SchemaSpy: how to understand your data model
#Mysql # datamodel #arm64 #mac m1
Table of contents
Motivation
For a new data engineer or analyst who is just onboard, it is often to be overwhelmed with the data model of your company, especially when it's not well documented. In this article, we are going to talk about open-source tool called SchemaSpy
It is able to show you the meta-info of your database
Also, it is able to show you the ERD (1 implied and 2 implied) so you don't have to be limited by the SQL ide such as MySQL workbench.
What is SchemaSpy?
SchemaSpy is an open-source tool for analyzing and visualizing database schemas. It generates interactive diagrams and documentation that help users understand the relationships between tables, columns, and other database objects.
SchemaSpy supports a wide range of databases, including MySQL
, PostgreSQL
, Oracle
, and SQL Server
, and can be used to generate documentation for both small and large databases. SchemaSpy is a popular tool for data architects, developers, and other professionals who work with databases and need to understand their structure and relationships.
It works well if your database is on-prem but i haven't tested how to use it for cloud-based database tho.
Installation
You have to have the following pre-req to running it locally without docker
graphviz
: for making diagramsjava
: for running the app.mysql jdbc driver and you should select OS as
platform independent
and download the zip file.schemaspy.jar
: you can download it following the instruction from their github here.
Let's use brew
to install the pre-reqs
brew install graphviz
brew install java
Then you installed graphviz
, java
, mysql jdbc driver
and schemaspy.jar
in the same directory, you should make a directory
# check your java version
java --version
# make a directory for storing schemaSpy index.html
mkdir output
Then your file structure should look something like this,
├── mysql-connector-j-8.0.33
├── output
└── schemaspy.jar
Then you should run with the following command
java -jar ./schemaspy.jar \
-dp ./mysql-connector-j-8.0.33/mysql-connector-j-8.0.33.jar \
-t mysql \
-db <database_name> \
-schemas <database_schema> \
-host <host_ip> \
-port 3306 \
-u <username> \
-p <password> \
-o ./output
In the command above, you prob want to understand the flags
Flag | Description |
-dp | Path to the JDBC driver JAR file |
-t | Type of database being analyzed |
-db | Name of the database to analyze |
-schemas | Comma-separated list of schemas to include in the analysis |
-host | IP address or hostname of the database server |
-port | Port number to use for the database connection |
-u | Username to use for the database connection |
-p | Password to use for the database connection |
-o | Output directory for the generated documentation |
Then your index.html
will be generated by SchemaSpy and it will be
cd output
# use chrome to open it, you can use other browser of your preference
open -a "Google Chrome" index.html
Now, you should be able to enjoy your beautiful web server to understand your database and data model (believe me, it costs the most amount of time).
Summary
In this article, we cover how to use SchemaSpy to understand your database model. We also covered how to run it locally. SchemaSpy is great but it does have limitations such as the ERD diagram only showing tables related by keys (logical mapping is not supported), but it's understandable.