SchemaSpy: how to understand your data model

Photo by fabio on Unsplash

SchemaSpy: how to understand your data model

#Mysql # datamodel #arm64 #mac m1

Motivation

For a new data engineer or analyst who is just onboard, it is often to be overwhelmed with the data model of your company, especially when it's not well documented. In this article, we are going to talk about open-source tool called SchemaSpy

It is able to show you the meta-info of your database

schemaspy generated webpage

Also, it is able to show you the ERD (1 implied and 2 implied) so you don't have to be limited by the SQL ide such as MySQL workbench.

What is SchemaSpy?

SchemaSpy is an open-source tool for analyzing and visualizing database schemas. It generates interactive diagrams and documentation that help users understand the relationships between tables, columns, and other database objects.

SchemaSpy supports a wide range of databases, including MySQL, PostgreSQL, Oracle, and SQL Server, and can be used to generate documentation for both small and large databases. SchemaSpy is a popular tool for data architects, developers, and other professionals who work with databases and need to understand their structure and relationships.

It works well if your database is on-prem but i haven't tested how to use it for cloud-based database tho.

Installation

You have to have the following pre-req to running it locally without docker

  • graphviz : for making diagrams

  • java: for running the app.

  • mysql jdbc driver and you should select OS as platform independent and download the zip file.

  • schemaspy.jar: you can download it following the instruction from their github here.

Let's use brew to install the pre-reqs

brew install graphviz

brew install java

Then you installed graphviz , java , mysql jdbc driver and schemaspy.jar in the same directory, you should make a directory

# check your java version
java --version

# make a directory for storing schemaSpy index.html
mkdir output

Then your file structure should look something like this,

├── mysql-connector-j-8.0.33 
├── output 
└── schemaspy.jar

Then you should run with the following command

java -jar ./schemaspy.jar \                                                     
   -dp ./mysql-connector-j-8.0.33/mysql-connector-j-8.0.33.jar \
   -t mysql \
   -db <database_name> \
   -schemas <database_schema> \
   -host <host_ip> \
   -port 3306 \
   -u <username> \
   -p <password> \
   -o ./output

In the command above, you prob want to understand the flags

FlagDescription
-dpPath to the JDBC driver JAR file
-tType of database being analyzed
-dbName of the database to analyze
-schemasComma-separated list of schemas to include in the analysis
-hostIP address or hostname of the database server
-portPort number to use for the database connection
-uUsername to use for the database connection
-pPassword to use for the database connection
-oOutput directory for the generated documentation

Then your index.html will be generated by SchemaSpy and it will be

cd output

# use chrome to open it, you can use other browser of your preference
open -a "Google Chrome" index.html

Now, you should be able to enjoy your beautiful web server to understand your database and data model (believe me, it costs the most amount of time).

Summary

In this article, we cover how to use SchemaSpy to understand your database model. We also covered how to run it locally. SchemaSpy is great but it does have limitations such as the ERD diagram only showing tables related by keys (logical mapping is not supported), but it's understandable.

Reference