Motivation
The goal of this blog is to show how to use dbt
to manage multiple database environments for data engineers or analysts.
Due to the delay in sync between QA, development and production database, your SQL code might be slightly different for different databases.
The problem we are facing is how to manage multiple database environments with one codebase.
Solution
Let's set up a dbt project called dbt_project
# set up a dbt project in current directory
dbt init <project_name>
You could set up database credentials in nano ~/.dbt/profile.yml
, which looks like this
dbt_project:
# the default env
target: dev
outputs:
# dev environemnt credential
dev:
type: mysql
server: [server/host]
port: [port] # optional
database: [schema] # optional, should be same as schema
schema: [schema]
username: [username]
password: [password]
driver: MySQL ODBC 8.0 ANSI Driver
# qa environemnt credential
qa:
type: mysql
server: [server/host]
port: [port] # optional
database: [schema] # optional, should be same as schema
schema: [schema]
username: [username]
password: [password]
driver: MySQL ODBC 8.0 ANSI Driver
# prod environemnt credential
prod:
type: mysql
server: [server/host]
port: [port] # optional
database: [schema] # optional, should be same as schema
schema: [schema]
username: [username]
password: [password]
driver: MySQL ODBC 8.0 ANSI Driver
You could set up multiple environments specified in the same file.
To test the connection for different environments, you can do
# test the connection for target database specified in yml, in here
# it's dev
dbt debug
# test connection for qa
dbt debug --target qa
After that, you can compile and run your model for different environment with the following commands
# run the production env
dbt run --target prod
# run the development env
dbt run --target dev
Now, you need to solve how to use one SQL code base for all environments. There are two ways for different levels of control,
jinja config in
dbt_project/models/sources.yml
jinja config in sql code like
test.sql
change sources.yml
jinja template has a variable called target
that contains info about your connection to the warehouse. It has some general ones such as
variable | example | description |
target.name | dev | name of the active target in your profile.yml |
target.type | postgres | The active adapter being used. |
version: 2
sources:
- name: source_name
database: |
{%- if target.name == "dev" -%} raw_dev
{%- elif target.name == "qa" -%} raw_qa
{%- elif target.name == "prod" -%} raw_prod
{%- else -%} invalid_database
{%- endif -%}
schema: source_schema
You can control the sources to compile different database environments
with more granular control
filter out the data only for dev environment with the following command
-- only filter out data only for 'dev' environment
select
*
from source('raw', 'client')
{% if target.name == 'dev' %}
where ingested_date >= '2023-07-01'
{% endif %}
Summary
Try dbt
to manage multiple environments to write SQL!