ONE dbt project for multiple environments

ONE dbt project for multiple environments

Dev, QA and prod with 1 codebase!

Motivation

The goal of this blog is to show how to use dbt to manage multiple database environments for data engineers or analysts.

Due to the delay in sync between QA, development and production database, your SQL code might be slightly different for different databases.

The problem we are facing is how to manage multiple database environments with one codebase.

Solution

Let's set up a dbt project called dbt_project

# set up a dbt project in current directory
dbt init <project_name>

You could set up database credentials in nano ~/.dbt/profile.yml , which looks like this

dbt_project:
  # the default env
  target: dev
  outputs:
    # dev environemnt credential
    dev:
      type: mysql
      server: [server/host]
      port: [port]  # optional
      database: [schema] # optional, should be same as schema
      schema: [schema]
      username: [username]
      password: [password]
      driver: MySQL ODBC 8.0 ANSI Driver
    # qa environemnt credential
    qa:
      type: mysql
      server: [server/host]
      port: [port]  # optional
      database: [schema] # optional, should be same as schema
      schema: [schema]
      username: [username]
      password: [password]
      driver: MySQL ODBC 8.0 ANSI Driver
    # prod environemnt credential
    prod:
      type: mysql
      server: [server/host]
      port: [port]  # optional
      database: [schema] # optional, should be same as schema
      schema: [schema]
      username: [username]
      password: [password]
      driver: MySQL ODBC 8.0 ANSI Driver

You could set up multiple environments specified in the same file.

To test the connection for different environments, you can do

# test the connection for target database specified in yml, in here
# it's dev
dbt debug 

# test connection for qa
dbt debug --target qa

After that, you can compile and run your model for different environment with the following commands

# run the production env
dbt run --target prod

# run the development env
dbt run --target dev

Now, you need to solve how to use one SQL code base for all environments. There are two ways for different levels of control,

  • jinja config in dbt_project/models/sources.yml

  • jinja config in sql code like test.sql

change sources.yml

jinja template has a variable called target that contains info about your connection to the warehouse. It has some general ones such as

variableexampledescription
target.namedevname of the active target in your profile.yml
target.typepostgresThe active adapter being used.
version: 2

sources:
  - name: source_name 
    database: |
      {%- if  target.name == "dev" -%} raw_dev
      {%- elif target.name == "qa"  -%} raw_qa
      {%- elif target.name == "prod"  -%} raw_prod
      {%- else -%} invalid_database
      {%- endif -%}
    schema: source_schema

You can control the sources to compile different database environments

with more granular control

filter out the data only for dev environment with the following command

-- only filter out data only for 'dev' environment 
select
  *
from source('raw', 'client')
{% if target.name == 'dev' %}
where ingested_date >= '2023-07-01'
{% endif %}

Summary

Try dbt to manage multiple environments to write SQL!

Reference