SmartDataMark logo

DBT 101

What is dbt (data build tool)?

dbt helps analytics engineers/ data engineers transform data inside the warehouse (it’s the T in ELT). You write SQL (plus a bit of YAML for config), and dbt handles dependencies, environments, tests, documentation, and deployment.

Project basics

  • dbt_project.yml → defines your project (models path, materializations, configs).
  • Models → each model is a single SELECT in a .sql file; dbt builds them into views/tables.
  • Adapters (connectors) → Snowflake, BigQuery, Redshift, Postgres, Databricks, DuckDB, etc.
  • Core commands

  • dbt run → compiles + runs your SQL models.
  • dbt build → runs models and tests (and snapshots if present).
  • dbt test → executes tests.
  • dbt compile → only compiles (no execution).
  • dbt docs generate/serve → builds and serves your project docs.
  • Materializations (build strategies)

  • view (default): lightweight, always reads from source each query.
  • table: persists results (fast reads; higher storage).
  • incremental: only processes new/changed data; supports MERGE on many adapters.
  • ephemeral: inlines as CTEs (not persisted).
  • Sources & seeds

  • sources: reference external tables not built by dbt; you can run freshness checks.
  • seeds: load small CSVs from your repo into the warehouse (dbt seed).
  • Testing

  • Generic (schema) tests in YAML: unique, not_null, accepted_values, relationships.
  • Singular (data) tests in SQL: custom assertions as SELECT queries.
  • Run: dbt test, dbt test --select test_type:generic|singular, or target a model.
  • Documentation

  • Write descriptions in YAML or Markdown; dbt generates a searchable HTML site.
  • Customize the landing page with overview.md and add images/assets in your project.
  •         - models/customer_spending.sql
    SELECT
    customer_id,
    SUM(order_amount) AS total_spent
    FROM
    {{ ref('raw_orders') }} -- 'ref' function creates a dependency on the 'raw_orders' table
    GROUP BY
    customer_id
          

    DBT turns SQL + YAML into reliable, tested, documented, and deployable data transformations for modern ELT.