Don't be dogmatic about following this, you can't get too far off track with this
-
Ask questions to understand stakeholder needs
- User story: Dashboard will be visited by this persona at this cadence to make this decision.
-
Ask questions to understand data
-
Sketch dashboard (on pen and paper)
- Chart elements
- Define filters
-
SQL queries to generate input to dashboard elements using
some_ideal_clean_and_pristine.table_that_you_think_exists
-
Before starting model, do some sensemaking queries on the source data.
-
Start writing the dbt model to produce
some_ideal_clean_and_pristine.table_that_you_think_exists
- Configure sources, test non-null/uniqueness
- Speak to materialization strategy (don’t add
is_incremental
until you have a running model) - Create model. Start with import CTEs.
- Scaffold with step-by-step CTEs before you start hacking away.
- Filter near the top:
- Deleted records
- Excluded users
- Events too old to be relevant
-
Create yaml
- add relevant tests
with events as (
select * from {{ref('events')}}
{% if is_incremental() %}
where event_timestamp >= (select max(event_timestamp)::date from {{this}})
{% endif %}
)
--rest of model...
Unique Key (Incremental Models docs)
{{
config(
materialized='incremental',
unique_key='date_day'
)
}}
select
date_trunc('day', event_at) as date_day,
count(distinct user_id) as daily_active_users
from {{ ref('app_data_events') }}
{% if is_incremental() %}
-- this filter will only be applied on an incremental run
-- (uses >= to include records arriving later on the same day as the last run of this model)
where date_day >= (select coalesce(max(date_day), '1900-01-01') from {{ this }})
{% endif %}
group by 1
dbt run --full-refresh --select my_incremental_model+
I probably want to avoid packages, but if you need codegen or utils, create a packages.yml
file in the dir as the project yaml, and run dbt deps
.
packages:
- package: dbt-labs/codegen
version: 0.10.0
Run dbt deps
. And then here's the command line
dbt --quiet run-operation generate_source --args '{"table_names": ["orders"]}' > models/staging/jaffle_shop/_sources.yml
- Late binding views
WITH NO SCHEMA BINDING;
in dbt it's{{ config(materialized='view', bind=False) }}
. - Auto-refresh
Via the dbt docs for Redshift config
Be up-front about not being super familiar. But my basic understanding is: distkey is for joining, sortkey is for grouping and filtering.
-- Example with one sort key
{{ config(materialized='table', sort='reporting_day', dist='unique_id') }}
select ...
-- Example with multiple sort keys
{{ config(materialized='table', sort=['category', 'region', 'reporting_day'], dist='received_at') }}
select ...
-- Example with interleaved sort keys
{{ config(materialized='table',
sort_type='interleaved'
sort=['category', 'region', 'reporting_day'],
dist='unique_id')
}}
select ...