I have a job interview and wanted to have stuff handy.
Don't be dogmatic about following this but it's better than no plan.
Ask questions to understand stakeholder needs
Ask questions to understand data
Sketch dashboard (on pen and paper)
SQL queries to generate input to dashboard elements using some_ideal_clean_and_pristine.table_that_you_think_exists
Before starting model, consider doing sensemaking queries on the source data.
Start writing the dbt model to produce some_ideal_clean_and_pristine.table_that_you_think_exists
is_incremental
until you have a running model)Create yaml
with events as (
select * from {{ref('events')}}
{% if is_incremental() %}
where event_timestamp >= (select max(event_timestamp)::date from {{this}})
{% endif %}
)
--rest of model...
{{
config(
materialized='incremental',
unique_key='date_day'
)
}}
select
date_trunc('day', event_at) as date_day,
count(distinct user_id) as daily_active_users
from {{ ref('app_data_events') }}
{% if is_incremental() %}
-- this filter will only be applied on an incremental run
-- (uses >= to include records arriving later on the same day as the last run of this model)
where date_day >= (select coalesce(max(date_day), '1900-01-01') from {{ this }})
{% endif %}
group by 1
dbt run --full-refresh --select my_incremental_model+
I probably want to avoid packages, but if you need codegen or utils, create a packages.yml
file in the dir as the project yaml, and run dbt deps
.
packages:
- package: dbt-labs/codegen
version: 0.10.0
Run dbt deps
. And then here's the command line
dbt --quiet run-operation generate_source --args '{"table_names": ["orders"]}' > models/staging/jaffle_shop/_sources.yml
WITH NO SCHEMA BINDING;
in dbt it's {{ config(materialized='view', bind=False) }}
.Via the dbt docs for Redshift config
Be up-front about not being super familiar. But my basic understanding is: distkey is for joining, sortkey is for grouping and filtering.
-- Example with one sort key
{{ config(materialized='table', sort='reporting_day', dist='unique_id') }}
select ...
-- Example with multiple sort keys
{{ config(materialized='table', sort=['category', 'region', 'reporting_day'], dist='received_at') }}
select ...
-- Example with interleaved sort keys
{{ config(materialized='table',
sort_type='interleaved'
sort=['category', 'region', 'reporting_day'],
dist='unique_id')
}}
select ...
unbounded: Where BETWEEN is not specified, the frame is implicitly bounded by the current row. For example, ROWS 5 PRECEDING is equal to ROWS BETWEEN 5 PRECEDING AND CURRENT ROW. Also, ROWS UNBOUNDED FOLLOWING is equal to ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING.
over (order by day rows unbounded preceding)
is enough