SQL for Modern Analytics
The engine behind data extraction. Beyond SELECT *.
SQL remains the lingua franca of data work β but the version most analysts learn in bootcamps is a pale shadow of what production analytics requires. Window functions, CTEs, query optimisation on multi-billion-row tables: these are the skills that separate a data analyst from a data strategist.
In this module
- Window Functions: The Game Changer
- CTEs and Query Readability
- Cloud-Scale SQL: BigQuery & Snowflake
Window Functions: The Game Changer
Most analysts are comfortable with GROUP BY aggregations. Window functions take this several levels higher β they let you compute aggregations OVER a sliding frame while keeping every row visible in the result set. Consider a simple use case: computing a 7-day rolling average of daily revenue without collapsing your dataset.
The canonical pattern is OVER (PARTITION BY β¦ ORDER BY β¦ ROWS BETWEEN β¦). Mastering this alone will make you 10Γ more productive in Power BI because your pre-aggregated SQL becomes trivially connectable to any visual layer. Functions like ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), and LEAD() enable cohort analysis, funnel step tracking, and retention modelling β use cases that naive GROUP BY queries simply cannot express elegantly.
CTEs and Query Readability
Common Table Expressions (CTEs) introduced with the WITH keyword are the closest thing SQL has to functions. They let you decompose a complex query into named, readable steps. A 300-line monolithic SELECT becomes a sequence of well-named sub-queries, each testable in isolation.
The performance implications in modern engines like BigQuery, Snowflake, and DuckDB are often neutral or positive β the query planner inlines CTEs intelligently. Where they do matter is in collaborative environments: a CTE-based query is reviewable, debuggable, and maintainable in a way that nested sub-queries are not. Treat your SQL like production code β because in a data-driven company, it is.
Cloud-Scale SQL: BigQuery & Snowflake
The move from on-premise databases to columnar cloud warehouses changes how you write SQL in subtle but important ways. In BigQuery, you pay per byte scanned β this changes your relationship with SELECT * permanently. You learn to be surgical with column selection, to use partitioned and clustered tables, and to understand when to materialise intermediate results vs. query them on the fly.
In Snowflake, the concept of virtual warehouses means your queries compete for compute resources in a way that on-premise SQL never did. Understanding how to size warehouses, use result caching, and write queries that are cache-friendly becomes part of your daily analytical toolkit. These platforms also introduce dialect differences β QUALIFY (Snowflake's elegant way to filter window function results), UNNEST, ARRAY_AGG, STRUCT β that have no equivalent in standard SQL.
Production Best Practices
Always EXPLAIN ANALYZE before pushing a query to a BI tool β understand the query plan.
Partition your tables by date in BigQuery; filter on the partition column first, always.
Write CTEs top-down: each CTE should be explainable in one sentence.
Never use SELECT * in a production query β column pruning is free performance.