GROUP BY
Learning Focus
Use this lesson to master GROUP BY for summarizing data by category in DuckDB.
Concept Overview
GROUP BY collapses multiple rows sharing the same value(s) into a single summary row, enabling aggregate functions to compute per-group statistics.
Basic Syntax
SELECT column, aggregate_function(other_col)
FROM table
GROUP BY column;
Examples
Example 1: Headcount per Department
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;
Example 2: Multi-Column Grouping
SELECT department, active, COUNT(*) AS count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, active
ORDER BY department, active;
Example 3: Group by Expression
-- Group by year of hire
SELECT
extract('year' FROM hire_date) AS hire_year,
COUNT(*) AS hires
FROM employees
GROUP BY hire_year
ORDER BY hire_year;
Example 4: Group on File Data
-- Aggregate directly from a CSV
SELECT category, SUM(revenue) AS total_revenue
FROM read_csv_auto('sales.csv')
GROUP BY category
ORDER BY total_revenue DESC;
Example 5: Group by with FILTER
SELECT
department,
COUNT(*) FILTER (WHERE active = true) AS active,
COUNT(*) FILTER (WHERE active = false) AS inactive
FROM employees
GROUP BY department;
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Non-aggregated column missing from GROUP BY | Error | Every SELECT column must be in GROUP BY or wrapped in an aggregate |
| Grouping by high-cardinality columns | Many small groups | Check cardinality with COUNT(DISTINCT col) first |
| Using WHERE instead of HAVING for group filters | Error or wrong results | Filter group results with HAVING |
Quick Reference
SELECT col, COUNT(*), AVG(col2) FROM t GROUP BY col;
SELECT col1, col2, SUM(col3) FROM t GROUP BY col1, col2;
SELECT extract('year' FROM date_col), COUNT(*) FROM t GROUP BY 1;
What's Next
- Next: HAVING - Filter groups after aggregation.
- Module Overview - Return to this module index.