Skip to main content

GROUP BY

Learning Focus

Use this lesson to master GROUP BY for summarizing data by category in DuckDB.

Concept Overview

GROUP BY collapses multiple rows sharing the same value(s) into a single summary row, enabling aggregate functions to compute per-group statistics.

Basic Syntax

SELECT column, aggregate_function(other_col)
FROM table
GROUP BY column;

Examples

Example 1: Headcount per Department

SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;

Example 2: Multi-Column Grouping

SELECT department, active, COUNT(*) AS count, AVG(salary) AS avg_salary
FROM employees
GROUP BY department, active
ORDER BY department, active;

Example 3: Group by Expression

-- Group by year of hire
SELECT
extract('year' FROM hire_date) AS hire_year,
COUNT(*) AS hires
FROM employees
GROUP BY hire_year
ORDER BY hire_year;

Example 4: Group on File Data

-- Aggregate directly from a CSV
SELECT category, SUM(revenue) AS total_revenue
FROM read_csv_auto('sales.csv')
GROUP BY category
ORDER BY total_revenue DESC;

Example 5: Group by with FILTER

SELECT
department,
COUNT(*) FILTER (WHERE active = true) AS active,
COUNT(*) FILTER (WHERE active = false) AS inactive
FROM employees
GROUP BY department;

Common Pitfalls

PitfallConsequencePrevention
Non-aggregated column missing from GROUP BYErrorEvery SELECT column must be in GROUP BY or wrapped in an aggregate
Grouping by high-cardinality columnsMany small groupsCheck cardinality with COUNT(DISTINCT col) first
Using WHERE instead of HAVING for group filtersError or wrong resultsFilter group results with HAVING

Quick Reference

SELECT col, COUNT(*), AVG(col2) FROM t GROUP BY col;
SELECT col1, col2, SUM(col3) FROM t GROUP BY col1, col2;
SELECT extract('year' FROM date_col), COUNT(*) FROM t GROUP BY 1;

What's Next