Skip to main content

SELECT DISTINCT

Learning Focus

Use this lesson to understand SELECT DISTINCT and how to remove duplicates in DuckDB query results.

Concept Overview

SELECT DISTINCT returns only unique rows from a result set — duplicate rows are eliminated before results are returned.

Basic Syntax

SELECT DISTINCT column1, column2
FROM table_name;

Examples

Example 1: Distinct Departments

SELECT DISTINCT department FROM employees;

Output:

┌─────────────┐
│ department │
├─────────────┤
│ Engineering │
│ Marketing │
│ HR │
│ Finance │
└─────────────┘

Example 2: Distinct Combinations

-- Unique department + active status combinations
SELECT DISTINCT department, active
FROM employees
ORDER BY department;

Example 3: Count of Distinct Values

-- How many unique departments?
SELECT COUNT(DISTINCT department) AS unique_departments
FROM employees;

Example 4: Distinct on File Data

-- Unique categories in a CSV, no import needed
SELECT DISTINCT category
FROM read_csv_auto('products.csv');

DISTINCT vs GROUP BY

Both can deduplicate, but there are differences:

-- These are equivalent for deduplication:
SELECT DISTINCT department FROM employees;

SELECT department FROM employees GROUP BY department;

-- GROUP BY allows aggregations; DISTINCT does not
SELECT department, COUNT(*) FROM employees GROUP BY department;
-- SELECT DISTINCT department, COUNT(*) FROM employees; -- invalid!

Common Pitfalls

PitfallConsequencePrevention
Applying DISTINCT to all columnsRarely removes rowsOnly apply to the columns that should be unique
Using DISTINCT when GROUP BY is neededCannot add aggregationsUse GROUP BY when you also need COUNT, SUM, etc.
Expecting DISTINCT to deduplicate NULLsNULLs treated as equal (one NULL kept)Use IS NOT NULL filtering if NULLs are unwanted

Quick Reference

SELECT DISTINCT col FROM table;
SELECT COUNT(DISTINCT col) FROM table;
SELECT DISTINCT col1, col2 FROM table ORDER BY col1;

What's Next