Indexes in DuckDB

Learning Focus

Use this lesson to understand DuckDB's approach to indexing — different from traditional RDBMS indexes.

DuckDB's Indexing Philosophy

DuckDB is an OLAP engine, meaning it is designed for full-column scans rather than point lookups. As a result:

Traditional B-tree indexes provide limited benefit for analytics
DuckDB uses ART (Adaptive Radix Tree) indexes for primary keys and unique constraints
DuckDB internally uses zone maps (min/max statistics per row group) for Parquet files

ART Indexes (PRIMARY KEY / UNIQUE)

ART indexes are automatically created when you define PRIMARY KEY or UNIQUE constraints:

-- Primary key automatically creates an ART index
CREATE TABLE employees (
    id   INTEGER PRIMARY KEY,
    name VARCHAR,
    dept VARCHAR
);

-- Explicit UNIQUE constraint creates an index
CREATE TABLE users (
    email    VARCHAR UNIQUE,
    username VARCHAR
);

Explicit Index Creation

-- Create an index on a column
CREATE INDEX idx_dept ON employees (department);

-- Create a composite index
CREATE INDEX idx_dept_salary ON employees (department, salary);

-- Drop an index
DROP INDEX idx_dept;

When Indexes Help in DuckDB

Scenario	Index Helps?
Point lookup (`WHERE id = 5`)	Yes — ART index on primary key
Exact match on indexed column	Yes
Range scan on large table	Limited (zone maps often faster)
Full table analytics / GROUP BY	No — columnar scan is faster
Join on unique key	Yes

Zone Maps (Automatic Parquet Optimization)

For Parquet files, DuckDB automatically uses zone maps (row group min/max statistics):

-- DuckDB reads this filter and skips Parquet row groups
-- where year is not between the min and max of each group
SELECT * FROM read_parquet('sales.parquet')
WHERE year = 2024;

No explicit index creation needed — this is automatic for Parquet.

Checking Existing Indexes

-- Show all indexes in the database
SELECT * FROM duckdb_indexes();

-- Show index details for a table
SELECT * FROM duckdb_indexes() WHERE table_name = 'employees';

Best Practices

Don't over-index: Indexes add write overhead and are rarely useful for OLAP queries
Use PRIMARY KEY: Always define a PK — it helps JOIN performance
Use Parquet: Zone maps provide automatic filtering without explicit indexes
Sort Parquet data: Sort by common filter columns before writing Parquet to improve zone map efficiency

-- Sorted Parquet = better zone maps
COPY (SELECT * FROM sales ORDER BY year, region)
TO 'sales_sorted.parquet' (FORMAT PARQUET);

Common Pitfalls

Pitfall	Consequence	Prevention
Creating many indexes expecting MySQL-style speedup	No analytics benefit	Prefer sorted Parquet over indexes
Index on low-cardinality column	Index rarely used	Index high-cardinality columns only
Forgetting PRIMARY KEY on join tables	Slower joins	Always define PK on dimension tables

Quick Reference

CREATE INDEX idx_name ON table (column);
CREATE INDEX idx_comp ON table (col1, col2);
DROP INDEX idx_name;
SELECT * FROM duckdb_indexes();

What's Next

Next: Extensions - Expand DuckDB with spatial, full-text search, and more.
Module Overview - Return to this module index.

DuckDB's Indexing Philosophy​

ART Indexes (PRIMARY KEY / UNIQUE)​

Explicit Index Creation​

When Indexes Help in DuckDB​

Zone Maps (Automatic Parquet Optimization)​

Checking Existing Indexes​

Best Practices​

Common Pitfalls​

Quick Reference​

What's Next​