Indexes in DuckDB
Learning Focus
Use this lesson to understand DuckDB's approach to indexing — different from traditional RDBMS indexes.
DuckDB's Indexing Philosophy
DuckDB is an OLAP engine, meaning it is designed for full-column scans rather than point lookups. As a result:
- Traditional B-tree indexes provide limited benefit for analytics
- DuckDB uses ART (Adaptive Radix Tree) indexes for primary keys and unique constraints
- DuckDB internally uses zone maps (min/max statistics per row group) for Parquet files
ART Indexes (PRIMARY KEY / UNIQUE)
ART indexes are automatically created when you define PRIMARY KEY or UNIQUE constraints:
-- Primary key automatically creates an ART index
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR,
dept VARCHAR
);
-- Explicit UNIQUE constraint creates an index
CREATE TABLE users (
email VARCHAR UNIQUE,
username VARCHAR
);
Explicit Index Creation
-- Create an index on a column
CREATE INDEX idx_dept ON employees (department);
-- Create a composite index
CREATE INDEX idx_dept_salary ON employees (department, salary);
-- Drop an index
DROP INDEX idx_dept;
When Indexes Help in DuckDB
| Scenario | Index Helps? |
|---|---|
Point lookup (WHERE id = 5) | Yes — ART index on primary key |
| Exact match on indexed column | Yes |
| Range scan on large table | Limited (zone maps often faster) |
| Full table analytics / GROUP BY | No — columnar scan is faster |
| Join on unique key | Yes |
Zone Maps (Automatic Parquet Optimization)
For Parquet files, DuckDB automatically uses zone maps (row group min/max statistics):
-- DuckDB reads this filter and skips Parquet row groups
-- where year is not between the min and max of each group
SELECT * FROM read_parquet('sales.parquet')
WHERE year = 2024;
No explicit index creation needed — this is automatic for Parquet.
Checking Existing Indexes
-- Show all indexes in the database
SELECT * FROM duckdb_indexes();
-- Show index details for a table
SELECT * FROM duckdb_indexes() WHERE table_name = 'employees';
Best Practices
- Don't over-index: Indexes add write overhead and are rarely useful for OLAP queries
- Use PRIMARY KEY: Always define a PK — it helps JOIN performance
- Use Parquet: Zone maps provide automatic filtering without explicit indexes
- Sort Parquet data: Sort by common filter columns before writing Parquet to improve zone map efficiency
-- Sorted Parquet = better zone maps
COPY (SELECT * FROM sales ORDER BY year, region)
TO 'sales_sorted.parquet' (FORMAT PARQUET);
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Creating many indexes expecting MySQL-style speedup | No analytics benefit | Prefer sorted Parquet over indexes |
| Index on low-cardinality column | Index rarely used | Index high-cardinality columns only |
| Forgetting PRIMARY KEY on join tables | Slower joins | Always define PK on dimension tables |
Quick Reference
CREATE INDEX idx_name ON table (column);
CREATE INDEX idx_comp ON table (col1, col2);
DROP INDEX idx_name;
SELECT * FROM duckdb_indexes();
What's Next
- Next: Extensions - Expand DuckDB with spatial, full-text search, and more.
- Module Overview - Return to this module index.