Skip to main content

Indexes in DuckDB

Learning Focus

Use this lesson to understand DuckDB's approach to indexing — different from traditional RDBMS indexes.

DuckDB's Indexing Philosophy

DuckDB is an OLAP engine, meaning it is designed for full-column scans rather than point lookups. As a result:

  • Traditional B-tree indexes provide limited benefit for analytics
  • DuckDB uses ART (Adaptive Radix Tree) indexes for primary keys and unique constraints
  • DuckDB internally uses zone maps (min/max statistics per row group) for Parquet files

ART Indexes (PRIMARY KEY / UNIQUE)

ART indexes are automatically created when you define PRIMARY KEY or UNIQUE constraints:

-- Primary key automatically creates an ART index
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR,
dept VARCHAR
);

-- Explicit UNIQUE constraint creates an index
CREATE TABLE users (
email VARCHAR UNIQUE,
username VARCHAR
);

Explicit Index Creation

-- Create an index on a column
CREATE INDEX idx_dept ON employees (department);

-- Create a composite index
CREATE INDEX idx_dept_salary ON employees (department, salary);

-- Drop an index
DROP INDEX idx_dept;

When Indexes Help in DuckDB

ScenarioIndex Helps?
Point lookup (WHERE id = 5)Yes — ART index on primary key
Exact match on indexed columnYes
Range scan on large tableLimited (zone maps often faster)
Full table analytics / GROUP BYNo — columnar scan is faster
Join on unique keyYes

Zone Maps (Automatic Parquet Optimization)

For Parquet files, DuckDB automatically uses zone maps (row group min/max statistics):

-- DuckDB reads this filter and skips Parquet row groups
-- where year is not between the min and max of each group
SELECT * FROM read_parquet('sales.parquet')
WHERE year = 2024;

No explicit index creation needed — this is automatic for Parquet.

Checking Existing Indexes

-- Show all indexes in the database
SELECT * FROM duckdb_indexes();

-- Show index details for a table
SELECT * FROM duckdb_indexes() WHERE table_name = 'employees';

Best Practices

  • Don't over-index: Indexes add write overhead and are rarely useful for OLAP queries
  • Use PRIMARY KEY: Always define a PK — it helps JOIN performance
  • Use Parquet: Zone maps provide automatic filtering without explicit indexes
  • Sort Parquet data: Sort by common filter columns before writing Parquet to improve zone map efficiency
-- Sorted Parquet = better zone maps
COPY (SELECT * FROM sales ORDER BY year, region)
TO 'sales_sorted.parquet' (FORMAT PARQUET);

Common Pitfalls

PitfallConsequencePrevention
Creating many indexes expecting MySQL-style speedupNo analytics benefitPrefer sorted Parquet over indexes
Index on low-cardinality columnIndex rarely usedIndex high-cardinality columns only
Forgetting PRIMARY KEY on join tablesSlower joinsAlways define PK on dimension tables

Quick Reference

CREATE INDEX idx_name ON table (column);
CREATE INDEX idx_comp ON table (col1, col2);
DROP INDEX idx_name;
SELECT * FROM duckdb_indexes();

What's Next