Skip to main content

Parallel Execution

Learning Focus

Use this lesson to understand DuckDB's parallel execution model and how to tune it for your hardware.

DuckDB is Parallel by Default

DuckDB automatically uses all available CPU cores to parallelize queries. There is no configuration needed for basic parallel execution.

-- This query automatically runs in parallel on all cores
SELECT department, SUM(salary)
FROM read_parquet('huge_employees.parquet')
GROUP BY department;

How DuckDB Parallelizes

Query → Split into pipelines → Each pipeline runs in parallel threads
┌────────────────────────────────────────────┐
│ Thread 1: scan rows 0-1M │
│ Thread 2: scan rows 1M-2M │
│ Thread 3: scan rows 2M-3M │
│ Thread 4: scan rows 3M-4M │
└────────────────────────────────────────────┘

Merge + Aggregate

Configuring Thread Count

-- Check current thread count
PRAGMA threads;

-- Set to specific count (default: all logical cores)
PRAGMA threads = 4;

-- Restore to auto (use all cores)
PRAGMA threads = -1;

-- Via environment
-- DUCKDB_NO_THREADS=4 duckdb mydb.db

Memory Configuration

-- Check memory limit
PRAGMA memory_limit;

-- Set memory limit
PRAGMA memory_limit = '8GB';

-- Set temp directory for spill-to-disk
PRAGMA temp_directory = '/tmp/duckdb_temp';

Monitoring Query Parallelism

-- View active queries and thread usage (DuckDB Progress Bar)
PRAGMA enable_progress_bar;

-- Check query pipeline info via EXPLAIN
EXPLAIN ANALYZE
SELECT COUNT(*), AVG(salary)
FROM read_parquet('huge_dataset.parquet');

Parallel File Reads

-- Multiple Parquet files are read in parallel automatically
SELECT * FROM read_parquet('data/*.parquet');

-- S3 files are also read in parallel
LOAD httpfs;
SELECT * FROM read_parquet('s3://bucket/events/**/*.parquet');

Vectorized Execution

DuckDB processes data in vectors of 1024 rows rather than row-by-row. This:

  • Maximizes CPU cache efficiency
  • Enables SIMD instructions
  • Reduces function call overhead

You do not need to configure vectorized execution — it is always active.

Benchmarking Parallel Queries

-- Enable timing
.timer on

-- Single thread
PRAGMA threads = 1;
SELECT COUNT(*), AVG(salary) FROM read_parquet('data.parquet');

-- All cores
PRAGMA threads = -1;
SELECT COUNT(*), AVG(salary) FROM read_parquet('data.parquet');

Common Pitfalls

PitfallConsequencePrevention
Using DuckDB in many concurrent processesMemory contentionDuckDB works best with one process; use one connection
Setting too many threads on a shared serverStarves other processesSet PRAGMA threads = N where N is your allocated core count
Running DuckDB in Docker with limited CPUsReduced parallelismSet --cpus Docker flag appropriately

Quick Reference

PRAGMA threads;           -- show current
PRAGMA threads = 8; -- set to 8 cores
PRAGMA memory_limit = '4GB';
PRAGMA temp_directory = '/tmp/duck_spill';
PRAGMA enable_progress_bar;
EXPLAIN ANALYZE query;

What's Next