Skip to main content

COPY FROM and Export

Learning Focus

Use this lesson to import data into DuckDB tables and export query results to various file formats.

COPY FROM (Import)

-- Import CSV into an existing table
COPY employees FROM 'employees.csv' (FORMAT CSV, HEADER true, DELIMITER ',');

-- Import Parquet
COPY employees FROM 'employees.parquet' (FORMAT PARQUET);

-- Import JSON Lines
COPY events FROM 'events.ndjson' (FORMAT JSON);

COPY TO (Export)

Export to CSV

-- Export entire table
COPY employees TO 'output/employees.csv' (FORMAT CSV, HEADER true);

-- Export query result
COPY (
SELECT name, department, salary
FROM employees
WHERE active = true
ORDER BY salary DESC
) TO 'active_employees.csv' (FORMAT CSV, HEADER true, DELIMITER ',');

Export to Parquet

-- Export with Parquet compression
COPY employees TO 'employees.parquet' (FORMAT PARQUET, COMPRESSION SNAPPY);

-- Export partitioned Parquet (Hive layout)
COPY (SELECT * FROM sales)
TO 'output/' (FORMAT PARQUET, PARTITION_BY (year, month));

Export to JSON

COPY employees TO 'employees.json' (FORMAT JSON);

Practical ETL Patterns

CSV → Parquet Pipeline

-- Read, clean, and write in one step
COPY (
SELECT
id,
UPPER(name) AS name,
department,
ROUND(salary,2) AS salary,
hire_date
FROM read_csv_auto('raw/employees.csv')
WHERE salary > 0 AND name IS NOT NULL
) TO 'clean/employees.parquet' (FORMAT PARQUET, COMPRESSION ZSTD);

Split Export into Multiple Files

-- Export one file per department using PARTITION_BY
COPY (SELECT * FROM employees)
TO 'output/by_dept/' (FORMAT PARQUET, PARTITION_BY (department));
-- Creates: output/by_dept/department=Engineering/data.parquet etc.

COPY Options Reference

OptionValuesDescription
FORMATCSV, PARQUET, JSONOutput format
HEADERtrue/falseInclude CSV header
DELIMITERAny charField delimiter for CSV
COMPRESSIONSNAPPY, ZSTD, GZIPParquet compression
PARTITION_BYColumn listHive-partition the output

Common Pitfalls

PitfallConsequencePrevention
COPY to a path without directory existingError: directory not foundCreate directory before COPY
Exporting large result without compressionLarge output filesAlways use COMPRESSION ZSTD for Parquet
CSV export without HEADERMissing column namesAlways set HEADER true for CSV

Quick Reference

-- Import
COPY t FROM 'file.csv' (FORMAT CSV, HEADER true);
COPY t FROM 'file.parquet' (FORMAT PARQUET);

-- Export
COPY t TO 'out.csv' (FORMAT CSV, HEADER true);
COPY t TO 'out.parquet' (FORMAT PARQUET, COMPRESSION ZSTD);
COPY (SELECT ...) TO 'out.parquet' (FORMAT PARQUET, PARTITION_BY (col));

What's Next