Skip to main content

Reading JSON Files

Learning Focus

Use this lesson to read JSON files directly in DuckDB — including nested documents and JSON arrays.

Basic Usage

-- Auto-detect JSON format and schema
SELECT * FROM read_json_auto('records.json') LIMIT 10;

-- Shorthand
SELECT * FROM 'records.json' LIMIT 10;

-- Newline-delimited JSON (NDJSON / JSON Lines)
SELECT * FROM read_json_auto('events.ndjson');

-- JSON array format
SELECT * FROM read_json_auto('data.json', format = 'array');

JSON Format Options

FormatExampleOption
JSON Lines (NDJSON){"a":1}\n{"a":2}format = 'newline_delimited'
JSON Array[{"a":1},{"a":2}]format = 'array'
Auto-detectEitherformat = 'auto' (default)

Reading Nested JSON

-- Nested objects are exposed with struct dot notation
SELECT
event.user_id,
event.metadata.source AS source,
event.timestamp
FROM read_json_auto('events.json') AS t(event);

-- Or access nested fields with ->
SELECT
json_extract_string(payload, '$.user.name') AS user_name,
json_extract(payload, '$.amount')::DOUBLE AS amount
FROM raw_events;

Flatten Nested Arrays (UNNEST)

-- JSON: {"order_id": 1, "items": ["Shirt","Jeans"]}
SELECT
order_id,
unnest(items) AS item
FROM read_json_auto('orders.json');

JSON Functions

-- Extract string value
SELECT json_extract_string('{"name":"Alice"}', '$.name'); -- Alice

-- Extract typed value
SELECT json_extract('{"salary":95000}', '$.salary')::INTEGER; -- 95000

-- Check if key exists
SELECT json_contains('{"a":1}', '{"a":1}'); -- true

-- Build JSON
SELECT to_json({'name': 'Alice', 'age': 30});

-- Array length
SELECT json_array_length('["a","b","c"]'); -- 3

Convert JSON to Parquet

COPY (SELECT * FROM read_json_auto('raw.json'))
TO 'clean.parquet' (FORMAT PARQUET);

Common Pitfalls

PitfallConsequencePrevention
Mixed JSON formats in one fileParse errorStandardize to NDJSON (one object per line)
Deeply nested JSON with auto-detectFlat struct, not expandedUse json_extract() for specific paths
Large JSON vs ParquetSlower and largerConvert to Parquet after initial ingestion

Quick Reference

SELECT * FROM read_json_auto('file.json');
SELECT * FROM read_json_auto('file.json', format='array');
SELECT json_extract_string(col, '$.key') FROM t;
SELECT unnest(json_array) FROM t;
COPY (SELECT * FROM read_json_auto('f.json')) TO 'f.parquet' (FORMAT PARQUET);

What's Next