Reading JSON Files
Learning Focus
Use this lesson to read JSON files directly in DuckDB — including nested documents and JSON arrays.
Basic Usage
-- Auto-detect JSON format and schema
SELECT * FROM read_json_auto('records.json') LIMIT 10;
-- Shorthand
SELECT * FROM 'records.json' LIMIT 10;
-- Newline-delimited JSON (NDJSON / JSON Lines)
SELECT * FROM read_json_auto('events.ndjson');
-- JSON array format
SELECT * FROM read_json_auto('data.json', format = 'array');
JSON Format Options
| Format | Example | Option |
|---|---|---|
| JSON Lines (NDJSON) | {"a":1}\n{"a":2} | format = 'newline_delimited' |
| JSON Array | [{"a":1},{"a":2}] | format = 'array' |
| Auto-detect | Either | format = 'auto' (default) |
Reading Nested JSON
-- Nested objects are exposed with struct dot notation
SELECT
event.user_id,
event.metadata.source AS source,
event.timestamp
FROM read_json_auto('events.json') AS t(event);
-- Or access nested fields with ->
SELECT
json_extract_string(payload, '$.user.name') AS user_name,
json_extract(payload, '$.amount')::DOUBLE AS amount
FROM raw_events;
Flatten Nested Arrays (UNNEST)
-- JSON: {"order_id": 1, "items": ["Shirt","Jeans"]}
SELECT
order_id,
unnest(items) AS item
FROM read_json_auto('orders.json');
JSON Functions
-- Extract string value
SELECT json_extract_string('{"name":"Alice"}', '$.name'); -- Alice
-- Extract typed value
SELECT json_extract('{"salary":95000}', '$.salary')::INTEGER; -- 95000
-- Check if key exists
SELECT json_contains('{"a":1}', '{"a":1}'); -- true
-- Build JSON
SELECT to_json({'name': 'Alice', 'age': 30});
-- Array length
SELECT json_array_length('["a","b","c"]'); -- 3
Convert JSON to Parquet
COPY (SELECT * FROM read_json_auto('raw.json'))
TO 'clean.parquet' (FORMAT PARQUET);
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Mixed JSON formats in one file | Parse error | Standardize to NDJSON (one object per line) |
| Deeply nested JSON with auto-detect | Flat struct, not expanded | Use json_extract() for specific paths |
| Large JSON vs Parquet | Slower and larger | Convert to Parquet after initial ingestion |
Quick Reference
SELECT * FROM read_json_auto('file.json');
SELECT * FROM read_json_auto('file.json', format='array');
SELECT json_extract_string(col, '$.key') FROM t;
SELECT unnest(json_array) FROM t;
COPY (SELECT * FROM read_json_auto('f.json')) TO 'f.parquet' (FORMAT PARQUET);
What's Next
- Next: HTTPFS and S3 - Query files directly from HTTP or S3 without downloading.
- Module Overview - Return to this module index.