Reading JSON Files

Learning Focus

Use this lesson to read JSON files directly in DuckDB — including nested documents and JSON arrays.

Basic Usage

-- Auto-detect JSON format and schema
SELECT * FROM read_json_auto('records.json') LIMIT 10;

-- Shorthand
SELECT * FROM 'records.json' LIMIT 10;

-- Newline-delimited JSON (NDJSON / JSON Lines)
SELECT * FROM read_json_auto('events.ndjson');

-- JSON array format
SELECT * FROM read_json_auto('data.json', format = 'array');

JSON Format Options

Format	Example	Option
JSON Lines (NDJSON)	`{"a":1}\n{"a":2}`	`format = 'newline_delimited'`
JSON Array	`[{"a":1},{"a":2}]`	`format = 'array'`
Auto-detect	Either	`format = 'auto'` (default)

Reading Nested JSON

-- Nested objects are exposed with struct dot notation
SELECT
    event.user_id,
    event.metadata.source AS source,
    event.timestamp
FROM read_json_auto('events.json') AS t(event);

-- Or access nested fields with ->
SELECT
    json_extract_string(payload, '$.user.name') AS user_name,
    json_extract(payload, '$.amount')::DOUBLE    AS amount
FROM raw_events;

Flatten Nested Arrays (UNNEST)

-- JSON: {"order_id": 1, "items": ["Shirt","Jeans"]}
SELECT
    order_id,
    unnest(items) AS item
FROM read_json_auto('orders.json');

JSON Functions

-- Extract string value
SELECT json_extract_string('{"name":"Alice"}', '$.name');  -- Alice

-- Extract typed value
SELECT json_extract('{"salary":95000}', '$.salary')::INTEGER;  -- 95000

-- Check if key exists
SELECT json_contains('{"a":1}', '{"a":1}');  -- true

-- Build JSON
SELECT to_json({'name': 'Alice', 'age': 30});

-- Array length
SELECT json_array_length('["a","b","c"]');  -- 3

Convert JSON to Parquet

COPY (SELECT * FROM read_json_auto('raw.json'))
TO 'clean.parquet' (FORMAT PARQUET);

Common Pitfalls

Pitfall	Consequence	Prevention
Mixed JSON formats in one file	Parse error	Standardize to NDJSON (one object per line)
Deeply nested JSON with auto-detect	Flat struct, not expanded	Use `json_extract()` for specific paths
Large JSON vs Parquet	Slower and larger	Convert to Parquet after initial ingestion

Quick Reference

SELECT * FROM read_json_auto('file.json');
SELECT * FROM read_json_auto('file.json', format='array');
SELECT json_extract_string(col, '$.key') FROM t;
SELECT unnest(json_array) FROM t;
COPY (SELECT * FROM read_json_auto('f.json')) TO 'f.parquet' (FORMAT PARQUET);

What's Next

Next: HTTPFS and S3 - Query files directly from HTTP or S3 without downloading.
Module Overview - Return to this module index.

Basic Usage​

JSON Format Options​

Reading Nested JSON​

Flatten Nested Arrays (UNNEST)​

JSON Functions​

Convert JSON to Parquet​

Common Pitfalls​

Quick Reference​

What's Next​

Basic Usage

JSON Format Options

Reading Nested JSON

Flatten Nested Arrays (UNNEST)

JSON Functions

Convert JSON to Parquet

Common Pitfalls

Quick Reference

What's Next