Skip to main content

Select Statement

Learning Focus

Use this lesson to understand Select Statement with practical DuckDB syntax and examples.

Concept Overview

What is the SELECT Statement?

The SELECT statement is the most fundamental command in DuckDB. It retrieves data from tables, files, or expressions. DuckDB extends standard SELECT with columnar optimizations and direct file querying.

Why is SELECT Important?

  • Data Retrieval: Primary way to get data from any source
  • File Querying: DuckDB can SELECT directly from CSV, Parquet, and JSON
  • Foundation for Analysis: All analytical work begins with SELECT
  • OLAP Power: Columnar execution makes large scans exceptionally fast

Basic Syntax

SELECT column1, column2, ...
FROM table_name;

Sample Database Setup

CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE,
active BOOLEAN
);

INSERT INTO employees VALUES
(1, 'Alice Johnson', 'Engineering', 95000.00, '2020-03-15', true),
(2, 'Bob Smith', 'Marketing', 72000.00, '2019-06-01', true),
(3, 'Charlie Davis', 'Engineering', 88000.00, '2022-01-20', true),
(4, 'Diana White', 'HR', 65000.00, '2020-09-10', false),
(5, 'Evan Brown', 'Finance', 78000.00, '2021-11-05', true);

Step-by-Step Examples

Example 1: Select All Columns

SELECT * FROM employees;

Expected Output:

┌─────┬───────────────┬─────────────┬──────────┬────────────┬─────────┐
│ id │ name │ department │ salary │ hire_date │ active │
├─────┼───────────────┼─────────────┼──────────┼────────────┼─────────┤
│ 1 │ Alice Johnson │ Engineering │ 95000.00 │ 2020-03-15 │ true │
│ 2 │ Bob Smith │ Marketing │ 72000.00 │ 2019-06-01 │ true │
│ 3 │ Charlie Davis │ Engineering │ 88000.00 │ 2022-01-20 │ true │
│ 4 │ Diana White │ HR │ 65000.00 │ 2020-09-10 │ false │
│ 5 │ Evan Brown │ Finance │ 78000.00 │ 2021-11-05 │ true │
└─────┴───────────────┴─────────────┴──────────┴────────────┴─────────┘

Example 2: Specific Columns

SELECT name, department FROM employees;

Example 3: Column Aliases

SELECT
name AS "Employee Name",
department AS "Dept",
salary AS "Annual Salary"
FROM employees;

Example 4: Calculated Columns

SELECT
name,
salary,
salary / 12 AS monthly_salary,
round(salary * 1.1, 2) AS with_10pct_raise
FROM employees;

Example 5: SELECT from a CSV File (DuckDB-specific)

-- No table creation needed!
SELECT name, department
FROM read_csv_auto('employees.csv')
LIMIT 5;

Practical Use Cases

Use Case 1: Department Directory

SELECT
name AS "Employee",
department AS "Department",
CASE WHEN active THEN 'Active' ELSE 'Inactive' END AS "Status"
FROM employees;

Use Case 2: Payroll Estimation

SELECT
name,
salary,
round(salary / 12, 2) AS monthly_pay,
round(salary * 0.2, 2) AS estimated_tax
FROM employees
WHERE active = true;

Use Case 3: Query a Parquet File Directly

SELECT department, COUNT(*) AS headcount
FROM read_parquet('employees.parquet')
GROUP BY department;

Common Mistakes & Troubleshooting

Mistake 1: Forgetting FROM

-- Wrong
SELECT name, department;

-- Right
SELECT name, department FROM employees;

Mistake 2: Misspelled Column Names

-- Wrong
SELECT naem FROM employees;
-- Error: Referenced column "naem" not found

-- Right: check with DESCRIBE
DESCRIBE employees;
SELECT name FROM employees;

Mistake 3: Quoting Identifiers

-- DuckDB uses double-quotes for identifiers (not backticks)
SELECT "name", "department" FROM employees;

-- MySQL-style backticks don't work in DuckDB
-- SELECT `name` FROM employees; -- Error!

Best Practices

  • Avoid SELECT * in production — specify only needed columns
  • Add LIMIT when exploring large tables or files
  • Use aliases for clarity in reports
  • Push filters earlyWHERE reduces data before other operations

Quick Reference

-- Basic select
SELECT col1, col2 FROM table;

-- All columns
SELECT * FROM table;

-- With alias
SELECT col AS alias FROM table;

-- From file
SELECT * FROM read_csv_auto('file.csv') LIMIT 10;

-- Table info
DESCRIBE tablename;
SHOW TABLES;

Common Pitfalls

PitfallConsequencePrevention
Using SELECT * on large filesFull file scan, slow resultsAlways add LIMIT when exploring
Backtick identifiersSyntax errorUse double quotes for identifiers in DuckDB
Skipping DESCRIBE before queryingUnknown column errorsRun DESCRIBE tablename first

What's Next