Select Statement
Learning Focus
Use this lesson to understand Select Statement with practical DuckDB syntax and examples.
Concept Overview
What is the SELECT Statement?
The SELECT statement is the most fundamental command in DuckDB. It retrieves data from tables, files, or expressions. DuckDB extends standard SELECT with columnar optimizations and direct file querying.
Why is SELECT Important?
- Data Retrieval: Primary way to get data from any source
- File Querying: DuckDB can SELECT directly from CSV, Parquet, and JSON
- Foundation for Analysis: All analytical work begins with SELECT
- OLAP Power: Columnar execution makes large scans exceptionally fast
Basic Syntax
SELECT column1, column2, ...
FROM table_name;
Sample Database Setup
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR,
department VARCHAR,
salary DECIMAL(10,2),
hire_date DATE,
active BOOLEAN
);
INSERT INTO employees VALUES
(1, 'Alice Johnson', 'Engineering', 95000.00, '2020-03-15', true),
(2, 'Bob Smith', 'Marketing', 72000.00, '2019-06-01', true),
(3, 'Charlie Davis', 'Engineering', 88000.00, '2022-01-20', true),
(4, 'Diana White', 'HR', 65000.00, '2020-09-10', false),
(5, 'Evan Brown', 'Finance', 78000.00, '2021-11-05', true);
Step-by-Step Examples
Example 1: Select All Columns
SELECT * FROM employees;
Expected Output:
┌─────┬───────────────┬─────────────┬──────────┬────────────┬─────────┐
│ id │ name │ department │ salary │ hire_date │ active │
├─────┼───────────────┼─────────────┼──────────┼────────────┼─────────┤
│ 1 │ Alice Johnson │ Engineering │ 95000.00 │ 2020-03-15 │ true │
│ 2 │ Bob Smith │ Marketing │ 72000.00 │ 2019-06-01 │ true │
│ 3 │ Charlie Davis │ Engineering │ 88000.00 │ 2022-01-20 │ true │
│ 4 │ Diana White │ HR │ 65000.00 │ 2020-09-10 │ false │
│ 5 │ Evan Brown │ Finance │ 78000.00 │ 2021-11-05 │ true │
└─────┴───────────────┴─────────────┴──────────┴────────────┴─────────┘
Example 2: Specific Columns
SELECT name, department FROM employees;
Example 3: Column Aliases
SELECT
name AS "Employee Name",
department AS "Dept",
salary AS "Annual Salary"
FROM employees;
Example 4: Calculated Columns
SELECT
name,
salary,
salary / 12 AS monthly_salary,
round(salary * 1.1, 2) AS with_10pct_raise
FROM employees;
Example 5: SELECT from a CSV File (DuckDB-specific)
-- No table creation needed!
SELECT name, department
FROM read_csv_auto('employees.csv')
LIMIT 5;
Practical Use Cases
Use Case 1: Department Directory
SELECT
name AS "Employee",
department AS "Department",
CASE WHEN active THEN 'Active' ELSE 'Inactive' END AS "Status"
FROM employees;
Use Case 2: Payroll Estimation
SELECT
name,
salary,
round(salary / 12, 2) AS monthly_pay,
round(salary * 0.2, 2) AS estimated_tax
FROM employees
WHERE active = true;
Use Case 3: Query a Parquet File Directly
SELECT department, COUNT(*) AS headcount
FROM read_parquet('employees.parquet')
GROUP BY department;
Common Mistakes & Troubleshooting
Mistake 1: Forgetting FROM
-- Wrong
SELECT name, department;
-- Right
SELECT name, department FROM employees;
Mistake 2: Misspelled Column Names
-- Wrong
SELECT naem FROM employees;
-- Error: Referenced column "naem" not found
-- Right: check with DESCRIBE
DESCRIBE employees;
SELECT name FROM employees;
Mistake 3: Quoting Identifiers
-- DuckDB uses double-quotes for identifiers (not backticks)
SELECT "name", "department" FROM employees;
-- MySQL-style backticks don't work in DuckDB
-- SELECT `name` FROM employees; -- Error!
Best Practices
- Avoid
SELECT *in production — specify only needed columns - Add
LIMITwhen exploring large tables or files - Use aliases for clarity in reports
- Push filters early —
WHEREreduces data before other operations
Quick Reference
-- Basic select
SELECT col1, col2 FROM table;
-- All columns
SELECT * FROM table;
-- With alias
SELECT col AS alias FROM table;
-- From file
SELECT * FROM read_csv_auto('file.csv') LIMIT 10;
-- Table info
DESCRIBE tablename;
SHOW TABLES;
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
| Using SELECT * on large files | Full file scan, slow results | Always add LIMIT when exploring |
| Backtick identifiers | Syntax error | Use double quotes for identifiers in DuckDB |
| Skipping DESCRIBE before querying | Unknown column errors | Run DESCRIBE tablename first |
What's Next
- Next: SELECT DISTINCT - Remove duplicate rows from results.
- Module Overview - Return to this module index.