Skip to main content

INNER JOIN

Learning Focus

Use this lesson to master INNER JOIN — combining rows from multiple tables where a condition is satisfied.

Concept Overview

An INNER JOIN returns only rows where the join condition is TRUE in both tables. Rows with no match are excluded.

Sample Data

CREATE TABLE departments (
dept_id INTEGER PRIMARY KEY,
dept_name VARCHAR
);

INSERT INTO departments VALUES
(1, 'Engineering'), (2, 'Marketing'), (3, 'HR'), (4, 'Finance');

CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name VARCHAR,
dept_id INTEGER,
salary DECIMAL(10,2)
);

INSERT INTO employees VALUES
(1, 'Alice', 1, 95000),
(2, 'Bob', 2, 72000),
(3, 'Charlie', 1, 88000),
(4, 'Diana', 3, 65000),
(5, 'Evan', NULL, 78000); -- no department assigned

Basic INNER JOIN

SELECT e.name, d.dept_name, e.salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id;

Note: Evan (NULL dept_id) is excluded — INNER JOIN only returns matched rows.

Multiple JOINs

SELECT e.name, d.dept_name, p.project_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
INNER JOIN project_assignments pa ON e.id = pa.employee_id
INNER JOIN projects p ON pa.project_id = p.id;

JOIN with Aggregation

SELECT
d.dept_name,
COUNT(e.id) AS headcount,
AVG(e.salary) AS avg_salary
FROM employees e
INNER JOIN departments d ON e.dept_id = d.dept_id
GROUP BY d.dept_name
ORDER BY avg_salary DESC;

JOIN on File Data (DuckDB-specific)

-- Join a table with a CSV file — no import needed
SELECT e.name, c.company_name
FROM employees e
INNER JOIN read_csv_auto('companies.csv') c ON e.company_id = c.id;

Common Pitfalls

PitfallConsequencePrevention
Joining on non-indexed columnsSlow joins on large tablesCreate indexes or use DuckDB's parallel hash joins
Cartesian product from missing ONMillions of unexpected rowsAlways verify ON condition
Column name ambiguityError on ambiguous column nameAlways alias tables and prefix columns

Quick Reference

SELECT a.col, b.col
FROM table_a a
INNER JOIN table_b b ON a.id = b.a_id;

-- Shorthand (INNER is implicit)
SELECT * FROM a JOIN b ON a.id = b.a_id;

What's Next