STDDEV and Variance
Learning Focus
Use this lesson to compute statistical measures of data spread and distribution in DuckDB.
Statistical Aggregate Functions
| Function | Description |
|---|---|
stddev(col) | Population std deviation |
stddev_samp(col) | Sample std deviation |
stddev_pop(col) | Population std deviation (alias) |
variance(col) | Variance (population) |
var_samp(col) | Sample variance |
median(col) | Middle value |
quantile(col, q) | q-th quantile (0 to 1) |
Examples
-- Salary statistics
SELECT
AVG(salary) AS mean_salary,
median(salary) AS median_salary,
stddev(salary) AS std_dev,
variance(salary) AS variance
FROM employees;
-- By department
SELECT
department,
AVG(salary) AS avg_salary,
stddev(salary) AS std_dev,
median(salary) AS median_salary
FROM employees
GROUP BY department;
Quantiles and Percentiles
-- Median (50th percentile)
SELECT quantile(salary, 0.50) FROM employees;
-- 25th and 75th percentiles (IQR)
SELECT
quantile(salary, 0.25) AS p25,
quantile(salary, 0.75) AS p75,
quantile(salary, 0.75) - quantile(salary, 0.25) AS iqr
FROM employees;
-- Multiple quantiles at once
SELECT quantile(salary, [0.10, 0.25, 0.50, 0.75, 0.90]) AS salary_distribution
FROM employees;
Mode (Most Frequent Value)
SELECT mode(department) AS most_common_dept FROM employees;
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
Using stddev vs stddev_samp | Different results for small samples | Use stddev_samp for sample data |
| Expecting quantile on NULL-heavy columns | Skewed results | Filter IS NOT NULL before quantile |
| Confusing MEDIAN with AVG on skewed data | Misleading central tendency | Report both when data is skewed |
Quick Reference
SELECT AVG(col), median(col), stddev(col), variance(col) FROM t;
SELECT quantile(col, 0.25) AS p25, quantile(col, 0.75) AS p75 FROM t;
SELECT quantile(col, [0.1, 0.5, 0.9]) FROM t;
SELECT mode(col) FROM t;
What's Next
- Next: Approximate Aggregation - Fast approximations for massive datasets.
- Module Overview - Return to this module index.