Skip to main content

STDDEV and Variance

Learning Focus

Use this lesson to compute statistical measures of data spread and distribution in DuckDB.

Statistical Aggregate Functions

FunctionDescription
stddev(col)Population std deviation
stddev_samp(col)Sample std deviation
stddev_pop(col)Population std deviation (alias)
variance(col)Variance (population)
var_samp(col)Sample variance
median(col)Middle value
quantile(col, q)q-th quantile (0 to 1)

Examples

-- Salary statistics
SELECT
AVG(salary) AS mean_salary,
median(salary) AS median_salary,
stddev(salary) AS std_dev,
variance(salary) AS variance
FROM employees;

-- By department
SELECT
department,
AVG(salary) AS avg_salary,
stddev(salary) AS std_dev,
median(salary) AS median_salary
FROM employees
GROUP BY department;

Quantiles and Percentiles

-- Median (50th percentile)
SELECT quantile(salary, 0.50) FROM employees;

-- 25th and 75th percentiles (IQR)
SELECT
quantile(salary, 0.25) AS p25,
quantile(salary, 0.75) AS p75,
quantile(salary, 0.75) - quantile(salary, 0.25) AS iqr
FROM employees;

-- Multiple quantiles at once
SELECT quantile(salary, [0.10, 0.25, 0.50, 0.75, 0.90]) AS salary_distribution
FROM employees;

Mode (Most Frequent Value)

SELECT mode(department) AS most_common_dept FROM employees;

Common Pitfalls

PitfallConsequencePrevention
Using stddev vs stddev_sampDifferent results for small samplesUse stddev_samp for sample data
Expecting quantile on NULL-heavy columnsSkewed resultsFilter IS NOT NULL before quantile
Confusing MEDIAN with AVG on skewed dataMisleading central tendencyReport both when data is skewed

Quick Reference

SELECT AVG(col), median(col), stddev(col), variance(col) FROM t;
SELECT quantile(col, 0.25) AS p25, quantile(col, 0.75) AS p75 FROM t;
SELECT quantile(col, [0.1, 0.5, 0.9]) FROM t;
SELECT mode(col) FROM t;

What's Next