Skip to main content

Cohort Analysis

Learning Focus

Use this lesson to build cohort retention tables — a foundational analytics technique for measuring user loyalty and product stickiness.

What is Cohort Analysis?

Cohort analysis groups users by their acquisition date (signup month) and tracks what percentage of each cohort is still active in subsequent months.

Cohort Month │ Month 0 │ Month 1 │ Month 2 │ Month 3
─────────────┼─────────┼─────────┼─────────┼────────
2024-01 │ 100% │ 65% │ 45% │ 32%
2024-02 │ 100% │ 70% │ 50% │ —
2024-03 │ 100% │ 68% │ — │ —

Sample Data Setup

CREATE TABLE user_events (
user_id INTEGER,
event_type VARCHAR,
event_date DATE
);

-- Simulate signups and activity
INSERT INTO user_events VALUES
(1, 'signup', '2024-01-05'), (1, 'active', '2024-02-10'),
(1, 'active', '2024-03-15'), (2, 'signup', '2024-01-10'),
(2, 'active', '2024-02-20'), (3, 'signup', '2024-02-01'),
(3, 'active', '2024-03-05'), (4, 'signup', '2024-01-20');

Step 1: Identify Cohort (Signup Month)

CREATE VIEW user_cohorts AS
SELECT
user_id,
MIN(event_date) AS signup_date,
date_trunc('month', MIN(event_date)) AS cohort_month
FROM user_events
WHERE event_type = 'signup'
GROUP BY user_id;

SELECT * FROM user_cohorts LIMIT 5;

Step 2: Calculate Activity Months

CREATE VIEW user_activity AS
SELECT DISTINCT
e.user_id,
date_trunc('month', e.event_date) AS activity_month
FROM user_events e
WHERE event_type = 'active';

Step 3: Build the Cohort Retention Matrix

WITH cohort_data AS (
SELECT
c.cohort_month,
-- How many months after signup is this activity?
DATEDIFF('month', c.cohort_month, a.activity_month) AS period,
COUNT(DISTINCT a.user_id) AS active_users
FROM user_cohorts c
INNER JOIN user_activity a ON c.user_id = a.user_id
GROUP BY c.cohort_month, period
),
cohort_sizes AS (
SELECT cohort_month, COUNT(*) AS cohort_size
FROM user_cohorts
GROUP BY cohort_month
)
SELECT
cd.cohort_month,
cs.cohort_size,
cd.period,
cd.active_users,
ROUND(cd.active_users * 100.0 / cs.cohort_size, 1) AS retention_pct
FROM cohort_data cd
JOIN cohort_sizes cs ON cd.cohort_month = cs.cohort_month
ORDER BY cd.cohort_month, cd.period;

Export Cohort Table to CSV

COPY (
-- same query as above
SELECT cohort_month, period, retention_pct
FROM cohort_data cd JOIN cohort_sizes cs ON cd.cohort_month = cs.cohort_month
ORDER BY cohort_month, period
) TO 'cohort_retention.csv' (FORMAT CSV, HEADER true);

Common Pitfalls

PitfallConsequencePrevention
Including re-signup events in cohortDuplicated users in cohortUse MIN(event_date) for signup
Not deduplicating activity per monthInflated retention %Use DISTINCT user_id in activity CTE
DATEDIFF in days instead of monthsWrong period columnUse DATEDIFF('month', ...) explicitly

Quick Reference

-- Cohort = first event per user
MIN(event_date) AS signup_date

-- Period = months since signup
DATEDIFF('month', cohort_month, activity_month) AS period

-- Retention %
active_users * 100.0 / cohort_size AS retention_pct

What's Next