Survival Analysis Report — Telco Customer Churn

5 minute read

Published:

1. Data Preparation

1.1 Dataset

ItemValue
SourceIBM Telco Customer Churn
Original rows (Bronze)7,043
Columns21

1.2 Filtering for Survival Analysis

Two filters applied:

  1. Contract = “Month-to-month” only
  2. InternetService ≠ “No” (internet subscribers only)
StageRows% of Original
Bronze (raw)7,043100.0%
Silver (filtered)3,35147.6%

1.3 Churn Distribution (Silver Table)

ChurnCountPercentage
0 (Retained)1,79553.6%
1 (Churned)1,55646.4%
Total3,351 

1.4 Contract Distribution (Full Dataset)

ContractCountPercentage
Month-to-month3,87555.0%
Two year1,69524.1%
One year1,47320.9%

2. Kaplan-Meier Estimator

2.1 What is Kaplan-Meier?

Kaplan-Meier is a non-parametric method that estimates the survival function S(t) — the probability that a customer survives beyond time t. It properly accounts for censored observations.

2.2 Population-Level Survival Curve

Median survival time: 34.0 months

Time PointSurvival ProbabilityInterpretation
6 months0.780378.0% survive at least 6 months
12 months0.695069.5% survive at least 1 year
24 months0.575357.5% survive at least 2 years
34 months0.5000Median — half have churned
48 months0.387238.7% survive 4 years
60 months0.289028.9% survive 5 years

2.3 Covariate-Level Analysis with Log-Rank Test

The log-rank test determines whether survival curves for different groups are statistically distinguishable. Null hypothesis (H₀): the groups have the same survival distribution.

Results for all 15 categorical variables:

VariableLevelsOverall p-valueSignificant (p < 0.05)?
onlineSecurity3< 0.000001✅ Yes
onlineBackup3< 0.000001✅ Yes
deviceProtection3< 0.000001✅ Yes
techSupport3< 0.000001✅ Yes
partner2< 0.000001✅ Yes
dependents2< 0.000001✅ Yes
internetService20.000001✅ Yes
paymentMethod4< 0.000001✅ Yes
multipleLines3< 0.000001✅ Yes
streamingMovies20.000023✅ Yes
streamingTV20.000322✅ Yes
paperlessBilling20.003876✅ Yes
gender20.153317❌ No
phoneService20.194432❌ No
seniorCitizen20.723174❌ No

Key findings:

  • 12 out of 15 variables show statistically significant differences.

  • paymentMethod IS significant (overall p < 0.000001).

  • Service-related features are the most significant.

2.4 DSL Subscriber Survival Probabilities

MonthSurvival Probability
01.0000
30.8347
60.7839
90.7508
120.7270

3. Cox Proportional Hazards Model

3.1 What is Cox PH?

Cox Proportional Hazards is a semi-parametric regression model:

\[h(t|X) = h_0(t) \times e^{\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p}\]
  • HR < 1 → protective (reduces churn risk)

  • HR > 1 → risk factor (increases churn risk)

3.2 Feature Encoding

Original VariableKept ColumnDropped (baseline)
dependentsdependents_Yesdependents_No
internetServiceinternetService_DSLinternetService_Fiber optic
onlineBackuponlineBackup_YesonlineBackup_No
techSupporttechSupport_YestechSupport_No
paperlessBillingpaperlessBilling_YespaperlessBilling_No

3.3 Model Results

CovariateCoef (β)Hazard Ratio exp(β)p-value95% CI
onlineBackup_Yes-0.77660.4600< 0.001[0.4096, 0.5165]
techSupport_Yes-0.63920.5277< 0.001[0.4553, 0.6117]
dependents_Yes-0.32870.7199< 0.001[0.6265, 0.8272]
internetService_DSL-0.21730.80470.0002[0.7167, 0.9034]
Concordance Index: 0.6409    

3.4 Interpretation

  • Online Backup (HR = 0.460): 54% lower hazard of churning.

  • Tech Support (HR = 0.528): 47.2% lower hazard.

  • DSL Internet (HR = 0.805): 19.5% lower hazard compared to Fiber Optic.

3.5 Proportional Hazards Assumption Check

Variablep-valuePH Assumption Violated?
internetService_DSL< 0.0001✅ Yes
onlineBackup_Yes< 0.0001✅ Yes
techSupport_Yes0.0002✅ Yes
dependents_Yes> 0.05❌ No

4. Accelerated Failure Time (AFT) Model

4.1 What is AFT?

AFT models how covariates “accelerate” or “decelerate” the time to event:

\[T = T_0 \times e^{\beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_p X_p}\]
  • exp(β) > 1 → time to churn is longer (protective)

  • exp(β) < 1 → time to churn is short (risk factor)

4.2 Feature Encoding

9 covariates: partner, multipleLines, internetService_DSL, onlineSecurity, onlineBackup, deviceProtection, techSupport, paymentMethod_Bank, paymentMethod_Credit.

4.3 Model Results

Median survival time: 135.51 months

MetricCox PHAFT (Log-Logistic)
Concordance0.64090.7306

4.4 AFT Coefficients

CovariateCoef (β)exp(β)p-valueInterpretation
onlineSecurity_Yes0.86162.3669< 0.0012.37× longer survival
onlineBackup_Yes0.81282.2542< 0.0012.25× longer survival
paymentMethod_Credit0.79902.2234< 0.0012.22× longer survival
techSupport_Yes0.68931.9923< 0.0011.99× longer survival

5. Customer Lifetime Value (CLV)

5.1 Methodology

\[\text{Expected Profit}_m = S(m) \times \text{Monthly Revenue}\] \[\text{NPV}_m = \frac{\text{Expected Profit}_m}{(1 + \text{Monthly IRR})^m}\]

5.2 Customer Profile

Has dependents, Fiber Optic, has online backup, has tech support.

5.3 CLV Table

MonthSurvival ProbExpected ProfitNPVCumulative NPV
10.9830$29.49$29.24$29.24
60.9416$28.25$26.90$168.13
120.9073$27.22$24.64$319.76
240.8583$25.75$21.12$591.61
360.8118$24.35$18.06$824.71

5.4 Profile Comparison (36-Month Cumulative NPV)

Profile36-Month Cumulative NPV
DSL + TechSupport$1,004.91
Fiber + TechSupport$907.05
Fiber, No TechSupport$596.69

6. Summary & Key Takeaways

6.1 Method Comparison

MethodConcordanceKey Assumption Met?
Cox PH0.6409❌ 3/4 violated
AFT0.7306Partially

6.2 Most Important Findings

  1. Median churn time is 34 months for target segment.

  2. Online Backup is the strongest protective factor (HR = 0.460).

  3. The AFT model outperforms Cox PH (concordance 0.73 vs 0.64).

  4. Gender, phoneService, and seniorCitizen do NOT significantly affect churn.

6.3 Business Recommendations

  • Prioritize Online Backup and Tech Support adoption.

  • Monitor Fiber Optic customers closely.

  • Use AFT model for better prediction accuracy.


Appendix: Corrections

IssueOriginalCorrected
paymentMethod significanceNot significantSignificant (p < 0.000001)
AFT interpretationexp(β)>1 = faster churnexp(β)>1 = longer survival
CLV 12m NPV$292.68$319.76
CLV 36m NPV$799.97$824.71