Predicting Flight Delays — Logistic Regression on 1M+ U.S. Flights
Cleaned and feature-engineered a dataset of 1M+ U.S. flights (Jan–Feb 2024), conducted EDA across airports and days of the week, then trained a logistic regression model to predict delay probability. Found origin airport is the dominant predictor — JFK, LGA, and EWR exceeded 80% delay rates — while distance and departure time had minimal influence. Model achieved 65.2% accuracy on a held-out test set.