Strategic Insights into US Airline Market: Fare Prediction and Market Landscape

For the final project of INF1290-Data Analytics: Introduction, Methods, and Practical Approaches, we analyzed fare pricing patterns, market competition, and market segmentation in the US airline industry. The motivation behind this analysis stems from the significant impact that fare pricing and market competition have on the profitability and sustainability of airlines.
After necessary data cleaning and preparation, we conducted EDA including a correlation study, time-series analysis for fare trends, and market analytics for leading locations and top players with the lowest fares.




Based on our preliminary findings, which identified Southwestern Airlines as a prime candidate for further study, we focused our subsequent analysis on recommending pricing strategies for Southwestern Airlines and gaining insights into their passenger needs. Our analysis addresses the following fundamental research questions:
[R1]: What factors influence fare prices for Southwestern Airlines and how do we inform their pricing strategy?
[R2]: What are the characteristics of the market segments that Southwest Airlines aims to attract?
To address our first research [R1], we designed and evaluated the predictive performance of different machine learning models, which included Linear Regression, Decision Tree Regressor, and Random Forest Regressor, to forecast fare prices for Southwestern Airlines’ flights. To achieve this, we trained all three models with all relevant continuous variables and assessed their accuracy using the R-squared metric. The Random Forest Regressor model demonstrated the highest accuracy in forecasting Southwestern Airlines’ flight fares (r^2 = 0.9982).

The three (3) most common features were the flight distance (nsmiles), the fare per mile (fare_per_mile), and the year of the flight (Year). Recognizing that the flight distance (nsmiles) is a significant factor in predicting fares, our subsequent analysis (clustering) further centres this feature to extract insights and inform on Southwest’s market segments.

To tackle our second research [R2] we leveraged a K-Means clustering model to identify patterns and groups of Southwest flights with similar characteristics. Here, our model partitioned the data into distinct clusters based on two key features: the number of passengers (passengers) and the distance (nsmiles) traveled. this clustering allows for a nuanced analysis of customer segmentation and could inform how Southwest AIrlines could tailor marketing strategies based on the characteristics of each cluster. For example, offering promotional deals for regional flights (Cluster 0) or loyalty programs for frequent long-haul travelers (Cluster 2).