Date of Award

Spring 5-2022

Document Type


Degree Name

Doctor of Philosophy (PhD)


Civil & Environmental Engineering


Civil and Environmental Engineering

Committee Director

Sherif Ishak

Committee Member

Mecit Cetin

Committee Member

Hong Yang

Committee Member

Kun Xie


Ride-sourcing transportation services offered by transportation network companies (TNCs) like Uber and Lyft are disrupting the transportation landscape. The growing demand on these services, along with their potential short and long-term impacts on the environment, society, and infrastructure emphasize the need to further understand the ride-sourcing system. There were no sufficient data to fully understand the system and integrate it within regional multimodal transportation frameworks. This can be attributed to commercial and competition reasons, given the technology-enabled and innovative nature of the system. Recently, in 2019, the City of Chicago the released an extensive and complete ride-sourcing trip-level data for all trips made within the city since November 1, 2018. The data comprises the trip ends (pick-up and drop-off locations), trip timestamps, trip length and duration, fare including tipping amounts, and whether the trip was authorized to be shared (pooled) with another passenger or not.

Therefore, the main goal of this dissertation is to develop a comprehensive data-driven framework to understand and model the system using this data from Chicago, in a reproducible and transferable fashion. Using data fusion approach, sociodemographic, economic, parking supply, transit availability and accessibility, built environment and crime data are collected from open sources to develop this framework. The framework is predicated on three pillars of analytics: (1) explorative and descriptive analytics, (2) diagnostic analytics, and (3) predictive analytics. The dissertation research framework also provides a guide on the key spatial and behavioral explanatory variables shaping the utility of the mode, driving the demand, and governing the interdependencies between the demand’s willingness to share and surge price. Thus, the key findings can be readily challenged, verified, and utilized in different geographies.

In the explorative and descriptive analytics, the ride-sourcing system’s spatial and temporal dimensions of the system are analyzed to achieve two objectives: (1) explore, reveal, and assess the significance of spatial effects, i.e., spatial dependence and heterogeneity, in the system behavior, and (2) develop a behavioral market segmentation and trend mining of the willingness to share. This is linked to the diagnostic analytics layer, as the revealed spatial effects motivates the adoption of spatial econometric models to analytically identify the ride-sourcing system determinants. Multiple linear regression (MLR) is used as a benchmark model against spatial error model (SEM), spatially lagged X (SLX) model, and geographically weighted regression (GWR) model. Two innovative modeling constructs are introduced deal with the ride-sourcing system’s spatial effects and multicollinearity: (1) Calibrated Spatially Lagged X Ridge Model (CSLXR) and Calibrated Geographically Weighted Ridge Regression (CGWRR) in the diagnostic analytics layer.

The identified determinants in the diagnostic analytics layer are then fed into the predictive analytics one to develop an interpretable machine learning (ML) modeling framework. The system’s annual average weekday origin-destination (AAWD OD) flow is modeled using the following state-of-the-art ML models: (1) Multilayer Perceptron (MLP) Regression, (2) Support Vector Machines Regression (SVR), and (3) Tree-based ensemble learning methods, i.e., Random Forest Regression (RFR) and Extreme Gradient Boosting (XGBoost). The innovative modeling construct of CGWRR developed in the diagnostic analytics is then validated in a predictive context and is found to outperform the state-of-the-art ML models in terms of testing score of 0.914, in comparison to 0.906 for XGBoost, 0.84 for RFR, 0.89 for SVR, and 0.86 for MLP. The CGWRR exhibits outperformance as well in terms of the root mean squared error (RMSE) and mean average error (MAE).

The findings of this dissertation partially bridge the gap between the practice and the research on ride-sourcing transportation systems understanding and integration. The empirical findings made in the descriptive and explorative analytics can be further utilized by regional agencies to fill practice and policymaking gaps on regulating ride-sourcing services using corridor or cordon toll, optimally allocating standing areas to minimize deadheading, especially during off-peak periods, and promoting the ride-share willingness in disadvantage communities. The CGWRR provides a reliable modeling and simulation tool to researchers and practitioners to integrate the ride-sourcing system in multimodal transportation modeling frameworks, simulation testbed for testing long-range impacts of policies on ride-sourcing, like improved transit supply, congestions pricing, or increased parking rates, and to plan ahead for similar futuristic transportation modes, like the shared autonomous vehicles.