Date of Award

Spring 2019

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Civil/Environmental Engineering

Committee Director

Rajesh Paleti

Committee Director

Mecit Cetin

Committee Member

Hong Yang

Abstract

One of the most fundamental tasks when it comes to analyzing data using statistical methods is to understand the relationship between the explanatory variables and the outcome. Misclassification of explanatory variables is a common risk when using statistical modeling techniques. In this dissertation, we define ‘misclassification,’ as a response that is reported or recorded in the wrong category; for example, a variable is registered as a one when it should have the value zero. Misclassification can easily happen in any data; for example, in an interview setting where the respondent misunderstands the question or the interviewer checks the wrong box.

The results uncovered significant misclassification rates ranging from 1% to 40% for different auto ownership alternatives, in the first part of the dissertation. Also, the results from latent class models provide evidence for variation in misclassification probabilities across different population segments. The second part of the dissertation uses traditional crash databases that record police-reported injury severity data, which are prone to misclassification errors. In addition, we developed a mixed generalized ordered response model that quantifies misclassification rates in the injury severity variable and adjusts the bias in parameter estimates due to misclassification. The model uncovered a 32% misclassification rate in the non-incapacitating severity category. As another case study, the misclassification extent in the telecommuting frequency data is also investigated. Telecommuting frequency is a response variable collected in travel surveys; therefore, it is prone to errors leading to mismeasurements or misclassification. The objective of this investigation of the dissertation is to develop a statistical model to analyze telecommuting data while accounting for potential misclassification errors.

Models that ignore misclassification were not only found to have lower statistical fit but also significantly different elasticity effects, particularly for choice alternatives with high misclassification probabilities. Overall, the simulation analysis, along with the other models developed, suggests that the models that consider misclassification in the data perform better than the ones that ignore the misclassification. The methods developed in this study can be extended to analyze misclassification in other transportation disciplines.

DOI

10.25777/9nvc-bn12

ISBN

9781085623308

Share

COinS