Date of Award

Spring 2019

Document Type


Degree Name

Doctor of Philosophy (PhD)


Computational Modeling & Simulation Engineering


Modeling and Simulation

Committee Director

Jose Padilla

Committee Member

Saikou Diallo

Committee Member

George Hsieh

Committee Member

John Sokolowski

Committee Member

Masha Sosonkina


Agents are commonly created on a set of simple rules driven by theories, hypotheses, and assumptions. Such modeling premise has limited use of real-world data and is challenged when modeling real-world systems due to the lack of empirical grounding. Simultaneously, the last decade has witnessed the production and availability of large-scale data from various sensors that carry behavioral signals. These data sources have the potential to change the way we create agent-based models; from simple rules to driven by data. Despite this opportunity, the literature has neglected to offer a modeling approach to generate granular agent behaviors from data, creating a gap in the literature.

This dissertation proposes a novel data-driven approach for modeling agents to bridge the research gap. The approach is composed of four detailed steps including data preparation, attribute model creation, behavior model creation, and integration. The connection between and within each step is established using data flow diagrams.

The practicality of the approach is demonstrated with a human mobility model that uses millions of location footprints collected from social media. In this model, the generation of movement behavior is tested with five machine learning/statistical modeling techniques covering a large number of model/data configurations. Results show that Random Forest-based learning is the most effective for the mobility use case. Furthermore, agent attribute values are obtained/generated with machine learning and translational assignment techniques.

The proposed approach is evaluated in two ways. First, the use case model is compared to another model which is developed using a state-of-the-art data-driven approach. The model’s prediction performance is comparable to the state-of-the-art model. The plausibility of behaviors and model structure in the use case model is found to be closer to real-world than the state-of-the-art model. This outcome indicates that the proposed approach produces realistic results. Second, a standard mobility dataset is used for driving the mobility model in place of social media data. Despite its small size, the data and model resembled the results gathered from the primary use case indicating the possibility of using different datasets with the proposed approach.