Document Type

Article

Publication Date

2025

DOI

10.36227/techrxiv.174123588.85605769/v1

Publication Title

TechRxiv

Pages

1-38

Abstract

Large Language Models (LLMs) play an increasingly integrated and pivotal role in generating diverse types of texts, such as social media messages, emails, narratives, and technical reports, among other textual communication forms. As AI-generated messaging filters into human communication, a systematic exploration of their effectiveness for mimicking human-like communication of life events is needed. In this study, we employ a zero-shot structured narrative prompt to generate 24,000 life event messages for birth, death, hiring, and firing events using OpenAI's GPT-4. From this dataset, we manually classify 2880 messages and evaluate their validity in conveying these life events through the form of X (formerly Twitter) posts. Human evaluators found that 87.43% of the sampled messages (n = 2880) sufficiently met the intentions of their structured prompts based on their interpretations of the prompt and the corresponding AI-generated message. To automate the identification of valid and invalid messages, we train and validate nine Machine Learning models (ML) on the classified datasets. Leveraging an ensemble of these nine models, we extend our analysis to predict the classifications of the remaining 21,120 untagged messages. Finally, we manually tag 1% of the messages with predicted classifications and attain 90.57% accuracy (sd = 29.3%, n = 212) across the four life event types. The ML models excelled at classifying valid messages as valid, but experienced challenges at simultaneously classifying invalid messages as invalid. Our findings advance the study of LLM capabilities, limitations, and validity while offering practical insights for message generation and natural language processing applications.

Rights

© 2025 The Authors.

Published under the terms of a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Comments

This work has been submitted to the IEEE for possible publication. e-Prints posted on TechRxiv are preliminary reports that are not peer reviewed. They should not be regarded as conclusive, guide clinical practice/health-related behavior, or be reported in the media as established information.

Original Publication Citation

Lynch, C., Jensen, E., Gore, R., Zamponi, V., O'Brien, K., Feldhaus, B., Smith, K., Martínez, J., Munro, M. H., Ozkose, T. E., Gundogdu, T. B., Reinhold, A. M., Kavak, H., & Ezell, B. (2025). AI-generated messaging for life events using structured prompts: A comparative study of GPT with human experts and machine learning. TechRxiv. https://doi.org/10.36227/techrxiv.174123588.85605769/v1

ORCID

0000-0002-4830-7488 (Lynch), 0000-0002-5026-4501 (Smith), 0000-0002-2244-8274 (Martinez), 0000-0003-4274-908X (Ezell)

Share

COinS