StyleScript: A Structured Data Augmentation Framework for Transformer-Based OCR in Engineering Documents

Document Type

Conference Paper

Publication Date

2026

Publication Title

Proceedings of the 59th Hawaii International Conference on System Sciences

Pages

1692-1701

Conference Name

59th Hawaii International Conference on System Sciences, January 6-9, 2026, Maui, Hawaii, USA

Abstract

Engineering documents often contain a combination of printed and handwritten text, intricate layouts, and visual degradation, posing significant challenges to Optical Character Recognition (OCR) systems. Transformer-based models like TrOCR provide strong baseline performance but require domain-specific data augmentation to generalize effectively. This paper introduces StyleScript, a structured data augmentation framework that generates realistic synthetic word images by extracting stroke-based style features such as slant angle and thickness. We fine-tune both TrOCR (small) and TrOCR (large) models using real and StyleScript-augmented data derived from a Military Sealift Command (MSC) dataset. Additionally, we develop a systematic OCR pipeline combining preprocessing, CRAFT-based text detection, and fine-tuned TrOCR recognition to digitize full-page engineering documents with spatial fidelity. Experimental results show that StyleScript-enhanced training improves OCR performance across diverse document conditions, making it a practical solution for engineering and other technical domains with limited annotated data.

Rights

© 2026 The Authors.

Published under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND) License.

ORCID

0009-0004-6773-3943 (Javadnejad), 0000-0003-0091-6986 (Li), 0000-0003-0144-9099 (Sousa-Poza), 0000-0003-2542-5454 (Khallouli)

Original Publication Citation

Javadnejad, F., Li, J., Kovacic, S., Sousa-Poza, A., & Khallouli, W. (2026). StyleScript: A structured data augmentation framework for transformer-based OCR in engineering documents. In T. X. Bui (Ed.), Proceedings of the 59th Hawaii International Conference on System Sciences (pp. 1692-1701). University of Hawai'i at Mānoa. https://hdl.handle.net/10125/111597

Share

COinS