Data Science Faculty Publications

Exploring the Synergy Between Very Large Transformer and LSTM Models for Effective Medical Captioning from Videos to Text: The Impact of Captioning in Healthcare

Document Type

Article

Publication Date

2026

DOI

10.1007/s44196-025-01067-0

Publication Title

International Journal of Computation Intelligence Systems

Volume

Issue

Abstract

In today’s rapidly evolving digital landscape, the demand for accurate and contextually relevant subtitles for image and video content, particularly in the medical domain, is increasingly critical. Despite the proliferation of visual data across various platforms, existing captioning systems often struggle due to variations in visual settings, complex temporal relationships, and nuanced semantics. Additionally, challenges such as limited datasets, privacy issues, and specialized annotation requirements make medical image captioning particularly difficult. To tackle these challenges, we investigate cutting-edge deep learning methodologies, specifically Transfer Learning and Transformer models, through a comparative analysis. Specifically, we focus on Transfer Learning through the MedVisionCapturer model and Transformer models using CausalVLM. Our findings reveal that the Transfer Learning model achieves notable performance with a BLEU score of 83.34, CIDEr Score of 89.23, METEOR Score of 43.91, and ROUGE-L value of 73.41 when tested on a limited set of CT scan recordings. In contrast, the Transformer model attains competitive yet lower scores: a BLEU score of 71.42, CIDEr Score of 74.20, METEOR Score of 99.06, and ROUGE-L value of 96.10. Therefore, this work underscores the promise of the advanced models introduced to improve the efficacy of automatic medical image captioning systems, ultimately fostering better health outcomes within an increasingly complex medical landscape.

Rights

This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if you modified the licensed material. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Data Availability

Article states: "The real time images are collected from Department of Radiology, Saveetha Medical College & Hospital (SMCH). Saveetha Institute of Medical and Technical Sciences of India. All methods were carried out in accordance with relevant guidelines and regulations. All experimental protocols were approved and consent was obtained from Department of Radiology, Saveetha Medical College & Hospital (SMCH). Saveetha Institute of Medical and Technical Sciences of India."

Original Publication Citation

Aswiga, R. V., & Zahir, M. A. (2026). Exploring the synergy between very large transformer and LSTM models for effective medical captioning from videos to text: The impact of captioning in healthcare. International Journal of Computational Intelligence Systems, 19(1), Article 93. https://doi.org/10.1007/s44196-025-01067-0

Repository Citation

Download

Included in

Artificial Intelligence and Robotics Commons, Communication Technology and New Media Commons, Data Science Commons, Diagnosis Commons, Medical Education Commons

Article Location

COinS

Data Science Faculty Publications

Exploring the Synergy Between Very Large Transformer and LSTM Models for Effective Medical Captioning from Videos to Text: The Impact of Captioning in Healthcare

Document Type

Publication Date

DOI

Publication Title

Volume

Issue

Pages

Abstract

Rights

Data Availability

Original Publication Citation

Repository Citation

Included in

Article Location

Search

Browse

Contribute

Article Locations

Links

Contact Us

Data Science Faculty Publications

Exploring the Synergy Between Very Large Transformer and LSTM Models for Effective Medical Captioning from Videos to Text: The Impact of Captioning in Healthcare

Authors

Document Type

Publication Date

DOI

Publication Title

Volume

Issue

Pages

Abstract

Rights

Data Availability

Original Publication Citation

Repository Citation

Included in

Share

Article Location

Search

Browse

Contribute

Article Locations

Links

Contact Us