Document Type

Article

Publication Date

2025

DOI

10.1109/ACCESS.2025.3574234

Publication Title

IEEE Access

Volume

Article in press

Pages

20 pp.

Abstract

Research in Bengali Natural Language Processing (BNLP) is rapidly expanding. Despite being one of the most widely spoken languages in the world, BNLP research remains insufficient, particularly in Bengali speech recognition. The languages rich morphology, agglutinative structure, and diverse dialects make text and speech processing especially challenging. However, these challenges can be addressed with effective preprocessing techniques. Various organizations in Bangladesh and West Bengal are integrating Natural Language Processing (NLP) into their services, but without a thorough understanding of preprocessing, these implementations remain incomplete. Applying proper preprocessing techniques to the Bengali language will serve as a foundation for developing robust NLP applications. This paper presents a comprehensive review of preprocessing techniques in BNLP based on state-of-the-art research. It covers key areas such as sentiment analysis, Named Entity Recognition, speech recognition, text categorization, and summarization. First, the paper provides an in-depth discussion of Bengali language characteristics and research areas in BNLP. It then explores the challenges faced by researchers in processing Bengali text and speech. Additionally, it details various preprocessing techniques, highlighting their advantages and disadvantages. Finally, the paper examines future directions for BNLP, emphasizing the role of effective preprocessing in advancing the field.

Comments

Bibliographic note: There is a non-functional placeholder DOI on this article, which may cause confusion and may be replaced as the published version becomes available. The non-functional DOI visible on the article is: https://doi.org/10.1109/ACCESS.2017.DOI

The functional DOI, visible on the publisher's landing page, is: https://doi.org/10.1109/ACCESS.2025.3574234

Rights

© 2025 The Authors.

Published under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Original Publication Citation

Chakraborty, S., Das, P., Dipto, S. M., Pramanik, M. A., & Noor, J. (2025). An analytical review of preprocessing techniques in Bengali Natural Language Processing. IEEE Access. Advance online publication. https://doi.org/10.1109/ACCESS.2025.3574234

Share

COinS