Document Type
Article
Publication Date
2025
DOI
10.1109/ACCESS.2025.3574234
Publication Title
IEEE Access
Volume
Article in press
Pages
20 pp.
Abstract
Research in Bengali Natural Language Processing (BNLP) is rapidly expanding. Despite being one of the most widely spoken languages in the world, BNLP research remains insufficient, particularly in Bengali speech recognition. The languages rich morphology, agglutinative structure, and diverse dialects make text and speech processing especially challenging. However, these challenges can be addressed with effective preprocessing techniques. Various organizations in Bangladesh and West Bengal are integrating Natural Language Processing (NLP) into their services, but without a thorough understanding of preprocessing, these implementations remain incomplete. Applying proper preprocessing techniques to the Bengali language will serve as a foundation for developing robust NLP applications. This paper presents a comprehensive review of preprocessing techniques in BNLP based on state-of-the-art research. It covers key areas such as sentiment analysis, Named Entity Recognition, speech recognition, text categorization, and summarization. First, the paper provides an in-depth discussion of Bengali language characteristics and research areas in BNLP. It then explores the challenges faced by researchers in processing Bengali text and speech. Additionally, it details various preprocessing techniques, highlighting their advantages and disadvantages. Finally, the paper examines future directions for BNLP, emphasizing the role of effective preprocessing in advancing the field.
Rights
© 2025 The Authors.
Published under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.
Original Publication Citation
Chakraborty, S., Das, P., Dipto, S. M., Pramanik, M. A., & Noor, J. (2025). An analytical review of preprocessing techniques in Bengali Natural Language Processing. IEEE Access. Advance online publication. https://doi.org/10.1109/ACCESS.2025.3574234
Repository Citation
Chakraborty, S., Das, P., Dipto, S. M., Pramanik, M. A., & Noor, J. (2025). An analytical review of preprocessing techniques in Bengali Natural Language Processing. IEEE Access. Advance online publication. https://doi.org/10.1109/ACCESS.2025.3574234
Included in
Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons, Scholarly Publishing Commons, South and Southeast Asian Languages and Societies Commons, Systems Architecture Commons
Comments
Bibliographic note: There is a non-functional placeholder DOI on this article, which may cause confusion and may be replaced as the published version becomes available. The non-functional DOI visible on the article is: https://doi.org/10.1109/ACCESS.2017.DOI
The functional DOI, visible on the publisher's landing page, is: https://doi.org/10.1109/ACCESS.2025.3574234