Document Type
Conference Paper
Publication Date
2023
DOI
979-8-89176-061-5
Publication Title
Findings of the Association for Computational Linguistics: EMNLP 2023
Pages
4587-4603
Conference Name
The 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore
Abstract
Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.
Original Publication Citation
Asl, J., Blanco, E., & Takabi, D. (2023) RobustEmbed: Robust sentence embeddings using self-supervised contrastive pre-training. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4587-4603). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.305
Repository Citation
Asl, J., Blanco, E., & Takabi, D. (2023) RobustEmbed: Robust sentence embeddings using self-supervised contrastive pre-training. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4587-4603). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.305
ORCID
0000-0003-0447-3641 (Takabi)
Comments
Bibliographic information: ISBN: 979-8-89176-061-5
Editors: Houda Bouamor, Juan Pino, Kalika Bali
© 2023 Association for Computational Linguistics.
Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International (CC BY 4.0) License.