Document Type

Conference Paper

Publication Date

2023

DOI

979-8-89176-061-5

Publication Title

Findings of the Association for Computational Linguistics: EMNLP 2023

Pages

4587-4603

Conference Name

The 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore

Abstract

Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.

Comments

Bibliographic information: ISBN: 979-8-89176-061-5

Editors: Houda Bouamor, Juan Pino, Kalika Bali

© 2023 Association for Computational Linguistics.

Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Original Publication Citation

Asl, J., Blanco, E., & Takabi, D. (2023) RobustEmbed: Robust sentence embeddings using self-supervised contrastive pre-training. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4587-4603). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.305

ORCID

0000-0003-0447-3641 (Takabi)

Share

COinS