RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training

Javad Asl, Georgia State University
Eduardo Blanco, University of Arizona
Daniel Takabi, Old Dominion UniversityFollow

Document Type

Conference Paper

Publication Date

2023

DOI

979-8-89176-061-5

Publication Title

Findings of the Association for Computational Linguistics: EMNLP 2023

Pages

4587-4603

Conference Name

The 2023 Conference on Empirical Methods in Natural Language Processing, December 6-10, 2023, Singapore

Abstract

Pre-trained language models (PLMs) have demonstrated their exceptional performance across a wide range of natural language processing tasks. The utilization of PLM-based sentence embeddings enables the generation of contextual representations that capture rich semantic information. However, despite their success with unseen samples, current PLM-based representations suffer from poor robustness in adversarial scenarios. In this paper, we propose RobustEmbed, a self-supervised sentence embedding framework that enhances both generalization and robustness in various text representation tasks and against diverse adversarial attacks. By generating high-risk adversarial perturbations to promote higher invariance in the embedding space and leveraging the perturbation within a novel contrastive objective approach, RobustEmbed effectively learns high-quality sentence embeddings. Our extensive experiments validate the superiority of RobustEmbed over previous state-of-the-art self-supervised representations in adversarial settings, while also showcasing relative improvements in seven semantic textual similarity (STS) tasks and six transfer tasks. Specifically, our framework achieves a significant reduction in attack success rate from 75.51% to 39.62% for the BERTAttack attack technique, along with enhancements of 1.20% and 0.40% in STS tasks and transfer tasks, respectively.

Comments

Bibliographic information: ISBN: 979-8-89176-061-5

Editors: Houda Bouamor, Juan Pino, Kalika Bali

Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Original Publication Citation

Asl, J., Blanco, E., & Takabi, D. (2023) RobustEmbed: Robust sentence embeddings using self-supervised contrastive pre-training. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4587-4603). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.305

Repository Citation

Asl, J., Blanco, E., & Takabi, D. (2023) RobustEmbed: Robust sentence embeddings using self-supervised contrastive pre-training. In H. Bouamor, J. Pino, & K. Bali (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2023 (pp. 4587-4603). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.findings-emnlp.305

ORCID

0000-0003-0447-3641 (Takabi)

School of Cybersecurity Faculty Publications

RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training

Document Type

Publication Date

DOI

Publication Title

Pages

Conference Name

Abstract

Comments

Original Publication Citation

Repository Citation

ORCID

Included in

Search

Browse

Contribute

Links

Contact Us

School of Cybersecurity Faculty Publications

RobustEmbed: Robust Sentence Embeddings Using Self-Supervised Contrastive Pre-Training

Authors

Document Type

Publication Date

DOI

Publication Title

Pages

Conference Name

Abstract

Comments

Original Publication Citation

Repository Citation

ORCID

Included in

Share

Search

Browse

Contribute

Links

Contact Us