SEER: Backdoor Detection for Vision-Language Models Through Searching Target Text and Image Trigger Jointly
Document Type
Article
Publication Date
2024
DOI
10.1609/aaai.v38i7.28611
Publication Title
Proceedings of the AAAI Conference on Artificial Intelligence
Volume
38
Issue
7
Pages
7766-7774
Abstract
This paper proposes SEER, a novel backdoor detection algorithm for vision-language models, addressing the gap in the literature on multi-modal backdoor detection. While backdoor detection in single-modal models has been well studied, the investigation of such defenses in multi-modal models remains limited. Existing backdoor defense mechanisms cannot be directly applied to multi-modal settings due to their increased complexity and search space explosion. In this paper, we propose to detect backdoors in vision-language models by jointly searching image triggers and malicious target texts in feature space shared by vision and language modalities. Our extensive experiments demonstrate that SEER can achieve over 92% detection rate on backdoor detection in vision-language models in various settings without accessing training data or knowledge of downstream tasks.
Rights
Copyright © 2024, Association for the Advancement of Artificial Intelligence. All rights reserved.
"In the returned rights section of the AAAI copyright form, authors are specifically granted back the right to use their own papers for noncommercial uses, such as inclusion in their dissertations or the right to deposit their papers in their institutional repositories, provided there is proper attribution. The published version is not available for posting outside the AAAI Digital Library."
Included in accordance with publisher policy.
Original Publication Citation
Zhu, L., Ning, R., Li, J., Xin, C., & Wu, H. (2024). SEER: Backdoor detection for vision-language models through searching target text and image trigger jointly. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7766-7774. https://doi.org/10.1609/aaai.v38i7.28611
Repository Citation
Zhu, L., Ning, R., Li, J., Xin, C., & Wu, H. (2024). SEER: Backdoor detection for vision-language models through searching target text and image trigger jointly. Proceedings of the AAAI Conference on Artificial Intelligence, 38(7), 7766-7774. https://doi.org/10.1609/aaai.v38i7.28611
ORCID
0000-0003-4050-6252 (Ning), 0000-0003-0091-6986 (Li)