Exploring Computer Vision-Based Models for Multi-Person Attention Analysis
Abstract/Description/Artist Statement
Joint visual attention (JVA) occurs when multiple people share focus on an object, event, or person. Joint attention is achieved between individuals through cues including voice, eye gaze, and head orientation. In research, the most commonly used measurements are gaze and head orientation, as they provide strong indicators of JVA. JVA is a fundamental skill for effective communication and social interaction, thus, understanding where individuals direct their attention and the underlying reasons for that focus is essential for advancing research in social behavior and human-computer interaction. Early studies relied on manual observation, however, with the advancement of eye-tracking systems and computer vision technologies, the field has expanded significantly, adopting automated, data-driven approaches to analyze JVA. These systems help infer attention states, including mutual attention, when two individuals look at each other, and shared attention, when multiple individuals focus on a common object. This project investigates the feasibility of deploying computer vision-based models for multi-person attention analysis in third-person images and videos, which doesn’t require any hardware or wearable sensors. By collecting data in a non-invasive manner, these models allow us to study attention patterns in a naturalistic environment. We plan to apply this approach to collaborative learning environments to investigate the role of JVA in collaborative learning environments and how differences in JVA influence learning outcomes. Understanding these attention dynamics may allow educators to identify moments of disengagement or misalignment in real time and adapt instructional strategies to better support collaborative learning.
Keywords: Computer vision models, Joint attention, collaborative learning, Human-computer interaction
Faculty Advisor/Mentor
Sampath Jayrathna
Faculty Advisor/Mentor Email
sampath@cs.odu.edu
Faculty Advisor/Mentor Department
Computer Science
College/School Affiliation
College of Sciences
Student Level Group
Undergraduate
Presentation Type
Poster
Exploring Computer Vision-Based Models for Multi-Person Attention Analysis
Joint visual attention (JVA) occurs when multiple people share focus on an object, event, or person. Joint attention is achieved between individuals through cues including voice, eye gaze, and head orientation. In research, the most commonly used measurements are gaze and head orientation, as they provide strong indicators of JVA. JVA is a fundamental skill for effective communication and social interaction, thus, understanding where individuals direct their attention and the underlying reasons for that focus is essential for advancing research in social behavior and human-computer interaction. Early studies relied on manual observation, however, with the advancement of eye-tracking systems and computer vision technologies, the field has expanded significantly, adopting automated, data-driven approaches to analyze JVA. These systems help infer attention states, including mutual attention, when two individuals look at each other, and shared attention, when multiple individuals focus on a common object. This project investigates the feasibility of deploying computer vision-based models for multi-person attention analysis in third-person images and videos, which doesn’t require any hardware or wearable sensors. By collecting data in a non-invasive manner, these models allow us to study attention patterns in a naturalistic environment. We plan to apply this approach to collaborative learning environments to investigate the role of JVA in collaborative learning environments and how differences in JVA influence learning outcomes. Understanding these attention dynamics may allow educators to identify moments of disengagement or misalignment in real time and adapt instructional strategies to better support collaborative learning.
Keywords: Computer vision models, Joint attention, collaborative learning, Human-computer interaction