Exploring Computer Vision-Based Models for Multi-Person Attention Analysis

Abstract/Description/Artist Statement

Joint visual attention (JVA) occurs when multiple people share focus on an object, event, or person. Joint attention is achieved between individuals through cues including voice, eye gaze, and head orientation. In research, the most commonly used measurements are gaze and head orientation, as they provide strong indicators of JVA. JVA is a fundamental skill for effective communication and social interaction, thus, understanding where individuals direct their attention and the underlying reasons for that focus is essential for advancing research in social behavior and human-computer interaction. Early studies relied on manual observation, however, with the advancement of eye-tracking systems and computer vision technologies, the field has expanded significantly, adopting automated, data-driven approaches to analyze JVA. These systems help infer attention states, including mutual attention, when two individuals look at each other, and shared attention, when multiple individuals focus on a common object. This project investigates the feasibility of deploying computer vision-based models for multi-person attention analysis in third-person images and videos, which doesn’t require any hardware or wearable sensors. By collecting data in a non-invasive manner, these models allow us to study attention patterns in a naturalistic environment. We plan to apply this approach to collaborative learning environments to  investigate the role of JVA in collaborative learning environments and how differences in JVA influence learning outcomes. Understanding these attention dynamics may allow educators to identify moments of disengagement or misalignment in real time and adapt instructional strategies to better support collaborative learning.

Keywords: Computer vision models, Joint attention, collaborative learning, Human-computer interaction

Presenting Author Name/s

Jannah Elmousalami

Faculty Advisor/Mentor

Sampath Jayrathna

Faculty Advisor/Mentor Email

sampath@cs.odu.edu

Faculty Advisor/Mentor Department

Computer Science

College/School Affiliation

College of Sciences

Student Level Group

Undergraduate

Presentation Type

Poster

This document is currently not available here.

Share

COinS
 

Exploring Computer Vision-Based Models for Multi-Person Attention Analysis

Joint visual attention (JVA) occurs when multiple people share focus on an object, event, or person. Joint attention is achieved between individuals through cues including voice, eye gaze, and head orientation. In research, the most commonly used measurements are gaze and head orientation, as they provide strong indicators of JVA. JVA is a fundamental skill for effective communication and social interaction, thus, understanding where individuals direct their attention and the underlying reasons for that focus is essential for advancing research in social behavior and human-computer interaction. Early studies relied on manual observation, however, with the advancement of eye-tracking systems and computer vision technologies, the field has expanded significantly, adopting automated, data-driven approaches to analyze JVA. These systems help infer attention states, including mutual attention, when two individuals look at each other, and shared attention, when multiple individuals focus on a common object. This project investigates the feasibility of deploying computer vision-based models for multi-person attention analysis in third-person images and videos, which doesn’t require any hardware or wearable sensors. By collecting data in a non-invasive manner, these models allow us to study attention patterns in a naturalistic environment. We plan to apply this approach to collaborative learning environments to  investigate the role of JVA in collaborative learning environments and how differences in JVA influence learning outcomes. Understanding these attention dynamics may allow educators to identify moments of disengagement or misalignment in real time and adapt instructional strategies to better support collaborative learning.

Keywords: Computer vision models, Joint attention, collaborative learning, Human-computer interaction