Enterprise AI Analysis: October 26, 2024
Do You (Dis)agree With Me? Modelling Implicit User Disagreement in Human-AI Interaction Using Gaze Data
Authors: Abdulrahman Mohamed Selim, Omair Shahzad Bhatti, Michael Barz, Amr Gomaa, Daniel Sonntag
The widespread use of generative AI has led to increased focus on human-AI interaction. This paper focuses on modelling user disagreement using machine learning (ML) by observing users' implicit viewing behaviour, primarily gaze data. We conducted a controlled study with 30 participants evaluating captions from a simulated ML image-captioning system. Personalised gaze-based models achieved an average balanced accuracy of 68.4%, outperforming generalised modelling (57.0%) and multimodal approaches. Our findings highlight the importance of personalisation, feature selection, and temporal dynamics for robust disagreement detection, while also addressing ethical and privacy implications of continuous passive gaze and facial monitoring. We release the dataset to support reproducibility and further work.
Key Findings & Executive Impact
Discover the core insights that can transform your enterprise AI strategy, informed by cutting-edge research.
Deep Analysis & Enterprise Applications
Select a topic to dive deeper, then explore the specific findings from the research, rebuilt as interactive, enterprise-focused modules.
This research addresses the critical challenge of implicitly detecting user disagreement in Human-AI interaction. As AI systems become more prevalent, unexpected or conflicting outputs can lead to user frustration and mistrust. The study investigates how passive signals, specifically gaze and facial data, can be used to model user disagreement without requiring explicit feedback. By focusing on image-captioning as a task, the researchers explored whether implicit viewing behaviors correlate with explicit user judgments of agreement or disagreement.
Key Research Questions:
- RQ1: Can perceived disagreement be reliably detected from passive gaze and facial signals?
- RQ2: How do personalized and generalized models compare in performance and robustness?
- RQ3: Which gaze-based features and time windows are most predictive, and how consistent are they across participants?
Ultimately, the study aims to enhance the responsiveness and trustworthiness of interactive AI systems by enabling implicit disagreement detection.
A controlled user study was conducted with 30 participants (21 male, 9 female; mean age 26.4 years) evaluating 154 image-caption pairs from the FOIL-COCO dataset, which contained deliberate single-word errors. Participants provided binary judgments of agree/disagree.
Data Collection:
- Eye-tracking data: Tobii Pro Fusion eye-tracker at 250 Hz.
- Facial video recordings: Luxonis OAK-D camera at 30 Hz.
Data Processing & Feature Extraction:
- Gaze data: Resampled to 4ms intervals. Fixation and saccadic events identified using the Dispersion-Threshold Identification (I-DT) algorithm with an adaptive procedure based on Median Absolute Deviation (MAD) of gaze velocities.
- Features: A total of 39 gaze-based features were extracted, including fixation counts/durations, saccade dynamics (amplitude, velocity), scanpath characteristics, AOI transitions, and pupil responses.
- Facial data: Action Units (AUs) were extracted from facial video recordings, with statistical summaries computed.
Machine Learning:
- Classical ML algorithms (SVM, XGBoost, LDA, RF) were used.
- Evaluated using Balanced Accuracy, F1-score, Recall, Precision, and AUC.
- Two main experimental sets:
- Initial Experiments: Multimodal (gaze+facial) vs unimodal (gaze-only, facial-only); Generalised (across participants) vs. Personalised (within-participant) models.
- Post-hoc Analysis: Focused on gaze data, exploring effects of feature selection (FA, FB, Fpool subsets) and time-window selection (full recording, last 11 seconds, last 3 seconds) for personalised models.
The processed dataset is publicly available on GitHub.
The research revealed that personalised models significantly outperformed generalised models in detecting implicit user disagreement, particularly when relying solely on gaze data.
Key Results:
- RQ1 (Detectability): Perceived disagreement can be reliably detected from passive gaze signals. Gaze-only models were more effective than multimodal models (combining gaze and facial data). Facial data alone yielded chance-level performance.
- RQ2 (Personalisation vs. Generalisation): Personalised models achieved a mean balanced accuracy of 68.40% (±5.70%), significantly outperforming generalised models (mean BA ≈ 57.00% ±2.54%). This highlights high inter-participant variability and the necessity of tailoring models to individual users.
- RQ3-A (Predictive Features): Fixation counts and durations, saccade and scanpath characteristics, and pupil diameter variability were identified as primary gaze-based signals for disagreement. However, no single feature set was universally optimal, reinforcing the need for adaptive, participant-specific modelling.
- RQ3-B (Time Window): The optimal time window for disagreement detection varied across individuals. While full recordings offered robust overall performance (BA=70.20%, AUC=70.30%) at the group level, some participants benefited from shorter windows (e.g., 3-second window achieved highest recall at 72.50%). This suggests that discriminative cues can occur at different temporal segments of an interaction, requiring personalisation.
Overall, the modest overall accuracy (though significantly better with personalization) indicates that subtle implicit disagreement signals are challenging to capture, but the approach shows feasibility.
The study concludes that while implicit disagreement can be detected above chance level using passive gaze signals, reliable detection suitable for real-world applications remains an open challenge. The findings underscore the critical importance of personalisation in human-AI interaction, as fixed, one-size-fits-all models are unlikely to generalise well across diverse users.
Ethical and Privacy Concerns:
Continuous passive gaze and facial monitoring raises significant ethical considerations, including informed consent, data privacy, and user discomfort. The practical benefits must carefully outweigh the privacy burden. Future deployments should adopt privacy-preserving designs (e.g., opt-in, data minimisation, local processing) and offer user control and transparency.
Future Research Directions:
- Richer Stimuli: Incorporate datasets with controlled variation in foil similarity and contextual plausibility, and different error types (object, attribute, relation, context violations) to elicit stronger, more varied disagreement signals and enable analysis of disagreement intensity.
- Multimodal Signals: Continue exploring richer facial, vocal, and other behavioral signals, and advanced multimodal fusion techniques.
- Advanced ML Models: Investigate state-of-the-art deep neural networks for automatic feature extraction and representation learning, requiring larger, more diverse datasets to avoid overfitting.
- Naturalistic Contexts: Expand studies to more naturalistic settings and broader populations to assess robustness under realistic variability.
- User-Controllable Interventions: Develop adaptive systems that trigger lightweight, user-controllable interventions when disagreement is detected, rather than fully automated, potentially disruptive actions.
The work provides a baseline and motivates further development of end-to-end interactive prototypes for adaptive human-AI systems.
Implicit Disagreement Detection Process (Figure 1)
| Model Type | Balanced Accuracy (BA) | Key Benefits |
|---|---|---|
| Personalized Models | 68.40% (avg) |
|
| Generalised Models | 57.00% (avg) |
|
Ethical Considerations in Passive Monitoring
The research highlights significant ethical concerns with continuous passive gaze and facial monitoring. While valuable for implicit feedback, this approach raises issues related to informed consent, data privacy, and user discomfort. The study suggests that practical deployments must carefully weigh benefits against the privacy burden, advocating for:
- Privacy-preserving designs: Opt-in consent, purpose limitation, local/on-device processing.
- Transparency: Clear communication about what is sensed and why.
- User Control: Easy opt-out mechanisms and episodic, rather than continuous, sensing.
These considerations are crucial for building trustworthy and ethical human-AI systems that respect user agency and privacy.
Quantify the Impact of Enhanced AI Interaction
Estimate potential annual savings and reclaimed human hours by deploying AI systems with implicit disagreement detection, improving human-AI collaboration efficiency.
Your Roadmap to Seamless AI Integration
Our structured approach ensures a smooth transition and maximum impact for your enterprise.
Phase 1: Diagnostic Assessment & Customization
Our experts analyze your current AI interaction workflows and data. We identify key points where implicit disagreement impacts efficiency and tailor a data collection strategy for your specific use cases and user base.
Phase 2: Pilot Deployment & Personalization
Implement a pilot program with our implicit disagreement detection models, leveraging your collected gaze data. We fine-tune models to individual user patterns, maximizing accuracy and minimizing false positives in a controlled environment.
Phase 3: Integration & Scalable Rollout
Seamlessly integrate the refined models into your existing AI platforms. We provide continuous monitoring and adaptation strategies, ensuring robust performance across diverse user groups and evolving interaction scenarios.
Ready to Transform Your Human-AI Interaction?
Book a complimentary strategy session to explore how personalized implicit disagreement detection can elevate your enterprise AI systems.