Extracting, predicting, and explaining human visual reasoning
Abstract
[EMBARGOED UNTIL 12/01/2026] Vision enables humans to navigate complex environments, integrate contextual knowledge, and perform high-level analytic tasks through sophisticated visual reasoning processes. While eye-tracking studies have revealed important behavioral patterns and modern deep learning models have achieved impressive predictive performance, an important scientific gap remains: current AI systems have limited ability to model, interpret, or predict human visual reasoning in a way that is both computationally rigorous and explainable. This dissertation contributes toward addressing this gap by developing frameworks that extract human visual reasoning patterns from behavioral data and incorporate contextual knowledge into neural architectures to support more interpretable prediction. The first contribution introduces a pattern-mining framework that decomposes raw eye-movement sequences into semantically meaningful subsequences. By identifying contrastive visual reasoning patterns across tasks and levels of expertise, this approach highlights latent reasoning strategies that are often obscured in whole-sequence analyses. The method provides a richer and more nuanced representation of visual cognition, revealing temporal-spatial structures and context-dependent signatures that existing computational techniques may overlook. Building on these insights, the dissertation proposes the Context Guided Visual Transformer, a neural architecture that incorporates task-specific contextual information directly into its attention pathways. This design aims to enhance multimodal interaction and improve explainability by generating context-aware attention maps that clarify how input features interact with contextual guidance. These maps provides insight into the model's prediction process and represent a step toward more transparent mechanisms than conventional pixel-level saliency methods. Together, these contributions represent an initial step toward modeling and predicting aspects of context-dependent visual reasoning. By connecting human behavioral insights with explainable computational models, this work provides tools and directions for developing more transparent AI systems and outlines a path for future research in domains where interpretability and trustworthy reasoning are essential.
Table of Contents
PubMed ID
Degree
Ph. D.
