GAZE 2023: Gaze Estimation and Prediction in the Wild

Sunday Morning, 18th June 2023 (half-day)

5 papers have been accepted in our GAZE2023 workshop.
Congratulations to all authors!

Introduction

The 5th International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2023) at CVPR 2023 aims to encourage and highlight novel strategies for eye gaze estimation and prediction with a focus on robustness and accuracy in extended parameter spaces, both spatially and temporally. This is expected to be achieved by applying novel neural network architectures, incorporating anatomical insights and constraints, introducing new and challenging datasets, and exploiting multi-modal training. Specifically, the workshop topics include (but are not limited to):

Reformulating eye detection, gaze estimation, and gaze prediction pipelines with deep networks.
Applying geometric and anatomical constraints into the training of (sparse or dense) deep networks.
Leveraging additional cues such as contexts from face region and head pose information.
Developing adversarial methods to deal with conditions where current methods fail (illumination, appearance, etc.).
Exploring attention mechanisms to predict the point of regard.
Designing new accurate measures to account for rapid eye gaze movement.
Novel methods for temporal gaze estimation and prediction including Bayesian methods.
Integrating differentiable components into 3D gaze estimation frameworks.
Robust estimation from different data modalities such as RGB, depth, head pose, and eye region landmarks.
Generic gaze estimation method for handling extreme head poses and gaze directions.
Temporal information usage for eye tracking to provide consistent gaze estimation on the screen.
Personalization of gaze estimators with few-shot learning.
Semi-/weak-/un-/self- supervised leraning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.

We will be hosting 2 invited speakers for the topic of gaze estimation. We will also be accepting the submission of full unpublished papers as done in previous versions of the workshop. These papers will be peer-reviewed via a double-blind process, and will be published in the official workshop proceedings and be presented at the workshop itself. More information will be provided as soon as possible.

Call for Contributions

Full Workshop Papers

Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this CMT link.

Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf) and a letter of changes (named as letter_of_changes.pdf) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.

Important Dates

Paper Submission Deadline	March 13, 2023 (12:00 Pacific time)
Notification to Authors	Mar 31, 2023
Camera-Ready Deadline	April 8, 2023

Workshop Schedule

Time in UTC	Start Time in UTC* (probably your time zone)	Item
3:25pm - 3:30pm	18 Jun 2023 15:25:00 UTC	Opening remark
3:30pm - 4:10pm	18 Jun 2023 15:30:00 UTC	Invited talk by Erroll Wood
4:10pm - 4:50pm	18 Jun 2023 16:10:00 UTC	Invited talk by Ruth Rosenholtz
4:50pm - 5:10pm	18 Jun 2023 16:50:00 UTC	Invited poster spotlight talk
5:10pm - 6:00pm	18 Jun 2023 17:10:00 UTC	Coffee break & poster presentation
6:00pm - 6:50pm	18 Jun 2023 18:00:00 UTC	Workshop paper presentation
6:50pm - 7:00pm	18 Jun 2023 18:50:00 UTC	Award & closing remark

* This time is calculated to be in your computer's reported time zone.
For example, those in Los Angeles may see UTC-7,
while those in Berlin may see UTC+2.

Please note that there may be differences to your actual time zone.

Invited Keynote Speakers

Ruth Rosenholtz

Massachusetts Institute of Technology

Human vision at a glance

Abstract

Research at the GAZE 2023 workshop aims to estimate and predict where someone is looking. But if we know someone's direction of gaze, what can we say about what they see, and what parts of the visual input they process? One cannot simply assume that observers process the object lying at the point of regard. Human visual perception gathers considerable information over a fairly wide field of view. People may not always point their eyes at the interesting bits, because that may not always be optimal for real-world tasks.

One can make sense of these questions through understanding of the strengths and limitations of human peripheral vision. We move our eyes as part of a complex tradeoff between the information available in the fovea vs. periphery, and the costs of shifting one's gaze. Furthermore, this tradeoff depends on factors such as individual differences, age, and level of experience. Recent understanding and modeling of peripheral vision can provide important insights into what a person can see, given where they look. This understanding could also help determine such things as the cost of misestimating someone's direction of gaze.

Biography (click to expand/collapse)

Ruth Rosenholtz is a Principal Research Scientist in MIT’s Department of Brain and Cognitive Sciences, a member of CSAIL, and currently on sabbatical at NVIDIA Research. She has a Ph.D. in EECS (Computer Vision) from UC Berkeley. She joined MIT in 2003 after 7 years at the Palo Alto Research Center (formerly Xerox PARC). Her work focuses on developing predictive models of human visual processing, including peripheral vision, visual search, visual clutter, and perceptual organization. In addition, her lab works on applying understanding of human vision to image fidelity (NASA Ames), and to design of user interfaces and information visualizations (Xerox PARC and MIT). She is a world expert in peripheral vision and its implications for how we think about vision, attention, and design.

Erroll Wood

Google

Synthetics for Gaze Estimation and More

Abstract

Nowadays, collecting the right dataset for machine learning is often more challenging than choosing the model. We address this with photorealistic synthetic training data – labelled images of humans made using computer graphics. With synthetics we can generate clean labels without annotation noise or error, produce labels otherwise impossible to annotate by hand, and easily control variation and diversity in our datasets. I will show you how synthetics underpins our work on understanding humans, including how it enables fast and accurate 3D face tracking, including eye gaze, in the wild.

Biography (click to expand/collapse)

Erroll is a Staff Software Engineer at Google, working on Digital Humans. Previously, he was a member of Microsoft's Mixed Reality AI Lab, where he worked on hand tracking for HoloLens 2, avatars for Microsoft Mesh, synthetic data for face tracking, and Holoportation. He did his PhD at the University of Cambridge, working on gaze estimation.

Awards

Best Paper Award sponsored by

Where are they looking in the 3D space?
Nora Horanyi (University of Birmingham); Linfang Zheng (University of Birmingham); Eunji Chong (Amazon); Ales Leonardis (University of Birmingham ); Hyung Jin Chang (University of Birmingham)

PDF (CVF)

Best Poster Award sponsored by

Kappa Angle Regression with Ocular Counter-Rolling Awareness for Gaze Estimation
Shiwei Jin (UCSD); Ji Dai (UCSD); Truong Nguyen (UC San Diego)

PDF (CVF)

EFE: End-to-end Frame-to-Gaze Estimation
Haldun Balim (ETH Zurich); Seonwook Park (Lunit Inc.); Xi Wang (ETH Zurich); Xucong Zhang (Delft University of Technology); Otmar Hilliges (ETH Zurich)

PDF (CVF)

Accepted Full Papers

Multimodal Integration of Human-Like Attention in Visual Question Answering Ekta Sood (University of Stuttgart); Fabian Kögel (Sony Europe B.V.); Philipp Müller (DFKI GmbH); Dominike Thomas (University of Stuttgart); Mihai Bace (University of Stuttgart); Andreas Bulling (University of Stuttgart)

PDF (CVF)

Kappa Angle Regression with Ocular Counter-Rolling Awareness for Gaze Estimation Shiwei Jin (UCSD); Ji Dai (UCSD); Truong Nguyen (UC San Diego)

PDF (CVF) Suppl. (CVF)

GazeCaps: Gaze Estimation with Self-Attention-Routed Capsules Hengfei Wang (University of Birmingham); Jun O Oh (Dankook University); Hyung Jin Chang (University of Birmingham); Jin Hee Na (VTouch Inc.); Minwoo Tae (Dankook University); Zhongqun Zhang (University of Birmingham); Sang-Il Choi (Dankook University)

PDF (CVF) Suppl. (CVF)

Where are they looking in the 3D space? Nora Horanyi (University of Birmingham ); Linfang Zheng (University of Birmingham); Eunji Chong (Amazon); Ales Leonardis (University of Birmingham ); Hyung Jin Chang (University of Birmingham)

PDF (CVF) Suppl. (CVF)

EFE: End-to-end Frame-to-Gaze Estimation Haldun Balim (ETH Zurich); Seonwook Park (Lunit Inc.); Xi Wang (ETH Zurich); Xucong Zhang (Delft University of Technology); Otmar Hilliges (ETH Zurich)

PDF (CVF) arXiv

Invited Posters

Real-Time Multi-Person Eyeblink Detection in the Wild for Untrimmed Video Wenzheng Zeng, Yang Xiao, Sicheng Wei, Jinfang Gan, Xintao Zhang, Zhiguo Cao, Zhiwen Fang, Joey Tianyi Zhou

PDF

Source-Free Adaptive Gaze Estimation by Uncertainty Reduction Xin Cai, Jiabei Zeng, Shiguang Shan, Xilin Chen

PDF Supp.

ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection Shiwei Jin, Zhen Wang, Lei Wang, Ning Bi, Truong Nguyen

PDF Supp.

GazeNeRF: 3D-Aware Gaze Redirection With Neural Radiance Fields Alessandro Ruzzi, Xiangwei Shi, Xi Wang, Gengyan Li, Shalini De Mello, Hyung Jin Chang, Xucong Zhang, Otmar Hilliges

PDF Supp.

GFIE: A Dataset and Baseline for Gaze-Following From 2D to 3D in Indoor Environments Zhengxi Hu, Yuxue Yang, Xiaolin Zhai, Dingye Yang, Bohan Zhou, Jingtai Liu

PDF Supp.