Image credit to DALL·E

Date: Tuesday, 18th June 2024
Time: 8:30 AM – 12:30 PM (half-day)
Location: Arch 309


The 6th International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2024) at CVPR 2024 aims to encourage and highlight novel strategies for eye gaze estimation and prediction. The workshop topics include (but are not limited to):

  • Enhancing eye image segmentation, landmark localization, gaze estimation and other tasks in mixed and augmented reality (XR / AR) settings.
  • Novel multi-modal systems for incorporating gaze information to improve visual recognition tasks.
  • Improving eye detection, gaze estimation, and gaze prediction pipelines in various ways, such as by applying geometric and anatomical constraints, leveraging additional cues such as head pose, scene content, or considering multi-modal inputs.
  • Developing adversarial or domain generalization methods to improve cross-dataset performance or to deal with conditions where current methods fail (illumination, appearance, etc.).
  • Exploring attention mechanisms and temporal information to predict the point of regard.
  • Novel methods for temporal gaze estimation and prediction including Bayesian methods.
  • Personalization of gaze estimators with methods such as few-shot learning.
  • Semi-/weak-/un-/self- supervised learning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.
We will be hosting 2 invited speakers and will also be accepting the submission of full unpublished papers as done in previous versions of the workshop. These papers will be peer-reviewed via a double-blind process, and will be published in the official workshop proceedings and be presented at the workshop itself.

Call for Contributions

Full Workshop Papers

Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this CMT link.

Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf) and a letter of changes (named as letter_of_changes.pdf) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.

Important Dates

Paper Submission Deadline March 15, 2024 (23:59 Pacific time)
Notification to Authors April 5, 2024
Camera-Ready Deadline April 14, 2024
Workshop Day June 18, 2024

Workshop Video Recording

Workshop Schedule

Time in UTC Start Time in UTC*
(probably your time zone)
3:30pm - 3:35pm 18 Jun 2024 15:30:00 UTC Opening remark
3:35pm - 4:15pm 18 Jun 2024 15:35:00 UTC Invited talk by Feng Xu
4:15pm - 4:55pm 18 Jun 2024 16:15:00 UTC Invited talk by Alexander Fix
4:55pm - 5:10pm 18 Jun 2024 16:55:00 UTC Invited poster spotlight talk
5:10pm - 6:10pm 18 Jun 2024 17:10:00 UTC Coffee break & poster presentation
6:10pm - 6:50pm 18 Jun 2024 18:10:00 UTC Workshop paper presentation
6:50pm - 7:00pm 18 Jun 2024 18:50:00 UTC Award & closing remark
* This time is calculated to be in your computer's reported time zone.
For example, those in Los Angeles may see UTC-7,
while those in Berlin may see UTC+2.

Please note that there may be differences to your actual time zone.

Invited Keynote Speakers

Feng Xu
Tsinghua University

Eye Region Reconstruction with a Monocular Camera


Eye region reconstruction is an important yet challenging task in computer vision and graphics. It suffers from complicated geometry and motions, severe occlusions, and eyeglass interference, for which existing methods have to make a trade-off between capture cost and reconstruction quality. We focused on low-cost capture setups and proposed novel algorithms to achieve high-quality eye region reconstruction under limited inputs. In addition, we tried to solve the eyeglass interference, which lays the foundation for high-quality eye region reconstruction. We have also tried to apply eye region reconstruction in medicine for disease diagnosis.

Biography (click to expand/collapse)

Feng Xu is currently an associate professor at the School of Software, Tsinghua University, Beijing, China. He earned a Ph.D. in automation and a B.S. in physics from Tsinghua University in 2012 and 2007, respectively. Until 2015, He was a Researcher in the Internet Graphics group, Microsoft Research Asia. His research interests include human body reconstruction, face animation, and medical image analysis. He has authored more than 40 conference and journal papers in the corresponding areas, including Nature Medicine, SIGGRAPH, CVPR, ICCV, ECCV, PAMI, and so on.

Alexander Fix
Meta Reality Labs Research

Challenges in Near Eye Tracking for AR/VR


Artificial and Virtual Reality (AR/VR) has incredible potential for using eye tracking to power the future of computing, but also incredible challenges in making eye tracking that works for everyone, all the time. In this talk, I will talk about some of the work we’re doing here at Meta Reality Labs to build eye tracking into AR/VR, as well as the key areas where the CVPR and GAZE 2024 community can help solve the hardest problems in this space. We will highlight Aria – the ET-enabled research glasses from Meta – and how they are a great tool for investigating applications of eye tracking. We will also show some new approaches to doing eye tracking, based on event cameras, polarization, and more.

Biography (click to expand/collapse)

Alexander Fix is a Research Scientist at Meta Reality Labs Research, where he has worked on eye tracking and related topics for the last 9 years. His research interests include 3D reconstruction, NeRF and other implicit reconstructions, and geometric eye tracking. Collaborations at Meta include quite a lot of eye tracking hardware research, particularly on novel methods for eye tracking such as Event Cameras. He graduated from Cornell in 2016 with a PhD in Computer Science, and from the University of Chicago in 2009 with a BS in Computer Science and Mathematics.

Accepted Full Papers

Spatio-Temporal Attention and Gaussian Processes for Personalized Video Gaze Estimation Swati Jindal, Mohit Yadav, Roberto Manduchi
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following Anshul Gupta, Pierre Vuillecard, Arya Farkhondeh, Jean-Marc Odobez
PDF (CVF) Suppl. (CVF)
Gaze Scanpath Transformer: Predicting Visual Search Target by Spatiotemporal Semantic Modeling of Gaze Scanpath Takumi Nishiyasu, Yoichi Sato
GESCAM: A Dataset and Method on Gaze Estimation for Classroom Attention Measurement Athul Mathew, Arshad Khan, Thariq Khalid, Riad Souissi
PDF (CVF) Project Page

Invited Posters

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation Yihua Cheng, Yaning Zhu, Zongji Wang, Hongquan Hao, Liu Wei, Shiqing Cheng, Xi Wang, Hyung Jin Chang
Learning from Observer Gaze: Zero-shot Attention Prediction Oriented by Human-Object Interaction Recognition Yuchen Zhou, Linkai Liu, Chao Gou
PDF (CVF) Suppl. (CVF)
Sharingan: A Transformer Architecture for Multi-Person Gaze Following Samy Tafasca, Anshul Gupta, Jean-Marc Odobez
PDF (CVF) Suppl. (CVF) arXiv
From Feature to Gaze: A Generalizable Replacement of Linear Layer for Gaze Estimation Yiwei Bao, Feng Lu


Hyung Jin Chang
University of Birmingham
Xucong Zhang
Delft University of Technology
Shalini De Mello
NVIDIA Research
Seonwook Park
NVIDIA Research

Jean-Marc Odobez
EPFL & Idiap Research Institute
Yihua Cheng
University of Birmingham
Xi Wang
ETH Zürich
Otmar Hilliges
ETH Zürich
Aleš Leonardis
University of Birmingham

Website Chair

Hengfei Wang
University of Birmingham

Please contact me if you have any question about this website.

Workshop sponsored by: