Image credit to Daniel Hannah

Sunday, 20th June 2021 (half-day)
3pm - 8:20pm UTC

8am - 1:20pm PDT(UTC-7)
11am - 4:20pm EDT(UTC-4)
4pm - 9:20pm BST(UTC+1)
5pm - 10:20pm CEST(UTC+2)
8:30pm- 1:50am (+1) IST(UTC+5.5)
11pm - 4:20am (+1) CST(UTC+8)
12am - 5:20am (+1) KST(UTC+9)

Youtube recording:


The 3rd International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2021) at CVPR 2021 aims to encourage and highlight novel strategies for eye gaze estimation and prediction with a focus on robustness and accuracy in extended parameter spaces, both spatially and temporally. This is expected to be achieved by applying novel neural network architectures, incorporating anatomical insights and constraints, introducing new and challenging datasets, and exploiting multi-modal training. Specifically, the workshop topics include (but are not limited to):

  • Reformulating eye detection, gaze estimation, and gaze prediction pipelines with deep networks.
  • Applying geometric and anatomical constraints into the training of (sparse or dense) deep networks.
  • Leveraging additional cues such as contexts from face region and head pose information.
  • Developing adversarial methods to deal with conditions where current methods fail (illumination, appearance, etc.).
  • Exploring attention mechanisms to predict the point of regard.
  • Designing new accurate measures to account for rapid eye gaze movement.
  • Novel methods for temporal gaze estimation and prediction including Bayesian methods.
  • Integrating differentiable components into 3D gaze estimation frameworks.
  • Robust estimation from different data modalities such as RGB, depth, head pose, and eye region landmarks.
  • Generic gaze estimation method for handling extreme head poses and gaze directions.
  • Temporal information usage for eye tracking to provide consistent gaze estimation on the screen.
  • Personalization of gaze estimators with few-shot learning.
  • Semi-/weak-/un-/self- supervised leraning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.
We will be hosting 3 invited speakers and holding 2 deep learning challenges for the topic of gaze estimation. We will also be accepting the submission of full unpublished papers as done in previous versions of the workshop. These papers will be peer-reviewed via a double-blind process, and will be published in the official workshop proceedings and be presented at the workshop itself. More information will be provided as soon as possible.

Call for Contributions

Full Workshop Papers

Submission: We invite authors to submit unpublished papers (8-page ICCV format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this CMT link.

Accepted papers will be published in the official ICCV Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf) and a letter of changes (named as letter_of_changes.pdf) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.

GAZE 2021 Challenges

The GAZE 2021 Challenges are hosted on Codalab, and can be found at:

More information on the respective challenges can be found on their pages.

We are thankful to our sponsors for providing the following prizes:

ETH-XGaze Challenge Winner USD 500 courtesy of
EVE Challenge Winner Tobii Eye Tracker 5 courtesy of

Important Dates

ETH-XGaze & EVE Challenges Released February 13, 2021
Paper Submission Deadline March 29, 2021 (23:59 Pacific time)
Notification to Authors April 13, 2021
Camera-Ready Deadline April 20, 2021
ETH-XGaze & EVE Challenges Closed May 28, 2021 (23:59 UTC)

Workshop Schedule

Time in UTC Start Time in UTC*
(probably your time zone)
3:00pm - 3:05pm 20 Jun 2021 15:00:00 UTC Opening Remarks and Awards
3:05pm - 3:40pm 20 Jun 2021 15:05:00 UTC Challenge Winner Talks
3:40pm - 4:40pm 20 Jun 2021 15:40:00 UTC Full Workshop Paper Presentations
4:40pm - 6:00pm 20 Jun 2021 16:40:00 UTC Break + Poster Session
6:00pm - 6:35pm 20 Jun 2021 18:00:00 UTC Keynote Talk: Jim Rehg
6:35pm - 7:15pm 20 Jun 2021 18:35:00 UTC Keynote Talk: Moshe Eizenman
7:10pm - 7:45pm 20 Jun 2021 19:10:00 UTC Keynote Talk: Adrià Recasens
7:45pm - 8:15pm 20 Jun 2021 19:45:00 UTC Panel Discussion
8:15pm - 8:20pm 20 Jun 2021 20:15:00 UTC Closing Remarks
* This time is calculated to be in your computer's reported time zone.
For example, those in Los Angeles may see UTC-7,
while those in Berlin may see UTC+2.

Please note that there may be differences to your actual time zone.

Invited Keynote Speakers

Jim Rehg
Georgia Institute of Technology

An Egocentric View of Social Behavior

Biography (click to expand/collapse)

James M. Rehg (pronounced “ray”) is a Professor in the School of Interactive Computing at the Georgia Institute of Technology, where he is Director of the Center for Behavioral Imaging, co-Director of the Center for Computational Health, and co-Director of the Computational Perception Lab. He received his Ph.D. from CMU in 1995 and worked at the Cambridge Research Lab of DEC (and then Compaq) from 1995-2001, where he managed the computer vision research group. He received an NSF CAREER award in 2001 and a Raytheon Faculty Fellowship from Georgia Tech in 2005. He and his students have received a number of best paper awards, including best student paper awards at ICML 2005, BMVC 2010, Mobihealth 2014, Face and Gesture 2015, and a Method of the Year award from the journal Nature Methods. Dr. Rehg serves on the Editorial Board of the Intl. J. of Computer Vision, and he served as the General co-Chair for CVPR 2009 and is serving as the Program co-Chair for CVPR 2017 (Puerto Rico). He has authored more than 100 peer-reviewed scientific papers and holds 23 issued US patents. Dr. Rehg’s research interests include computer vision, machine learning, behavioral imaging, and mobile health (mHealth). He is the Deputy Director of the NIH Center of Excellence on Mobile Sensor Data-to-Knowledge (MD2K), which is developing novel on-body sensing and predictive analytics for improving health outcomes. Dr. Rehg is also leading a multi-institution effort, funded by an NSF Expedition award, to develop the science and technology of Behavioral Imaging— the capture and analysis of social and communicative behavior using multi-modal sensing, to support the study and treatment of developmental disorders such as autism.

Moshe Eizenman
University of Toronto

Development of hybrid eye tracking systems for studies of neuropsychiatric disorders.


Visual scanning behaviour is controlled by both low-level perception processes (e.g., colour, spatial characteristics of the visual stimuli) and high-level cognitive processes, which are driven by memories, emotions, expectations, and goals. During natural viewing subjects are unaware of their visual scanning behaviour and as such visual scanning behaviour can provide physiological markers for objective evaluation of cognitive processes in patients with neuropsychiatric disorders.

In this talk I will present our past and current work towards the development of objective markers for neuropsychiatric disorders. This work includes both the development of new methods to analyse visual scanning patterns and the development of eye-tracking systems to monitor such patterns.

I will start by describing a general method for the analysis of visual scanning behaviour in neuropsychiatric disorders. I will then demonstrate the utility of this novel method by providing examples from our studies in patients with eating and mood disorders. I will then describe two low cost eye-tracking systems that we developed for such studies. One system uses a smartphone to display visual stimuli and analyse visual scanning patterns while the other uses a virtual reality headset to display visual stimuli. Point-of-gaze, in both systems, is computed by an eye-model whose parameters are estimated from eye-images by machine learning techniques (i.e., a hybrid approach to point-of-gaze estimation).

Biography (click to expand/collapse)

Moshe Eizenman is a professor in the departments of Ophthalmology and Visual Science and Electrical and Computer Engineering at the University of Toronto. He is also a senior researcher at the Krembil Brain Institute. He received his Ph.D. from the University of Toronto in 1984, where he worked at the Institute of Biomedical Engineering as the head of the vision and eye-movements group. He has authored more than 120 peer-reviewed scientific papers and his research interests include the development of eye-tracking systems, analysis of eye-movements and visual scanning patterns and development of objective physiological markers for psychiatric and neurological disorders. Prof. Eizenman is the founder of EL-MAR Inc. a company that develops advanced eye-tracking technologies for pilot training, driving and medical research.

Where are they looking?


In order to understand actions or anticipate intentions, humans need efficient ways of gathering information about each other. In particular, gaze is a rich source of information about other peoples’ activities and intentions. In this talk, we describe our work on predicting human gaze. We introduce a series of methods to follow gaze for different modalities. First, we present GazeFollow, a dataset and model to predict the location of people's gaze in an image. Furthermore, we introduce Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze direction estimation in unconstrained scenes. Finally, we also propose a saliency-based sampling layer designed to improve performance in arbitrary tasks by efficiently zooming into the relevant parts of the input image.

Biography (click to expand/collapse)

Adrià Recasens is a Research Scientist at DeepMind. He previously completed his PhD on computer vision at the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology in 2019. During his PhD, he worked on various topics related to image and video understanding. Particularly, he has various publications on gaze estimation on image and video. His current research focuses on self-supervised learning specifically applied to multiple modalities such as video, audio or text.


Best Paper Award sponsored by

PupilTAN: A Few-Shot Adversarial Pupil Localizer
Nikolaos Poulopoulos, Emmanouil Z. Psarakis, and Dimitrios Kosmopoulos

ETH-XGaze Challenge Winner sponsored by

Team VIPL-TAL-Gaze
Xin Cai, Jiabei Zeng, Yunjia Sun, Xiao Wang, Jiajun Zhang, Boyu Chen, Zhilong Ji, Xiao Liu, Xilin Chen, and Shiguang Shan


EVE Challenge Winner sponsored by

Jun Bao, Buyu Liu, and Jun Yu


Accepted Full Papers

GOO: A Dataset for Gaze Object Prediction in Retail Environments Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, and Richard Guinto
PDF (CVF) arXiv
PupilTAN: A Few-Shot Adversarial Pupil Localizer Nikolaos Poulopoulos, Emmanouil Z. Psarakis, and Dimitrios Kosmopoulos
Appearance-based Gaze Estimation using Attention and Difference Mechanism Murthy L R D and Pradipta Biswas
Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets Rémy Siegfried and Jean-Marc Odobez

Invited Posters

Weakly-Supervised Physically Unconstrained Gaze Estimation Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, and Jan Kautz
PDF (CVF) Supp. (CVF) arXiv
Dual Attention Guided Gaze Target Detection in the Wild Yi Fang, Jiapeng Tang, Wang Shen, Wei Shen, Xiao Gu, Li Song, and Guangtao Zhai
Connecting What To Say With Where To Look by Modeling Human Attention Traces Zihang Meng, Licheng Yu, Ning Zhang, Tamara L. Berg, Babak Damavandi, Vikas Singh, and Amy Bearman
ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges
PDF (ECVA Open Access) Project Page
Towards End-to-end Video-based Eye-Tracking Seonwook Park, Emre Aksan, Xucong Zhang, and Otmar Hilliges

Program Committee

Marcel Bühler
ETH Zurich
Hyung Jin Chang
University of Birmingham
Eunji Chong
Georgia Tech
Shalini De Mello
NVIDIA Research
Tobias Fischer
Queensland University of Technology
Wolfgang Fuhl
University of Tübingen
Otmar Hilliges
ETH Zürich
Nora Horanyi
University of Birmingham
Yifei Huang
University of Tokyo
Aleš Leonardis
University of Birmingham
Yin Li
University of Wisconsin-Madison
Miao Liu
Georgia Tech
Seonwook Park
Lunit Inc.
Xucong Zhang
ETH Zürich


Hyung Jin Chang
University of Birmingham
Xucong Zhang
ETH Zürich
Seonwook Park
Lunit Inc.
Shalini De Mello
NVIDIA Research

Qiang Ji
Rensselaer Polytechnic Institute
Otmar Hilliges
ETH Zürich
Aleš Leonardis
University of Birmingham

Workshop sponsored by: