GAZE 2021: Gaze Estimation and Prediction in the Wild

Image credit to Daniel Hannah

Sunday, 20th June 2021 (half-day)
3pm - 8:20pm UTC

8am	- 1:20pm	PDT	(UTC-7)
11am	- 4:20pm	EDT	(UTC-4)
4pm	- 9:20pm	BST	(UTC+1)
5pm	- 10:20pm	CEST	(UTC+2)
8:30pm	- 1:50am (+1)	IST	(UTC+5.5)
11pm	- 4:20am (+1)	CST	(UTC+8)
12am	- 5:20am (+1)	KST	(UTC+9)

Youtube recording: https://youtu.be/WQ8azMW_dn8

Introduction

The 3rd International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2021) at CVPR 2021 aims to encourage and highlight novel strategies for eye gaze estimation and prediction with a focus on robustness and accuracy in extended parameter spaces, both spatially and temporally. This is expected to be achieved by applying novel neural network architectures, incorporating anatomical insights and constraints, introducing new and challenging datasets, and exploiting multi-modal training. Specifically, the workshop topics include (but are not limited to):

Reformulating eye detection, gaze estimation, and gaze prediction pipelines with deep networks.
Applying geometric and anatomical constraints into the training of (sparse or dense) deep networks.
Leveraging additional cues such as contexts from face region and head pose information.
Developing adversarial methods to deal with conditions where current methods fail (illumination, appearance, etc.).
Exploring attention mechanisms to predict the point of regard.
Designing new accurate measures to account for rapid eye gaze movement.
Novel methods for temporal gaze estimation and prediction including Bayesian methods.
Integrating differentiable components into 3D gaze estimation frameworks.
Robust estimation from different data modalities such as RGB, depth, head pose, and eye region landmarks.
Generic gaze estimation method for handling extreme head poses and gaze directions.
Temporal information usage for eye tracking to provide consistent gaze estimation on the screen.
Personalization of gaze estimators with few-shot learning.
Semi-/weak-/un-/self- supervised leraning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.

We will be hosting 3 invited speakers and holding 2 deep learning challenges for the topic of gaze estimation. We will also be accepting the submission of full unpublished papers as done in previous versions of the workshop. These papers will be peer-reviewed via a double-blind process, and will be published in the official workshop proceedings and be presented at the workshop itself. More information will be provided as soon as possible.

Call for Contributions

Full Workshop Papers

Submission: We invite authors to submit unpublished papers (8-page ICCV format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this CMT link.

Accepted papers will be published in the official ICCV Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf) and a letter of changes (named as letter_of_changes.pdf) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.

GAZE 2021 Challenges

The GAZE 2021 Challenges are hosted on Codalab, and can be found at:

ETH-XGaze Challenge: https://competitions.codalab.org/competitions/28930
EVE Challenge: https://competitions.codalab.org/competitions/28954

More information on the respective challenges can be found on their pages.

We are thankful to our sponsors for providing the following prizes:

ETH-XGaze Challenge Winner	USD 500	courtesy of
EVE Challenge Winner	Tobii Eye Tracker 5	courtesy of

Important Dates

ETH-XGaze & EVE Challenges Released	February 13, 2021
Paper Submission Deadline	March 29, 2021 (23:59 Pacific time)
Notification to Authors	April 13, 2021
Camera-Ready Deadline	April 20, 2021
ETH-XGaze & EVE Challenges Closed	May 28, 2021 (23:59 UTC)

Workshop Schedule

Time in UTC	Start Time in UTC* (probably your time zone)	Item
3:00pm - 3:05pm	20 Jun 2021 15:00:00 UTC	Opening Remarks and Awards
3:05pm - 3:40pm	20 Jun 2021 15:05:00 UTC	Challenge Winner Talks
3:40pm - 4:40pm	20 Jun 2021 15:40:00 UTC	Full Workshop Paper Presentations
4:40pm - 6:00pm	20 Jun 2021 16:40:00 UTC	Break + Poster Session
6:00pm - 6:35pm	20 Jun 2021 18:00:00 UTC	Keynote Talk: Jim Rehg
6:35pm - 7:15pm	20 Jun 2021 18:35:00 UTC	Keynote Talk: Moshe Eizenman
7:10pm - 7:45pm	20 Jun 2021 19:10:00 UTC	Keynote Talk: Adrià Recasens
7:45pm - 8:15pm	20 Jun 2021 19:45:00 UTC	Panel Discussion
8:15pm - 8:20pm	20 Jun 2021 20:15:00 UTC	Closing Remarks

* This time is calculated to be in your computer's reported time zone.
For example, those in Los Angeles may see UTC-7,
while those in Berlin may see UTC+2.

Please note that there may be differences to your actual time zone.

Invited Keynote Speakers

Jim Rehg

Georgia Institute of Technology

An Egocentric View of Social Behavior

Biography (click to expand/collapse)

James M. Rehg (pronounced “ray”) is a Professor in the School of Interactive Computing at the Georgia Institute of Technology, where he is Director of the Center for Behavioral Imaging, co-Director of the Center for Computational Health, and co-Director of the Computational Perception Lab. He received his Ph.D. from CMU in 1995 and worked at the Cambridge Research Lab of DEC (and then Compaq) from 1995-2001, where he managed the computer vision research group. He received an NSF CAREER award in 2001 and a Raytheon Faculty Fellowship from Georgia Tech in 2005. He and his students have received a number of best paper awards, including best student paper awards at ICML 2005, BMVC 2010, Mobihealth 2014, Face and Gesture 2015, and a Method of the Year award from the journal Nature Methods. Dr. Rehg serves on the Editorial Board of the Intl. J. of Computer Vision, and he served as the General co-Chair for CVPR 2009 and is serving as the Program co-Chair for CVPR 2017 (Puerto Rico). He has authored more than 100 peer-reviewed scientific papers and holds 23 issued US patents. Dr. Rehg’s research interests include computer vision, machine learning, behavioral imaging, and mobile health (mHealth). He is the Deputy Director of the NIH Center of Excellence on Mobile Sensor Data-to-Knowledge (MD2K), which is developing novel on-body sensing and predictive analytics for improving health outcomes. Dr. Rehg is also leading a multi-institution effort, funded by an NSF Expedition award, to develop the science and technology of Behavioral Imaging— the capture and analysis of social and communicative behavior using multi-modal sensing, to support the study and treatment of developmental disorders such as autism.

Moshe Eizenman

University of Toronto

Development of hybrid eye tracking systems for studies of neuropsychiatric disorders.

Abstract

Visual scanning behaviour is controlled by both low-level perception processes (e.g., colour, spatial characteristics of the visual stimuli) and high-level cognitive processes, which are driven by memories, emotions, expectations, and goals. During natural viewing subjects are unaware of their visual scanning behaviour and as such visual scanning behaviour can provide physiological markers for objective evaluation of cognitive processes in patients with neuropsychiatric disorders.

In this talk I will present our past and current work towards the development of objective markers for neuropsychiatric disorders. This work includes both the development of new methods to analyse visual scanning patterns and the development of eye-tracking systems to monitor such patterns.

I will start by describing a general method for the analysis of visual scanning behaviour in neuropsychiatric disorders. I will then demonstrate the utility of this novel method by providing examples from our studies in patients with eating and mood disorders. I will then describe two low cost eye-tracking systems that we developed for such studies. One system uses a smartphone to display visual stimuli and analyse visual scanning patterns while the other uses a virtual reality headset to display visual stimuli. Point-of-gaze, in both systems, is computed by an eye-model whose parameters are estimated from eye-images by machine learning techniques (i.e., a hybrid approach to point-of-gaze estimation).

Biography (click to expand/collapse)

Moshe Eizenman is a professor in the departments of Ophthalmology and Visual Science and Electrical and Computer Engineering at the University of Toronto. He is also a senior researcher at the Krembil Brain Institute. He received his Ph.D. from the University of Toronto in 1984, where he worked at the Institute of Biomedical Engineering as the head of the vision and eye-movements group. He has authored more than 120 peer-reviewed scientific papers and his research interests include the development of eye-tracking systems, analysis of eye-movements and visual scanning patterns and development of objective physiological markers for psychiatric and neurological disorders. Prof. Eizenman is the founder of EL-MAR Inc. a company that develops advanced eye-tracking technologies for pilot training, driving and medical research.

Adrià Recasens

DeepMind

Where are they looking?

Abstract

In order to understand actions or anticipate intentions, humans need efficient ways of gathering information about each other. In particular, gaze is a rich source of information about other peoples’ activities and intentions. In this talk, we describe our work on predicting human gaze. We introduce a series of methods to follow gaze for different modalities. First, we present GazeFollow, a dataset and model to predict the location of people's gaze in an image. Furthermore, we introduce Gaze360, a large-scale gaze-tracking dataset and method for robust 3D gaze direction estimation in unconstrained scenes. Finally, we also propose a saliency-based sampling layer designed to improve performance in arbitrary tasks by efficiently zooming into the relevant parts of the input image.

Biography (click to expand/collapse)

Adrià Recasens is a Research Scientist at DeepMind. He previously completed his PhD on computer vision at the Computer Science and Artificial Intelligence Laboratory at the Massachusetts Institute of Technology in 2019. During his PhD, he worked on various topics related to image and video understanding. Particularly, he has various publications on gaze estimation on image and video. His current research focuses on self-supervised learning specifically applied to multiple modalities such as video, audio or text.

Awards

Best Paper Award sponsored by

PupilTAN: A Few-Shot Adversarial Pupil Localizer
Nikolaos Poulopoulos, Emmanouil Z. Psarakis, and Dimitrios Kosmopoulos

ETH-XGaze Challenge Winner sponsored by

Team VIPL-TAL-Gaze
Xin Cai, Jiabei Zeng, Yunjia Sun, Xiao Wang, Jiajun Zhang, Boyu Chen, Zhilong Ji, Xiao Liu, Xilin Chen, and Shiguang Shan

Code

EVE Challenge Winner sponsored by

Team HDU_CS
Jun Bao, Buyu Liu, and Jun Yu

Code

Accepted Full Papers

GOO: A Dataset for Gaze Object Prediction in Retail Environments Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza, and Richard Guinto

PDF (CVF) arXiv

PupilTAN: A Few-Shot Adversarial Pupil Localizer Nikolaos Poulopoulos, Emmanouil Z. Psarakis, and Dimitrios Kosmopoulos

PDF (CVF)

Appearance-based Gaze Estimation using Attention and Difference Mechanism Murthy L R D and Pradipta Biswas

PDF (CVF)

Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets Rémy Siegfried and Jean-Marc Odobez

PDF (CVF) IDIAP

Invited Posters

Weakly-Supervised Physically Unconstrained Gaze Estimation Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, and Jan Kautz

PDF (CVF) Supp. (CVF) arXiv

Dual Attention Guided Gaze Target Detection in the Wild Yi Fang, Jiapeng Tang, Wang Shen, Wei Shen, Xiao Gu, Li Song, and Guangtao Zhai

PDF (CVF)

Connecting What To Say With Where To Look by Modeling Human Attention Traces Zihang Meng, Licheng Yu, Ning Zhang, Tamara L. Berg, Babak Damavandi, Vikas Singh, and Amy Bearman

PDF (CVF) Supp. (CVF) arXiv Code

ETH-XGaze: A Large Scale Dataset for Gaze Estimation under Extreme Head Pose and Gaze Variation Xucong Zhang, Seonwook Park, Thabo Beeler, Derek Bradley, Siyu Tang, and Otmar Hilliges

PDF (ECVA Open Access) Project Page

Towards End-to-end Video-based Eye-Tracking Seonwook Park, Emre Aksan, Xucong Zhang, and Otmar Hilliges

PDF (ECVA Open Access) Project Page