5 papers have been accepted in our GAZE2023 workshop.
Congratulations to all authors!
The 5th International Workshop on Gaze Estimation and Prediction in the Wild (GAZE 2023) at CVPR 2023 aims to encourage and highlight novel strategies for eye gaze estimation and prediction with a focus on robustness and accuracy in extended parameter spaces, both spatially and temporally. This is expected to be achieved by applying novel neural network architectures, incorporating anatomical insights and constraints, introducing new and challenging datasets, and exploiting multi-modal training. Specifically, the workshop topics include (but are not limited to):
- Reformulating eye detection, gaze estimation, and gaze prediction pipelines with deep networks.
- Applying geometric and anatomical constraints into the training of (sparse or dense) deep networks.
- Leveraging additional cues such as contexts from face region and head pose information.
- Developing adversarial methods to deal with conditions where current methods fail (illumination, appearance, etc.).
- Exploring attention mechanisms to predict the point of regard.
- Designing new accurate measures to account for rapid eye gaze movement.
- Novel methods for temporal gaze estimation and prediction including Bayesian methods.
- Integrating differentiable components into 3D gaze estimation frameworks.
- Robust estimation from different data modalities such as RGB, depth, head pose, and eye region landmarks.
- Generic gaze estimation method for handling extreme head poses and gaze directions.
- Temporal information usage for eye tracking to provide consistent gaze estimation on the screen.
- Personalization of gaze estimators with few-shot learning.
- Semi-/weak-/un-/self- supervised leraning methods, domain adaptation methods, and other novel methods towards improved representation learning from eye/face region images or gaze target region images.
Call for Contributions
Full Workshop Papers
Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) at this CMT link.
Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.
Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf
) and a letter of changes (named as letter_of_changes.pdf
) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.
Workshop Schedule
Time in UTC | Start Time in UTC* (probably your time zone) |
Item |
---|---|---|
3:25pm - 3:30pm | 18 Jun 2023 15:25:00 UTC | Opening remark |
3:30pm - 4:10pm | 18 Jun 2023 15:30:00 UTC | Invited talk by Erroll Wood |
4:10pm - 4:50pm | 18 Jun 2023 16:10:00 UTC | Invited talk by Ruth Rosenholtz |
4:50pm - 5:10pm | 18 Jun 2023 16:50:00 UTC | Invited poster spotlight talk |
5:10pm - 6:00pm | 18 Jun 2023 17:10:00 UTC | Coffee break & poster presentation |
6:00pm - 6:50pm | 18 Jun 2023 18:00:00 UTC | Workshop paper presentation |
6:50pm - 7:00pm | 18 Jun 2023 18:50:00 UTC | Award & closing remark |
For example, those in Los Angeles may see UTC-7,
while those in Berlin may see UTC+2.
Please note that there may be differences to your actual time zone.
Invited Keynote Speakers
Massachusetts Institute of Technology
Human vision at a glance
Abstract
Research at the GAZE 2023 workshop aims to estimate and predict where someone is looking. But if we know someone's direction of gaze, what can we say about what they see, and what parts of the visual input they process? One cannot simply assume that observers process the object lying at the point of regard. Human visual perception gathers considerable information over a fairly wide field of view. People may not always point their eyes at the interesting bits, because that may not always be optimal for real-world tasks.
One can make sense of these questions through understanding of the strengths and limitations of human peripheral vision. We move our eyes as part of a complex tradeoff between the information available in the fovea vs. periphery, and the costs of shifting one's gaze. Furthermore, this tradeoff depends on factors such as individual differences, age, and level of experience. Recent understanding and modeling of peripheral vision can provide important insights into what a person can see, given where they look. This understanding could also help determine such things as the cost of misestimating someone's direction of gaze.
Ruth Rosenholtz is a Principal Research Scientist in MIT’s Department of Brain and Cognitive Sciences, a member of CSAIL, and currently on sabbatical at NVIDIA Research. She has a Ph.D. in EECS (Computer Vision) from UC Berkeley. She joined MIT in 2003 after 7 years at the Palo Alto Research Center (formerly Xerox PARC). Her work focuses on developing predictive models of human visual processing, including peripheral vision, visual search, visual clutter, and perceptual organization. In addition, her lab works on applying understanding of human vision to image fidelity (NASA Ames), and to design of user interfaces and information visualizations (Xerox PARC and MIT). She is a world expert in peripheral vision and its implications for how we think about vision, attention, and design.
Synthetics for Gaze Estimation and More
Abstract
Nowadays, collecting the right dataset for machine learning is often more challenging than choosing the model. We address this with photorealistic synthetic training data – labelled images of humans made using computer graphics. With synthetics we can generate clean labels without annotation noise or error, produce labels otherwise impossible to annotate by hand, and easily control variation and diversity in our datasets. I will show you how synthetics underpins our work on understanding humans, including how it enables fast and accurate 3D face tracking, including eye gaze, in the wild.
Erroll is a Staff Software Engineer at Google, working on Digital Humans. Previously, he was a member of Microsoft's Mixed Reality AI Lab, where he worked on hand tracking for HoloLens 2, avatars for Microsoft Mesh, synthetic data for face tracking, and Holoportation. He did his PhD at the University of Cambridge, working on gaze estimation.
Awards
Best Paper Award sponsored by
Where are they looking in the 3D space?
Nora Horanyi (University of Birmingham); Linfang Zheng (University of Birmingham); Eunji Chong (Amazon); Ales Leonardis (University of Birmingham ); Hyung Jin Chang (University of Birmingham)
Best Poster Award sponsored by
Kappa Angle Regression with Ocular Counter-Rolling Awareness for Gaze Estimation
Shiwei Jin (UCSD); Ji Dai (UCSD); Truong Nguyen (UC San Diego)
EFE: End-to-end Frame-to-Gaze Estimation
Haldun Balim (ETH Zurich); Seonwook Park (Lunit Inc.); Xi Wang (ETH Zurich); Xucong Zhang (Delft University of Technology); Otmar Hilliges (ETH Zurich)
Accepted Full Papers
Invited Posters
Program Committee
Amazon
University of Tokyo
Lunit Inc.
Public University of Navarre
University of Wisconsin-Madison
ETH Zürich
Rensselaer Polytechnic Institute
Rensselaer Polytechnic Institute
The University of Tokyo
University of Birmingham
Monash University
University of Tuebingen
ETH Zurich
University of Birmingham
Delft University of Technology
NVIDIA Research
Lunit Inc.
ETH Zürich
University of Birmingham
University of Birmingham
Please contact me if you have any question about this website.
Email: hxw080@student.bham.ac.uk