Taein Kwon

I am a postdoctoral research fellow at VGG @ Oxford working with Prof. Andrew Zisserman.

My research interests are Egocentric Vision, Action Recognition, Contextual AI, Hand-object Interaction, Video Understanding, AR/VR, Multi-modal Learning, Visual-language Models and Self-supervised Learning.

Previously, I did my PhD under the supervision of Prof. Marc Pollefeys at ETH Zurich and I earned my Master's degree from UCLA. I received my Bachelor's in Electrical Engineering from Yonsei University, Seoul, Korea

If you are interested in semester projects (ETHZ), master's theses (ETHZ), 4YP (Oxford), or personal projects related to action recognition, egocentric vision, video understanding, and hand-object interaction that could lead to publications, feel free to email me. We can discuss potential exciting projects.

Email  /  LinkedIn  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
News
  • 03/2025 Our paper JEGAL is accepted to ICCV 2025.
  • 03/2025 Our paper EgoPressure is accepted to CVPR 2025 (highlight).
  • 09/2024 I will start my postdoc fellow position at VGG, Oxford.
  • 07/2024 I successfully defended my PhD.
  • 07/2024 Our paper HoloAssist received the CVPR Egovis 2022/2023 Distinguished Paper Award.
  • 05/2024 I received the prestigious SNSF Postdoc.Mobility fellowship.
  • 07/2023 Our paper HoloAssist is accepted to ICCV 2023.
  • 12/2022 I will start my research internship at Meta Reality Labs Research.
  • 06/2022 Our paper Egobody is accepted to ECCV 2022.
  • 06/2022 I will start my research internship at Microsoft Research.
  • 05/2022 I will co-organize Human Body, Hands, and Activities from Egocentric and Multi-view Cameras @ ECCV 2022.
  • 03/2022 Our paper CASA is accepted to CVPR 2022 (oral).
  • 03/2021 Our paper H2O is accepted to ICCV 2021.

Research
EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations
Junho Park, Andrew Sangwoo Ye, Taein Kwon
arXiv, 2025 
project page / paper

We introduce EgoWorld, a novel two-stage framework that reconstructs egocentric view from rich exocentric observations, including depth maps, 3D hand poses, and textual descriptions.

Understanding Co-speech Gestures in-the-wild
Sindhu Hggde*, K R Prajwal*, Taein Kwon, Andrew Zisserman
ICCV, 2025 
project page / paper / code

We introduce JEGAL, a Joint Embedding space for Gestures, Audio and Language. Our semantic gesture representations can be used to perform multiple downstream tasks such as cross-modal retrieval, spotting gestured words, and identifying who is speaking solely using gestures.

* co-first authors
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao*, Taein Kwon*, Paul Streli*, Marc Pollefeys, Christian Holz
CVPR, 2025  (Highlight)
project page / paper

We introduce EgoPressure, a novel dataset of touch contact and pressure interaction from an egocentric perspective, complemented with hand pose meshes and fine-grained pressure intensities for each contact.

* co-first authors
Multi Activity Sequence Alignment via Implicit Clustering
Taein Kwon, Zador Pataki, Mahdi Rad, Marc Pollefeys
arXiv, 2025 
paper

We propose a novel framework that overcomes these limitations using sequence alignment via implicit clustering. Specifically, our key idea is to perform implicit clip-level clustering while aligning frames in sequences. This coupled with our proposed dual augmentation technique enhances the network's ability to learn generalizable and discriminative representations.

HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World
Xin Wang*, Taein Kwon*, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Ashley Fanello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys
ICCV, 2023 
project page / paper / supp

HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment.

* co-first authors
CaSAR: Contact-aware Skeletal Action Recognition
Junan Lin*, Zhichao Sun*, Enjie Cao*, Taein Kwon, Mahdi Rad, Marc Pollefeys
arXiv, 2023 
paper

Contact-aware Skeletal Action Recognition (CaSAR) uses novel representations of hand-object interaction that encompass spatial information: 1) contact points where the hand joints meet the objects, 2) distant points where the hand joints are far away from the object and nearly not involved in the current action. Our framework is able to learn how the hands touch or stay away from the objects for each frame of the action sequence, and use this information to predict the action class.

* co-first authors
EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices
Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang
ECCV, 2022 
project page / paper / code

EgoBody is a large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2. Given two interacting subjects, we leverage a lightweight multi-camera rig to reconstruct their 3D shape and pose over time.

Context-Aware Sequence Alignment using 4D Skeletal Augmentation
Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys
CVPR, 2022 (Oral Presentation)
project page / paper / video / code

We propose a skeletal self-supervised learning approach that uses alignment as a pretext task. Our approach to alignment relies on a context-aware attention model that incorporates spatial and temporal context within and across sequences and a contrastive learning formulation that relies on 4D skeletal augmentations. Pose data provides a valuable cue for alignment and downstream tasks, such as phase classification and phase progression, as it is robust to different camera angles and changes in the background, while being efficient for real-time processing.

H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys
ICCV, 2021
project page / paper / video

In this paper, we propose a method to collect a dataset of two hands manipulating objects for first person interaction recognition. We provide a rich set of annotations including action labels, object classes, 3D left & right hand poses, 6D object poses, camera poses and scene point clouds. We further propose the first method to jointly recognize the 3D poses of two hands manipulating objects and a novel topology-aware graph convolutional network for recognizing hand-object interactions.

Smart Refrigerator for Healthcare Using Food Image Classification
Taein Kwon, Eunjeong Park, Hyukjae Chang
ACM BCB, 2016

We propose a sensor-equipped food container, Smart Refrigerator, which recognizes foods and monitors their status. We demonstrated the performance in detection of food and suggested that automatic monitoring of food intake can provide intuitive feedback to users.

Mentoring
  • Alyssa Chan (2024-2025, master thesis/4yp, "Recognising British Sign Language in Video using Deep Learning"), Msc student at Oxford

  • Junho Park (2024-2025, collaboration project, "EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations"), AI researcher at LG; led a submission EgoWorld

  • Seokjun Kim (2024-2025, collaboration project, "Teleoperating Robots in Virtual Reality"), Msc student at ETH Zurich / Neuromeka -> Now PhD student at Georgia Tech

  • Minsung Kang (2024-2025, semester project, "LLM-Driven Data Augmentation and Classification for Mistake Detection in Egocentric Videos"), Msc student at ETH Zurich

  • Tavis Siebert (2024-2025, 3DV project, "Gaze-Guided Scene Graphs for Egocentric Action Prediction"), Msc students at ETH Zurich

  • Dennis Baumann & Christopher Bennewitz (2024-2025, 3DV project, "Contact-Aware Action Recognition"), Msc students at ETH Zurich

  • Yiming Zhao (2023-2024, master thesis, "EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision"), Msc student at ETH Zurich; led CVPR '25 paper EgoPressure (highlight)

  • Boub Mischa (2023-2024, semester project, "Text-Enhanced Few-Shot Learning for Egocentric Video Action Recognition"), Msc student at ETH Zurich -> Now co-founder at Swiss Engineering Partners AG

  • Junan Lin & Zhichao Sun & Enjie Cao (2023, 3DV project, "CaSAR: Contact-aware Skeletal Action Recognition"), Msc students at ETH Zurich; led a technical report CaSAR

  • Aashish Singh (2023, semester project, "Object Pose Estimation and Tracking in Mixed Reality Appplications"), Msc student at ETH Zurich

  • Yanik Künzi (2023, bacher thesis, "Point Cloud-Based Tutorials for the HoloLens2"), Bsc student at ETH Zurich; led the main software development for the project’s repository

Contact
Dept. of Engineering Science

University of Oxford

Engineering and Technology Building

18 Banbury Rd

Oxford OX1 3PH

taein@robots (dot) ox (dot) ac (dot) uk

Website template from Jon Barron.