Taein Kwon

I am a postdoctoral research fellow at VGG @ Oxford working with Prof. Andrew Zisserman.

My research interests are Egocentric Vision, Action Recognition, Contextual AI, Hand-object Interaction, Video Understanding, AR/VR, Multi-modal Learning, Visual-language Models and Self-supervised Learning.

Previously, I did my PhD under the supervision of Prof. Marc Pollefeys at ETH Zurich and I earned my Master's degree from UCLA. I received my Bachelor's in Electrical Engineering from Yonsei University, Seoul, Korea

If you are interested in semester projects (ETHZ), master's theses (ETHZ), 4YP (Oxford), or personal projects related to action recognition, egocentric vision, video understanding, and hand-object interaction that could lead to publications, feel free to email me. We can discuss potential exciting projects.

Email / LinkedIn / CV / Google Scholar / Twitter / Github

News

03/2025 Our paper JEGAL is accepted to ICCV 2025.
03/2025 Our paper EgoPressure is accepted to CVPR 2025.
09/2024 I will start my postdoc fellow position at VGG, Oxford.
07/2024 I successfully defended my PhD.
07/2023 Our paper HoloAssist is accepted to ICCV 2023.
12/2022 I will start my research internship at Meta Reality Labs Research.
06/2022 Our paper Egobody is accepted to ECCV 2022.
06/2022 I will start my research internship at Microsoft Research.
05/2022 I will co-organize Human Body, Hands, and Activities from Egocentric and Multi-view Cameras @ ECCV 2022.
03/2022 Our paper CASA is accepted to CVPR 2022 (oral).
03/2021 Our paper H2O is accepted to ICCV 2021.

Research

	EgoWorld: Translating Exocentric View to Egocentric View using Rich Exocentric Observations Junho Park, Andrew Sangwoo Ye, Taein Kwon arXiv, 2025 project page / paper We introduce EgoWorld, a novel two-stage framework that reconstructs egocentric view from rich exocentric observations, including depth maps, 3D hand poses, and textual descriptions.
	Understanding Co-speech Gestures in-the-wild Sindhu Hggde, K R Prajwal, Taein Kwon, Andrew Zisserman ICCV, 2025 project page / paper / code We introduce JEGAL, a Joint Embedding space for Gestures, Audio and Language. Our semantic gesture representations can be used to perform multiple downstream tasks such as cross-modal retrieval, spotting gestured words, and identifying who is speaking solely using gestures. * co-first authors
	EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision Yiming Zhao, Taein Kwon, Paul Streli, Marc Pollefeys, Christian Holz CVPR, 2025 project page / paper We introduce EgoPressure, a novel dataset of touch contact and pressure interaction from an egocentric perspective, complemented with hand pose meshes and fine-grained pressure intensities for each contact. co-first authors
	Multi Activity Sequence Alignment via Implicit Clustering Taein Kwon, Zador Pataki, Mahdi Rad, Marc Pollefeys arXiv, 2025 paper We propose a novel framework that overcomes these limitations using sequence alignment via implicit clustering. Specifically, our key idea is to perform implicit clip-level clustering while aligning frames in sequences. This coupled with our proposed dual augmentation technique enhances the network's ability to learn generalizable and discriminative representations.
	HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Ashley Fanello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys ICCV, 2023 project page / paper / supp HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment. * co-first authors
	CaSAR: Contact-aware Skeletal Action Recognition Junan Lin, Zhichao Sun, Enjie Cao, Taein Kwon, Mahdi Rad, Marc Pollefeys arXiv, 2023 paper Contact-aware Skeletal Action Recognition (CaSAR) uses novel representations of hand-object interaction that encompass spatial information: 1) contact points where the hand joints meet the objects, 2) distant points where the hand joints are far away from the object and nearly not involved in the current action. Our framework is able to learn how the hands touch or stay away from the objects for each frame of the action sequence, and use this information to predict the action class. co-first authors
	EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang ECCV, 2022 project page / paper / code EgoBody is a large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2. Given two interacting subjects, we leverage a lightweight multi-camera rig to reconstruct their 3D shape and pose over time.
	Context-Aware Sequence Alignment using 4D Skeletal Augmentation Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys CVPR, 2022 (Oral Presentation) project page / paper / video / code We propose a skeletal self-supervised learning approach that uses alignment as a pretext task. Our approach to alignment relies on a context-aware attention model that incorporates spatial and temporal context within and across sequences and a contrastive learning formulation that relies on 4D skeletal augmentations. Pose data provides a valuable cue for alignment and downstream tasks, such as phase classification and phase progression, as it is robust to different camera angles and changes in the background, while being efficient for real-time processing.
	H2O: Two Hands Manipulating Objects for First Person Interaction Recognition Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys ICCV, 2021 project page / paper / video In this paper, we propose a method to collect a dataset of two hands manipulating objects for first person interaction recognition. We provide a rich set of annotations including action labels, object classes, 3D left & right hand poses, 6D object poses, camera poses and scene point clouds. We further propose the first method to jointly recognize the 3D poses of two hands manipulating objects and a novel topology-aware graph convolutional network for recognizing hand-object interactions.
	Smart Refrigerator for Healthcare Using Food Image Classification Taein Kwon, Eunjeong Park, Hyukjae Chang ACM BCB, 2016 We propose a sensor-equipped food container, Smart Refrigerator, which recognizes foods and monitors their status. We demonstrated the performance in detection of food and suggested that automatic monitoring of food intake can provide intuitive feedback to users.

Teaching

Teaching Assistant

Mixed Reality [2019-Present]

3D Vision [2019-Present]

Computer Vision [2018]

AI and new media [2018]

Intermediate Programming [2017]

Awards and Talks

Awards

Postdoc Mobility Fellowship, Swiss National Science Foundation (155K USD) 2024

Distinguished Paper Award (HoloAssist), Egovis @ CVPR'24 2024

Grant, Swiss National Science Foundation, “Beyond Frozen Worlds: Capturing functional 3D Digital Twins from the Real World” Role: Project Conceptualization PI. Prof. Marc Pollefeys (2M USD) 2023

Scholarship, Recipient of Korean Government Scholarship from NIIED (150K USD) 2018

Scholarship, Yonsei International Foundation 2016

IBM Innovation Prize, Startup Weekend, Technology Competition 2015

Best Technology Prize, Internet of Things (IoT) Hackathon by the government of Korea 2014

Best Laboratory Intern, Yonsei Institute of Information and Communication Technology 2014

Scholarship, Yonsei University Foundation,Korean Telecom Group Foundation 2014, 2011, 2010

Creative Prize, Startup Competition, Yonsei University 2014

Talks

2025/01: Video Understand Team @ Naver

2024/12: GSDS @ Seoul Nation University

2024/07: Y.Sato Lab @ University of Tokyo

2024/05: Visual Intelligence Lab @ Yonsei University

2024/01: CVLAB @ Korea University

2023/11: Intelligent Robotics Laboratory @ University of Birmingham

2023/08: NVIDIA Research, Taiwan

2023/06: Microsoft Mixed Reality & AI Lab, Zurich

2023/06: AIoT Lab, Seoul National University

2023/05: KASE Open Seminar, ETH Zurich

2022/03: Context-Aware Sequence Alignment using 4D Skeletal Augmentation. Applied Machine Learning Days (AMLD) @EPFL & Swiss JRC [Link]

2021/10: H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. ICCV 2021 Workshop on Egocentric Perception, Interaction and Computing (EPIC) [Link|Video]

2021/04: H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. Swiss Joint Research Center (JRC) Workshop 2021 [Link|Video]

Academic Service

Organizer: Human Body, Hands, and Activities from Egocentric and Multi-view Cameras @ ECCV'22,
KSAE Open Seminar @ ETH Zurich

Reviwer: CVPR, ICCV, ECCV, CHI, SIGGRAPH

Student Projects

Point Cloud Completion for Mixed Reality, Mixed Reality 2023

Few-Shot Action Recognition, Master Semster Thesis 2023

Understanding Human Hand and Finger Interaction with Rigid Surfaces for AR/VR Applications, Master Thesis 2023

Point Cloud-Based Tutorials for the HoloLens 2, Bachelor Thesis 2023

CaSAR: Contact-aware Skeletal Action Recognition, 3DV 2023

Head-Worn Camera Image Stabilization using Neural Radiance Fields, 3DV 2023

Object Pose Estimation and Tracking in Mixed Reality Appplications, Master Semester Thesis 2023

Interacting with the Robot via Hololens 2, Mixed Reality 2022

Climbing instructions using Mixed Reality, Mixed Reality 2022

Image Stabilization for HoloLens Camera, Mixed Reality 2022

4D Holographic Tutorials, 3DV 2022

AR Dance Game, Mixed Reality 2021

AR Dataset for Egocentric Action Recognition, 3DV 2021

Virtual Paintings and Graffitis, Mixed Reality 2020

3D Pose Motion Representation for Action Recognition, 3DV 2019

3D Hand Shape and Pose from Images, 3DV 2019

HoloLens Robot Controller, 3DV 2019

Contact

Dept. of Computer Science

ETH Zurich

CNB G 85.2

Universitatstrasse 6

CH-8092 Zurich, Switzerland

Tel: +41 (0)44 63 26 360

taein.kwon@inf (dot) ethz (dot) ch

Website template from Jon Barron.