Taein Kwon

I am a Ph.D. student at the Computer Vision and Geometry Group, ETH Zurich working with Prof. Marc Pollefeys.

My research interests are Egocentric Vision, Action Recognition, Video Understanding, 3D Vision, Hand-object Interaction, Multi-modal Learning, AR/VR and Computer Vision.

Previously, at UCLA, I graduated with an M.S. in Electrical and Computer Engineering. Also, I received my B.S. in Electrical Engineering from Yonsei University, Seoul, Korea

Email  /  LinkedIn  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
  • 07/2023 Our paper HoloAssist is accepted to ICCV 2023.
  • 12/2022 I will start my research internship at Meta Reality Labs Research.
  • 06/2022 Our paper Egobody is accepted to ECCV 2022.
  • 06/2022 I will start my research internship at Microsoft Research.
  • 05/2022 I will co-organize Human Body, Hands, and Activities from Egocentric and Multi-view Cameras @ ECCV 2022.
  • 03/2022 Our paper CASA is accepted to CVPR 2022 (oral).
  • 03/2021 Our paper H2O is accepted to ICCV 2021.

HoloAssist: An Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World
Xin Wang*, Taein Kwon*, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Ashley Fanello, Bugra Tekin, Felipe Vieira Frujeri, Neel Joshi, Marc Pollefeys
ICCV, 2023 
project page / paper / supp

HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment.

* co-first authors
CaSAR: Contact-aware Skeletal Action Recognition
Junan Lin*, Zhichao Sun*, Enjie Cao*, Taein Kwon, Mahdi Rad, Marc Pollefeys
arXiv, 2023 

Contact-aware Skeletal Action Recognition (CaSAR) uses novel representations of hand-object interaction that encompass spatial information: 1) contact points where the hand joints meet the objects, 2) distant points where the hand joints are far away from the object and nearly not involved in the current action. Our framework is able to learn how the hands touch or stay away from the objects for each frame of the action sequence, and use this information to predict the action class.

* co-first authors
EgoBody: Human Body Shape and Motion of Interacting People from Head-Mounted Devices
Siwei Zhang, Qianli Ma, Yan Zhang, Zhiyin Qian, Taein Kwon, Marc Pollefeys, Federica Bogo, Siyu Tang
ECCV, 2022 
project page / paper / code

EgoBody is a large-scale dataset of accurate 3D human body shape, pose and motion of humans interacting in 3D scenes, with multi-modal streams from third-person and egocentric views, captured by Azure Kinects and a HoloLens2. Given two interacting subjects, we leverage a lightweight multi-camera rig to reconstruct their 3D shape and pose over time.

Context-Aware Sequence Alignment using 4D Skeletal Augmentation
Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys
CVPR, 2022 (Oral Presentation)
project page / paper / video / code

We propose a skeletal self-supervised learning approach that uses alignment as a pretext task. Our approach to alignment relies on a context-aware attention model that incorporates spatial and temporal context within and across sequences and a contrastive learning formulation that relies on 4D skeletal augmentations. Pose data provides a valuable cue for alignment and downstream tasks, such as phase classification and phase progression, as it is robust to different camera angles and changes in the background, while being efficient for real-time processing.

H2O: Two Hands Manipulating Objects for First Person Interaction Recognition
Taein Kwon, Bugra Tekin, Jan Stuhmer, Federica Bogo, Marc Pollefeys
ICCV, 2021
project page / paper / video

In this paper, we propose a method to collect a dataset of two hands manipulating objects for first person interaction recognition. We provide a rich set of annotations including action labels, object classes, 3D left & right hand poses, 6D object poses, camera poses and scene point clouds. We further propose the first method to jointly recognize the 3D poses of two hands manipulating objects and a novel topology-aware graph convolutional network for recognizing hand-object interactions.

Smart Refrigerator for Healthcare Using Food Image Classification
Taein Kwon, Eunjeong Park, Hyukjae Chang
ACM BCB, 2016

We propose a sensor-equipped food container, Smart Refrigerator, which recognizes foods and monitors their status. We demonstrated the performance in detection of food and suggested that automatic monitoring of food intake can provide intuitive feedback to users.

Teaching Assistant
Awards and Talks
  • Grant, Swiss National Science Foundation, “Beyond Frozen Worlds: Capturing functional 3D Digital Twins from the Real World” Role: Project Conceptualization PI. Prof. Marc Pollefeys (2M USD) 2023

  • Scholarship, Recipient of Korean Government Scholarship from NIIED (150K USD) 2018

  • Scholarship, Yonsei International Foundation 2016

  • IBM Innovation Prize, Startup Weekend, Technology Competition 2015

  • Best Technology Prize, Internet of Things (IoT) Hackathon by the government of Korea 2014

  • Best Laboratory Intern, Yonsei Institute of Information and Communication Technology 2014

  • Scholarship, Yonsei University Foundation,Korean Telecom Group Foundation 2014, 2011, 2010

  • Creative Prize, Startup Competition, Yonsei University 2014
  • 2023/06: Toward Interactive AI in Mixed Reality @ Microsoft Mixed Reality & AI Lab, Zurich

  • 2023/06: Toward Interactive AI in Mixed Reality @ AIoT Lab, Seoul National University

  • 2023/05: Toward Interactive AI in Mixed Reality @ KASE Open Seminar, ETH Zurich

  • 2022/03: Context-Aware Sequence Alignment using 4D Skeletal Augmentation. Applied Machine Learning Days (AMLD) @EPFL & Swiss JRC [Link]

  • 2021/10: H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. ICCV 2021 Workshop on Egocentric Perception, Interaction and Computing (EPIC) [Link|Video]

  • 2021/04: H2O: Two Hands Manipulating Objects for First Person Interaction Recognition. Swiss Joint Research Center (JRC) Workshop 2021 [Link|Video]
Academic Service
Organizer: Human Body, Hands, and Activities from Egocentric and Multi-view Cameras @ ECCV'22,
KSAE Open Seminar @ ETH Zurich

Student Projects
Dept. of Computer Science

ETH Zurich

CNB G 85.2

Universitatstrasse 6

CH-8092 Zurich, Switzerland

Tel: +41 (0)44 63 26 360

taein.kwon@inf (dot) ethz (dot) ch

Website template from Jon Barron.