Publications and Preprints
Papers are in reverse chronological order. '*' denotes equal contribution.
|
|
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman,
...
Sanjay Haresh,
Yongsen Mao*,
Manolis Savva,
...
CVPR, 2024
project page
/
arXiv
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge to push the frontier of first-person video understanding of skilled human activity.
|
|
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna*,
Yongsen Mao*,
Hanxiao Jiang,
Sanjay Haresh,
Brennan Shacklett,
Dhruv Batra,
Alexander Clegg,
Eric Undersander,
Angel Chang,
Manolis Savva
CVPR, 2024
project page
/
arXiv
We present the Habitat Synthetic Scene Dataset, a dataset of 211 high-quality 3D scenes, and use it to investigate the impact of synthetic 3D scene dataset scale and realism on the task of training embodied agents to find and navigate to objects.
|
|
Articulated 3D Human-Object Interactions from RGB Videos: An Empirical Analysis of
Approaches and Challenges
Sanjay Haresh,
Xiaohao Sun,
Hanxiao Jiang,
Angel Chang,
Manolis Savva
3DV, 2022
project page
/
arXiv
We canonicalize the task of reconstruction 3D human object from videos and benchmark 5 families of methods on the task.
|
|
Timestamp-Supervised Action Segmentation with Graph Convolutional Networks
Hamza Khan
Sanjay Haresh,
Awais Ahmed,
Shakeeb Siddiqui,
Andrey Konin ,
M. Zeeshan Zia,
Quoc-Huy Tran
IROS, 2022
project page
/
arXiv
We leverage graph convolutional networks to propagate timestamp labels to the whole video resulting in a 97% reduction of required labels.
|
|
Unsupervised Action Segmentation by Joint Representation Learning and Online Clustering
Sanjay Haresh*,
Sateesh Kumar*,
Awais Ahmed,
Andrey Konin ,
M. Zeeshan Zia,
Quoc-Huy Tran
CVPR, 2022
project page
/
arXiv
We proposed temporal optimal transport for jointly learning representations and performing online clustering in an unsupervised manner.
|
|
Learning by Aligning Video in Time
Sanjay Haresh*,
Sateesh Kumar*,
Huseyin Coskun,
Shahram N. Syed,
Andrey Konin ,
M. Zeeshan Zia,
Quoc-Huy Tran
CVPR, 2021
project page
/
arXiv
Good frame representations can be learned by learning global alignment across pairs of videos via differentiable dynamic time warping.
|
|
Towards Anomaly Detection in Dashcam Videos
Sanjay Haresh*,
Sateesh Kumar*,
M. Zeeshan Zia
Quoc-Huy Tran
IV, 2020
talk
/
arXiv
We curated a large dataset of dashcam videos for road anomalies understanding. We proposed an object-object interaction
reasoning approach for detecting anomalies without additional supervision.
|
|