Publications
For the up-to-date publication list, please visit the Google Scholar page.
* Equal contribution. † Equal advising.
2025

HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots
IEEE International Conference on Robotics and Automation (ICRA), May 2025

SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2025

DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning
IEEE International Conference on Robotics and Automation (ICRA), May 2025

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2025

PRESTO: Fast Motion Planning Using Diffusion Models Based on Key-Configuration Environment Representation
IEEE International Conference on Robotics and Automation (ICRA), May 2025

BUMBLE: Unifying Reasoning and Acting with Vision-Language Models for Building-wide Mobile Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2025

LongVILA: Scaling Long-Context Visual Language Models for Long Videos
International Conference on Learning Representations (ICLR), April 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Technical report arXiv:2503.14734, March 2025

LEGATO: Cross-Embodiment Imitation Using a Grasping Tool
IEEE Robotics and Automation Letters (RA-L), March 2025

Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids
Technical report arXiv:2502.20396, February 2025

ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills
Technical report arXiv:2502.01143, February 2025

Humanoid Locomotion and Manipulation: Current Progress and Challenges in Control, Planning, and Learning
Technical report arXiv:2501.02116, January 2025
2024

AMAGO-2: Breaking the Multi-Task Barrier in Meta-Reinforcement Learning with Transformers
Conference on Neural Information Processing Systems (NeurIPS), December 2024

Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions
Conference on Robot Learning (CoRL), November 2024

OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation
Conference on Robot Learning (CoRL), November 2024
Oral Presentation

Multi-Task Interactive Robot Fleet Learning with Visual World Models
Conference on Robot Learning (CoRL), November 2024

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
International Journal of Robotics Research (IJRR), Oct 2024

ARDuP: Active Region Video Diffusion for Universal Policies
International Conference on Intelligent Robots and Systems (IROS), October 2024

PRIME: Scaffolding Manipulation Tasks with Behavior Primitives for Data-Efficient Imitation Learning
IEEE Robotics and Automation Letters (RA-L), October 2024

Foundation Models in Robotics: Applications, Challenges, and the Future
International Journal of Robotics Research (IJRR), September 2024

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
International Conference on Machine Learning (ICML), July 2024

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset
Robotics: Science and Systems (RSS), July 2024

InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning
Robotics: Science and Systems (RSS), July 2024

DrEureka: Language Model Guided Sim-To-Real Transfer
Robotics: Science and Systems (RSS), July 2024

RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
Robotics: Science and Systems (RSS), July 2024

ORION: Vision-based Manipulation from Single Human Video with Open-World Object Graphs
Technical report arXiv:2405.20321, May 2024

Doduo: Dense Visual Correspondence from Unsupervised Semantic-Aware Flow
IEEE International Conference on Robotics and Automation (ICRA), May 2024

Model-Based Runtime Monitoring with Interactive Imitation Learning
IEEE International Conference on Robotics and Automation (ICRA), May 2024

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
IEEE International Conference on Robotics and Automation (ICRA), May 2024
Best Conference Paper Award

LOTUS: Continual Imitation Learning for Robot Manipulation Through Unsupervised Skill Discovery
IEEE International Conference on Robotics and Automation (ICRA), May 2024

AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents
International Conference on Learning Representations (ICLR), May 2024
Spotlight Presentation

Eureka: Human-Level Reward Design via Coding Large Language Models
International Conference on Learning Representations (ICLR), May 2024

Few-View Object Reconstruction with Unknown Categories and Camera Poses
International Conference on 3D Vision (3DV), March 2024
Oral Presentation

Granger Causal Interaction Skill Chains
Transactions on Machine Learning Research (TMLR), March 2024

Voyager: An Open-Ended Embodied Agent with Large Language Models
Transactions on Machine Learning Research (TMLR), March 2024

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning
AAAI Conference on Artificial Intelligence (AAAI), February 2024
Oral Presentation
2023

Deep Imitation Learning for Humanoid Loco-manipulation through Human Teleoperation
International Conference on Humanoid Robots (Humanoids), December 2023
Oral Presentation

LIBERO: Benchmarking Knowledge Transfer in Lifelong Robot Learning
NeurIPS 2023 Datasets and Benchmarks Track, December 2023

Cross-Episodic Curriculum for Transformer Agents
Conference on Neural Information Processing Systems (NeurIPS), December 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), December 2023

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations
Conference on Robot Learning (CoRL), November 2023

MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations
Conference on Robot Learning (CoRL), November 2023

MUTEX: Learning Unified Policies from Multimodal Task Specifications
Conference on Robot Learning (CoRL), November 2023

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Conference on Robot Learning (CoRL), November 2023
Best Paper Award Finalist

Interactive Robot Learning from Verbal Correction
CoRL Workshop on Language and Robot Learning (LangRob), November 2023

Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning
International Conference on Intelligent Robots and Systems (IROS), October 2023

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation
International Journal of Robotics Research (IJRR), July 2023

VIMA: General Robot Manipulation with Multimodal Prompts
International Conference on Machine Learning (ICML), July 2023

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
Robotics: Science and Systems (RSS), July 2023
Best Paper Award Finalist

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

Ditto in the House: Building Articulated Models of Indoor Scenes through Interactive Perception
IEEE International Conference on Robotics and Automation (ICRA), May 2023

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments
IEEE International Conference on Robotics and Automation (ICRA), May 2023
2022

Learning and Retrieval from Prior Data for Skill-based Imitation Learning
Conference on Robot Learning (CoRL), December 2022

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
Conference on Robot Learning (CoRL), December 2022

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
NeurIPS 2022 Datasets and Benchmarks Track, November 2022
Outstanding Paper Award

Pre-Trained Language Models for Interactive Decision-Making
Conference on Neural Information Processing Systems (NeurIPS), November 2022
Oral Presentation

Causal Dynamics Learning for Task-Independent State Abstraction
International Conference on Machine Learning (ICML), July 2022
Long Presentation

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation
Robotics: Science and Systems (RSS), June 2022
Best Student Paper Award Finalist

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022

Ditto: Building Digital Twins of Articulated Objects from Interaction
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Oral Presentation

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Oral Presentation

Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks
IEEE International Conference on Robotics and Automation (ICRA), May 2022
Outstanding Learning Paper Award

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2022

Visually Grounded Task and Motion Planning for Mobile Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2022

RelViT: Concept-Guided Vision Transformer for Visual Relational Reasoning
International Conference on Learning Representations (ICLR), April 2022

Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation
IEEE Robotics and Automation Letters (RA-L), January 2022
2021

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
Conference on Robot Learning (CoRL), November 2021

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Conference on Robot Learning (CoRL), November 2021
Oral Presentation

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
International Conference on Computer Vision (ICCV), October 2021

Discovering Generalizable Skills via Automated Generation of Diverse Tasks
Robotics: Science and Systems (RSS), July 2021

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
Robotics: Science and Systems (RSS), July 2021

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
International Conference on Machine Learning (ICML), July 2021

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
International Conference on Machine Learning (ICML), July 2021
Long Talk

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
International Conference on Machine Learning (ICML), July 2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
NeurIPS 2021 Datasets and Benchmarks Track, July 2021

Fast Uncertainty Quantification for Deep Object Pose Estimation
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Deep Affordance Foresight: Planning Through What Can Be Done in the Future
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Emergent Hand Morphology and Control from Optimizing Robust Grasps of Diverse Objects
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Learning Multi-Arm Manipulation Through Collaborative Teleoperation
IEEE International Conference on Robotics and Automation (ICRA), May 2021
Best Multi-Robotic Systems Paper Award Finalist

Adaptive Procedural Task Generation for Hard-Exploration Problems
International Conference on Learning Representations (ICLR), May 2021