Publications
For the up-to-date publication list, please visit the Google Scholar page.
* Equal contribution. † Equal advising.
2023

VIMA: General Robot Manipulation with Multimodal Prompts
International Conference on Machine Learning (ICML), July 2023

Robot Learning on the Job: Human-in-the-Loop Autonomy and Learning During Deployment
Robotics: Science and Systems (RSS), July 2023

Fast Monocular Scene Reconstruction with Global-Sparse Local-Dense Grids
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments
IEEE International Conference on Robotics and Automation (ICRA), May 2023

Ditto in the House: Building Articulated Models of Indoor Scenes through Interactive Perception
IEEE International Conference on Robotics and Automation (ICRA), May 2023

Voyager: An Open-Ended Embodied Agent with Large Language Models
Technical report arXiv:2305.16291, May 2023

MimicPlay: Long-Horizon Imitation Learning by Watching Human Play
Technical report arXiv:2302.12422, February 2023

Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Technical report arXiv:2302.04858, February 2023
2022

VIOLA: Imitation Learning for Vision-Based Manipulation with Object Proposal Priors
Conference on Robot Learning (CoRL), December 2022

Learning and Retrieval from Prior Data for Skill-based Imitation Learning
Conference on Robot Learning (CoRL), December 2022

Few-View Object Reconstruction with Unknown Categories and Camera Poses
Technical report arXiv:2212.04492, December 2022

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Conference on Neural Information Processing Systems (NeurIPS), November 2022
Outstanding Paper Award

Pre-Trained Language Models for Interactive Decision-Making
Conference on Neural Information Processing Systems (NeurIPS), November 2022
Oral Presentation

Causal Dynamics Learning for Task-Independent State Abstraction
International Conference on Machine Learning (ICML), July 2022
Long Presentation

ACID: Action-Conditional Implicit Visual Dynamics for Deformable Object Manipulation
Robotics: Science and Systems (RSS), June 2022
Best Student Paper Finalist

Ditto: Building Digital Twins of Articulated Objects from Interaction
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Oral Presentation

COOPERNAUT: End-to-End Driving with Cooperative Perception for Networked Vehicles
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022

Bongard-HOI: Benchmarking Few-Shot Visual Reasoning for Human-Object Interactions
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022
Oral Presentation

Augmenting Reinforcement Learning with Behavior Primitives for Diverse Manipulation Tasks
IEEE International Conference on Robotics and Automation (ICRA), May 2022
Outstanding Learning Paper Award

OSCAR: Data-Driven Operational Space Control for Adaptive and Robust Robot Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2022

Visually Grounded Task and Motion Planning for Mobile Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2022

RelViT: Concept-Guided Vision Transformer for Visual Relational Reasoning
International Conference on Learning Representations (ICLR), April 2022

Bottom-Up Skill Discovery from Unsegmented Demonstrations for Long-Horizon Robot Manipulation
IEEE Robotics and Automation Letters (RA-L), January 2022
2021

Adversarial Skill Chaining for Long-Horizon Robot Manipulation via Terminal State Regularization
Conference on Robot Learning (CoRL), November 2021

What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
Conference on Robot Learning (CoRL), November 2021
Oral Presentation

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision
International Conference on Computer Vision (ICCV), October 2021

Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations
Robotics: Science and Systems (RSS), July 2021

Learning Generalizable Skills via Automated Generation of Diverse Tasks
Robotics: Science and Systems (RSS), July 2021

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning
International Conference on Machine Learning (ICML), July 2021

Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
International Conference on Machine Learning (ICML), July 2021
Long Talk

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
International Conference on Machine Learning (ICML), July 2021

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
NeurIPS 2021 Datasets and Benchmarks Track, July 2021

Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Fast Uncertainty Quantification for Deep Object Pose Estimation
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Deep Affordance Foresight: Planning Through What Can Be Done in the Future
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Learning Multi-Arm Manipulation Through Collaborative Teleoperation
IEEE International Conference on Robotics and Automation (ICRA), May 2021
Best Multi-Robotic Systems Paper Finalist

Emergent Hand Morphology and Control from Optimizing Robust Grasps of Diverse Objects
IEEE International Conference on Robotics and Automation (ICRA), May 2021

Adaptive Procedural Task Generation for Hard-Exploration Problems
International Conference on Learning Representations (ICLR), May 2021
2020

Human-in-the-Loop Imitation Learning using Remote Teleoperation
Technical report arXiv:2012.06733, December 2020

Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning
Conference on Neural Information Processing Systems (NeurIPS), December 2020
Spotlight Presentation

Learning a Contact-Adaptive Controller for Robust, Efficient Legged Locomotion
Conference on Robot Learning (CoRL), November 2020

robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
Technical report arXiv:2009.12293, September 2020

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
European Conference on Computer Vision (ECCV), August 2020
* indicates equal contribution

OCEAN: Online Task Inference for Compositional Tasks with Context Adaptation
Conference on Uncertainty in Artificial Intelligence (UAI), August 2020

DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
International Joint Conference on Artificial Intelligence (IJCAI), July 2020
* indicates equal contribution

6-PACK: Category-Level 6D Pose Tracker with Anchor-Based Keypoints
IEEE International Conference on Robotics and Automation (ICRA), May 2020

KETO: Learning Keypoint Representations for Tool Manipulation
IEEE International Conference on Robotics and Automation (ICRA), May 2020

Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
IEEE Transactions on Robotics (T-RO), March 2020
2019

Causal Induction from Visual Observations for Goal Directed Tasks
NeurIPS 2019 Workshop on Causal Machine Learning, December 2019

Regression Planning Networks
Conference on Neural Information Processing Systems (NeurIPS), December 2019

Scaling Robot Supervision to Hundreds of Hours with RoboTurk: Robotic Manipulation Dataset through Human Reasoning and Dexterity
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019
Best Cognitive Robotics Paper Finalist

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019

Dynamics Learning with Cascaded Variational Inference for Multi-Step Manipulation
Conference on Robot Learning (CoRL), October 2019
Oral Presentation

Situational Fusion of Visual Representation for Visual Navigation
International Conference on Computer Vision (ICCV), October 2019

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning
Technical report arXiv:1909.12989, September 2019
* indicates equal contribution

Closing the Perception-Action Loop: Towards Building General-Purpose Robot Autonomy
Stanford University Ph.D. Dissertation, August 2019

Learning Task-Oriented Grasping for Tool Manipulation from Simulated Self-Supervision
International Journal of Robotics Research (IJRR), August 2019

DenseFusion: 6D Object Pose Estimation by Iterative Dense Fusion
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Oral Presentation
* indicates equal contribution

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks
IEEE International Conference on Robotics and Automation (ICRA), May 2019
Best Conference Paper Award
* indicates equal contribution