Published On Oct 19, 2024
Title: On Building General, Zero-Shot Robot Policies
Abstract:
Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this talk, I will present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we developed new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We trained five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. I will talk about our primary lessons from training RUMS: namely the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments. All the code and data, and models I will talk about have been open sourced in our website, https://robotutilitymodels.com/
Bio:
Nur Muhammad “Mahi” Shafiullah is a Ph.D. student at NYU Courant Institute advised by Lerrel Pinto. His research is driven by a vision of robots seamlessly integrated into our messy everyday lives: automating problems and continuously learning alongside us. Mahi's recent work has developed new algorithms for learning robotic behavior, large robot models for robust manipulation, and spatio-semantic memory that can handle dynamic changes in the world. He is passionate about getting these models and algorithms out in the real-world, operating autonomously in NYC homes. His work has been featured in Oral and Spotlight presentations and demos at conferences like ICRA, RSS, NeurIPS, ICML, and ICLR. Mahi is supported by the Apple Fellowship, the Jacob T. Schwarz Fellowship, and was a visiting scientist at Meta. In a past life, Mahi was a Silver medalist at IMO and worked on adversarial robustness as an undergrad at MIT (S.B. ‘19).