In the animated series "The Jetsons," Rosie the robotic maid effortlessly switches between tasks like vacuuming, cooking, and taking out the trash. However, in reality, training a robot to handle a wide range of tasks remains a significant challenge.
Traditionally, engineers gather task-specific data for a particular robot in controlled environments to train it. This process is costly, time-consuming, and often results in robots that struggle to adapt to new environments or tasks they haven’t been trained for.
To address this, MIT researchers have developed a technique that integrates a vast amount of data from diverse sources to train robots more efficiently. Their approach combines data from different domains, including simulations and real-world robots, along with various modalities such as vision sensors and robotic arm encoders, into a unified “language” that a generative AI model can understand.
By synthesizing such a large volume of data, the method allows robots to be trained for various tasks without starting from scratch every time. This technique is not only faster and more cost-effective than traditional methods but also outperformed training from the ground up by over 20 percent in both simulations and real-world experiments.
https://github.com/NathanGRJ/Mecha-Domination-Rampage-MOD-unlimited-diamonds
https://github.com/ThomasKPT/Flame-of-Valhalla-Global-MOD-unlimited-diamonds
https://github.com/WilliamHKN/ARK-Ultimate-Mobile-Edition-MOD-unlimited-keys
https://github.com/AllanNJT/Last-War-Survival-MOD-unlimited-diamonds
https://github.com/PeterGNC/NBA-2K25-MyTEAM-MOD-unlimited-VC
https://github.com/ChristianBNC/Post-Apo-Tycoon-MOD-unlimited-money-and-gems
https://github.com/AidenRTN/Ash-Echoes-Global-MOD-unlimited-free-X-Crystal
https://github.com/AndrewKBT/Grimguard-Tactics-MOD-unlimited-free-rubies
https://github.com/BlakeBNT/LootBoy-MOD-unlimited-diamonds
https://github.com/BrianHNB/Last-Day-on-Earth-Survival-MOD-unlimited-coins
https://github.com/CharlesKPD/Truckers-of-Europe-3-MOD-unlimited-money-all-levels-unlocked
https://github.com/CodyABT/MeChat-MOD-unlimited-gems
https://github.com/ConnorTND/Gold-and-Goblins-MOD-unlimited-money-and-gems
https://github.com/EthanKPN/Head-Ball-2-MOD-unlimited-diamonds-and-coins
https://github.com/EvanKMS/Race-Max-Pro-MOD-unlimited-money-and-gold
https://github.com/EvanBKM/Spider-Fighter-3-MOD-unlimited-money
https://github.com/GabrielKNC/Standoff-2-MOD-unlimited-gold
https://github.com/JackEMB/War-Thunder-Mobile-MOD-unlimited-money
https://github.com/JacobGNO/Flex-City-Vice-Online-MOD-unlimited-money-and-gold
https://github.com/JacobGNT/School-Party-Craft-MOD-unlimited-money
https://github.com/JamesGBT/One-State-RP-MOD-unlimited-money-and-gems
https://github.com/LiamFWV/Ride-Master-MOD-unlimited-money-and-gems
https://github.com/LucasPRB/Rec-Room-MOD-unlimited-tokens
https://github.com/MichaelNGF/Super-City-Building-Master-MOD-unlimited-money-and-gems
https://github.com/NathanKPG/Driving-School-SImulator-EVO-MOD-unlimited-money
“In robotics, we often hear that there isn’t enough training data. But the bigger issue is that this data comes from many different domains, modalities, and robot hardware,” explains Lirui Wang, an EECS graduate student and lead author of the research. “Our work demonstrates how you can train a robot using all this diverse data.”
Wang's co-authors include fellow graduate student Jialiang Zhao, Meta research scientist Xinlei Chen, and senior author Kaiming He, an associate professor in EECS and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Conference on Neural Information Processing Systems.
A robot’s “policy” is the set of instructions it follows based on sensor inputs, such as camera images or arm position data. Policies are typically trained through imitation learning, where a human demonstrates actions or operates the robot remotely. However, this method relies on a limited amount of task-specific data, which often causes robots to struggle when faced with new environments or tasks.
To improve this, the MIT team drew inspiration from large language models like GPT-4, which are pretrained on vast amounts of data and then fine-tuned with task-specific data. This pretraining enables them to perform a wide range of tasks effectively.
"In the language domain, data consists of sentences. But in robotics, the data is much more varied. To pretrain robots like language models, we need a different architecture," says Wang.
Given the diverse nature of robotic data—ranging from camera images to language instructions and depth maps—and the mechanical differences between robots, the researchers developed a new architecture called Heterogeneous Pretrained Transformers (HPT). This architecture unifies data from different modalities and domains to train robots more efficiently.
Comments
Displaying 0 of 0 comments ( View all | Add Comment )