The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

How TRI is using Generative AI to teach robots

By Steve Crowe | September 19, 2023

Toyota Research Institute (TRI) today unveiled how it is using Generative AI to help robots learn new dexterous behaviors from demonstration. TRI said this new approach “is a step towards building ‘Large Behavior Models (LBMs)’ for robots, analogous to the Large Language Models (LLMs) that have recently revolutionized conversational AI.”

TRI said it has already taught robots more than 60 difficult, dexterous skills using the new approach. Some of these skills include pouring liquids, using tools and manipulating deformable objects. These were all realized, according to TRI, without writing a single line of new code; the only change was supplying the robot with new data. You can view more videos of this approach here.

“The tasks that I’m watching these robots perform are simply amazing – even one year ago, I would not have predicted that we were close to this level of diverse dexterity,” said Russ Tedrake, vice president of robotics research at TRI and the Toyota professor of electrical engineering and computer science, aeronautics and astronautics, and mechanical engineering at MIT. “What is so exciting about this new approach is the rate and reliability with which we can add new skills. Because these skills work directly from camera images and tactile sensing, using only learned representations, they are able to perform well even on tasks that involve deformable objects, cloth, and liquids — all of which have traditionally been extremely difficult for robots.”

At RoboBusiness, which takes place October 18-19 in Santa Clara, Calif., a keynote panel of robotics industry leaders will discuss the applications of Large Language Models (LLMs) and text generation applications to robotics. It will also explore fundamental ways generative AI can be applied to robotics design, model training, simulation, control algorithms and product commercialization.

The panel will include Pras Velagapudi, VP of Innovation at Agility Robotics, Jeff Linnell, CEO and founder of Formant, Ken Goldberg, the William S. Floyd Jr. Distinguished Chair in Engineering at UC Berkeley, Amit Goel, director of product management at NVIDIA, and Ted Larson, CEO of OLogic. 

Teleoperation

TRI’s robot behavior model learns from haptic demonstrations from a teacher, combined with a language description of the goal. It then uses an AI-based diffusion policy to learn the demonstrated skill. This process allows a new behavior to be deployed autonomously from dozens of demonstrations.

TRI’s approach to robot learning is agnostic to the choice of teleoperation device, and it said it has used a variety of low-cost interfaces such as joysticks. For more dexterous behaviors, it taught via bimanual haptic devices with position-position coupling between the teleoperation device and the robot. Position-position coupling means the input device sends measured pose as commands to the robot and the robot tracks these pose commands using torque-based Operational Space Control. The robot’s pose-tracking error is then converted to a force and sent back to the input device for the teacher to feel. This allows teachers to close the feedback loop with the robot through force and has been critical for many of the most difficult skills we have taught.

When the robot holds a tool with both arms, it creates a closed kinematic chain. For any given configuration of the robot and tool, there is a large range of possible internal forces that are unobservable visually. Certain force configurations, such as pulling the grippers apart, are inherently unstable and make it likely the robot’s grasp will slip. If human demonstrators do not have access to haptic feedback, they won’t be able to sense or teach proper control of force.

So TRI employs its Soft-Bubble sensors on many of its platforms. These sensors consist of an internal camera observing an inflated deformable outer membrane. They go beyond measuring sparse force signals and allow the robot to perceive spatially dense information about contact patterns, geometry, slip, and force.

Making good use of the information from these sensors has historically been a challenge. But TRI said diffusion provides a natural way for robots to use the full richness these visuotactile sensors afford that allows them to apply them to arbitrary dexterous tasks.

In one test, a human teacher attempted 10 egg-beating demonstrations. With haptic force feedback, the operator succeeded every time. Without this feedback, they failed every time.

Diffusion

Instead of image generation conditioned on natural language, TRI uses diffusion to generate robot actions conditioned on sensor observations and, optionally, natural language. TRI said using diffusion to generate robot behavior provides three benefits over previous approaches:

  • 1. Applicability to multi-modal demonstrations. This means human demonstrators can teach behaviors naturally and not worry about confusing the robot.
  • 2. Suitability to high-dimensional action spaces. This means it’s possible for the robot to plan forward in time which helps avoid myopic, inconsistent, or erratic behavior.
  • 3. Stable and reliable training. This means it’s possible to train robots at scale and have confidence they will work, without laborious hand-tuning or hunting for golden checkpoints.

According to TRI, Diffusion is well suited for high dimensional output spaces. Generating images, for example, requires predicting hundreds of thousands of individual pixels. For robotics, this is a key advantage and allows diffusion-based behavior models to scale to complex robots with multiple limbs. It also gave TRI the ability to predict intended trajectories of actions instead of single timesteps.

 

TRI said this Diffusion Policy is “embarrassingly simple” to train; new behaviors can be taught without requiring numerous costly and laborious real-world evaluations to hunt for the best-performing checkpoints and hyperparameters. Unlike computer vision or natural language applications, AI-based closed-loop systems can not be accurately evaluated with offline metrics — they must be evaluated in a closed-loop setting which, in robotics, generally requires evaluation on physical hardware.

This means any learning pipeline that requires extensive tuning or hyperparameter optimization becomes impractical due to this bottleneck in real-life evaluation. Because Diffusion Policy works out of the box so consistently, it allowed TRI to bypass this difficulty.

Next steps

TRI admitted that “when we teach a robot a new skill, it is brittle.” Skills will work well in circumstances that are similar to those used in teaching, but the robot will struggle when they differ. TRI said the most common causes of failure cases we observe are:

  • States where no recovery has been demonstrated. This can be the result of demonstrations that are too clean.
  • Camera viewpoint or background significant changes.
  • Test time manipulands that were not encountered during training.
  • Distractor objects, for example, significant clutter that was not present during training.

Part of TRI’s technology stack is Drake, a model-based design for robotics that includes a toolbox and simulation platform. Drake’s degree of realism allows TRI to develop in both simulation and in reality and could help overcome these shortcomings going forward.

TRI’s robots have learned 60 dexterous skills already, with a target of hundreds by the end of 2023 and 1,000 by the end of 2024.

“Existing Large Language Models possess the powerful ability to compose concepts in novel ways and learn from single examples,” TRI said. “In the past year, we’ve seen this enable robots to generalize semantically (for example, pick and place with novel objects). The next big milestone is the creation of equivalently powerful Large Behavior Models that fuse this semantic capability with a high level of physical intelligence and creativity. These models will be critical for general-purpose robots that are able to richly engage with the world around them and spontaneously create new dexterous behaviors when needed.”

About The Author

Steve Crowe

Steve Crowe is Executive Editor, Robotics, WTWH Media, and chair of the Robotics Summit & Expo and RoboBusiness. He is also co-host of The Robot Report Podcast, the top-rated podcast for the robotics industry. He joined WTWH Media in January 2018 after spending four-plus years as Managing Editor of Robotics Trends Media. He can be reached at scrowe@wtwhmedia.com

Tell Us What You Think! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles Read More >

The Northeastern team that won the MassRobotics Form & Function Challenge.
Northeastern soft robotic arm wins MassRobotics Form & Function Challenge at Robotics Summit
A FANUC robot working in car manufacturing.
U.S. automotive industry increased robot installations by 10% in 2024
A robot arm with a two-fingered gripper picking up a cup next to a sink.
Cornell University teaches robots new tasks from how-to videos in just 30 minutes
A comparison shot shows the relative size of the current RoboBee platform with a penny, a previous iteration of the RoboBee, and a crane fly.
Harvard equips its RoboBee with crane fly-inspired landing gear

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.
The Robot Report Listing Database

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • Sager Electronics and its partners, logos shown here, will exhibit at the 2025 Robotics Summit & Expo. Sager Electronics to exhibit at the Robotics Summit & Expo
  • The Shift in Robotics: How Visual Perception is Separating Winners from the Pack
  • An AutoStore automated storage and retrieval grid. Webinar to provide automated storage and retrieval adoption advice
  • Smaller, tougher devices for evolving demands
  • Modular motors and gearboxes make product development simple
The Robot Report
  • Mobile Robot Guide
  • Collaborative Robotics Trends
  • Field Robotics Forum
  • Healthcare Robotics Engineering Forum
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe