The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

Reinforcement learning, YouTube teaching robots new tricks

By Oliver Mitchell | November 5, 2018


The sun may be setting on what David Letterman would call “Stupid Robot Tricks,” as intelligent machines are beginning to surpass humans in a wide variety of manual and intellectual pursuits. In March 2016, Google’s DeepMind software program AlphaGo defeated the reining Go champion, Lee Sedol. Go, a Chinese game that originated more than 3,000 years ago, is said to be googol times more complex than chess. Lee was previously considered the greatest player in the past decade with 18 world titles. Today, AlphaGo holds the ranking title.

Deconstructing how the DeepMind team was able to cross a once-impossible threshold for computer scientists could provide a primer on the tools available to roboticists. According to the AlphaGo website, “traditional AI methods, which construct a search tree over all possible positions, don’t have a chance in Go. This is because of the sheer number of possible moves and the difficulty of evaluating the strength of each possible board position.”

Instead, the researchers combined the traditional search tree approach with a deep learning system. “One neural network, the ‘policy network,’ selects the next move to play. The other neural network, the ‘value network,’ predicts the winner of the game.” However, the key of AlphaGo was having the AI go through a rigorous approach of “reinforcement learning,” where it plays itself thousands of times from the database of games.

“We showed AlphaGo a large number of strong amateur games to help it develop its own understanding of what reasonable human play looks like. Then we had it play against different versions of itself thousands of times, each time learning from its mistakes and incrementally improving until it became immensely strong.”

By October 2017, the AI became so powerful it bypassed the reinforcement learning process that contained human input of professional and amateur games to only play earlier versions of itself. The new program, AlphaGo Zero, beat the previous one that defeated Sedol months earlier by 100 games to 0, making it the greatest Go player in history. Deep Mind is now looking to apply this logic to “a wide set of structured problems that share similar properties to a game like Go, such as planning tasks or problems where a series of actions have to be taken in the correct sequence. Examples could include protein folding, reducing energy consumption or searching for revolutionary new materials.”

Reinforcement learning for physical skills

Reinforcement learning techniques are not limited to games of strategy. Researchers at the University of California’s Berkeley Artificial Intelligence Research (BAIR) Lab recently presented a paper using YouTube videos to train humanoids in mimicking movements. Utilizing a similar methodology as AlphaGo, the BAIR team developed a deep learning neural network that approximates the motion of actors seen online into programming steps for robots. “A staggering 300 hours of videos are uploaded to YouTube every minute,” the BAIR team wrote in its blog. “Unfortunately, it is still very challenging for our machines to learn skills from this vast volume of visual data.”

In order to access this treasure trove of training data, programmers today are forced to purchase and ferry around bulky motion capture (mocap) equipment to create their own demonstration videos. “Mocap systems also tend to be restricted to indoor environments with minimal occlusion, which can limit the types of skills that can be recorded,” said BAIR researchers Xue Bin (Jason) Peng and Angjoo Kanazawa. Tackling this challenge, Peng and Kanazawa set out to create a seamless AI platform for unmanned systems to learn skills by unpacking hours of online video clips.

The paper states: “In this work, we present a framework for learning skills from videos (SFV). By combining state-of-the-art techniques in computer vision and reinforcement learning, our system enables simulated characters to learn a diverse repertoire of skills from video clips. Given a single monocular video of an actor performing some skill, such as a cartwheel or a backflip, our characters are able to learn policies that reproduce that skill in a physics simulation, without requiring any manual pose annotations.”

reinforcement learning physical skills

Future developments

The video is fed through an agent that breaks down the movements into three stages: “pose estimation, motion reconstruction, and motion imitation.” The first stage predicts the frames following a subject initial pose. Then the “motion reconstruction” reorganizes these predictions into “reference motion.” The final process simulates the data with animated characters that continue to train via reinforcement learning. The SFV platform is actually an update to Peng and Kanazawa’s earlier system, DeepMimic, for using motion capture video. To date, the results have been staggering with 20 different skills acquired just from ordinary online videos, as shown below:

Peng and Kanazawa are hopeful that such simulations could be leveraged in the future to enable machines to navigate new environments: “Even though the environments are quite different from those in the original videos, the learning algorithm still develops fairly plausible strategies for handling these new environments.” The team is also optimistic about its contribution to furthering the development of mobile unmanned systems, “All in all, our framework is really just taking the most obvious approach that anyone can think of when tackling the problem of video imitation. The key is in decomposing the problem into more manageable components, picking the right methods for those components, and integrating them together effectively.”

Humbly, the BAIR team admits that most YouTube videos are still too complicated for their AI to imitate. Whimsically, Peng and Kanazawa single out dancing “Gangnam style” as once of these hurdles. “We still have all of our work ahead of us,” declares the researchers, “and we hope that this work will help inspire future techniques that will enable agents to take advantage of the massive volume of publicly available video data to acquire a truly staggering array of skills.”

reinforcement learning physical skills

About The Author

Oliver Mitchell

Oliver Mitchell is a partner at ff Venture Capital. Oliver first joined ff VC in 2014 as a Limited Partner, and then in 2018 as a Venture Partner. Today, he takes a leading role on the investment team in expanding the portfolio’s deep tech position with holdings in robotics, drones, artificial intelligence, and industrial automation technologies. Oliver also works with ffVC’s investor relations in forging strategic relationships for our limited partners and corporate venture groups. In addition, he serves on the boards of Civ Robotics, Cambrian Intelligence, AppBind, Storyfit, and Cardflight.

Previously, Oliver ran his own investment portfolio of a dozen companies that have since returned 8 exits, including two IPOs (NVCR and EKSO) and one unicorn (Triple Lift) with a combined value of over $20 billion. Previous startup outcomes have included selling Holmes Protection to ADT/Tyco, AmeriCash to American Express, and launching RobotGalaxy, a national consumer S.T.E.M. brand. Oliver is an Adjunct Professor at Sy Syms School of Business, and frequent writer of trade periodicals.

Tell Us What You Think! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles Read More >

Two Standard Bot robot arms in a white room.
Standard Bots launches 30kg robot arm and U.S. production facility
An image of a robotic arm picking up a block with a QR code on it at the 2024 Robotics Summit.
Your guide to Day 2 of the 2025 Robotics Summit & Expo
The showfloor at the 2024 Robotics Summit.
Your guide to Day 1 of the 2025 Robotics Summit & Expo
A robot arm with a two-fingered gripper picking up a cup next to a sink.
Cornell University teaches robots new tasks from how-to videos in just 30 minutes

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.
The Robot Report Listing Database

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • Sager Electronics and its partners, logos shown here, will exhibit at the 2025 Robotics Summit & Expo. Sager Electronics to exhibit at the Robotics Summit & Expo
  • The Shift in Robotics: How Visual Perception is Separating Winners from the Pack
  • An AutoStore automated storage and retrieval grid. Webinar to provide automated storage and retrieval adoption advice
  • Smaller, tougher devices for evolving demands
  • Modular motors and gearboxes make product development simple
The Robot Report
  • Mobile Robot Guide
  • Collaborative Robotics Trends
  • Field Robotics Forum
  • Healthcare Robotics Engineering Forum
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe