The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe

Meta V-JEPA 2 world model uses raw video to train robots

By Steve Crowe | June 11, 2025

Meta today introduced V-JEPA 2, a 1.2-billion-parameter world model trained primarily on video to support understanding, prediction, and planning in robotic systems. Built on the Joint Embedding Predictive Architecture (JEPA), the model is designed to help robots and other “AI agents” navigate unfamiliar environments and tasks with limited domain-specific training.

V-JEPA 2 follows a two-stage training process all without additional human annotation. In the first, self-supervised stage, the model learns from over 1 million hours of video and 1 million images, capturing patterns of physical interaction. The second stage introduces action-conditioned learning using a small set of robot control data (about 62 hours), allowing the model to factor in agent actions when predicting outcomes. This makes the model usable for planning and closed-loop control tasks.

Meta said it has already tested this new model on robots in its labs. Meta reports that V-JEPA 2 performs well on common robotic tasks like and pick-and-place, using vision-based goal representations. For simpler tasks such as pick and place, the system generates candidate actions and evaluates them based on predicted outcomes. For tougher tasks, such as picking up an object and placing it in the right spot, V-JEPA2 uses a sequence of visual subgoals to guide behavior.

In internal tests, Meta said the model showed promising ability to generalize to new objects and settings, with success rates ranging from 65% to 80% on pick-and-place tasks in previously unseen environments.

“We believe world models will usher a new era for robotics, enabling real-world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data,” said Meta’s chief AI scientist Yann LeCun.

Although V-JEPA 2 shows improvements over prior models, Meta AI said there remains a noticeable gap between model and human performance on these benchmarks. Meta suggests this points to the need for models that can operate across multiple timescales and modalities, such as incorporating audio or tactile information.

To assess progress in physical understanding from video, Meta is also releasing the following three benchmarks:

  • IntPhys 2: evaluates the model’s ability to distinguish between physically plausible and implausible scenarios.
  • MVPBench: tests whether models rely on genuine understanding rather than dataset shortcuts in video question-answering.
  • CausalVQA: examines reasoning about cause-and-effect, anticipation, and counterfactuals.

The V-JEPA 2 code and model checkpoints are available for commercial and research use, with Meta aiming to encourage broader exploration of world models in robotics and embodied AI.

Meta joins other tech leaders in developing their own world models. Google DeepMind has been developing its own version, Genie, which can simulate entire 3D environments. And World Labs, a startup founded by Fei-Fei Li, raised $230 million to build large world models.

About The Author

Steve Crowe

Steve Crowe is Executive Editor, Robotics, WTWH Media, and chair of the Robotics Summit & Expo and RoboBusiness. He is also co-host of The Robot Report Podcast, the top-rated podcast for the robotics industry. He joined WTWH Media in January 2018 after spending four-plus years as Managing Editor of Robotics Trends Media. He can be reached at scrowe@wtwhmedia.com

Tell Us What You Think! Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles Read More >

The Loomia Smart Skin Developer Kit can help roboticists test flexible tactile sensing, as shown here with robotic hands.
Loomia Smart Skin Developer Kit to help give humanoid robots a sense of touch
Cohesive Robotics offers software for an adaptable welding workcell, shown here.
Cohesive Robotics releases Smart Welding Robotic Workcell
close up view of the TUM robot tree.
Unveiling the Tree of Robots: A new taxonomy for understanding robotic diversity
XTEND's Drone Hive is an example of its autonomous defense technology.
XTEND secures extension to Series B to scale autonomous tactical robots

RBR50 Innovation Awards

“rr
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for Robotics Professionals.
The Robot Report Listing Database

Latest Episode of The Robot Report Podcast

Automated Warehouse Research Reports

Sponsored Content

  • How to Set Up a Planetary Gear Motion with SOLIDWORKS
  • Sager Electronics and its partners, logos shown here, will exhibit at the 2025 Robotics Summit & Expo. Sager Electronics to exhibit at the Robotics Summit & Expo
  • The Shift in Robotics: How Visual Perception is Separating Winners from the Pack
  • An AutoStore automated storage and retrieval grid. Webinar to provide automated storage and retrieval adoption advice
  • Smaller, tougher devices for evolving demands
The Robot Report
  • Automated Warehouse
  • RoboBusiness Event
  • Robotics Summit & Expo
  • About The Robot Report
  • Subscribe
  • Contact Us

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search The Robot Report

  • Home
  • News
  • Technologies
    • Batteries / Power Supplies
    • Cameras / Imaging / Vision
    • Controllers
    • End Effectors
    • Microprocessors / SoCs
    • Motion Control
    • Sensors
    • Soft Robotics
    • Software / Simulation
  • Development
    • Artificial Intelligence
    • Human Robot Interaction / Haptics
    • Mobility / Navigation
    • Research
  • Robots
    • AGVs
    • AMRs
    • Consumer
    • Collaborative Robots
    • Drones
    • Humanoids
    • Industrial
    • Self-Driving Vehicles
    • Unmanned Maritime Systems
  • Business
    • Financial
      • Investments
      • Mergers & Acquisitions
      • Earnings
    • Markets
      • Agriculture
      • Healthcare
      • Logistics
      • Manufacturing
      • Mining
      • Security
    • RBR50
      • RBR50 Winners 2025
      • RBR50 Winners 2024
      • RBR50 Winners 2023
      • RBR50 Winners 2022
      • RBR50 Winners 2021
  • Resources
    • Automated Warehouse Research Reports
    • Digital Issues
    • eBooks
    • Publications
      • Automated Warehouse
      • Collaborative Robotics Trends
    • Search Robotics Database
    • Videos
    • Webinars / Digital Events
  • Events
    • RoboBusiness
    • Robotics Summit & Expo
    • DeviceTalks
    • R&D 100
    • Robotics Weeks
  • Podcast
    • Episodes
  • Advertise
  • Subscribe