Why physical AI 2.0 needs a reality check

A robot arm uses sensors and physical AI for perception.

Physical AI needs more than data to enable robots to be more effective. Source: Erika AI, via Adobe Stock

The world of artificial intelligence is moving from chatbots to vision processing—AI that lives in robots and self-driving cars. While we have made major strides in training these systems using massive datasets and digital simulations, a critical gap remains: the bridge between what a robot “sees” and what is actually happening in our messy, physical world.

High-level reasoning is not enough if the system doesn’t fully understand the physical state of its environment.

Physical AI evolves from Version 1.0 to 2.0

Currently, the industry is dominated by “physical AI 1.0.” This phase is defined by scale: using massive amounts of video and text data, along with hyper-realistic simulations like NVIDIA’s Cosmos platform, to teach machines how the world works before they ever take their first steps.

However, physical AI 1.0 has a “vision-first” bias. It assumes that if a robot has enough cameras and enough compute power, it can accurately predict the future. But as any driver knows, cameras can be blinded by glare, objects can be hidden in shadows, and sensors can provide noisy, conflicting data.

“Physical AI 2.0” introduces a new, essential layer to the stack: physical state recovery.

The distinction matters because the unit of competition in physical AI is no longer just the model. In digital AI, the model is often the product.

In embodied systems, the model has to work with sensing, simulation, policy training, orchestration, safety systems, edge deployment, and feedback from live operations. A robot that misreads the present cannot reason its way out of a bad state estimate.

ITE AD for the 2026 RoboBusiness call for speakers

Submit your session idea for the 2026 RoboBusiness

The new architecture of action

To function safely in the real world, a system needs four distinct capabilities working in a loop:

World models: These provide the “priors”—the learned knowledge of what might happen based on past experience and simulations.
Physical state recovery: This is the “missing link.” It takes noisy, incomplete sensor data and reconstructs the actual physical state of the world. It’s the difference between guessing where a pedestrian is and knowing their exact trajectory through a cluttered scene.
Reasoning systems: Once the state is recovered, the AI deliberates. It compares options, weighs risks, and decides on the best intent such as, “Should I yield or nudge?”
Action: The final step where the system executes a movement within strict safety boundaries.

Reasoning is only as good as the state estimate it reasons over. If the observation is incomplete or distorted, even an excellent reasoning model can become confidently wrong.

That separation is important. Reasoning systems influence control, they do not actuate directly. In robust systems, reasoning proposes intent, constraints, explanations, or candidate actions; planning, control, and safety logic then convert those outputs into bounded motion.

Physical AI is not merely descriptive or predictive. It becomes physical when decisions are translated into movement, and when that movement changes the world and creates the next set of observations.

Why more data isn’t the only answer

A common counterargument is that if we just build bigger “end-to-end” models, the AI will eventually learn to handle noisy sensors on its own.

A dedicated recovery layer is more efficient. By treating physical state recovery as its own module, developers can exploit specialized sensing (like radar or touch) and improve observability before the higher-level “brain” even starts thinking. This prevents every new robot from having to “relearn” the basic laws of physics from scratch.

The key distinction is between difficult cases and poorly observed cases. Benchmarks can tell developers that a system struggles with long-tail scenarios, such as occlusions or unusual road-user behavior.

But identifying a hard case is not the same as recovering what the sensors failed to capture. A camera can produce more frames, and a model can analyze them longer, but if the underlying observation is structurally degraded, downstream reasoning may still be operating on the wrong picture.

In those cases, the answer is not only more data. It is a stronger recovery layer that uses physics-based constraints and richer sensing to make the hidden state more visible.

Real-world examples: Robots and cars

Capability	Humanoid robot folding laundry	Autonomous vehicle navigating city
World models	Predicts how different fabrics should fold	Predicts how traffic flows in rain
State recovery	Identifies the garment's shape despite wrinkles, shadows, partial views, and ambiguous contact	Tracks a cyclist hidden behind a parked truck and cluttered scene
Reasoning	Decides whether to fold, re-grasp, set aside, or ask for help	Decides whether to yield, stop, nudge, or replan
Action	Gently folds the sleeve	Executes a smooth, safe steering maneuver

Observation is the bottom line for physical AI

The next frontier of AI isn’t just about making models “smarter” at reasoning; it’s about making them “better” at observing. The winner of the AI race will be the system that can most accurately bridge the gap between digital prediction and physical reality.

Vision and language are a start, but for physical AI to truly graduate into the real world, it needs a more trustworthy grip on the actual world it’s trying to move in.

Because in the real world, what you don’t see matters more than what you do.

About the author

Dr. Behrooz Rezvani is a serial entrepreneur, technologist, and systems architect who has repeatedly turned frontier mathematics into platforms and products. He founded Ikanos Communications, which helped redefine high-speed wireline broadband and was later acquired by Qualcomm Atheros.

Rezvani also co-founded Quantenna Communications, a leading Wi‑Fi semiconductor company acquired by ON Semiconductor for approximately $1.07 billion. He is founder and CEO of Atomathic, which is building the physics, mathematics, and inference software platform for physical AI — “making the invisible visible for defense, autonomy, robotics, aviation, and intelligent machines” — with strategic backing from RTX Ventures and GM Ventures.