Boston Dynamics Inc. recently posted a video highlighting how its new, electric Atlas humanoid robot performs tasks in the lab. You can watch it above.
The first thing that hits me from the video is how Atlas showcases its real-time perception. The video shows how the humanoid actively registers its frame of reference for the engine covers and all the pick-and-place locations.
The robot continually updates its understanding of the world to handle the parts effectively. When Atlas picks something up, it evaluates the topology of the part – how to handle it and where to place it.

Atlas perceives the topology of the part held in its hand as it acquires the part from the shelf. | Credit: Boston Dynamics
Then, there is this moment at 1:14 in the demo where an engineer dropped an engine cover on the floor. Atlas reacts as if it hears the part hit the floor. The humanoid then looks around, locates the part, figures out how to pick it up — again, evaluating its form — and places it with the necessary precision into the engine cover area.
“In this particular clip, the search behavior is manually triggered,” Scott Kuindersma, senior director of robotics research at Boston Dynamics, told The Robot Report. “The robot isn’t using audio cues to detect an engine cover hitting the ground. The robot is autonomously ‘finding’ the object on the floor, so in practice we can run the same vision model passively and trigger the same behavior if an engine cover — or whatever part we’re working with — is detected out of the fixture during normal operation.”
The video highlights Atlas’ ability to adapt and perceive its environment, adjust its concept of that world, and still stick to its assigned task. It shows how the robot can handle chaotic environments, maintain its task objective, and make changes to its mission on the fly.

Atlas can scan the floor and identify a part on the floor that doesn’t belong there. | Credit: Boston Dynamics
“When the object is in view of the cameras, Atlas uses an object pose estimation model that uses a render-and-compare approach to estimate pose from monocular images,” Boston Dynamics wrote in a blog post. “The model is trained with large-scale synthetic data and generalizes zero-shot to novel objects given a CAD model. When initialized with a 3D pose prior, the model iteratively refines it to minimize the discrepancy between the rendered CAD model and the captured camera image.”
“Alternatively, the pose estimator can be initialized from a 2D region-of-interest prior (such as an object mask),” said the company. “Atlas then generates a batch of pose hypotheses that are fed to a scoring model, and the best fit hypothesis is subsequently refined. Atlas’s pose estimator works reliably on hundreds of factory assets which we have previously modeled and textured in-house.”
I see, therefore I am
Robot vision guidance has been viable since the 1990s. At that time, robots could track items on moving conveyors and adjust local frames of reference for a circuit board assembly based on fiducials. Nothing is surprising or novel about this state of the art for vision guidance.
What’s unique now for humanoids is the mobility of the robot. Any mobile manipulator must consistently update its world map. Modern robot guidance uses vision language models (VLMs) to understand the world through the camera.
Older industrial robots were fixed to the ground and used 2D vision and complex calibration routines to map the field of view of the camera. What Boston Dynamics has demonstrated is a mobile, humanoid robot understanding its surroundings and continuing its task even as the environment changes around Atlas. Modern robots are gaining a 3D understanding of the world around them.
Boston Dynamics acknowledged that this demo is a mix of AI-based functions such as perception and some procedural programming for managing the mission. The video shows the progress in robotic software evolution. For these systems to work in the real world, they must handle both subtle and macro changes to their operating environments.
Atlas makes its way through the world
While Atlas’ moves seem odd at times in the video, it does illustrate how artificial intelligence perceives the world and the choices that it makes to move through it. We get to witness only a small slice of this decision-making in the video.
Boston Dynamics has previously posted videos showing motion-capture (mocap)-based behaviors. They demonstrated the agility of the system and what it can do with smooth input.
The jerkiness of this latest video, under AI decision making and control, is a long way from the “uncanny valley” with mocap demonstrations. Aaron Saunders, chief technology officer of Boston Dynamics, explained the company’s development work in a keynote at the 2025 Robotics Summit and Expo in Boston.
There remains a lot of real-time processing for Atlas to comprehend its world. In the video, we see the robot stopping to process the environment, before it makes a decision and continues.
I’m confident this is only going to get faster over time as the code evolves and the AI models become better at comprehension. I think that’s where the race is now: developing the AI-based software that allows these robots to adapt, understand their environments, and continuously learn from multimodal data.
Editor’s note: This article was updated with a quote from Scott Kuindersma, senior director of robotics research at Boston Dynamics.
Now accepting session submissions!
Tell Us What You Think!