The Frontier Of AI

by Jacob Claessens

Apr 06, 2025

Encompassing Embodied AI, Video Generation, Humanoid Robotics, and the associated companies

Research

Embodied AI

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
A great survey paper on embodied artificial intelligence.

Core Topics

Visual SLAM
GS SLAM, OVD-SLAM, DynaSLAM
3D Scene Understanding
Projection, voxel, point-based
Active Exploration
3D Visual Grounding
VLN (Visual Language Navigation)
R2R, giant simulator chart

Simulators

iGibson
TDW
SAPIEN
Habitat
IA2-THOR
Virtual Home
Isaac Sim
Gazebo
PyBullet?

Environments

ALFRED
BEHAVIOR 1K
ManiSkill3
VLN-CE (Habitat)
Re-arrangement (Habitat)
Find / Co-operation / ROBO-Thor
Proc-Thor / Manipular-Thor / iThor
Isaac Lab

VLA Models

Octo - May 2024
Paper

Octo-Base: 13 it/sec, 93M Params
Octo-Small: 17 it/sec, 27M Params
Outperforms RT-1-X. Comparable to RT-2-X (55B).
Supports language + goal image conditioning.
On WidowX tasks, 25% higher avg performance with goal image conditioning.
Finetuning Octo > training from scratch or with VC-1.

Pi0 - Physical Intelligence
paper + blog

VLM – 3B Params

Complex tasks: multi-stage, deformable object manipulation, strategy variance.
Reading through the 1X blog, they were saying the reason they use real world data is because simulators can’t handle deformable objects well.

RT-2-X (Embodied-X) – June 2024

55B Params
Performs unseen tasks in lab settings.

OpenVLA

7B param, discretized actions

Bi-Manual Diffusion Policy

Video Generation

Companies

Embodied AI

Figure AI
Tesla Optimus
Unitree
Skild AI
Physical Intelligence
Agility Robotics
Santuary AI

Video Generation

Open AI
Stabilty AI
Runway
Meta
PikaLabs
Hotshot AI

The Frontier Of AI

Research

Embodied AI

VLA Models

Robotic Datasets

Robotics Survey

Humanoid Training

World Models

Video Generation

Companies

Embodied AI

Video Generation

Historic Break Through AI Papers