The Frontier Of AI

Encompassing Embodied AI, Video Generation, Humanoid Robotics, and the associated companies

Research

Embodied AI

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
A great survey paper on embodied artificial intelligence.

Core Topics

  • Visual SLAM
  • GS SLAM, OVD-SLAM, DynaSLAM
  • 3D Scene Understanding
  • Projection, voxel, point-based
  • Active Exploration
  • 3D Visual Grounding
  • VLN (Visual Language Navigation)
  • R2R, giant simulator chart

Simulators

  • iGibson
  • TDW
  • SAPIEN
  • Habitat
  • IA2-THOR
  • Virtual Home
  • Isaac Sim
  • Gazebo
  • PyBullet?

Environments

  • ALFRED
  • BEHAVIOR 1K
  • ManiSkill3
  • VLN-CE (Habitat)
  • Re-arrangement (Habitat)
  • Find / Co-operation / ROBO-Thor
  • Proc-Thor / Manipular-Thor / iThor
  • Isaac Lab

VLA Models

Octo - May 2024
Paper

  • Octo-Base: 13 it/sec, 93M Params
  • Octo-Small: 17 it/sec, 27M Params
  • Outperforms RT-1-X. Comparable to RT-2-X (55B).
  • Supports language + goal image conditioning.
  • On WidowX tasks, 25% higher avg performance with goal image conditioning.
  • Finetuning Octo > training from scratch or with VC-1.

Pi0 - Physical Intelligence
paper + blog

  • VLM – 3B Params

Complex tasks: multi-stage, deformable object manipulation, strategy variance.
Reading through the 1X blog, they were saying the reason they use real world data is because simulators can’t handle deformable objects well.

RT-2-X (Embodied-X) – June 2024

  • 55B Params
  • Performs unseen tasks in lab settings.

OpenVLA

  • 7B param, discretized actions

Bi-Manual Diffusion Policy

Robotic Datasets

Robotics Survey

Humanoid Training

World Models

Video Generation

Companies

Embodied AI

  • Figure AI
  • Tesla Optimus
  • Unitree
  • Skild AI
  • Physical Intelligence
  • Agility Robotics
  • Santuary AI

Video Generation

  • Open AI
  • Stabilty AI
  • Runway
  • Meta
  • PikaLabs
  • Hotshot AI