Encompassing Embodied AI, Video Generation, Humanoid Robotics, and the associated companies
Research
Embodied AI
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
A great survey paper on embodied artificial intelligence.
Core Topics
- Visual SLAM
- GS SLAM, OVD-SLAM, DynaSLAM
- 3D Scene Understanding
- Projection, voxel, point-based
- Active Exploration
- 3D Visual Grounding
- VLN (Visual Language Navigation)
- R2R, giant simulator chart
Simulators
- iGibson
- TDW
- SAPIEN
- Habitat
- IA2-THOR
- Virtual Home
- Isaac Sim
- Gazebo
- PyBullet?
Environments
- ALFRED
- BEHAVIOR 1K
- ManiSkill3
- VLN-CE (Habitat)
- Re-arrangement (Habitat)
- Find / Co-operation / ROBO-Thor
- Proc-Thor / Manipular-Thor / iThor
- Isaac Lab
VLA Models
Octo - May 2024
Paper
- Octo-Base: 13 it/sec, 93M Params
- Octo-Small: 17 it/sec, 27M Params
- Outperforms RT-1-X. Comparable to RT-2-X (55B).
- Supports language + goal image conditioning.
- On WidowX tasks, 25% higher avg performance with goal image conditioning.
- Finetuning Octo > training from scratch or with VC-1.
Pi0 - Physical Intelligence
paper + blog
- VLM – 3B Params
Complex tasks: multi-stage, deformable object manipulation, strategy variance.
Reading through the 1X blog, they were saying the reason they use real world data is because simulators can’t handle deformable objects well.
RT-2-X (Embodied-X) – June 2024
- 55B Params
- Performs unseen tasks in lab settings.
OpenVLA
- 7B param, discretized actions
Bi-Manual Diffusion Policy
Robotic Datasets
-
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot – 2TB, 110k contact-rich sequences
-
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation – 55k demos, 279 tasks
-
DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset – 76k demos, 564 scenes
- ARIO: All Robots in One – A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents – Docs, Code
- Embodied X Dataset
Robotics Survey
Humanoid Training
World Models
Video Generation
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
-
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models
Companies
Embodied AI
- Figure AI
- Tesla Optimus
- Unitree
- Skild AI
- Physical Intelligence
- Agility Robotics
- Santuary AI
Video Generation
- Open AI
- Stabilty AI
- Runway
- Meta
- PikaLabs
- Hotshot AI