Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
-
Updated
Jan 26, 2026 - Python
Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
Visualize episode embeddings and select maximally diverse training subsets for robotics ML. Train on 10K diverse episodes instead of 50K random ones.
Imitation Learning for Surgical Robot Task Automation — Behavioral Cloning, DAgger, Diffusion Policy, and VLA models on JIGSAWS surgical demonstrations
Add a description, image, and links to the vla-model topic page so that developers can more easily learn about it.
To associate your repository with the vla-model topic, visit your repo's landing page and select "manage topics."