Small-Data VLM Backbone Benchmarking for Robotics

Systematic benchmarking of vision backbones under small-data constraints for real-world robotics deployment

Overview

This project addresses a critical gap in vision model research: how do different architectures perform when trained with limited data?

Most academic benchmarks assume:

Millions of training samples
Expensive compute clusters
Large batch sizes (512+)

Real robotics teams face:

<300k training samples
1-4 GPUs or edge devices
Tight latency, memory, and power constraints

This framework provides standardized benchmarking to answer: Which backbone should a robotics developer use given their constraints?

Quick Start

# Clone and setup
git clone https://github.com/cronenberg64/VLM-arch.git
cd VLM-arch
pip install -r requirements.txt

# Benchmark all models
python scripts/run_batch.py --mode benchmark

# Train with 5k samples (small-data regime)
python scripts/train.py model=convnext_v2_tiny dataset=cifar10 dataset.subset=5000

# Analyze results
jupyter lab notebooks/analysis.ipynb

Read the Complete Usage Guide for detailed instructions.

What Gets Benchmarked

Architectures (7 models)

Type	Models
CNNs	ConvNeXt-V2 Tiny, EfficientNetV2-S, MobileNetV3, ResNet50
Transformers	ViT-Base, DeiT-Small, Swin-Tiny
Hybrid	Coming soon: CoAtNet, MetaFormer

Metrics

Model Complexity

Parameters (M)
FLOPs (G)
Peak Memory (MB)

Training Behavior

Convergence speed
Sample efficiency (5k, 10k, 50k, full)
Batch size sensitivity

Deployment

Inference latency (ms)
Throughput (img/s)
CPU/GPU performance

Project Structure

VLM-arch/
├── src/
│   ├── data/          # Dataset loading & subsampling
│   ├── models/        # Model factory (timm integration)
│   ├── engine/        # Training loop (AMP, grad accumulation)
│   └── benchmark/     # Profiling tools
├── scripts/
│   ├── train.py       # Training entry point
│   ├── benchmark.py   # Benchmarking entry point
│   └── run_batch.py   # Batch processing utility
├── configs/           # Hydra configurations
├── notebooks/         # Analysis & visualization
└── results/           # Benchmark outputs

Key Features

Automatic subsampling with stratified sampling (maintains class balance)
Unified interface for 7+ architectures via timm
Mixed precision training (AMP) and gradient accumulation
Comprehensive profiling: FLOPs, latency, memory, throughput
WandB integration for experiment tracking
Batch processing for running multiple experiments
Jupyter notebooks for analysis and visualization

Example Results

After running python scripts/run_batch.py --mode benchmark:

                Model  Params (M)  FLOPs (G)  Latency (ms)  Throughput (img/s)
mobilenetv3_large_100    4.21       0.007        9.89           1133.44
  tf_efficientnetv2_s   20.19       0.062       26.60            531.24
             resnet50   23.53       0.084       11.25            628.88
      convnextv2_tiny   27.87       0.091       22.51            328.17

Your results will vary based on hardware

Research Workflow

Benchmark: Profile all models to understand complexity trade-offs
Train: Run experiments with different data sizes (5k → 50k → full)
Analyze: Generate plots comparing accuracy vs. data size
Deploy: Select best model for your robotics hardware constraints

Documentation

Complete Usage Guide - Detailed instructions, code explanations, advanced usage
Implementation Plan - Technical design decisions
Task List - Development progress

Use Cases

This framework is ideal for:

Robotics researchers evaluating vision backbones for edge deployment
ML engineers comparing architectures under data constraints
Students learning about model efficiency and benchmarking
Companies selecting models for production systems

Advanced Usage

Custom Dataset Sizes

# Compare performance across data regimes
for subset in 5000 10000 50000; do
    python scripts/train.py model=vit_base dataset=cifar10 dataset.subset=$subset
done

Hyperparameter Tuning

python scripts/train.py model=convnext_v2_tiny \
    training.lr=0.001 \
    training.batch_size=64 \
    training.epochs=200

Add Your Own Model

Create configs/model/my_model.yaml
Run: python scripts/benchmark.py model=my_model

See USAGE_GUIDE.md for more details.

Contributing

Contributions welcome! Ideas:

Add new architectures (RepVGG, PVT, CoAtNet, etc.)
Extend to new datasets (ImageNet, custom robotics data)
Add deployment benchmarks (Jetson, RaspberryPi, Intel Movidius)
Improve profiling tools (energy consumption, etc.)

License

MIT License - See LICENSE for details

Citation

If you use this framework in your research, please cite:

@misc{vlm-arch-benchmark,
  title={Small-Data VLM Backbone Benchmarking for Real-World Robotics},
  author={cronenberg64},
  year={2025},
  url={https://github.com/cronenberg64/VLM-arch}
}

Acknowledgments

Built with PyTorch, timm, and Hydra
Inspired by real-world robotics deployment challenges

Ready to benchmark? → Start with the Usage Guide

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Small-Data VLM Backbone Benchmarking for Robotics

Overview

Quick Start

What Gets Benchmarked

Architectures (7 models)

Metrics

Project Structure

Key Features

Example Results

Research Workflow

Documentation

Use Cases

Advanced Usage

Custom Dataset Sizes

Hyperparameter Tuning

Add Your Own Model

Contributing

License

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
notebooks		notebooks
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
RESULTS.md		RESULTS.md
USAGE_GUIDE.md		USAGE_GUIDE.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Small-Data VLM Backbone Benchmarking for Robotics

Overview

Quick Start

What Gets Benchmarked

Architectures (7 models)

Metrics

Project Structure

Key Features

Example Results

Research Workflow

Documentation

Use Cases

Advanced Usage

Custom Dataset Sizes

Hyperparameter Tuning

Add Your Own Model

Contributing

License

Citation

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages