/// RobotFlowLabs

Foundation Models
for Real Robots

We optimize vision, language, and action models for real-time edge deployment. Every model here is compressed, benchmarked, and ready for production robotics.

16+

Models

58

ANIMA Modules

4.5x

Max Compression

ROS2

Native

Mission

The robotics community deserves production-ready models, not just research checkpoints. We take the best open foundation models — CLIP, SAM2, DINOv2, Qwen, Depth Anything — and make them actually deployable on the hardware robots use: Jetson Orin, industrial PCs, edge GPUs.

Every model is quantized (INT4/INT8), exported (ONNX/SafeTensors/TorchScript), and benchmarked on real hardware. No guesswork, no "should work in theory" — measured performance on real silicon.

Model Collections

Organized by capability for the ANIMA robotics stack.

ANIMA Vision

Segmentation, features, depth estimation, and visual grounding for robotic scene understanding.

SAM2 · DINOv2 · CLIP · SigLIP · Depth Anything

ANIMA Language

INT4 quantized language models for instruction following, planning, and robotic reasoning.

Qwen2.5-7B · SmolLM2-1.7B

ANIMA VLM

Vision-language models for visual QA, scene description, and grounding language to observations.

Qwen2.5-VL-7B · Qwen2.5-VL-3B

ANIMA VLA

Vision-Language-Action models for end-to-end robotic control and manipulation.

SmolVLA · RDT2-FM · FORGE Students

FORGE — Compression Pipeline

Our 4-stage pipeline takes any 7B+ VLA model down to <2GB for real-time edge deployment.

Teacher Labels

5 teachers, multi-GPU

▶

Distillation

KD + curriculum

▶

Compression

Prune + quantize

▶

Export

ONNX / TRT / MLX

Automated hyperparameter optimization via Optuna · 400+ trials across 4 GPUs · W&B experiment tracking