RL agent that learns to play Doom, trained with PPO, exported to INT8 TFLite, and run at inference time inside a bare-metal doomgeneric port on RISC-V.
The full pipeline: VizDoom environment → Stable-Baselines3 PPO training → ONNX → TensorFlow Lite INT8 → static C codegen → bare-metal RISC-V ELF (or x86 host with SDL2).
Training (Python) Export Codegen Bare-metal
┌──────────┐ ┌─────────────────┐ ┌──────────────┐ ┌──────────────────┐
│ VizDoom │ │ PyTorch → ONNX │ │ TFLite → C │ │ doomgeneric │
│ + SB3 │───▶│ → TF → TFLite │───▶│ weights + │───▶│ + agent inference│
│ PPO │ │ INT8 quant │ │ graph code │ │ on RV32 / x86 │
└──────────┘ └─────────────────┘ └──────────────┘ └──────────────────┘
# 1. Setup
git config core.hooksPath .githooks
uv sync
git submodule update --init --recursive
scripts/download_wad.sh
# 2. Train (curriculum: basic → corridor → arena → E1M1)
scripts/train_curriculum.sh --model v2
# 3. Build static binary (x86 with SDL2 display)
scripts/build_static.sh doom_agent_v2_ppo.zip x86
# 4. Run
./build-x86/doom_agent_static_host --model ignored -iwad wads/DOOM1.WADOr for bare-metal RISC-V under QEMU:
scripts/build_static.sh doom_agent_v2_ppo.zip rv32
qemu-system-riscv32 -machine virt \
-cpu rv32,v=true,vlen=128,zve32f=true -m 128M \
-nographic -bios none \
-semihosting-config enable=on,target=native \
-kernel build-rv32/doom_agent_rv32.elfTwo model variants, both using depthwise-separable convolutions for compute efficiency on constrained hardware:
Baseline — 3 conv blocks, flatten, single dense layer:
Visual (4, 45, 60) NCHW
→ DWSepConv(4→16, s2) → DWSepConv(16→32, s2) → DWSepConv(32→32, s2)
→ Flatten (1536) → concat with state (20) → Dense(256, ReLU)
→ 256-dim features → policy_net → sigmoid → 6 actions
V2 — 6 conv blocks, Global Average Pooling, two dense layers:
Visual (4, 60, 80) NCHW
→ DWSepConv(4→32, s2) → DWSepConv(32→64, s2) → DWSepConv(64→64, s1)
→ DWSepConv(64→128, s2) → DWSepConv(128→128, s1) → DWSepConv(128→192, s2)
→ GAP → (192,) → concat with state (20) = 212
→ Dense(256, ReLU) → Dense(128, ReLU)
→ 128-dim features → policy_net → sigmoid → 6 actions
Each DWSepConv block: depthwise 3×3 → pointwise 1×1
→ BatchNorm → ReLU. GAP replaces the flatten layer, reducing
the feature vector from 1536 to 192 and eliminating the TRANSPOSE →
RESHAPE sequence in the exported graph.
| Baseline | V2 | |
|---|---|---|
| Parameters | 401K | 149K |
| MACs per inference | 0.67M | 5.0M |
| INT8 TFLite size | ~390 KB | 202 KB |
| Scratch buffer | ~14 KB | 53 KB |
| Visual input | 45×60×4 | 60×80×4 |
| Feature dim | 256 | 128 |
| Inference (QEMU rv32) | — | ~1030 ms |
V2 trades 7.5× more MACs for 2.7× fewer parameters (GAP eliminates the large flatten→FC weight matrix). The smaller weight footprint matters more than MAC count on memory-constrained targets.
| Input | Shape | Description |
|---|---|---|
visual |
(4, H, W) float |
Grayscale frame stack, channels-first |
state |
(20,) float |
health, armor, ammo[4], weapon_onehot[9], velocity_xy[2], reserved[3] |
6 multi-binary actions with contradiction masking (forward+backward and left+right cannot be simultaneously active):
| Bit | Action |
|---|---|
| 0 | Forward |
| 1 | Backward |
| 2 | Turn left |
| 3 | Turn right |
| 4 | Fire |
| 5 | Use |
Action repeat: 4 tics per agent decision (~8.6 decisions/sec at 35 tic/sec).
Training uses Stable-Baselines3
PPO with a VizDoom environment wrapper (training/env.py).
# Single scenario
uv run python -m training.train --scenario basic --total-timesteps 500000
# Full curriculum (recommended)
scripts/train_curriculum.sh --model v2The curriculum progressively increases difficulty across 4 phases (2.5M total steps):
| Phase | Scenario | Steps | Skill |
|---|---|---|---|
| 1 | basic.cfg |
500K | Single room, 1 enemy — learn to aim and shoot |
| 2 | deadly_corridor.cfg |
1M | Hallway with enemies — move, shoot, dodge |
| 3 | defend_the_center.cfg |
1M | 360° arena — spatial awareness |
| 4 | e1m1_agent.cfg |
2.5M | Full E1M1 — navigation + combat |
Each phase resumes from the previous checkpoint. Monitor with
TensorBoard: tensorboard --logdir tb_doom/.
| Parameter | Value |
|---|---|
| Learning rate | 3×10-4 |
| n_steps | 2048 |
| Batch size | 64 |
| Epochs | 10 |
| Gamma | 0.99 |
| GAE lambda | 0.95 |
| Clip range | 0.2 |
| Entropy coef | 0.01 |
| Parallel envs | 8 |
Dense reward signals layered on top of the base game reward:
| Signal | Weight | Source |
|---|---|---|
| Kill | ×50 | KILLCOUNT delta |
| Health | ×0.5 | HEALTH delta |
| Ammo | ×0.2 | Total ammo delta |
| Movement | ×0.01 | XY position delta |
| Time penalty | −0.001 | Per step |
Converts a trained SB3 checkpoint to a fully-quantized INT8 TFLite model:
scripts/export.sh doom_agent_v2_ppo.zip --output-dir models/v2Pipeline steps:
- Extract inference policy — Strip value head, wrap feature extractor + policy net + sigmoid
- PyTorch → ONNX —
torch.onnx.export(opset 18) - Preprocess ONNX — Rewrite Conv pads to
auto_pad='SAME_UPPER'for TFLite compatibility - ONNX → TensorFlow —
onnx2tfwith NCHW→NHWC auto-transpose - Collect calibration data — 200 representative samples from VizDoom
- INT8 quantization — TFLite converter with full-integer quantization
- Verify — Compare INT8 vs FP32 outputs across 50 random inputs
Outputs: doom_agent.onnx, doom_agent_fp32.tflite,
doom_agent_fp16.tflite, doom_agent_int8.tflite.
Note: Export requires Python 3.11–3.13 (TensorFlow doesn't support 3.14). The export script auto-creates a separate venv if needed.
The static code generator (tools/codegen_graph.py) compiles a TFLite
INT8 model into plain C:
uv run python tools/codegen_graph.py \
--model models/v2/doom_agent_int8.tflite \
--output-dir inference/generatedThis produces:
| File | Contents |
|---|---|
doom_agent_weights.c/h |
Const weight arrays with .cnidoom.weights section attribute |
doom_agent_graph.c/h |
run_graph() — sequential kernel calls with liveness-optimized scratch buffer |
Key optimizations:
- TRANSPOSE elimination — Folds TRANSPOSE+RESHAPE into FC weight permutation
- Liveness-based scratch allocation — Greedy packing minimizes peak scratch (~53 KB vs 64 KB TFLM arena)
- Section attributes —
.cnidoom.weightsand.cnidoom.scratchfor linker-script-based memory placement - LUT generation — Pre-computed 256-entry tanh/logistic lookup tables
The inference code builds with CMake. Key options:
cmake -B build -S inference \
-DDOOM_AGENT_STATIC=ON \ # Enable codegen backend
-DDOOM_AGENT_KERNEL_TARGET=riscv \ # Kernel target: generic, x86, riscv
-DDOOM_AGENT_RV32=ON \ # Bare-metal RISC-V target
-DDOOM_AGENT_EMBEDDED_LIB=ON \ # Build libcnidoom.a
-DDOOM_AGENT_HOST=ON # SDL2 host binary (x86 only)| Target | Binary | Description |
|---|---|---|
doom_agent_host |
x86 | SDL2 display, TFLM from file |
doom_agent_static_host |
x86 | SDL2 display, static codegen (no TFLM) |
doom_agent_rv32.elf |
rv32 | Bare-metal RISC-V, QEMU virt + ramfb |
libcnidoom.a |
rv32 | Embedded library for firmware integration |
Platform-optimized implementations override generic C reference kernels:
| Target | ISA | Kernels |
|---|---|---|
generic |
Pure C | Reference implementations for all ops |
x86 |
AVX2 | conv2d, depthwise_conv2d, fully_connected, mean |
riscv |
RVV (Zve32x/f) | conv2d, depthwise_conv2d, fully_connected, mean, logistic_lut, tanh_lut |
scripts/build_static.sh runs the entire pipeline from checkpoint to
binary:
# x86 (AVX2 host, runs natively with SDL2)
scripts/build_static.sh doom_agent_v2_ppo.zip x86
# rv32 (bare-metal ELF, runs under QEMU)
scripts/build_static.sh doom_agent_v2_ppo.zip rv32Steps: export → codegen → cmake configure → build → bit-accuracy test → SDK export (rv32).
For integrating Doom + agent inference into your own firmware. The library provides weak-symbol fallbacks for all platform-specific functions; override what you need with strong symbols.
#include "cnidoom.h"
// Run with default QEMU virt config:
int main(void) { cnidoom_run(NULL); }
// Or configure:
cnidoom_config_t cfg = cnidoom_default_config();
cfg.wad_path = "DOOM1.WAD";
cfg.clint_mtime_base = 0x200BFF8;
cnidoom_run(&cfg);Override any of these with strong symbols in your platform .c file:
| Callback | Default | Purpose |
|---|---|---|
cnidoom_platform_init() |
no-op | One-time hardware init |
cnidoom_putc(char c) |
semihosting SYS_WRITEC |
Console output (for printf) |
cnidoom_draw(fb, w, h) |
no-op | Display XRGB8888 framebuffer |
cnidoom_get_ticks_ms() |
CLINT mtime or semihosting | Monotonic millisecond clock |
cnidoom_sleep_ms(ms) |
busy-wait on get_ticks_ms |
Sleep/delay |
The library annotates performance-critical data with named sections for memory placement on real hardware:
| Section | Contents | Typical placement | Size (V2) |
|---|---|---|---|
.cnidoom.weights |
Model weights (const) | SRAM / flash | ~200 KB |
.cnidoom.scratch |
Activation scratch (r/w) | fast SRAM / DTCM | ~53 KB |
.cnidoom.wad |
Embedded WAD (const) | DDR / ext flash | ~4 MB |
If your linker script doesn't mention these sections, they fall into
default parents (.rodata, .bss) and still work.
The rv32 build automatically produces a self-contained SDK in
build-rv32/cnidoom-sdk/:
cnidoom-sdk/
include/cnidoom.h Public API (the only header you need)
lib/libcnidoom.a Merged static library (2 MB)
lib/embedded_wad.o DOOM1.WAD as linkable object (optional)
cnidoom.mk Makefile fragment
BUILD_FLAGS Compiler flags reference
examples/ Working QEMU virt platform
Makefile Builds firmware.elf from the SDK
qemu_main.c Entry point
qemu_platform.c UART + ramfb overrides
startup.S, linker.ld Boot code + memory map
uart.c, ramfb.c/h QEMU virt drivers
Build from the SDK with a plain Makefile:
CNIDOOM_SDK := path/to/cnidoom-sdk
include $(CNIDOOM_SDK)/cnidoom.mk
firmware.elf: startup.o main.o my_platform.o
$(CC) -nostartfiles -Wl,--gc-sections -Tlinker.ld $^ $(CNIDOOM_LDFLAGS) -o $@# Python tests
uv run pytest
# C tests (host)
cmake -B build -S inference -DDOOM_AGENT_BUILD_TESTS=ON
cmake --build build && cd build && ctest
# AVX2 vs generic bit-accuracy (x86)
cmake -B build-x86 -S inference \
-DDOOM_AGENT_STATIC=ON -DDOOM_AGENT_KERNEL_TARGET=x86
cmake --build build-x86 && ./build-x86/test_x86_bitexact
# RVV vs generic bit-accuracy (QEMU)
cmake -B build-rv32 -S inference \
-DCMAKE_TOOLCHAIN_FILE=cmake/riscv32-elf-clang.cmake \
-DDOOM_AGENT_RV32=ON -DDOOM_AGENT_STATIC=ON \
-DDOOM_AGENT_KERNEL_TARGET=riscv
cmake --build build-rv32
qemu-system-riscv32 -machine virt \
-cpu rv32,v=true,vlen=128,zve32f=true -m 128M \
-nographic -bios none \
-semihosting-config enable=on,target=native \
-kernel build-rv32/test_rvv_bitexact.elf
# Model comparison (golden logs)
scripts/compare_models.shtraining/ Python — PPO training, evaluation, export
train.py Main training script
model.py DoomFeatureExtractor (baseline + V2)
env.py DoomHybridEnv (VizDoom wrapper)
export.py ONNX → TF → TFLite INT8 pipeline
calibrate.py Collect quantization calibration data
scenarios/ VizDoom config files
inference/ C/C++ — doomgeneric + agent inference
CMakeLists.txt Build system
doom_agent.h/c Core agent API (init, infer, destroy)
doom_agent_preprocess.c Frame downsampling + quantization
doom_agent_static.c Static codegen backend
doom_agent_tflm.cc TFLM interpreter backend
doom_agent_host.cc Host (file-based) backend
cnidoom.h/c Embedded library public API
cnidoom_platform_default.c Weak platform fallbacks
cnidoom_syscalls.c Portable libc stubs (semihosting)
w_file_cnidoom.c Unified WAD backend (embedded + stdc)
kernels/
generic/ Reference C kernels
x86/ AVX2-optimized kernels
riscv/ RVV-optimized kernels
platform/rv32/ QEMU virt bare-metal platform
startup.S Reset handler + vector unit init
linker.ld Memory map (ITCM/DTCM/SRAM/DDR)
qemu_main.c Entry point for library build
qemu_platform.c UART + ramfb strong overrides
uart.c, ramfb.c NS16550a UART, QEMU ramfb driver
syscalls.c Original monolithic syscalls
generated/ Auto-generated by codegen (gitignored)
tools/
codegen_graph.py TFLite INT8 → static C inference code
scripts/
download_wad.sh Fetch shareware DOOM1.WAD
train_curriculum.sh Phased curriculum training
export.sh Checkpoint → INT8 TFLite
build_static.sh Full pipeline + SDK export
compare_models.sh Golden log comparison across models
doomgeneric/ Git submodule (do not modify)
tflite-micro/ Git submodule (do not modify)
patches/ Submodule patches (applied at build time)
- Python ≥ 3.11 (training + export)
- uv package manager
- VizDoom (installed via
uv sync) - CMake ≥ 3.16 (inference build)
- RISC-V toolchain — auto-fetched by
cmake/riscv32-elf-clang.cmake(GCC 15, newlib, RV32IMF_Zve32x_Zve32f) - QEMU —
qemu-system-riscv32for bare-metal testing - SDL2 — for x86 host display (optional)
| Decision | Rationale |
|---|---|
| Depthwise-separable convolutions | 8× fewer multiply-accumulates than standard conv |
| INT8 full-integer quantization | No float ops at inference — runs on integer-only HW |
| NCHW training, NHWC export | PyTorch-native training, TFLite-native inference |
| Static codegen over TFLM interpreter | 53 KB scratch vs 64 KB arena; no C++ runtime |
| Weak-symbol platform API | One library binary, any board — just override what you need |
| Curriculum training | Gradual difficulty prevents catastrophic forgetting |
| Embedded WAD via objcopy | Zero-copy memory-mapped access; optional semihosting fallback |
| Target: RV32IMF_Zve32x_Zve32f | Integer + single-float + vector — Google Kelvin-class RISC-V |
doomgeneric is licensed under GPLv2. Shareware DOOM1.WAD is © id Software.