Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
-
Updated
Feb 9, 2026 - Go
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
A lightweight chat terminal-interface for llama.cpp server written in C++ with many features and windows/linux support.
Local LLM proxy, DevOps friendly
A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml
A production-grade Python SDK for llama-server that streamlines authentication, token rotation, observability, and PII masking—helping AI architects ship secure, traceable LLM systems with enterprise-ready guardrails.
A simple web application for real-time AI vision analysis using SmolVLM-500M-Instruct with live camera feed processing and text-to-speech.
This is a Bash script to automatically launch llama-server, detects available .gguf models, and selects GPU layers based on your free VRAM.
Create a code completion model & tool for IDEs that can run locally on consumer hardware and rival the performance of commercial products like Cursor.
Hikma is a minimal GTK4 chat client in Vala for OpenAI‑compatible APIs. It renders messages as plain text, stores settings securely via libsecret, and builds with Meson/Ninja plus a simple Debian packaging flow.
FIMpad is a FIM-focused local LLM interface in the form of a tabbed GUI text editor.
Add a description, image, and links to the llama-server topic page so that developers can more easily learn about it.
To associate your repository with the llama-server topic, visit your repo's landing page and select "manage topics."