Lavabo: The Kitchen Sink of Local AI
An all-in-one Docker container that bundles LLMs, embeddings, vision, and TTS into a single unified inference server.
Every local AI project I build needs some combination of the same pieces: an LLM for text generation, an embedding model for search, maybe TTS for audio output, maybe a vision model for image understanding. Each one has its own server, its own dependencies, its own Docker container, its own port. Multiply that across a few projects and you’re managing a small fleet of containers just to have a baseline set of AI capabilities available.
Lavabo is the answer to that sprawl. It’s a single Docker container that bundles all of these capabilities behind unified HTTP endpoints. The name is Spanish for washbasin — a nod to the “kitchen sink” approach of just putting everything in one place.
What’s Inside
The container serves a surprisingly broad set of models through a single FastAPI server:
- Any GGUF LLM for chat, completion, and structured output — point it at a local model file or a Hugging Face repo
- Transformers-based text embeddings with built-in similarity search helpers
- Kokoro TTS for quick, compact text-to-speech with multiple built-in voices
- Piper TTS for a wider selection of voices and accents, with support for training custom voices
- CLIP for zero-shot image classification using natural language prompts
- Moondream for heavier vision tasks — image captioning, visual Q&A, object detection, and pointing
Everything is exposed through a Swagger UI at /docs, so you can explore and test endpoints without writing any client code.
Why One Container
The obvious argument against this approach is that monoliths are bad. And sure, in production you’d probably want separate services with independent scaling. But for local development, having a single docker compose up that gives you an LLM, embeddings, TTS, and vision is a massive reduction in friction. I can prototype an idea that needs three different model types without spending the first hour on infrastructure.
It also makes the models composable in ways that separate containers don’t easily allow. Chain an LLM response into TTS. Use CLIP to classify an image, then pass the label to the LLM for elaboration. The examples directory has a few of these patterns ready to go.
The Tradeoff
Lavabo is intentionally built around compact models. These aren’t the biggest or best models available — they’re the ones that actually fit in a single container and run on consumer hardware. The philosophy is accessibility over peak performance. If you need frontier model quality, you’ll outgrow this. If you need a capable local inference stack that just works, this is it.