Every AI agent framework I’ve looked at wants to be the whole stack. They bundle the LLM integration, the tool execution, the memory, the UI — all in one monolith with opinions I don’t share. I wanted something I could wire together myself, where each piece is a standalone service that does one thing and exposes an HTTP API. That project became Oneiros.

The Architecture

Oneiros is a collection of single-purpose FastAPI and Node.js servers that compose into a full agent system:

  • Agent server — the core runtime, built on OpenAI’s Agents SDK. It handles conversation threads, fetches relevant tools per request, retrieves user memories, and talks to whatever OpenAI-compatible LLM you point it at.
  • Tool server — loads MCP tools from a Cursor-style mcp_settings.json and exposes them as HTTP endpoints. On startup it embeds every tool’s description, so at request time it can return only the tools relevant to the current conversation. This means the agent doesn’t get overwhelmed with fifty tools when it only needs two.
  • Memory server — classifies and stores user memories as semantic, episodic, or procedural. An LLM extracts memories from conversations with confidence scoring, and semantic deduplication prevents redundant entries. Memories are stored in both SQLite and human-readable markdown files.
  • Embedding server — handles all vector operations. Text embedding generation with sqlite-vec storage and similarity search, used by both the tool server and memory server.
  • Browser server — Playwright-based browser automation with AI-optimized accessibility tree snapshots. The key innovation is persistent element refs — the agent sees button "Submit" [ref=e1] and can reference e1 in subsequent actions without fragile CSS selectors.
  • TTS server — OpenAI-compatible text-to-speech endpoint backed by PocketTTS, with PCM streaming support.
  • Generator server — the wildest piece. It uses OpenCode CLI in an iterative loop to generate single-file HTML5 apps from natural language prompts, then validates them in a real browser using vision-based testing. If validation fails, the errors get fed back and it tries again.

The Interfaces

On top of the service layer, there are multiple ways to talk to the agent:

  • A web chat UI with streaming responses, voice input via Whisper, TTS playback, model selection, and conversation history
  • A Discord bot that joins voice channels, does real-time STT, forwards transcripts to the agent, and speaks responses back via TTS
  • A Telegram bot with text and voice note support
  • A WhatsApp bot using the Baileys library

All three bots share a cross-platform identity system — a single identities.json maps platform-specific IDs to canonical usernames, so the agent recognizes me whether I’m on Discord, Telegram, or WhatsApp. Conversation threads and memories follow me across platforms.

Context-Aware Tool Selection

The part I’m most pleased with is how tool selection works. The tool server doesn’t just dump every available tool into the agent’s context. It embeds all tool descriptions at startup, then at request time the agent server sends the recent conversation history and gets back only the most semantically relevant tools, filtered by a tunable score threshold with a minimum-return guarantee so edge cases still work.

This means I can have dozens of MCP servers configured — web search, email, calendar, file operations, browser control, custom scripts — and the agent only sees what’s relevant to what we’re actually talking about.

The Generator

The generator server deserves its own mention. It uses the Ralph Wiggum Loop — a stateless iterative pattern where each iteration, an LLM reads the current file system state, makes changes, and either declares the task complete or keeps going. Browser validation catches runtime errors that pure code generation misses, and vision-based testing can identify visual issues and interact with the generated UI.

There’s even a random generation mode where an LLM generates a creative app idea, the generator builds it, and the result gets validated automatically. It’s a fun way to see what falls out of the process unsupervised.

Why Microservices for a Personal Project

It might seem like overkill, but there are real benefits. I can restart the memory server without dropping a Discord call. I can develop the tool server independently and test it with curl. Each service has its own little web UI at /ui for manual poking. And when something breaks, the blast radius is contained.

Everything runs as systemd user services with health check dependencies, so ./start-services.sh brings up the whole stack in the right order. For deployment elsewhere, there’s a Docker Compose setup that wires the core services together.