<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Sleeping Robots</title><description>Project write-ups, experiments, and things I wanted to exist.</description><link>https://sleepingrobots.com/</link><item><title>ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images</title><link>https://sleepingrobots.com/dreams/rocm7-toolbox-upgrade-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/rocm7-toolbox-upgrade-strix-halo/</guid><description>AMD released ROCm 7.13 with Strix Halo optimizations. I benchmarked kyuz0&apos;s latest toolbox images against my current ROCm 6.4.4 production baseline to see if upgrading my llama-swap stack is worth it. The answer is complicated.</description><pubDate>Sun, 17 May 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>llm</category><category>linux</category><category>strix-halo</category><category>hardware</category><category>infra</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Running DramaBox on Strix Halo</title><link>https://sleepingrobots.com/dreams/dramabox-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/dramabox-strix-halo/</guid><description>Getting Resemble AI&apos;s expressive TTS model running on AMD Strix Halo with no NVIDIA hardware. TheRock gfx1151 nightlies, bitsandbytes preview for ROCm, reduced step counts, and torch.compile bringing the 3.3B DiT from RTF 4.0 down to 1.75.</description><pubDate>Fri, 15 May 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>voice</category><category>linux</category><category>strix-halo</category><category>benchmark</category><category>python</category><author>Zetaphor</author></item><item><title>Unsloth Studio on Strix Halo: Full GPU Training Without System ROCm</title><link>https://sleepingrobots.com/dreams/unsloth-studio-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/unsloth-studio-strix-halo/</guid><description>Getting Unsloth Studio&apos;s full training pipeline running on AMD Strix Halo (gfx1151) using pip-packaged ROCm nightlies, no /opt/rocm required. Chat, training, data recipes, and model export all working on Fedora 43.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>local</category><category>hardware</category><category>amd</category><category>linux</category><category>strix-halo</category><category>training</category><category>python</category><author>Zetaphor</author></item><item><title>Gemma 4 MTP Assistant: 3.7x Faster 31B and +45% Faster 26B-A4B on Strix Halo</title><link>https://sleepingrobots.com/dreams/gemma4-mtp-assistant-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/gemma4-mtp-assistant-strix-halo/</guid><description>Google&apos;s official Gemma 4 MTP assistant heads bring speculative decoding to MoE models that couldn&apos;t benefit before, and nearly quadruple dense model throughput on AMD Strix Halo&apos;s bandwidth-limited unified memory.</description><pubDate>Sun, 10 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>local</category><category>hardware</category><category>amd</category><category>linux</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Optimizing Echo-TTS: CPU Beats GPU</title><link>https://sleepingrobots.com/dreams/echo-tts-optimizations/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/echo-tts-optimizations/</guid><description>Eight optimization attempts on Echo-TTS CPU inference, the five that worked, quality evaluation with voice cloning, and how the optimized CPU path ended up faster than the GPU hybrid.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>voice</category><category>linux</category><category>strix-halo</category><category>benchmark</category><category>python</category><author>Zetaphor</author></item><item><title>MTP Speculative Decoding: 4.8x Faster Qwen 3.6 27B on Strix Halo</title><link>https://sleepingrobots.com/dreams/mtp-qwen36-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/mtp-qwen36-strix-halo/</guid><description>Multi-Token Prediction turns Qwen 3.6 27B from 6 t/s to 30 t/s on AMD Strix Halo, succeeding where draft models and ngram decoding failed, by using prediction heads baked into the model itself.</description><pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate><category>llm</category><category>local</category><category>hardware</category><category>amd</category><category>linux</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Benchmarking Echo-TTS on Strix Halo</title><link>https://sleepingrobots.com/dreams/echo-tts-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/echo-tts-strix-halo/</guid><description>Running a diffusion-based TTS model on AMD&apos;s Strix Halo, patching CUDA-only code for CPU, discovering a bf16 GPU hang on gfx1151, and a hybrid GPU/CPU trick that beats every other TTS model I&apos;ve tested.</description><pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>voice</category><category>linux</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Running HY-World 2.0 on Strix Halo: 3D World Reconstruction on an AMD iGPU</title><link>https://sleepingrobots.com/dreams/hy-world-2-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/hy-world-2-strix-halo/</guid><description>Porting Tencent&apos;s CUDA-only 3D world model to AMD&apos;s Radeon 8060S via ROCm Docker, flash-attention CK kernels, a fully compiled gsplat with wave32 patches, and complete 3D reconstruction output including Gaussian splats.</description><pubDate>Fri, 17 Apr 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>linux</category><category>docker</category><category>hardware</category><category>strix-halo</category><category>3D</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Friends Don&apos;t Let Friends Use Ollama</title><link>https://sleepingrobots.com/dreams/stop-using-ollama/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/stop-using-ollama/</guid><description>Ollama gained traction by being the first easy llama.cpp wrapper, then spent years dodging attribution, misleading users, and pivoting to cloud, all while riding VC money earned on someone else&apos;s engine. Here&apos;s the full history, and why the alternatives are better.</description><pubDate>Wed, 15 Apr 2026 00:00:00 GMT</pubDate><category>llm</category><category>local</category><category>open-source</category><category>opinion</category><author>Zetaphor</author></item><item><title>Speculative Decoding on Strix Halo: 2x Faster Gemma 4 31B Token Generation</title><link>https://sleepingrobots.com/dreams/speculative-decoding-gemma4-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/speculative-decoding-gemma4-strix-halo/</guid><description>Benchmarking speculative decoding with Gemma 4 E2B as a draft model for Gemma 4 31B on AMD Strix Halo, a bandwidth-bound setup where the optimal draft-max differs from discrete GPUs.</description><pubDate>Sun, 12 Apr 2026 00:00:00 GMT</pubDate><category>llm</category><category>local</category><category>hardware</category><category>amd</category><category>linux</category><category>infra</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Pi Web UI: A Browser Interface for the Pi Coding Agent</title><link>https://sleepingrobots.com/dreams/pi-web-ui/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/pi-web-ui/</guid><description>A full-stack web interface that puts the Pi coding agent in the browser, with system-level access, session history, and model switching through a local LiteLLM proxy.</description><pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate><category>agents</category><category>llm</category><category>local</category><category>infra</category><category>linux</category><author>Zetaphor</author></item><item><title>Local LLM Infrastructure on Strix Halo</title><link>https://sleepingrobots.com/dreams/local-llm-infrastructure-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/local-llm-infrastructure-strix-halo/</guid><description>How LiteLLM, llama-swap, and Lemonade Server compose into a unified local inference platform, routing dozens of models across GPU and NPU through a single API endpoint, accessible anywhere via Tailscale and a local reverse proxy.</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate><category>strix-halo</category><category>llm</category><category>amd</category><category>local</category><category>hardware</category><category>linux</category><category>infra</category><category>docker</category><category>npu</category><author>Zetaphor</author></item><item><title>Running LLMs on the AMD NPU with Lemonade Server</title><link>https://sleepingrobots.com/dreams/lemonade-server-npu-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/lemonade-server-npu-strix-halo/</guid><description>Setting up AMD&apos;s Lemonade Server on Strix Halo to run LLM and Whisper inference on the XDNA 2 NPU, driver builds, architecture decisions, and benchmarks against the integrated GPU.</description><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><category>llm</category><category>amd</category><category>local</category><category>hardware</category><category>linux</category><category>speech-to-text</category><category>infra</category><category>strix-halo</category><category>npu</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Benchmarking OmniVoice on Strix Halo</title><link>https://sleepingrobots.com/dreams/omnivoice-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/omnivoice-strix-halo/</guid><description>Running a 600+ language zero-shot TTS model on an AMD integrated GPU, voice cloning benchmarks, ROCm compatibility adventures, and the container workaround that actually worked.</description><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>voice</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Benchmarking VoxCPM2 on Strix Halo</title><link>https://sleepingrobots.com/dreams/voxcpm-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/voxcpm-strix-halo/</guid><description>Running a 2B parameter tokenizer-free TTS model in both Python and C++ on AMD&apos;s integrated GPU, near-real-time speech synthesis on CPU, and the Vulkan crash that stopped GPU acceleration in its tracks.</description><pubDate>Thu, 09 Apr 2026 00:00:00 GMT</pubDate><category>amd</category><category>local</category><category>voice</category><category>linux</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Self-Hosting Fish Audio on Strix Halo</title><link>https://sleepingrobots.com/dreams/fish-audio-strix-halo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/fish-audio-strix-halo/</guid><description>Running Fish Audio&apos;s 4B parameter S2-Pro text-to-speech model locally on an AMD Strix Halo integrated GPU via ROCm and Podman.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate><category>voice</category><category>amd</category><category>local</category><category>strix-halo</category><category>benchmark</category><author>Zetaphor</author></item><item><title>Medium-Claw: A Persistent AI Companion on Telegram</title><link>https://sleepingrobots.com/dreams/medium-claw/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/medium-claw/</guid><description>A Telegram bot backed by the Pi coding agent with autonomous scheduling, persistent memory, cross-session search, and a web dashboard.</description><pubDate>Sun, 15 Mar 2026 00:00:00 GMT</pubDate><category>agents</category><category>local</category><author>Zetaphor</author></item><item><title>LoopMaker Web</title><link>https://sleepingrobots.com/dreams/loopmaker-web/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/loopmaker-web/</guid><description>A browser-based AI music generation tool powered by ACE-Step, ported to Linux for local generation on AMD Strix Halo hardware.</description><pubDate>Thu, 05 Mar 2026 00:00:00 GMT</pubDate><category>music</category><category>local</category><category>linux</category><category>strix-halo</category><author>Zetaphor</author></item><item><title>QuizForge: Self-Learning Quiz Maker</title><link>https://sleepingrobots.com/dreams/self-learning-quiz-maker/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/self-learning-quiz-maker/</guid><description>A full-stack quiz platform that turns markdown files and YouTube transcripts into mixed-format quizzes with AI grading, contextual chat, and performance analytics.</description><pubDate>Fri, 20 Feb 2026 00:00:00 GMT</pubDate><category>local</category><category>tools</category><author>Zetaphor</author></item><item><title>Oneiros: A Personal AI Agent Platform</title><link>https://sleepingrobots.com/dreams/oneiros/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/oneiros/</guid><description>A modular collection of services for building a personal AI agent, tool use, memory, browser automation, TTS, and multi-platform chat interfaces.</description><pubDate>Fri, 30 Jan 2026 00:00:00 GMT</pubDate><category>agents</category><category>local</category><category>infra</category><author>Zetaphor</author></item><item><title>OCR List Maker</title><link>https://sleepingrobots.com/dreams/ocr-list-maker/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/ocr-list-maker/</guid><description>Snap a photo of a handwritten list, OCR it with a local vision model, and print a formatted checklist on a thermal receipt printer.</description><pubDate>Sun, 28 Dec 2025 00:00:00 GMT</pubDate><category>ocr</category><category>local</category><category>hardware</category><author>Zetaphor</author></item><item><title>llama-cpp-python in Docker</title><link>https://sleepingrobots.com/dreams/llama-cpp-python-docker/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/llama-cpp-python-docker/</guid><description>A Dockerfile and docker-compose setup for running llama.cpp with its Python bindings in a container, because finding a working one shouldn&apos;t be this hard.</description><pubDate>Mon, 03 Nov 2025 00:00:00 GMT</pubDate><category>llm</category><category>docker</category><category>local</category><category>infra</category><author>Zetaphor</author></item><item><title>Lavabo: The Kitchen Sink of Local AI</title><link>https://sleepingrobots.com/dreams/lavabo/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/lavabo/</guid><description>An all-in-one Docker container that bundles LLMs, embeddings, vision, and TTS into a single unified inference server.</description><pubDate>Sun, 10 Aug 2025 00:00:00 GMT</pubDate><category>llm</category><category>docker</category><category>local</category><category>infra</category><author>Zetaphor</author></item><item><title>Web Browser Wrapped</title><link>https://sleepingrobots.com/dreams/web-browser-wrapped/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/web-browser-wrapped/</guid><description>Generating weekly Spotify Wrapped-style reports from browser history using local models and Browser Recall data.</description><pubDate>Mon, 14 Apr 2025 00:00:00 GMT</pubDate><category>browser</category><category>recall</category><category>local</category><author>Zetaphor</author></item><item><title>Speech To Text Typing for Wayland Users</title><link>https://sleepingrobots.com/dreams/speech-to-text-typing-wayland/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/speech-to-text-typing-wayland/</guid><description>Building a custom speech-to-text solution for Linux Wayland users using NVIDIA&apos;s Canary model, Silero VAD, and ydotool.</description><pubDate>Sun, 13 Apr 2025 00:00:00 GMT</pubDate><category>speech-to-text</category><category>linux</category><category>python</category><category>local</category><author>Zetaphor</author></item><item><title>Total (Browser) Recall</title><link>https://sleepingrobots.com/dreams/total-browser-recall/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/total-browser-recall/</guid><description>Building a personal browser history search engine with full-text recall, inspired by Microsoft&apos;s Recall and rewind.ai.</description><pubDate>Sun, 13 Apr 2025 00:00:00 GMT</pubDate><category>browser</category><category>recall</category><category>productivity</category><category>local</category><author>Zetaphor</author></item><item><title>A Practical, Fully Local Desktop Voice Agent</title><link>https://sleepingrobots.com/dreams/desktop-voice-agent/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/desktop-voice-agent/</guid><description>Building a natural language voice controller for the Linux desktop using Qt6, a tiny 1.7B LLM, and a clever vector embedding trick for tool calling.</description><pubDate>Wed, 05 Feb 2025 00:00:00 GMT</pubDate><category>voice</category><category>agents</category><category>local</category><category>linux</category><category>llm</category><author>Zetaphor</author></item><item><title>A Fully Local, In-Browser Voice Assistant</title><link>https://sleepingrobots.com/dreams/browser-based-voice-assistant/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/browser-based-voice-assistant/</guid><description>Building a private, browser-based voice assistant using WebAssembly, Moonshine STT, Piper TTS, and local LLMs.</description><pubDate>Thu, 16 Jan 2025 00:00:00 GMT</pubDate><category>voice</category><category>local</category><category>wasm</category><category>llm</category><author>Zetaphor</author></item><item><title>You Are John</title><link>https://sleepingrobots.com/dreams/you-are-john/</link><guid isPermaLink="true">https://sleepingrobots.com/dreams/you-are-john/</guid><description>A text-driven simulation where you interact with a guy named John through natural language, and an LLM determines how his world responds.</description><pubDate>Sun, 13 Oct 2024 00:00:00 GMT</pubDate><category>llm</category><category>games</category><category>local</category><author>Zetaphor</author></item></channel></rss>