Dreams

Write-ups of what I built and how it went. Browse by tag

by Zetaphor

Running DramaBox on Strix Halo

Getting Resemble AI's expressive TTS model running on AMD Strix Halo with no NVIDIA hardware. TheRock gfx1151 nightlies, bitsandbytes preview for ROCm, reduced step counts, and torch.compile bringing the 3.3B DiT from RTF 4.0 down to 1.75.

read more →
by Zetaphor

Benchmarking Echo-TTS on Strix Halo

Running a diffusion-based TTS model on AMD's Strix Halo, patching CUDA-only code for CPU, discovering a bf16 GPU hang on gfx1151, and a hybrid GPU/CPU trick that beats every other TTS model I've tested.

read more →
by Zetaphor

Friends Don't Let Friends Use Ollama

Ollama gained traction by being the first easy llama.cpp wrapper, then spent years dodging attribution, misleading users, and pivoting to cloud, all while riding VC money earned on someone else's engine. Here's the full history, and why the alternatives are better.

read more →
by Zetaphor

Local LLM Infrastructure on Strix Halo

How LiteLLM, llama-swap, and Lemonade Server compose into a unified local inference platform, routing dozens of models across GPU and NPU through a single API endpoint, accessible anywhere via Tailscale and a local reverse proxy.

read more →
by Zetaphor

Benchmarking VoxCPM2 on Strix Halo

Running a 2B parameter tokenizer-free TTS model in both Python and C++ on AMD's integrated GPU, near-real-time speech synthesis on CPU, and the Vulkan crash that stopped GPU acceleration in its tracks.

read more →
by Zetaphor

LoopMaker Web

A browser-based AI music generation tool powered by ACE-Step, ported to Linux for local generation on AMD Strix Halo hardware.

read more →
by Zetaphor

OCR List Maker

Snap a photo of a handwritten list, OCR it with a local vision model, and print a formatted checklist on a thermal receipt printer.

read more →
by Zetaphor

You Are John

A text-driven simulation where you interact with a guy named John through natural language, and an LLM determines how his world responds.

read more →