// cat ~/dreams/tags/infra

#infra

> May 17, 2026

by Zetaphor

ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images

AMD released ROCm 7.13 with Strix Halo optimizations. I benchmarked kyuz0's latest toolbox images against my current ROCm 6.4.4 production baseline to see if upgrading my llama-swap stack is worth it. The answer is complicated.

> Apr 12, 2026

llm local

by Zetaphor

Speculative Decoding on Strix Halo: 2x Faster Gemma 4 31B Token Generation

Benchmarking speculative decoding with Gemma 4 E2B as a draft model for Gemma 4 31B on AMD Strix Halo, a bandwidth-bound setup where the optimal draft-max differs from discrete GPUs.

> Apr 11, 2026

agents llm

by Zetaphor

Pi Web UI: A Browser Interface for the Pi Coding Agent

A full-stack web interface that puts the Pi coding agent in the browser, with system-level access, session history, and model switching through a local LiteLLM proxy.

> Apr 10, 2026

strix-halo llm

by Zetaphor

Local LLM Infrastructure on Strix Halo

How LiteLLM, llama-swap, and Lemonade Server compose into a unified local inference platform, routing dozens of models across GPU and NPU through a single API endpoint, accessible anywhere via Tailscale and a local reverse proxy.

> Apr 9, 2026

llm amd

by Zetaphor

Running LLMs on the AMD NPU with Lemonade Server

Setting up AMD's Lemonade Server on Strix Halo to run LLM and Whisper inference on the XDNA 2 NPU, driver builds, architecture decisions, and benchmarks against the integrated GPU.

> Jan 30, 2026

agents local

by Zetaphor

Oneiros: A Personal AI Agent Platform

A modular collection of services for building a personal AI agent, tool use, memory, browser automation, TTS, and multi-platform chat interfaces.

> Nov 3, 2025

llm docker

by Zetaphor

llama-cpp-python in Docker

A Dockerfile and docker-compose setup for running llama.cpp with its Python bindings in a container, because finding a working one shouldn't be this hard.

> Aug 10, 2025

llm docker

by Zetaphor

Lavabo: The Kitchen Sink of Local AI

An all-in-one Docker container that bundles LLMs, embeddings, vision, and TTS into a single unified inference server.