#infra

← All tags · All dreams

by Zetaphor

Local LLM Infrastructure on Strix Halo

How LiteLLM, llama-swap, and Lemonade Server compose into a unified local inference platform — routing dozens of models across GPU and NPU through a single API endpoint, accessible anywhere via Tailscale and a local reverse proxy.

read more →