Running LLMs on the AMD NPU with Lemonade Server
Setting up AMD's Lemonade Server on Strix Halo to run LLM and Whisper inference on the XDNA 2 NPU, driver builds, architecture decisions, and benchmarks against the integrated GPU.
read more →Setting up AMD's Lemonade Server on Strix Halo to run LLM and Whisper inference on the XDNA 2 NPU, driver builds, architecture decisions, and benchmarks against the integrated GPU.
read more →Building a custom speech-to-text solution for Linux Wayland users using NVIDIA's Canary model, Silero VAD, and ydotool.
read more →