“It feels like I’ve plugged a GPU into my desktop — but it’s just the Mac mini M4 doing all the work.”
The dream of running powerful large language models (LLMs) locally, without GPU cloud costs or latency trade-offs, is not just alive — it’s thriving. Thanks to Apple’s MLX framework, the llm-mlx plugin, and the silicon magic of the Mac mini M4, running LLaMA 3.2, Mistral, and even larger models locally has become remarkably fast and easy.
Inspired by Apple’s WWDC25 session “Explore large language models on Apple silicon with MLX”, this post covers:
- Installing and using llm-mlx
- Running LLaMA, Mistral, and more locally
- Fine-tuning, quantization, and Swift integration
- Performance benchmarks on the Mac mini M4
- Real-world prompt tests
- Why this stack might be the best PLG dev tool Apple has built yet