Run local LLMs on your Mac: llm-mlx, Apple MLX, and the power of the Mac mini M4

“It feels like I’ve plugged a GPU into my desktop — but it’s just the Mac mini M4 doing all the work.”

The dream of running powerful large language models (LLMs) locally, without GPU cloud costs or latency trade-offs, is not just alive — it’s thriving. Thanks to Apple’s MLX framework, the llm-mlx plugin, and the silicon magic of the Mac mini M4, running LLaMA 3.2, Mistral, and even larger models locally has become remarkably fast and easy.

Inspired by Apple’s WWDC25 session “Explore large language models on Apple silicon with MLX”, this post covers:

Installing and using llm-mlx
Running LLaMA, Mistral, and more locally
Fine-tuning, quantization, and Swift integration
Performance benchmarks on the Mac mini M4
Real-world prompt tests
Why this stack might be the best PLG dev tool Apple has built yet

Apple’s WWDC25 session “Explore large language models on Apple silicon with MLX