Alex Chen

Writing on AI systems, LLM inference, and robotics

Optimizing LLM Inference for On-Device Deployment

A practical overview of quantization, pruning, KV cache, and hardware-specific techniques for running large language models faster — with a focus on edge devices.

6 min read · February 11, 2026

2026 · llm inference optimization quantization on-device-ai · ml
Evaluation of Large Language Models

A practical guide to LLM benchmarks — what they measure, how they are computed, and how to run them yourself.

6 min read · February 11, 2026

2026 · llm evaluation benchmarks · ml