- llm
- inference
- on-device-ai
- reinforcement-learning
- robotics
- ml
- thinking
•
•
•
•
•
•
-
Optimizing LLM Inference for On-Device Deployment
A practical overview of quantization, pruning, KV cache, and hardware-specific techniques for running large language models faster — with a focus on edge devices.
-
Evaluation of Large Language Models
A practical guide to LLM benchmarks — what they measure, how they are computed, and how to run them yourself.