吴正龙’s large-language-models Bookmarks
27 OCT 2025
The reliable performance of on-policy training with the cost-efficiency of a dense reward signal.
29 SEP 2025
LoRA may offer advantages in the cost and speed of post-training, and there are also a few operational reasons to prefer it to full fine-tuning.
22 JAN 2025
The industry rocking paper from DeepSeek that sent shock waves around the world. DeepSeek-R1 incorporates multi-stage training and cold-start data before RL.
27 DEC 2024
To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.
08 JUN 2024
A practical guide to building successful LLM products, covering the tactical, operational, and strategic.
11 DEC 2023
MoEs are pretrained much faster versus dense models, have faster inference compared to a model with the same number of parameters, and require high VRAM as all experts are loaded in memory.
30 JUL 2023
Practical patterns for integrating large language models (LLMs) into real systems and products. Overview of seven key patterns: evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.
21 MAY 2023
Different angle on attention mechanism to help build further intuition. Intended for people who have read the "Attention is All You Need" paper and have a basic understanding of how attention works.