Markola - wuzhenglong's Bookmarks

27 OCT 2025

[Thinking Machines] On-Policy Distillation

The reliable performance of on-policy training with the cost-efficiency of a dense reward signal.

29 SEP 2025

LoRA may offer advantages in the cost and speed of post-training, and there are also a few operational reasons to prefer it to full fine-tuning.

large-language-models reinforcement-learning

28 FEB 2025

Gradient Descent Explained

A practical breakdown of Gradient Descent, the backbone of ML optimization, with step-by-step examples and visualizations.

gradient-descent machine-learning

22 JAN 2025

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

The industry rocking paper from DeepSeek that sent shock waves around the world. DeepSeek-R1 incorporates multi-stage training and cold-start data before RL.

large-language-models

27 DEC 2024

DeepSeek-V3 Technical Report

To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.

large-language-models mixture-of-experts

24 AUG 2024

[Chip Huyen] Machine Learning Interviews Book

Machine learning interview prep materials representing the collective wisdom of many people who have sat on both sides of the table, and who have spent a lot of time thinking about the hiring process.

machine-learning

08 JUN 2024

[Applied LLMs]

A practical guide to building successful LLM products, covering the tactical, operational, and strategic.

large-language-models

11 DEC 2023

[HuggingFace] Mixture of Experts Explained

MoEs are pretrained much faster versus dense models, have faster inference compared to a model with the same number of parameters, and require high VRAM as all experts are loaded in memory.

large-language-models mixture-of-experts

30 JUL 2023

[Eugene Yan] Patterns for Building LLM-based Systems and Products

Practical patterns for integrating large language models (LLMs) into real systems and products. Overview of seven key patterns: evals, RAG, fine-tuning, caching, guardrails, defensive UX, and collecting user feedback.

large-language-models machine-learning

21 MAY 2023

[Eugene Yan] Some Intuition on Attention and the Transformer

Different angle on attention mechanism to help build further intuition. Intended for people who have read the "Attention is All You Need" paper and have a basic understanding of how attention works.

large-language-models machine-learning

吴正龙’s Bookmarks