Markola
Network
Log In / Sign Up
吴正龙
’s
distillation
Bookmarks
27 OCT 2025
[Thinking Machines] On-Policy Distillation
The reliable performance of on-policy training with the cost-efficiency of a dense reward signal.
distillation
large-language-models
machine-learning