吴正龙’s distillation Bookmarks

27 OCT 2025

The reliable performance of on-policy training with the cost-efficiency of a dense reward signal.

distillation large-language-models machine-learning