吴正龙’s mixture-of-experts Bookmarks

27 DEC 2024

To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.

large-language-models mixture-of-experts

11 DEC 2023

[HuggingFace] Mixture of Experts Explained

MoEs are pretrained much faster versus dense models, have faster inference compared to a model with the same number of parameters, and require high VRAM as all experts are loaded in memory.

large-language-models mixture-of-experts