joshuago’s ai-research Bookmarks
An interaction model trained from scratch with real-time responsiveness enabled by adopting a multi-stream, micro-turn design. Big break from the common turn-based models.
Paper from DeepSeek detailing LLM architectural and infrastructure changes made to make DeepSeek-V4 low-cost and token-efficient.
Different tokens activate different experts, based on their hidden representations.
A look at the building blocks of MoEs, how they’re trained, and the tradeoffs to consider when serving them for inference.
A set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines.
Built on a fully open, end-to-end data pipeline that spans pretraining, post-training, and interactive reinforcement learning, which gives developers reproducible building blocks.
HipKittens is an opinionated collection of programming primitives to help developers realize the hardware's capabilities. Includes optimized register tiles, 8-wave and 4-wave kernel patterns instead of wave-specialization to schedule work within processors. Also includes chiplet-optimized cache reuse patterns to schedule work across processors.