joshuago’s ai-research Bookmarks

15 MAY 2026
Interaction Models: A Scalable Approach to Human-AI Collaboration

An interaction model trained from scratch with real-time responsiveness enabled by adopting a multi-stream, micro-turn design. Big break from the common turn-based models.

01 MAY 2026
DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

Paper from DeepSeek detailing LLM architectural and infrastructure changes made to make DeepSeek-V4 low-cost and token-efficient.

06 APR 2026
Mixture of Experts (MoEs) in Transformers

Different tokens activate different experts, based on their hidden representations.

06 APR 2026
Mixture of Experts Explained

A look at the building blocks of MoEs, how they’re trained, and the tradeoffs to consider when serving them for inference.

25 MAR 2026
TurboQuant: Redefining AI efficiency with extreme compression

A set of advanced theoretically grounded quantization algorithms that enable massive compression for large language models and vector search engines.

12 MAR 2026
Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

Built on a fully open, end-to-end data pipeline that spans pretraining, post-training, and interactive reinforcement learning, which gives developers reproducible building blocks.

19 NOV 2025
AMD GPUs go brrr

HipKittens is an opinionated collection of programming primitives to help developers realize the hardware's capabilities. Includes optimized register tiles, 8-wave and 4-wave kernel patterns instead of wave-specialization to schedule work within processors. Also includes chiplet-optimized cache reuse patterns to schedule work across processors.