"it took Claude Fable 2.5 hours to write a fused megakernel which delivers a >18x speed-up over a PyTorch baseline now please recall that: - Fable is not the full Mythos model - Anthropic can spend much more than just 2.5h and ~550k tokens on this - they probably have better harnes…" — Lisan al Gaib Reddit
Claude Fable 5 [max] wrote the first genuine (and fastest) megakernel ever submitted to KernelBench-Mega.
It was tested on: Kimi-Linear W4A16 batch-1 decode for RTX PRO 6000 Blackwell. Every prior model "won" it with a multi-kernel Triton pipeline that fails our — Elliot Arledge
Source: https://x.com/elliotarledge/status/2072814573753975266
ses Anthropic is definitely doing some sweet autoresearch internally. Especially architecture research bros are probably so happy at Anthropic. Imagine vibe-testing a new arch / tweak some arch and wanting to test it in a semi-optimized way. Just let 10T Mythos cook for a day. — Lisan al Gaib
