It’s already been mentioned in Fable’s system card, but raw chain of thought output is getting hard to read. It’s a consequence of RLVR: apply enough reinforcement learning to a model and it’ll learn that plain English isn’t the most efficient way to reason about something. It’s meaningful: see here for an example of someone “translating” the reasoning trace from the system card.
On one hand, it’s kind of fascinating to see how LLMs “think” under the hood and that they’re sniffing out ways to think more and better with fewer tokens. On the other, this is going to be an issue for interpretability going forward—researchers are concerned about neuron-only representations being incomprehensible, but it looks like text is already starting to head in that direction too.