Progression

RSS feed

Switch to light mode

Buy me a coffee

respect to Cursor for showing this about their own composer model! "We're sharing new research on how models hack public benchmarks. The latest models learn to retrieve solutions from the internet or git history. When we apply a stricter harness, eval scores drop significantly."Reddit

Jun 26, 2026 08:21

respect to Cursor for showing this about their own composer model! "We're sharing new research on how models hack public benchmarks. The latest models learn to retrieve solutions from the internet or git history. When we apply a stricter harness, eval scores drop significantly."

Go to Progression Home