"Datacurve released DeepSWE, a new benchmark for frontier coding agents on real developer tasks. Unlike SWE-Bench’s public GitHub issues that models memorize, DeepSWE uses original tasks. Prompts are short but solutions edit 668 lines across 7 files on average, 5.5× more code"RedditMay 26, 2026 22:11Share