VibeThinker is a 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPORedditJun 23, 2026 14:18Sharearxiv.org