5.6-sol-medium looks like the replacement for 5.5-xhigh

The ExploitGym numbers suggest 5.6 is not just pushing peak scores. It is improving cost efficiency.

5.5-xhigh gets 15% intended exploits at $36.80. 5.6-sol-medium gets 16% at $19.62.

That is slightly better performance for about 47% less cost. Cost per score point drops from about $2.45 to $1.23.

The 5.4 replacement looks similar.

5.4-xhigh gets 7% at $26.57. 5.6-terra-high gets 9% at $14.62.

That is 2 points better for about 45% less cost. Cost per score point drops from about $3.80 to about $1.62.

This looks like OpenAI is moving the efficiency curve, not only the benchmark ceiling. The new reasoning levels may still cost more in absolute terms, but the score per dollar is much better here.

TL;DR: 5.5-xhigh vs 5.6-sol-medium; 5.4-xhigh vs 5.6-terra-high.