It took the first place in the ‘overall’ (but not coding) section of the lmsys arena. However, the arena isn’t necessarily accurate.
It doesn’t seem there are any coding benchmarks released as of now (or I haven’t found them).
Still, would be interesting to see if it could be integrated into Cursor, and the long context chat as well, and whether someone has run any proper benchmarks on it.