The LMArena model ranking list is out. The programming ability of DeepSeek-R1 has surpassed that of Claude Opus 4.

The LMArena Model Leaderboard is Out: DeepSeek-R1's Programming Ability Surpasses Claude Opus 4

via cnBeta.COM - Chinese Industry Information Website (author: Source: QbitAI Pro)

Telegraph
The LMArena Model Leaderboard is Out: DeepSeek-R1's Programming Ability Surpasses Claude Opus 4
In the field of open-source models, DeepSeek has brought another surprise. On the 28th of last month, DeepSeek had a minor update, upgrading its R1 inference model to the latest version (0528) and making the model and weights public. This time, R1-0528 further improved benchmark performance, enhanced front-end functionality, reduced hallucinations, and supported JSON output and function calls. Today, LMArena, a well-known large model public benchmarking platform in the industry that has also been mired in controversy recently (it was once pointed out that it showed favoritism towards large models of OpenAI, Google, and Meta), released its latest performance leaderboard. Among them, the results of DeepSeek-R1 (0528) are particularly eye-catching. Its...