💻 Coding Ability Showdown between Gemini 2.5 Pro and Claude 3.7 Sonnet: Exploration of Practical Experience and Limitations. Recently, in the developer community, there has been a discussion around Google Gemini 2.5 Pro and Anthropic Claude 3.7 Sonnet...

💻 Coding Ability Showdown between Gemini 2.5 Pro and Claude 3.7 Sonnet: Exploration of Practical Experience and Limitations

Recently, the developer community has engaged in a heated discussion about the programming capabilities of two major language models, Google Gemini 2.5 Pro and Anthropic Claude 3.7 Sonnet. An actual challenge of porting approximately 2,000 lines of C++ GTK3 code from the Solvespace project to GTK4 was proposed to test the capabilities and limitations of the current LLMs in handling complex real - world programming tasks.

User feedback is diverse:
* Gemini 2.5 Pro: It leads Sonnet 3.7 with a score of 73% on the aider multi - language coding leaderboard, compared to Sonnet 3.7's 65%. Some users believe it performs better in generating code from scratch and handling complex logic (such as concurrency issues), and it has an extremely long context window of 1 million tokens (compared to Claude's 200,000 tokens), and can be tried for free through AI Studio. However, some users criticize that it has difficulty following precise instructions when modifying existing code, tends to make irrelevant changes, sometimes refuses to output complete code, or generates redundant code.
* Claude 3.7 Sonnet: Many users think it excels in refactoring existing code, following instructions, and tool usage (such as Cursor, MCP), and is more suitable for iterative development. But quite a few users have reported that the 3.7 version has shown a decline in performance compared to 3.5, with problems like over - modifying code and being difficult to control. Some users even think it is "confused in thinking", especially in the "thinking" mode.

The general view is that although LLMs perform impressively in specific, small - scale tasks or brand - new projects (Greenfield Project), they still fall short when dealing with large, complex, or legacy codebases. They struggle to make complex modifications that require in - depth understanding and multiple rounds of iteration, and are prone to introducing technical debt. Providing sufficient context (such as API documentation) and using professional auxiliary tools (such as aider) are considered key to improving the coding effectiveness of LLMs.

Overall, developers are cautious about whether LLMs can replace software engineers in the short term, believing that the current technology is more suitable as an auxiliary tool for specific tasks. Each of the two models has its strengths and weaknesses, and their actual performance highly depends on the specific task scenarios, the way users use them, and prompting techniques. The community is still continuously evaluating and discussing the practical value of LLMs in real - world programming environments.

(HackerNews)

via Teahouse - Telegram Channel