On September 29, 2025, Anthropic released Claude Sonnet 4.5. It's a new AI model for programming, and it's really good. I've been testing it for several weeks myself, and the results surprised me.![Anthropic Introduced the Best Coding Model — Claude Sonnet 4.5 - 1]()
But the most interesting thing is that the model can work autonomously for more than 30 hours straight. Not just generating code, but doing full-fledged work. Developers gave it access to a server, and it deployed databases, purchased domain names, and set up environments on its own. And without errors.
I used to teach friends programming. Explaining basic things, showing them how to write code. Now everything has changed. With tools like these, you can do ten times more work.
![Anthropic Introduced the Best Coding Model — Claude Sonnet 4.5 - 3]()
IDE Integration:VS Code — there's a native Claude Code extension plus integration through GitHub Copilot.
JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm) — support for Claude Code and GitHub Copilot.
Cursor — deployed Claude Sonnet 4.5 for all users. Cursor's CEO said it's "state-of-the-art performance." Users report 30% less code rework compared to regular Cursor.
There are also integrations with Windsurf, Replit, Zed, and other editors.

What This Model Can Do
Claude Sonnet 4.5 solves 77.2% of real development tasks on the SWE-bench Verified test. This is a test where models are given real bugs from GitHub to see if they can fix them. For comparison: GPT-5 solves 72.8%, Gemini 2.5 Pro — 63.8%.
But the most interesting thing is that the model can work autonomously for more than 30 hours straight. Not just generating code, but doing full-fledged work. Developers gave it access to a server, and it deployed databases, purchased domain names, and set up environments on its own. And without errors.
I used to teach friends programming. Explaining basic things, showing them how to write code. Now everything has changed. With tools like these, you can do ten times more work.
Code Refactoring
Remember legacy code with seven levels of nested conditions? When you look at it and don't understand what's even happening? In one test, the model took a 210-line function with cyclomatic complexity of 16 and turned it into 30 lines of code with complexity of 3-6. The model untangled 13 nested conditions, extracted repeating logic, and broke everything into proper functions. And you know what? All tests passed after refactoring. Replit confirmed: code editing errors dropped from 9% to 0%. Zero percent — that's serious. When I was teaching people programming, refactoring was always a difficult topic. You need to understand the code, see patterns, know how to rewrite it better. Now the model does it automatically. And does it well.Finding and Fixing Bugs
I had a case. A friend wrote an application for a startup. Production was crashing, logs full of errors, nobody understood what was going on. Before, I would sit down and spend hours figuring it out. Now you can just show the logs to Claude. Cora (they make an AI assistant for development) shared a case: their model based on Claude Sonnet 4.5 fixed a bug in 20 minutes. The previous version, Claude Opus 4.1, couldn't handle this bug at all. At CrowdStrike (that's cybersecurity), using Claude reduced vulnerability processing time by 44%. At the same time, accuracy increased by 25%. The model finds vulnerabilities and fixes them itself before anyone can exploit them.Test Generation
Writing tests is boring. But necessary. Claude Sonnet 4.5 generates tests with a success rate of about 95%. And 85% of these tests are actually useful, not just formality. The model understands project structure, adapts to the needed framework (Jest, pytest, JUnit), creates mock objects, covers edge cases. You can work in TDD style: first ask it to write tests, then write code for those tests. I like that the model finds edge cases you wouldn't think of yourself. It can generate 50+ test cases for a single function. When I was teaching people, I always said: tests are important, write tests. But everyone understood it takes time. Now that problem doesn't exist.Code Review
Code review has always been a painful topic. A colleague might nitpick minor details or miss a serious problem. Claude does a comprehensive review in 2 minutes. For comparison: GPT-5 does the same in 10 minutes. The model checks:- Code quality (naming, structure)
- Security (input validation, vulnerabilities)
- Performance (time complexity, query efficiency)
- Compliance with project standards
What Makes Claude Special
There are many AI tools for coding on the market. GitHub Copilot, Cursor, GPT-5, Gemini. Why do I single out Claude specifically?
- First — performance. 77.2% on SWE-bench is the best result among all available models.
- Second — autonomy. More than 30 hours of continuous work. This isn't "write a function," this is "build an entire application."
- Third — computer use. 61.4% on the OSWorld test. The model can work with browsers, spreadsheets, any programs like a human.
- Fourth — production-ready code. Not "seems to work," but actually works. Error reduction to zero.
- Fifth — tool coordination. The model can run multiple commands in parallel, coordinate different services.
- Sixth — domain expertise. It's good not only at code, but also in finance, law, medicine. Experts in these fields confirm this.
- Seventh — security. Low rates of sycophancy and deception, resistance to prompt injection attacks.
Honest Comparison with Competitors
We need to be honest. Every tool has pros and cons.GPT-5
Simon Willison (well-known developer, creator of Datasette) tested Claude and said: "Better model for code than GPT-5-Codex." But there's a nuance. The Every.to team gave both models a large pull request to review. Claude handled it in 2 minutes. GPT-5 Codex — in 10 minutes. But GPT-5 found a complex edge case that Claude missed. The conclusion is simple. For fast development — Claude is a great choice. For critical production code, it's better to use GPT-5 for final review. Plus, GPT-5 is 2.4 times cheaper for input tokens.Gemini 2.5 Pro
Gemini lags in performance — 63.8% on SWE-bench versus Claude's 77.2%. But it has a context window up to 2 million tokens. That's 10 times more than Claude! If you're working with a huge codebase where you need to load hundreds of files at once — Gemini might be more convenient. Plus it's about twice as cheap.GitHub Copilot
Copilot is the market leader. 20 million users, 90% of Fortune 100 companies. Now it offers Claude Sonnet 4.5 as one of the models. Copilot wins in real-time completion and GitHub integration (PR reviews, Issues, Actions). Fixed price of $10-39 per month instead of paying per token. If you're in the GitHub ecosystem — it's a good choice.How to Get Started
Access to the model is simple. Web interface claude.ai:- Free plan — about 100 messages per day
- Pro plan $20/month — about 45 messages every 5 hours (up to 6,500 per month)
- Max plan from $100/month — 5-20 times more than Pro
- $3 per million input tokens
- $15 per million output tokens
- Prompt caching gives 90% savings ($0.30 instead of $3)
- Batch processing — 50% discount ($1.50/$7.50)
IDE Integration:VS Code — there's a native Claude Code extension plus integration through GitHub Copilot.
JetBrains IDEs (IntelliJ IDEA, PyCharm, WebStorm) — support for Claude Code and GitHub Copilot.
Cursor — deployed Claude Sonnet 4.5 for all users. Cursor's CEO said it's "state-of-the-art performance." Users report 30% less code rework compared to regular Cursor.
There are also integrations with Windsurf, Replit, Zed, and other editors.
Real-World Examples
Numbers from tests are one thing. What happens in reality?- Devin AI (autonomous AI developer) — 18% increase in planning accuracy, 12% improvement in end-to-end metrics.
- Vercel (platform for Next.js) — 17% performance improvement on Next.js tasks.
- Replit — editing errors reduced from 9% to 0%.
Language Support
Claude Sonnet 4.5 supports all major languages. Especially good at:- Python — Django, Flask, FastAPI, data science libraries (Polars, Pandas, NumPy). Works with virtual environments, pip, poetry.
- JavaScript/TypeScript — excellent type inference, Node.js code execution, NPM package installation.
- Frontend — React with hooks and functional components, proper architecture, state management, TypeScript. Also Vue, Angular, Svelte.
- Java and C# — enterprise code with pattern understanding, Spring Framework, .NET.
How the Developer's Role Is Changing
Many fear that AI will replace programmers. I don't think so. AI changes what a developer does. Professor Armando Solar-Lezama from MIT put it well: "Code completion is the easy part; the hard part is everything else." The real work of a programmer is:- Architectural planning
- Understanding business requirements
- Choosing technologies
- Creative problem solving
- Team communication
- Mentoring juniors
- Performance optimization
Adoption Numbers
Stack Overflow Developer Survey 2025: 84% of developers use or plan to use AI tools. 51% of professional developers use them daily. Google DORA Report 2025 (5,000 respondents): 90% of software developers use AI tools. Median usage — 2 hours per day. 80%+ report productivity growth. In tests, developers complete tasks 55% faster. Java developers see up to 61% of code generated by AI. But there's an interesting point. Only 24% report high levels of trust in AI. 46% actively distrust the accuracy of tools. Sentiment dropped from 70%+ to 60% in 2025. A METR study (July 2025) showed a 19% slowdown for experienced developers when using AI. Although developers subjectively felt a 20% speedup. What does this mean? Context matters. AI speeds up the work of less experienced developers. For experts, AI is still a complement, not a replacement.The Future of Development
Anthropic is developing aggressively. Three major releases in five months of 2025. The company reached $5 billion in annual recurring revenue and is tripling staff by the end of 2025. New capabilities:- Extended Thinking with Tool Use — the model can alternate between reasoning and using tools (web search, code execution).
- Improved memory — local file access for continuity between sessions.
- 65% reduction in "scheming" behavior — the model less often tries to work around tasks in non-standard ways instead of proper solutions.
- Claude Code — background tasks via GitHub Actions, VS Code and JetBrains integration, auto-response to PR feedback.
Practical Tips
If you decide to try Claude Sonnet 4.5, here's what I recommend:- Start with refactoring. Low-risk task where you can evaluate quality without risk.
- Use TDD. Ask to write tests first, then code. Quality will be higher.
- Provide context. The more the model understands about the project, the better the result.
- Combine tools. Claude for refactoring, Copilot for completion, GPT-5 for critical review.
- Learn from AI. Watch how the model solves problems. It will improve your skills.
- Don't trust blindly. AI can make mistakes. Always check the code.
- Automate routine. Code review, documentation, tests — that's for AI.
GO TO FULL VERSION