Claude 4 is out and it's a leap: first impressions after a week of testing
Anthropic did it again. Claude 4 is out, and after a week of intensive testing I can say: this is a generational leap, not an incremental update.
What's new
Opus 4: that smart colleague who never sleeps
Claude Opus 4 is the most powerful model I've ever used. And I've tried everything — GPT-5, Gemini 2 Pro, Llama 3, DeepSeek. Opus 4 is just on another level.
Where I see it most:
- Code. Opus 4 understands whole systems, not just functions. Tell him "refactor authentication" and he'll grasp the flow across 15 files.
- Reasoning. Extended thinking in Opus 4 is on another level. Watching him think through an architectural problem is like reading a senior engineer's mind.
- Context. 200k tokens, but now with even better retrieval. I had him analyze a whole monorepo (~150 files) and he answered precisely.
Sonnet 4: the performance/price ratio that makes sense
Sonnet 4 is what you'll use 90% of the time. It's faster than Opus, cheaper, and sufficient for most tasks. Anthropic did great work here — Sonnet 4 is better than Opus 3 on benchmarks, at a fraction of the price.
My rule: Sonnet for everything. Opus when I need deep reasoning or I'm working with a large context.
Haiku 4.5: speed demon
Haiku is for real-time applications — chatbots, classification, simple extractions. Latency under 200ms. For most API use cases, Haiku is more than enough.
My test: a real project
I took a medium-sized Next.js project (~80 files) and gave Opus 4 the task: "Go through the whole project, find performance bottlenecks, and suggest solutions."
Result:
- Identified an N+1 query in Server Components (he was right)
- Suggested moving data into a
use cachecomponent withcacheTag(correct Next.js 15 pattern) - Found unnecessary re-renders in a client component (he was right)
- Recommended lazy loading for a heavy library (correct)
- One false positive — flagged something as a problem that was intentional
Score: 4 out of 5 valid findings. In 30 seconds. A manual review would have taken me an hour.
Comparison with GPT-5
I have to be fair — GPT-5 is also an excellent model. Here's my honest comparison:
| Area | Claude Opus 4 | GPT-5 | |--------|--------------|-------| | Code and architecture | Better | Very good | | Long context | Significantly better | Good, but quality drops | | Creative writing | Good | Better | | Instruction following | Excellent | Very good | | Multimodality | Text + images | Text + images + audio + video | | Hallucinations | Lowest | Low | | Price (Sonnet vs GPT-5 mini) | Comparable | Comparable |
My verdict: For programming and analytical tasks → Claude. For creative and multimodal tasks → GPT. For the price sweet spot → both have great mid-tier models.
What surprised me
Constitutional AI v4 is noticeable. Claude 4 is less "cautious" than three. Anthropic clearly worked to make the model helpful without unnecessary gatekeeping. I can finally discuss security topics normally.
Tool use is production-ready. Structured function calling is reliable and consistent. I'm building an API endpoint with it that parses documents, and it works beautifully.
Czech is surprisingly good. Claude 4 generates more natural text in Czech than three did. Fewer anglicisms, better declensions. Still not perfect, but the progress is visible.
Price
| Model | Input (1M tokens) | Output (1M tokens) | |-------|-------------------|---------------------| | Opus 4 | $15 | $75 | | Sonnet 4 | $3 | $15 | | Haiku 4.5 | $0.25 | $1.25 |
Opus is expensive, but for what it can do, it's worth it. Sonnet covers 90% of needs at a reasonable price.
Conclusion
Claude 4 confirms that Anthropic is serious. They're not just "the safety guys" — they build top-tier models that I actually use every day.
If you're on Claude 3 — upgrade. If you're on GPT — try Sonnet 4 for a week. At minimum, you'll have a comparison.
And if you've never used AI... well, there won't be a better time to start.