Claude 4 is out and it's a leap: first impressions after a week of testing

Anthropic did it again. Claude 4 is out, and after a week of intensive testing I can say: this is a generational leap, not an incremental update.

What's new

Opus 4: that smart colleague who never sleeps

Claude Opus 4 is the most powerful model I've ever used. And I've tried everything — GPT-5, Gemini 2 Pro, Llama 3, DeepSeek. Opus 4 is just on another level.

Where I see it most:

Code. Opus 4 understands whole systems, not just functions. Tell him "refactor authentication" and he'll grasp the flow across 15 files.
Reasoning. Extended thinking in Opus 4 is on another level. Watching him think through an architectural problem is like reading a senior engineer's mind.
Context. 200k tokens, but now with even better retrieval. I had him analyze a whole monorepo (~150 files) and he answered precisely.

Sonnet 4: the performance/price ratio that makes sense

Sonnet 4 is what you'll use 90% of the time. It's faster than Opus, cheaper, and sufficient for most tasks. Anthropic did great work here — Sonnet 4 is better than Opus 3 on benchmarks, at a fraction of the price.

My rule: Sonnet for everything. Opus when I need deep reasoning or I'm working with a large context.

Haiku 4.5: speed demon

Haiku is for real-time applications — chatbots, classification, simple extractions. Latency under 200ms. For most API use cases, Haiku is more than enough.

My test: a real project

I took a medium-sized Next.js project (~80 files) and gave Opus 4 the task: "Go through the whole project, find performance bottlenecks, and suggest solutions."

Result:

Identified an N+1 query in Server Components (he was right)
Suggested moving data into a use cache component with cacheTag (correct Next.js 15 pattern)
Found unnecessary re-renders in a client component (he was right)
Recommended lazy loading for a heavy library (correct)
One false positive — flagged something as a problem that was intentional

Score: 4 out of 5 valid findings. In 30 seconds. A manual review would have taken me an hour.

Comparison with GPT-5

I have to be fair — GPT-5 is also an excellent model. Here's my honest comparison:

| Area | Claude Opus 4 | GPT-5 | |--------|--------------|-------| | Code and architecture | Better | Very good | | Long context | Significantly better | Good, but quality drops | | Creative writing | Good | Better | | Instruction following | Excellent | Very good | | Multimodality | Text + images | Text + images + audio + video | | Hallucinations | Lowest | Low | | Price (Sonnet vs GPT-5 mini) | Comparable | Comparable |

My verdict: For programming and analytical tasks → Claude. For creative and multimodal tasks → GPT. For the price sweet spot → both have great mid-tier models.

What surprised me

Constitutional AI v4 is noticeable. Claude 4 is less "cautious" than three. Anthropic clearly worked to make the model helpful without unnecessary gatekeeping. I can finally discuss security topics normally.

Tool use is production-ready. Structured function calling is reliable and consistent. I'm building an API endpoint with it that parses documents, and it works beautifully.

Czech is surprisingly good. Claude 4 generates more natural text in Czech than three did. Fewer anglicisms, better declensions. Still not perfect, but the progress is visible.

Price

| Model | Input (1M tokens) | Output (1M tokens) | |-------|-------------------|---------------------| | Opus 4 | $15 | $75 | | Sonnet 4 | $3 | $15 | | Haiku 4.5 | $0.25 | $1.25 |

Opus is expensive, but for what it can do, it's worth it. Sonnet covers 90% of needs at a reasonable price.

Conclusion

Claude 4 confirms that Anthropic is serious. They're not just "the safety guys" — they build top-tier models that I actually use every day.

If you're on Claude 3 — upgrade. If you're on GPT — try Sonnet 4 for a week. At minimum, you'll have a comparison.

And if you've never used AI... well, there won't be a better time to start.