Peak hours, my mistake, and 4 tools that give tokens back

I started the day normally. Coffee, editor, Claude Code, Max 5× plan for $100 a month. After a few minutes of work, 29% usage popped up in the header.

Twenty-nine percent. In ten minutes.

I wrote a frustrated post on Threads. I figured someone would like it, mumble something, and we'd all move on. Instead, a discussion with dozens of reactions kicked off from half the world — from Berliners to Californians to someone in Bangalore with exactly the same problem.

And during that discussion, I realized two things that typically contradict each other, but here both are true at once: Anthropic has a real problem. And at the same time, I caused half of it myself.

What's actually happening with the limits

In March 2026, Anthropic quietly tightened session limits during peak hours. Official communication came later, but it basically looks like this:

The weekly limit stayed the same
But session limits drain dramatically faster during 14:00–20:00 CET
According to Anthropic, it affects about 7% of users — i.e. the ones running most intensively

The problem is that if you live in Europe and work during the day, you fall into the peak window for practically your whole workday. 14:00 CET is 8 AM on the US East Coast — and at that moment infrastructure starts getting cleared for American users, and from our perspective it looks like someone is clipping your tokens in real time.

It's not an illusion. It's a real change in how compute resources get allocated. And it's not communicated entirely fairly.

But.

My mistake (embarrassing, but instructive)

When I calmly went through my setup, I hit a line in my project CLAUDE.md. It had a reference to a big folder of another project — something I'd put there weeks ago as "inspiration reference" and forgotten.

Claude Code dutifully loaded that entire foreign project at the start of every session. Thousands of lines of code that had nothing to do with the current work.

Initial context is the most expensive part of a session. You pay for it with every subsequent message, because the model holds it in its head the whole time. When you've got 40k tokens of ballast in there, every next question costs you an extra 40k.

When I deleted that line and restarted the session, the first message consumed... normally. Exactly what you'd expect.

29% in a few minutes suddenly made sense. The peak hour limit made it worse, but the fuel was my own mistake.

Four tools that actually give tokens back

In that Threads discussion, four tools I hadn't known about came up, and I've been running them ever since. None of them does miracles on its own — but together the savings add up fast.

RTK — Rust Token Killer

A transparent proxy for bash commands. It sits between you and the shell, and everything Claude Code runs (git status, cat, grep, npm test…) first goes through a filter that trims the ballast — duplicate lines, verbose outputs, empty blocks.

brew install rtk && rtk init -g

In practice this means −60 to −90% on token usage for git/test/read operations. The project has over 19 thousand stars on GitHub, so it's not some esoteric thing.

Caveman — caveman mode

A skill that rewrites the system prompt so Claude responds like a caveman. Sounds like a joke, but technical accuracy stays fully intact — just the polite phrases, intros, and three-paragraph explanations of why it's a good idea disappear.

npx skills add JuliusBrussee/caveman

It measures out at −75% output tokens. Instead of "Great idea! Here's how I would approach this step by step..." you get "Run npm install. Then edit line 42." Paradoxically, it even sped up my reading of the responses.

lean-ctx — context compressor

MCP server + shell hook. It compresses all the context that goes into the model — before sending, it chews it over, throws out redundancies, merges similar passages.

curl -fsSL https://leanctx.com/install.sh | sh

You notice the difference mainly in longer sessions where context grows organically. Instead of a bloated chat, you have a continuously compressed state.

graphify — codebase as a graph

Probably the most interesting approach of the four. Instead of Claude reading raw files, it first builds a knowledge graph of your codebase — components, imports, dependencies, function calls. Then it navigates the structure instead of reading files sequentially.

pip install graphifyy
graphify install
graphify claude install

Mainly useful for larger projects, where Claude would otherwise keep re-reading the same files over and over to figure out the relationships.

Best practices from the discussion

Besides the tools, a few habits came up in the discussion that are worth mentioning. I covered some of them in the previous article on saving tokens, so here just as bullet points:

Sonnet for agents, Opus only for strategy — Opus is great for initial analysis and review, but 80% of the concrete work can be handled by Sonnet more cheaply
/compact after every major change — summarizes the existing context, next messages you pay for from a smaller base
New conversation for each new task — no dragging ballast between unrelated tasks
Short, concise CLAUDE.md — no references to big external folders (see above, my mistake)
No big external references in project instructions — if you need to show something, send it ad-hoc in a specific message
More demanding work outside peak hours — before 14:00 or after 20:00 CET it runs differently
/context to track the current state — stop guessing and actually see how much the model is holding

Takeaways

I wrote that Threads post in frustration and expected sympathy likes. Instead, the discussion showed me that reality is almost always "and" instead of "or".

Anthropic really did tighten peak hour limits and communicated it poorly. That's a legitimate complaint.

At the same time, I had a bomb in my project instructions that was inflating my first message by tens of thousands of tokens. That's my mistake.

Both are true. And in practice this means that even though we can't influence how Anthropic divvies up compute during peak, we have quite a lot of influence over how many tokens we actually consume. RTK, Caveman, lean-ctx, graphify plus a few habits — and suddenly the same work leaves more of the limit for the end of the week.

Expensive lesson. But probably the one that paid off the most out of my $100 a month.