Perfect — that helps a lot. Here’s a Zero Fluff, story-led rewrite. No bullets, no hype, no “AI thought leadership” voice. Just a clean narrative that reads like you talking, and sits naturally next to the video.
Opus 4.6 Made AI Feel Like a Real Team
I woke up this morning planning to do a bit of reading.
I’d seen a few posts about Opus 4.6, watched people poke at it, say it was good, say it was faster, say it didn’t use more tokens. All the usual launch noise. But instead of skimming and moving on, I thought: right, I should actually put this to work properly.
Not a demo. Not a clever prompt. Real projects, with real mess in them.
So I did.
I ran Opus 4.6 across two projects. One through the terminal, one through the Claude desktop app. Same account, same model, same expectations. The first thing I wanted to sanity-check was token usage, because that’s always the quiet killer. Claude and Google both say it doesn’t use more tokens than previous versions, sometimes even fewer. From what I saw, that’s mostly true — until you stop asking it single questions and start asking it to behave like a team.
The first thing I asked it to do was a proper review. I didn’t want surface-level feedback. I wanted it to act like a project manager, a product manager and a technical architect all at once, and tell me what was wrong. Hard-coded variables. Duplication. Anything fragile. Anything that just looked off.
It came back with a big report.
It found loads of things it wasn’t happy with.
Which, frankly, was reassuring.
One of the projects already had a few sub-agents. The other didn’t — I’d gone in headfirst and figured I’d tidy it up later. So I asked it a simple question: looking at this project, and the agents that do or don’t exist, what team would you actually build?
It didn’t hesitate. It suggested a full setup. Security. Code review. Testing. Product. Project management. All the boring, necessary stuff that real teams have.
I told it to create that team as agents, make them all report to a project manager, and do it for both projects.
It did.
Then it said something that genuinely made me laugh. It explained that for the new team to get up to speed, they should do a full review together.
AI talking like a consultancy deck, but somehow… accurate.
All the agents kicked in at once.
In parallel.
They reviewed different parts of the codebase, talked to each other, flagged problems, disagreed, escalated things. They even had names — some I gave them — people I’ve actually worked with, people I trust. That part was oddly surreal.
One project churned away for nearly an hour. The other for about half that. No interruptions. No nudging. Just work happening.
When it finished, it didn’t just dump output. It came back with recommendations, refactoring plans, regression testing, and a clear sense of what should happen next. That was the moment it stopped feeling like “using AI” and started feeling like managing a team that had just gone away and done the work.
At that point I locked it down. No new features. No scope creep. Just make this solid.
I put it into that mode where it doesn’t ask many questions — which is slightly dangerous, I know — but the goal was stability, not creativity. It tightened the architecture, expanded the tests, and started behaving like a team preparing something for production rather than hacking something together.
One project went from thirteen test cases to several hundred, complete with sprint structure. That jump came from a 0.1 increase in the model version.
That’s the bit that should probably make people pause.
The downside is obvious once you see what’s happening. The desktop version of Claude absolutely chewed through credits. I ran out quickly and had to top up, even though the terminal version on the same account behaved differently.
It makes sense, though. When you’ve got multiple agents running in parallel, reviewing, documenting, planning and arguing at the same time, you’re paying for that work. That’s where the credits go.
But that’s also where the value is.
This is the point where it starts to feel different.
You’re no longer just asking an AI questions. You’re defining roles, creating structure, letting agents talk to each other, letting one agent tell another something’s broken and watching it get fixed. The shape of the work changes.
It stops being about prompts and starts being about systems.
When I go back into these projects properly, I’m hoping everything’s better.
If it’s completely broken things, I’ll write another post saying Opus 4.6 broke everything.
But honestly, I don’t think it did.
This was really, really good.
Off you go.









