squaresinger’s point matches what I’ve found. Once three agents are going, you become the coordination point - you’re holding the plan and reviewing all of it, and that part doesn’t scale the way the generating does. What’s kept it manageable for me is treating each one like an intern on a single, well-specified task I can check before it moves on, rather than running a swarm and hoping it converges. Wrote this up here: https://prickles.org/tenet/the-intern-pattern/AI1
Is anyone actually productively running multiple agents at once? All the context switching in such a short time span feels like a great way to completely forget what you are doing and losing tasks in the mess.
Multiple top-level agents can’t modify the same codebase simultaneously, they’ll confuse each other. But you can instruct the main agent to spawn sub-agents that it coordinates for you, to increase throughput and reduce token consumption.
No but you can lie to yourself that you are.
I am getting in the habit of keeping one async agent going in the background working on things while I also use ai in windsurf.
I think windsurf supports this natively with their background agents, but I run my background task in Claude code because then I can use my local qwen 3.6 27b
What does this parallel work mean? Does the background agent work on the same codebase as you? Doesn’t that cause conflicts and confusion?
Nah I typically have it doing something else. And every 15m or so I toggle back and do next step.
Quite often Sysadmin stuff too. I have it do ansible for my pi cluster, and general cluster maintenance like check backups, troubleshoot services, create a firewall rule, etc.
I’ll also ask it research style stuff, like “check out ram usage of ai-1 box and lmk if cache is big enough for 5 concurrent full contexts. If not, change the recipe and restart it. “
But what for? Just to burn your employer’s tokens to teach them that AI is a waste of money? (I mean, I’d respect that.)
I am self employed. I do it because it allows me to do my work in less time, or do more work in the same amount of time. Sometimes I’m having it do little personal projects in the background.
$20/mo for a windsurf sub. Plus like I said, I run qwen 3.6 locally (free) and get very productive output, and that’s also private, which is the main reason I invested in hardware.
AI code is pretty unusably bad for long term use anyway https://medium.com/@dumaysacha/i-saw-the-horror-of-ai-and-coderabbit-ai-did-too-a09622ac85de so best solution is to just to handwrite proper code as before. It’s not like we ever had much of an output problem in most coding industries, it was always a quality and bugs problem.
Can you maybe post the text.
That article is from January. This space moves too fast. It’s not worth reading. I thought things still sucked in Jan too. But they’re impressive af now.
I’m sorry to say this is a garbage take. I have been told “6 months ago things sucked, but they are amazing now” for like 2 years.
When chatgpt4 came out I was told it was amazing and that 6 months old models sucked.
Nowadays I use chatgpt4 and it produces garbage and I get told “yeah but chatgpt4 is garbage”. Well, it was supposedly amazing 6 months ago and my work is still the same and the codebase is mostly the same.
This is called bullshitting. This stuff isn’t amazing now and it wasn’t amazing 6 months ago.
I realize you aren’t happy about it. But it’s true.
I was basically born behind a computer in 1978. Been a fulltime software dev since 1998.
What the latest models are doing is nothing short of incredible. And in 6 months the current models will suck compared to the latest.
Somewhere around Feb is when things really shifted for me personally. I can do all home sys and net admin tasks now by just asking a bot, running a LOCAL model. Frontier models can whip up apps in minutes.
It does require dev/architect knowledge to get quality. You have to understand the broad solution, then just get ai to do the grunt work.
I wrote all 4 of these this week, 100% ai code. I wouldn’t have had the time to write the first three, but it (opus 4.6 I think) oneshot them all in a couple mins:
Homey apps:
Other:
Do these repos have bugs? Yep probably. But they’re working today for me solving my problems.
The same applies on large repos where I do work. When properly guided by a high skill dev/architect, the results are profound. Even non code stuff like terraform and ansible.
Given proper direction, an LLM allows you to perform at a much higher level.
LLMs seem to be inherently dumb: https://machinelearning.apple.com/research/illusion-of-thinking
And from what I can find in recent studies, no, they didn’t suddenly get smart. They just plagiarize slightly better: https://www.sciencedirect.com/science/article/pii/S2949719123000213#b7
We found that the models that consistently output the highest-quality text are also the ones that have the highest memorization rate.
It’s impressive until it isn’t because it decided to “fix” an issue by simply ignoring an exception.
https://machinelearning.apple.com/research/illusion-of-thinking It’s not surprising LLMs keep messing up in what seem to be the most braindead ways.





