Ryan Heneise

Six things AI hasn’t replaced in engineering

Six things AI hasn’t replaced in engineering

“AI is going to replace engineers” is the most confident wrong prediction in tech right now. 

I know because my team is exactly the team that prediction is about. We’ve handed our AI agents nearly everything you can safely hand them: drafting code, reviewing it, writing tests, tracing dependencies across multiple repos, chasing down data anomalies, decomposing PRDs into task lists - and watched our velocity go through the roof. We’ve cut our shipping cycles by two-thirds. I’m not guessing at that number; we measured it.

But despite the revolutionary shift in our work style, AI has made our human engineers more essential, not less. 

AI is like a power tool for the mind. When power tools were introduced they cut build and construction times drastically - workers could accomplish the same tasks in a fraction of the time. But carpenters and construction workers did not go away - a person is still required to wield and direct the tools. A nail gun allows a carpenter to work faster, but not one carpenter was ever replaced by a power tool, because driving nails was never the job. Knowing where the walls go is the job. 

That’s what’s happening in software. AI made execution cheap, but execution was never the bottleneck. Judgment was. And the more we automate, the more obvious that becomes.

Because we use AI this heavily, we can see exactly where its limitations are. Here are six places where someone on my team recently had to be the human in the room. None of them are theoretical.

1. Code review where a human says yes

Our automated code review is very good. For every pull request, we hand initial code review to at least four specialized reviewer agents: one for object-oriented design, one for Rails idioms, one for framework conventions, one hunting antipatterns. Agent reviewers never skim. They perform deep cross-cutting code analysis and frequently catch issues that a human reviewer might miss due to limited time, context switching, or large scope of changes and number of lines of code. Code reviewer agents are thorough and merciless - if code is bad they will tell you why and how to fix it. 

So why does a human still review every PR?

Because review was never just defect detection. Finding bugs is the easy part. The machines are already better at finding those than we are. They're better at writing tests, too. Every line of code we ship is now tested, with coverage we never achieved when humans wrote the tests by hand. But a passing test suite doesn't mean the code is right. Tests prove the code does what the author intended - they can't prove the author intended the right thing. That’s where the human comes in. 

Review is consent. It's a human saying: I understand this change, it conforms to the intent of the designers, and I accept it into the system I'm responsible for. Our company policy makes this explicit: you, the human developer, are responsible for every line of code you commit - even if an agent wrote it. That sentence does work no scanner can do. It means someone can explain this code in an incident at 2am. It means someone noticed that the change is correct but solves the wrong problem. It means the knowledge of what changed, and why, now lives in a person and not just in a diff.

An AI reviewer can tell you the code is plausible. The human reviewer is the contact point between plausible and correct.

Plausible compiles. Plausible passes tests. Plausible looks exactly like correct right up until it meets production. The gap between them isn't a defect you can scan for - it's a judgment about intent, context, and consequences. Somebody has to close that gap on purpose.

When people ask if AI review means we can skip human review, they have it backwards. The cheaper code gets to write, the more of it shows up at the review gate - and the more that gate matters.

2. Knowing why the code is weird

An agent with access to our codebase knows the code better than any single engineer on my team. It can read every file, trace every call path, map every dependency in minutes, and hold the whole thing in context. What it can't tell you is why the code is weird.

We have marketplace integrations with retry logic that looks paranoid. An agent reviewing it cold would flag it for simplification, and by everything visible in the repo it would be right. What's not in the repo: the partner outages, unpublished rate limits, and data anomalies that shaped that code, one painful incident at a time.

Every mature codebase is full of these. The migration we abandoned halfway through, and the reason we'd never try it again. The table that's named wrong because renaming it would break a report a customer depends on. The feature flag that's been "temporary" for two years because the day we removed it was a very bad day.

Context windows hold tokens. They don't hold scars.

Now, we're working on this. We write architecture decision records. We keep a CONTEXT.md that explains the domain to agents before they touch the code. Every hard-won lesson we write down is one an agent can use - and honestly, agents are better at reading this stuff than new hires are.

But notice what that workflow actually is: a human recognizing, in the moment, that something just happened worth writing down. The agent can read institutional knowledge all day. Somebody still has to know an institution-shaping event when they see one.

3. Deciding what to build

This is the big one. Everything else on this list is downstream of it.

Hand an agent a PRD and it will decompose it into a clean, well-scoped task list in minutes - dependencies mapped, edge cases called out, acceptance criteria attached. I've watched it do work in an afternoon that used to take a planning week. It's genuinely one of the best things AI has done for our team.

But that decomposition is only as good as the PRD you feed it. I've seen agents go completely off the rails with requirements that weren't fully scoped from the beginning - confidently building out an entire plan on top of an assumption nobody made on purpose. The agent doesn't know what you forgot to tell it. It fills the gaps with plausible guesses, and plausible is the problem.

One of the ways we've mitigated this is by flipping the direction of the conversation: before any decomposition happens, the agent interviews us about the requirements - flagging ambiguities, surfacing unstated assumptions, pushing on anything that could be read two ways. It's remarkably good at finding the holes in a spec. It still can't fill them. Every answer in that interview comes from a human who knows what the product is actually for.

And in all those PRDs, the agent has never once written the why. It can't. The why doesn't live in any repo or document it can read. The why lives in our business strategy, conversations with customers, and in the gap between what the market does and what it should do. 

The nail gun drives the nails, but it never picks the house.

And there's a sharper version of this problem. AI is additive by nature: ask it for features and you get features, ask it for options and you get options. It says yes with infinite stamina. But strategy is subtractive. Vision isn't a long list of things you could build - it's the short list of things you will, and the discipline to say no to everything else. Every "no" is a bet, and bets require someone who has something at stake.

We have agents that can build nearly anything we ask for. Which means the entire game is now what we ask for. Execution used to hide mediocre strategy - you could be vague about direction because everything took so long to build that the vagueness never got tested. That cover is gone. When building is cheap, choosing is everything.

4. Knowing code is wrong even when it works

A few weeks ago we shipped a large feature - a full dashboard, one of the bigger things we've built this year. It worked. Tests passed. The acceptance criteria were met. The automated reviewers were satisfied. By every measurable standard, it was done.

Then I sat down and reviewed it for style, and found six distinct problems. Custom controller actions where plain REST resources belonged. Service objects doing work that should have lived in the models. Abstractions built for flexibility nobody had asked for. Validation handled three different ways in three different places. Concerns that quietly depended on instance variables they didn't own. Names that described how the code worked instead of what it meant.

Every one of those patterns worked. Not one was a bug. You can't write a test that fails because an abstraction is over-built. There's no linter rule for "this service object is hiding what your domain model should be saying."

That taste is accumulated through years of experience - knowing which of today's harmless patterns become next year's unmaintainable mess, because you've personally paid for each one before.

AI optimizes for plausible solutions, and all six of those patterns were plausible. Taste is the veto. It's the voice that says "this works and it's still wrong."

Can you encode taste into a review lens and hand it to an agent? Partially. We’ve created specialized code review skills to do exactly that, and it helps. But someone had to have the taste first, recognize it was missing from the output, and know it was worth encoding. The agent applies taste. It doesn't originate it.

5. Noticing something is off

Recently, one of our marketplace partners sent an automated notice that our API success rate with them had dropped below their threshold. Nothing in our own systems was on fire. Every error was already being logged and reported to our monitoring; taken one at a time, none of them looked like more than routine noise.

The signal that mattered wasn't any single failure. It was the pattern connecting them - an accumulation that no individual alert was built to catch. Catching it took two human calls: taking an outside notification seriously enough to investigate instead of archive, and then looking at a set of individually unremarkable errors and recognizing the pattern behind them. The investigation that followed was spearheaded by AI agents searching logs, tracing requests, and comparing time windows - much faster than any of us could. We found the root cause, shipped a fix, and the success rate recovered. But the investigation only happened because a person decided the signal was real.

AI is the best anomaly investigator I've ever worked with. But it takes a person who has the experience and intuition to know when something is not quite right.

6. Owning the blast radius

Recently we consolidated our background job system, collapsing a sprawl of queues into a handful of tiers. Clean migration. But when the old queues came off, a few dozen failed jobs were left stranded on queues that no longer existed. They couldn't retry - there was nowhere for them to go.

The agent laid out potential options perfectly: the tradeoffs, the risks, the exact commands for either one. What it could not do was decide - because deciding meant owning whatever went wrong next, and there's no version of that an agent can hold.

This is the deepest item on the list, and it's the one I'm most sure about. Accountability is non-transferable. When something breaks in production at scale, someone answers for it - to a customer, to the team, to a regulator. "The AI suggested it" has never once been an acceptable answer in an incident review, and it never will be. Not because the AI was wrong, but because accountability was never about who was right. It's about who is responsible for what happens next.

Tools amplify what you can do. They don't absorb responsibility for what you did. You can delegate the writing of the code, the review of the code, the debugging of the code - and we delegate all of it. What you can't delegate is the part where you stand behind it. That stays with a person, because a person is the only thing an organization can actually hold to account.

Sharper tools, same carpenter

These six things have one thing in common: saying yes to a change. Knowing why the code is the way it is. Choosing what to build. Knowing what's wrong even when it works. Sensing that something's off. Owning what happens next. None of them are tasks. Every one of them is a judgment - and judgment is exactly the thing power tools were never for.

That's why I keep coming back to the nail gun. It made carpenters faster without making them optional, because the hard part of building a house was never how fast you could drive a nail. It was knowing what makes a house a home. AI is the most powerful tool our trade has ever been handed. It is driving the nails at a speed that still surprises me. But it’s not deciding where the walls go - that’s my job. 

The teams that win the next few years won't be the ones that replaced their engineers. They'll be the ones whose engineers picked up the sharpest tools ever made and used them to build, faster and better, exactly the right thing. That has always been the job. It still is.

“AI is going to replace engineers” is the most confident wrong prediction in tech right now. 

I know because my team is exactly the team that prediction is about. We’ve handed our AI agents nearly everything you can safely hand them: drafting code, reviewing it, writing tests, tracing dependencies across multiple repos, chasing down data anomalies, decomposing PRDs into task lists - and watched our velocity go through the roof. We’ve cut our shipping cycles by two-thirds. I’m not guessing at that number; we measured it.

But despite the revolutionary shift in our work style, AI has made our human engineers more essential, not less. 

AI is like a power tool for the mind. When power tools were introduced they cut build and construction times drastically - workers could accomplish the same tasks in a fraction of the time. But carpenters and construction workers did not go away - a person is still required to wield and direct the tools. A nail gun allows a carpenter to work faster, but not one carpenter was ever replaced by a power tool, because driving nails was never the job. Knowing where the walls go is the job. 

That’s what’s happening in software. AI made execution cheap, but execution was never the bottleneck. Judgment was. And the more we automate, the more obvious that becomes.

Because we use AI this heavily, we can see exactly where its limitations are. Here are six places where someone on my team recently had to be the human in the room. None of them are theoretical.

1. Code review where a human says yes

Our automated code review is very good. For every pull request, we hand initial code review to at least four specialized reviewer agents: one for object-oriented design, one for Rails idioms, one for framework conventions, one hunting antipatterns. Agent reviewers never skim. They perform deep cross-cutting code analysis and frequently catch issues that a human reviewer might miss due to limited time, context switching, or large scope of changes and number of lines of code. Code reviewer agents are thorough and merciless - if code is bad they will tell you why and how to fix it. 

So why does a human still review every PR?

Because review was never just defect detection. Finding bugs is the easy part. The machines are already better at finding those than we are. They're better at writing tests, too. Every line of code we ship is now tested, with coverage we never achieved when humans wrote the tests by hand. But a passing test suite doesn't mean the code is right. Tests prove the code does what the author intended - they can't prove the author intended the right thing. That’s where the human comes in. 

Review is consent. It's a human saying: I understand this change, it conforms to the intent of the designers, and I accept it into the system I'm responsible for. Our company policy makes this explicit: you, the human developer, are responsible for every line of code you commit - even if an agent wrote it. That sentence does work no scanner can do. It means someone can explain this code in an incident at 2am. It means someone noticed that the change is correct but solves the wrong problem. It means the knowledge of what changed, and why, now lives in a person and not just in a diff.

An AI reviewer can tell you the code is plausible. The human reviewer is the contact point between plausible and correct.

Plausible compiles. Plausible passes tests. Plausible looks exactly like correct right up until it meets production. The gap between them isn't a defect you can scan for - it's a judgment about intent, context, and consequences. Somebody has to close that gap on purpose.

When people ask if AI review means we can skip human review, they have it backwards. The cheaper code gets to write, the more of it shows up at the review gate - and the more that gate matters.

2. Knowing why the code is weird

An agent with access to our codebase knows the code better than any single engineer on my team. It can read every file, trace every call path, map every dependency in minutes, and hold the whole thing in context. What it can't tell you is why the code is weird.

We have marketplace integrations with retry logic that looks paranoid. An agent reviewing it cold would flag it for simplification, and by everything visible in the repo it would be right. What's not in the repo: the partner outages, unpublished rate limits, and data anomalies that shaped that code, one painful incident at a time.

Every mature codebase is full of these. The migration we abandoned halfway through, and the reason we'd never try it again. The table that's named wrong because renaming it would break a report a customer depends on. The feature flag that's been "temporary" for two years because the day we removed it was a very bad day.

Context windows hold tokens. They don't hold scars.

Now, we're working on this. We write architecture decision records. We keep a CONTEXT.md that explains the domain to agents before they touch the code. Every hard-won lesson we write down is one an agent can use - and honestly, agents are better at reading this stuff than new hires are.

But notice what that workflow actually is: a human recognizing, in the moment, that something just happened worth writing down. The agent can read institutional knowledge all day. Somebody still has to know an institution-shaping event when they see one.

3. Deciding what to build

This is the big one. Everything else on this list is downstream of it.

Hand an agent a PRD and it will decompose it into a clean, well-scoped task list in minutes - dependencies mapped, edge cases called out, acceptance criteria attached. I've watched it do work in an afternoon that used to take a planning week. It's genuinely one of the best things AI has done for our team.

But that decomposition is only as good as the PRD you feed it. I've seen agents go completely off the rails with requirements that weren't fully scoped from the beginning - confidently building out an entire plan on top of an assumption nobody made on purpose. The agent doesn't know what you forgot to tell it. It fills the gaps with plausible guesses, and plausible is the problem.

One of the ways we've mitigated this is by flipping the direction of the conversation: before any decomposition happens, the agent interviews us about the requirements - flagging ambiguities, surfacing unstated assumptions, pushing on anything that could be read two ways. It's remarkably good at finding the holes in a spec. It still can't fill them. Every answer in that interview comes from a human who knows what the product is actually for.

And in all those PRDs, the agent has never once written the why. It can't. The why doesn't live in any repo or document it can read. The why lives in our business strategy, conversations with customers, and in the gap between what the market does and what it should do. 

The nail gun drives the nails, but it never picks the house.

And there's a sharper version of this problem. AI is additive by nature: ask it for features and you get features, ask it for options and you get options. It says yes with infinite stamina. But strategy is subtractive. Vision isn't a long list of things you could build - it's the short list of things you will, and the discipline to say no to everything else. Every "no" is a bet, and bets require someone who has something at stake.

We have agents that can build nearly anything we ask for. Which means the entire game is now what we ask for. Execution used to hide mediocre strategy - you could be vague about direction because everything took so long to build that the vagueness never got tested. That cover is gone. When building is cheap, choosing is everything.

4. Knowing code is wrong even when it works

A few weeks ago we shipped a large feature - a full dashboard, one of the bigger things we've built this year. It worked. Tests passed. The acceptance criteria were met. The automated reviewers were satisfied. By every measurable standard, it was done.

Then I sat down and reviewed it for style, and found six distinct problems. Custom controller actions where plain REST resources belonged. Service objects doing work that should have lived in the models. Abstractions built for flexibility nobody had asked for. Validation handled three different ways in three different places. Concerns that quietly depended on instance variables they didn't own. Names that described how the code worked instead of what it meant.

Every one of those patterns worked. Not one was a bug. You can't write a test that fails because an abstraction is over-built. There's no linter rule for "this service object is hiding what your domain model should be saying."

That taste is accumulated through years of experience - knowing which of today's harmless patterns become next year's unmaintainable mess, because you've personally paid for each one before.

AI optimizes for plausible solutions, and all six of those patterns were plausible. Taste is the veto. It's the voice that says "this works and it's still wrong."

Can you encode taste into a review lens and hand it to an agent? Partially. We’ve created specialized code review skills to do exactly that, and it helps. But someone had to have the taste first, recognize it was missing from the output, and know it was worth encoding. The agent applies taste. It doesn't originate it.

5. Noticing something is off

Recently, one of our marketplace partners sent an automated notice that our API success rate with them had dropped below their threshold. Nothing in our own systems was on fire. Every error was already being logged and reported to our monitoring; taken one at a time, none of them looked like more than routine noise.

The signal that mattered wasn't any single failure. It was the pattern connecting them - an accumulation that no individual alert was built to catch. Catching it took two human calls: taking an outside notification seriously enough to investigate instead of archive, and then looking at a set of individually unremarkable errors and recognizing the pattern behind them. The investigation that followed was spearheaded by AI agents searching logs, tracing requests, and comparing time windows - much faster than any of us could. We found the root cause, shipped a fix, and the success rate recovered. But the investigation only happened because a person decided the signal was real.

AI is the best anomaly investigator I've ever worked with. But it takes a person who has the experience and intuition to know when something is not quite right.

6. Owning the blast radius

Recently we consolidated our background job system, collapsing a sprawl of queues into a handful of tiers. Clean migration. But when the old queues came off, a few dozen failed jobs were left stranded on queues that no longer existed. They couldn't retry - there was nowhere for them to go.

The agent laid out potential options perfectly: the tradeoffs, the risks, the exact commands for either one. What it could not do was decide - because deciding meant owning whatever went wrong next, and there's no version of that an agent can hold.

This is the deepest item on the list, and it's the one I'm most sure about. Accountability is non-transferable. When something breaks in production at scale, someone answers for it - to a customer, to the team, to a regulator. "The AI suggested it" has never once been an acceptable answer in an incident review, and it never will be. Not because the AI was wrong, but because accountability was never about who was right. It's about who is responsible for what happens next.

Tools amplify what you can do. They don't absorb responsibility for what you did. You can delegate the writing of the code, the review of the code, the debugging of the code - and we delegate all of it. What you can't delegate is the part where you stand behind it. That stays with a person, because a person is the only thing an organization can actually hold to account.

Sharper tools, same carpenter

These six things have one thing in common: saying yes to a change. Knowing why the code is the way it is. Choosing what to build. Knowing what's wrong even when it works. Sensing that something's off. Owning what happens next. None of them are tasks. Every one of them is a judgment - and judgment is exactly the thing power tools were never for.

That's why I keep coming back to the nail gun. It made carpenters faster without making them optional, because the hard part of building a house was never how fast you could drive a nail. It was knowing what makes a house a home. AI is the most powerful tool our trade has ever been handed. It is driving the nails at a speed that still surprises me. But it’s not deciding where the walls go - that’s my job. 

The teams that win the next few years won't be the ones that replaced their engineers. They'll be the ones whose engineers picked up the sharpest tools ever made and used them to build, faster and better, exactly the right thing. That has always been the job. It still is.

Industry Knowledge

Strategy & Tactics

Why Wait? Start Smarter Marketplace Growth Today!

Why Wait? Start Smarter Marketplace Growth Today!

Why Wait? Start Smarter Marketplace Growth Today!