Its Time To Let The Model Drive (for coding)

Its 2025 and you're still writing code by hand?!

Just a few months ago, I would not have trusted a LLM to generate tons of code. The quality of the models were good, but there was some intersection between model capability, my skill issues, and my engrained preference to control the code.

Something changed very recently. With Opus-4.5, GPT-5.2, and Composer-1, after using these models for a few weeks inside Cursor, I'm a full convert. The quality of the code is good (probably better than what I can write), the instruction following of the models is good, and I've discovered a workflow that really feels powerful.

It started when I attened the 2025 AI Engineer Summit which was focused on coding agents. Numerous talks covered topics like context engineering, Research - Plan - Implement workflows, using markdown files and skills etc. It opened my eyes to the reality that I needed to change how I worked and put more trust into the models.

The workflow

The workflow itself is pretty simple.

I use the smartest model I can get, then use it to sketch a plan.
- In the initial prompt, I often reference files that I know the model should use to provide more context, and I also try to give it as much context as I can.
- I use plan mode in Cursor which generates a markdown file with the implementation details.
- I then review the plan, and iterate on the plan with the model until I'm satisfied.
Then I implement the plan, with a fast model.
- In Cursor, I use composer-1. You can also use a model like Sonnet-4.5 to execute the plan.
Afterwards review the code and accept.
Have the model add tests.

The key point is something I learned in Dex Horthy's talk^[1], its important to create a plan and review it in plain english so you can catch potential issues before they're turned into code. This makes a world of difference!

I'm also a fan of cursor. I've used VSCode for years and its familiar and does what I want. You can do the same thing with CLI coding agents like claude code as well. I just hate reviewing diffs in the terminal interface.

Code organization

Another discovery is how you organize your code matters a lot in the age of agents. You need to spend extra time organizing your code into discrete, well organized chunks that the model can take and digest easily and that won't eat up their context.

What I mean is organizing your code in a way that you can tell the model to "go use the foo utility/package/library" and it can find the code easily, get context about it, and then use it. If you have a spaghetti mess, it will be very expensive for a model to poke around the codebase to learn how to use it.

Agent context

Using something like claude.md/agents.md is important. This document provides high level contextual information for your codebase that the model can easily discover and learn. Use a model to generate it, giving it stylistic guidance, contextual information, or other details that are important to point out.

Skills are another pattern that's starting to emerge. Skills are just folders with markdown files, scripts, or other files that a model can use to "learn" how to do a thing. It's just a folder with different chunks of context it can dynamically load if it needs to.

This is actually a really powerful pattern. When i mentioned above that you can tell a model to just use a utility/package/library, you're effectively telling the model to go use a "skill". It helps if your skills contain contextual data in the form of a readme, or you can follow Anthropic's skill spec. A skill is a neatly packaged combination of code + context and the model can use this discrete chunk to fulfill your intent.

Trust

I think letting go and trusting the models to do the right thing is hard. You need to continuously prove to yourself that you're giving the model the right context, and the model is doing what you asked. It's like you and the models are a team now, and like all teams, it takes time and effort to build a level of trust and understanding. But once you're on a high functioning, high trust team, you're able to get so much done.

It takes time and you'll need practice, but after a while it just clicks and its hard to go back. You may have to take the reigns sometimes and write actual code to get stuff done, especially with edge cases. I'm quite fortunate my employer is very AI-forward and provides a ton of support when it comes to trying the best tools and this freedom has allowed me to gain a lot of competency with the tools.

You'll need to practice, and you really can't cheap out by running open models (unfortunately).

Async vs Sync

There's also uncertainty between how asyncronous agents and syncronous agents should be used. Should you be firing off a bunch of tasks and then context switching between them? Should you just do one thing at a time with an IDE-like experience?

I think you can do both, but you need to be mindful about how you're using them.

Async agents

I have used an asyncronous agent at work that's able to run code against our own internal codebases. These are really powerful tools, but because they're asyncronous, the expectation is that you fire off the tool and it just runs and you check in at the end.

This puts the burden on you to context-dump at the beginning and put in a lot of effort thinking through all the potential context the agent might need. There's no real iteration loop. The expectation is you give it a task and it gets the work done.

This means that small, self contained tasks are perfect for async agents. Bug fixes, small JIRA tickets, anything that has a definitive answer for whether it's done or not, you can use an async agent. Just fire it off, it can run tests and once its done it'll create a PR for you. Easy.

Where async agents fail is if the task requires a human in a loop.

Syncronous agents

A syncronous agent runs in an IDE or in the terminal and you're continuously prompting it to work on small discrete tasks that add up to something bigger or more ambiguous. These agents require the driver to highly aware of the context around the code, and is more of a pair-programmer than anything. I've found the most success using the Plan/Research, then Execute workflow on small chunks of work.

An example:

Goal: I want to refactor a part of the codebase.
Chunk 1: Go find this module, move it over here, fix imports and references to this code.
Chunk 2: Split up this code into these separate modules, fix imports and references, add tests.
Chunk 3: ...so on and so forth.

The project is completed by implementing small discrete steps that you're able to plan, execute, and verify along the way. If you try to do a huge big bang refactor you're going to more than likely not like what you see. There's a lot of small decisions along the way that you as the human will need to make that if left up to an agent, may make all the wrong decisions (because they lack the context!)

Conclusion

Building trust and coming to a workflow that works for you is going to take time and a lot of tokens. Those of us that can adapt will implement more than ever before, and those that refuse will fall behind and become obsolete. It's unfortunate, but these tools are here to stay, they're far too economically useful.

Footnotes

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

It's Time to Let the Model Drive

A few months ago, I wouldn't have trusted an LLM to generate substantial amounts of code. The models were capable, but some combination of their limitations, my own skill gaps, and my ingrained need for control kept me writing everything by hand.

That's changed. After weeks with Opus-4.5, GPT-5.2, and Composer-1 inside Cursor, I'm a full convert. The code quality matches or exceeds what I produce. Instruction-following is reliable. And I've found a workflow that feels genuinely powerful.

The shift started at the 2025 AI Engineer Summit. Talk after talk covered context engineering, Research-Plan-Implement workflows, markdown files, skills. It became clear: I needed to change how I worked and put more trust in the models.

The Workflow

It's straightforward:

Plan with the smartest model available. Reference files the model should use for context. Use plan mode in Cursor to generate a markdown file with implementation details. Review and iterate on the plan until satisfied.
Execute with a fast model. Composer-1 or Sonnet-4.5 work well here.
Review and accept the code.
Have the model add tests.

The key insight comes from Dex Horthy's talk^[1]: review the plan in plain English before it becomes code. You catch issues when they're cheap to fix. This makes a world of difference.

I use Cursor because I've used VSCode for years and hate reviewing diffs in terminal interfaces. Claude Code works fine if that's your preference.

Code Organization

How you organize code matters enormously in the age of agents. You need discrete, well-organized chunks that models can digest without burning through context.

The goal: tell the model to "use the foo utility" and have it find the code, understand it, and apply it correctly. Spaghetti code means expensive exploration and unreliable results.

Agent Context

Documents like claude.md or agents.md provide high-level context about your codebase that models can discover and learn from. Use a model to generate it, giving it stylistic guidance and important details.

Skills are an emerging pattern worth adopting. A skill is just a folder with markdown files, scripts, or other resources a model can load dynamically to learn how to do something. It's a neatly packaged combination of code and context.

When you tell a model to use a utility or library, you're effectively pointing it at a skill. Include a readme or follow Anthropic's skill spec. The model uses this discrete chunk to fulfill your intent.

Trust

Letting go is hard. You need to continuously prove to yourself that you're providing the right context and the model is doing what you asked. You and the model are a team now. Like all teams, trust takes time to build. But once you're high-functioning and high-trust, output increases dramatically.

It takes practice, then it clicks, and it's hard to go back. You'll still need to take the reins sometimes—edge cases still require human intervention. I'm fortunate my employer provides access to the best tools and the freedom to experiment. That's accelerated my competency significantly.

You can't cheap out with open models. Not yet.

Async vs Sync

There's real uncertainty about when to use asynchronous versus synchronous agents. Fire off multiple tasks and context-switch? Or work through one thing at a time in an IDE?

Both work. Be mindful about which you choose.

Async Agents

Async agents run independently—you fire them off and check in at the end. This puts the burden on you to context-dump upfront and anticipate everything the agent might need. No iteration loop.

Small, self-contained tasks are perfect: bug fixes, small tickets, anything with a clear definition of done. Fire it off, let it run tests, get a PR back.

Where async agents fail: tasks requiring a human in the loop.

Synchronous Agents

Sync agents run in your IDE or terminal. You continuously prompt them through small discrete tasks that add up to something larger or more ambiguous. This is pair programming. The Plan/Research-then-Execute workflow works well here.

Example goal: refactor part of the codebase.

Chunk 1: Move this module, fix imports and references.
Chunk 2: Split this code into separate modules, fix imports, add tests.
Chunk 3: Continue incrementally.

You complete the project through small steps you can plan, execute, and verify. Big-bang refactors fail because they require countless small decisions. Left to an agent without context, many of those decisions will be wrong.

Conclusion

Building trust and finding a workflow that works for you takes time and tokens. Those who adapt will ship more than ever. Those who refuse will fall behind.

These tools are here to stay. They're too economically useful.

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

It’s Time to Let the Model Drive (for Coding)

It’s 2025. If you’re still writing most code by hand, you’re choosing latency.

A few months ago, I didn’t trust LLMs to generate large changes. Model quality was uneven, my prompts were inconsistent, and I had an ingrained preference for controlling every line. That changed fast. With Opus-4.5, GPT-5.2, and Composer-1 in Cursor, I’m a convert: the code is solid, instruction following is reliable, and the workflow is the first one that feels compounding.

The tipping point for me was the 2025 AI Engineer Summit, which was heavily focused on coding agents: context engineering, Research → Plan → Implement loops, and “skills” as reusable context. The message was clear: the tooling got better, but the bigger upgrade is changing how you work.

The workflow: plan in English, execute in code

The loop is simple and repeatable:

Draft a plan with the smartest model available
- In the initial prompt, I point to specific files and directories the model must read.
- I use Cursor’s plan mode to produce a markdown plan with implementation details.
- I iterate on the plan until it’s crisp, testable, and obviously correct.
Execute the plan with a fast model
- In Cursor, that’s typically Composer-1 (Sonnet-4.5 also works well).
- The goal is throughput without losing the plan’s constraints.
Review and accept
- I review the code like I would a strong teammate’s PR: invariants, interfaces, edge cases, and blast radius.
Have the model add tests
- Tests are not optional; they’re your regression contract and your fastest review tool.

The highest-leverage detail comes from Dex Horthy’s talk: write and review the plan in plain English before you mint code. Catching mistakes at the plan layer is dramatically cheaper than catching them after the model has sprayed changes across the tree.

I also prefer Cursor because I like reviewing changes in an IDE. You can do the same with CLI agents (e.g., Claude Code), but terminal diff review is still friction I don’t want.

Code organization becomes a first-class concern

Agents reward codebases that are navigable and punish spaghetti.

You want structure that lets you say: “Use the foo utility/package/library,” and the model can locate it, read it, and apply it without burning half its context window on archaeology. If your code is a tangle, the agent will still work—but it will be slower, more expensive, and more error-prone because it can’t form a stable mental map.

Practical implication: invest in small, coherent modules, obvious names, and “pit of success” APIs. You’re not just writing for humans anymore; you’re writing for retrieval.

Agent context: treat it like an interface

Two patterns matter here:

1) A high-level context file (`claude.md`, `agents.md`)

This is the model’s entry point to your repo: architecture, conventions, invariants, common commands, testing strategy, and “don’t do this” constraints. Generate the first draft with a model, then edit it ruthlessly. Keep it short. Keep it authoritative.

2) “Skills” as packaged code + context

Skills are emerging as a durable pattern: a folder containing markdown, scripts, examples, and any scaffolding that teaches the agent how to do a task in your system. Think of it as a reusable, loadable context module.

A “skill” is often just:

a small library/util package,
a README that explains how to use it correctly,
example call sites,
and guardrails (gotchas, performance constraints, security rules).

This aligns with Anthropic’s skills spec, but the idea is broader: you’re building discrete, discoverable chunks of capability. When you tell a model “use the foo library,” you’re effectively invoking a skill. Make it easy to load, easy to apply, and hard to misuse.

Trust is a workflow problem, not a personality trait

Letting go is difficult because the failure modes are expensive: wrong abstractions, subtle regressions, or code that “works” but violates unwritten constraints.

Trust comes from repetition:

You provide better context.
The model produces better outcomes.
You tighten the loop with plans, tests, and review checklists.
You learn which tasks to delegate and which require hands-on steering.

Eventually it clicks. You still take the wheel for edge cases and deep domain logic, but you stop spending your best hours on boilerplate, wiring, and rote refactors.

One non-negotiable: don’t cheap out on models. If you want the “agentic” workflow to feel reliable, use the strongest model you can for planning and a competent fast model for execution.

Async vs. sync agents: use both, deliberately

There’s a real question in practice: do you fire off autonomous tasks and context switch, or do you stay in a tight interactive loop?

You should do both. The mistake is using the wrong mode for the wrong shape of work.

Async agents: great for bounded work with clear done-ness

Async agents shine when the task is:

small,
self-contained,
testable,
and has a binary “done/not done.”

Examples:

bug fixes with a reproducible failing test,
small tickets,
dependency bumps,
straightforward feature flags,
mechanical migrations.

The tradeoff is upfront cost: you must context-dump early because there’s no tight iteration loop. If the task requires ongoing human judgment, async agents degrade quickly.

Sync agents: best for ambiguous work that needs a human in the loop

Synchronous agents—IDE or terminal pair programmers—excel when:

you need to explore,
you’re making lots of small decisions,
the target state is partially defined,
and you want to validate continuously.

The workflow that consistently works for me is: Plan/Research → Execute, repeated in small chunks.

Example refactor:

Chunk 1: locate module, move it, fix imports, update references.
Chunk 2: split into submodules, update boundaries, add tests.
Chunk 3: tighten API, remove dead paths, verify behavior.

Big-bang refactors are where agents most often make “reasonable” choices that are wrong for your codebase, because the missing context lives in product constraints, team conventions, and historical landmines.

Conclusion

This shift is not about replacing engineering judgment. It’s about moving that judgment earlier—into planning, context, constraints, and verification—then letting the model do the high-throughput implementation.

It will take time, practice, and tokens to build the loop and the trust. But the payoff is compounding: you ship more, refactor more safely, and spend less time on mechanical work. The economics are too strong for this to be a passing phase. The teams that adapt will outrun the teams that don’t.

Footnotes

It’s Time To Let the Model Drive (for Coding)

It’s 2025—you’re still writing code by hand?

A few months ago, I wouldn’t have trusted an LLM to generate significant amounts of code. The quality was decent, but there was a gap between model capability, my own habits, and a deep-seated need to control the output.

Something shifted recently. With Opus-4.5, GPT-5.2, and Composer-1, after several weeks of use inside Cursor, I’m a full convert. The code quality is strong—often better than what I’d write manually—and the models follow instructions well. More importantly, I’ve settled on a workflow that feels genuinely powerful.

The turning point was the 2025 AI Engineer Summit, which focused heavily on coding agents. Talks covered context engineering, Research → Plan → Implement workflows, markdown-based context files, and skills. It became clear that to keep pace, I needed to change how I work and place more trust in the models.

The Workflow

The process is straightforward:

Plan with the smartest model available.
In the initial prompt, I reference relevant files and provide as much context as possible. I use Cursor’s plan mode to generate a markdown file outlining implementation details.
I review the plan, then iterate with the model until satisfied.
Implement with a fast model.
In Cursor, I use Composer-1 (Sonnet-4.5 also works well). The model executes the plan step by step.
Review and accept the changes.
Have the model add tests.

A key insight from Dex Horthy’s talk^[1]: create and review the plan in plain English before it becomes code. This catches potential issues early and makes a dramatic difference in outcomes.

I prefer Cursor because it builds on the familiar VSCode foundation. The same approach works with CLI coding agents like Claude Code—I just dislike reviewing diffs in a terminal.

Code Organization

How you structure code matters immensely in the age of agents. Code must be organized into discrete, well-defined chunks that models can digest without consuming excessive context.

The goal: you should be able to tell the model to “use the foo utility” and have it locate, understand, and apply that code easily. Spaghetti architectures force the model to waste tokens and time exploring.

Agent Context

Documents like claude.md or agents.md provide high-level context about your codebase that models can quickly discover and use. Generate these with a model, adding stylistic guidelines, architectural notes, or other critical details.

Skills are another emerging pattern. A skill is simply a folder containing markdown files, scripts, or other assets that teach a model how to perform a task. It’s a packaged combination of code and context that the model can load dynamically when needed.

This is a powerful abstraction. Telling a model to “use the auth library” is effectively instructing it to apply a skill. Skills work best when they include a README or follow a spec like Anthropic’s skill format.

Trust

Letting go and trusting models is difficult. You must repeatedly verify that you’ve provided the right context and that the model is acting on it correctly. You and the model become a team—and like any team, building trust takes time and practice.

But once that trust is established, productivity soars. You’ll still need to take the reins occasionally, especially for edge cases or nuanced decisions. Practice is essential, and unfortunately, cutting corners with weaker open models rarely pays off.

Async vs. Sync

There’s ongoing debate about when to use asynchronous agents versus synchronous ones.

Async Agents

Async agents run tasks independently—you provide context upfront, fire them off, and check the results later. This places the burden on you to anticipate all necessary context upfront, with little room for iteration.

Async agents excel at small, self-contained tasks: bug fixes, well-defined JIRA tickets, or anything with a clear completion criteria. They can run tests and open a PR autonomously.

They struggle when a task requires a human in the loop.

Sync Agents

Synchronous agents operate in an IDE or terminal, with you continuously prompting them through smaller, discrete steps. They act more like pair programmers, requiring you to maintain context and steer the process.

I’ve had the most success using a Plan/Research → Execute workflow on focused chunks:

Goal: Refactor part of the codebase.
Chunk 1: Move module X, fix imports and references.
Chunk 2: Split module Y into separate components, update imports, add tests.
Chunk 3: Continue incrementally.

Trying to execute a large refactor in one go rarely works—too many small, context-dependent decisions arise that an agent alone will likely get wrong.

Conclusion

Building trust and refining your workflow will take time and tokens. Those who adapt will ship more than ever. Those who resist will fall behind. These tools aren’t going away—they’re too economically valuable to ignore.

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

It's Time to Let the Model Drive (for Coding)

It's 2025, and you're still writing code by hand?

Just months ago, I wouldn't trust LLMs to generate substantial code. Model quality was solid, but overlaps with my skills and control preferences held me back.

That changed recently. With Opus-4.5, GPT-5.2, and Composer-1 integrated into Cursor, I'm converted. Code quality exceeds mine, instruction-following is precise, and the workflow feels powerful.

This shift started at the 2025 AI Engineer Summit, focused on coding agents. Talks on context engineering, Research-Plan-Implement workflows, markdown files, and skills convinced me to trust models more.

The Workflow

Use the smartest model for planning.

Reference relevant files for context in the initial prompt.
Generate a markdown plan via Cursor's plan mode.
Review and iterate until satisfied.

Switch to a fast model like Composer-1 or Sonnet-4.5 for implementation.

Review code, accept, and have the model add tests.

Key insight from Dex Horthy's talk^[1]: Plan and review in plain English to catch issues before coding. This transforms outcomes.

Cursor suits me—familiar from VS Code. CLI agents like Claude Code work too, but terminal diff reviews are cumbersome.

Code Organization

Organize code into discrete, digestible chunks. This aids models without exhausting context limits.

Enable phrases like "use the foo utility" to let models locate, contextualize, and apply code efficiently. Spaghetti codebases inflate costs and complexity.

Agent Context

Maintain a claude.md or agents.md file with high-level codebase overviews. Generate it via model, providing stylistic and contextual guidance.

Skills—folders with markdown, scripts, or files—teach models tasks. They package code plus context, loadable dynamically.

Reference utilities as "skills" with readmes or Anthropic's spec. This discrete bundling fulfills intent effectively.

Building Trust

Trusting models requires proving they receive proper context and follow instructions. Treat them as teammates; build rapport through practice.

It clicks eventually, making reversion hard. Handle edge cases manually. My AI-forward employer enables tool mastery—practice is essential, and open models fall short.

Async vs. Sync Agents

Balance asynchronous and synchronous agents mindfully.

Async Agents

These run independently on internal codebases. Provide exhaustive upfront context; no iteration loop.

Ideal for contained tasks: bug fixes, small tickets. They run tests and generate PRs.

They falter on tasks needing human oversight.

Sync Agents

These operate in IDEs or terminals, handling discrete subtasks toward larger goals. Act as pair programmers; maintain awareness.

Example: Refactor via chunks—relocate modules, split code, fix imports, add tests. Verify each step.

Avoid big-bang approaches; models lack context for nuanced decisions.

Conclusion

Developing trust and workflows demands time and tokens. Adapters will ship more code; resisters risk obsolescence. These tools are economically vital and permanent.

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

Its Time To Let The Model Drive (for coding)

It’s 2025 and you’re still writing code by hand? Just a few months ago, I wouldn’t have trusted a large language model (LLM) to generate significant code. Quality was good, but the intersection of model capability, my own skill gaps, and my ingrained preference for control kept me skeptical. Something fundamental shifted recently. With Opus-4.5, GPT-5.2, and Composer-1 running inside Cursor, I’ve become a full convert. The code quality is now likely better than what I can produce manually, the models follow instructions effectively, and I’ve discovered a workflow that feels genuinely powerful.

This transformation began at the 2025 AI Engineer Summit, focused entirely on coding agents. Talks on context engineering, Research-Plan-Implement workflows, markdown files, and skills opened my eyes. I realized I needed to fundamentally change how I worked and place more trust in the models.

The Workflow

The workflow itself is deceptively simple:

Sketch a Plan: Use the most capable model available. Provide maximum context upfront—reference relevant files and include as much surrounding information as possible. Leverage Cursor's plan mode to generate a markdown file detailing the implementation steps.
Review and Iterate: Critically review the generated plan. Iterate on it with the model until satisfied. As Dex Horthy emphasized in his talk^[1], reviewing plans in plain English catches critical issues before code is written—making a significant difference.
Implement: Execute the plan. In Cursor, use Composer-1. Alternatively, use a model like Sonnet-4.5 for execution.
Review and Accept: Carefully review the generated code. Accept it.
Add Tests: Have the model automatically add tests.

The core insight? Creating and reviewing a plan in natural language is crucial for catching potential pitfalls early.

Code Organization Matters

Another critical discovery is how code organization impacts agent effectiveness. Spend extra time structuring your code into discrete, well-organized chunks. This allows models to easily "digest" the code and understand how to use specific utilities, packages, or libraries. If your codebase is a spaghetti mess, context retrieval becomes prohibitively expensive.

Agent Context and Skills

Using a dedicated context document (e.g., claude.md or agents.md) is essential. This file provides high-level contextual information your codebase relies on, easily discoverable by the model. Generate this document using a model, guiding it on stylistic and contextual details.

Skills represent a powerful emerging pattern. A skill is a neatly packaged combination of code plus context (e.g., a folder containing markdown files, scripts, and a README.md). This discrete unit allows a model to "learn" how to perform a specific task by dynamically loading relevant context. Follow Anthropic's skill specification for best practices. When you tell a model to "use the foo utility," you're effectively directing it to a skill.

Trust: The Foundation

Letting go and trusting the models is hard. Building this trust requires continuous effort: proving to yourself the model has the right context and is delivering what you asked. It’s a partnership. Like any team, it takes time to build high trust and understanding. Once achieved, productivity skyrockets. Edge cases might still require you to write code, but the leverage is immense. Employer support (like AI-forward policies) accelerates this journey. Practice is non-negotiable; you can't cheap out on open models.

Async vs Sync Agents

The choice between asynchronous and synchronous agents matters:

Async Agents: Fire off a task (e.g., a bug fix, small JIRA ticket) and expect it to run independently, returning a result (e.g., a PR). Ideal for self-contained, definitive tasks. The burden is on you to provide all necessary context upfront, as there's no iteration loop. Small, well-defined tasks shine here.
Sync Agents: Run within an IDE or terminal, prompting for small, discrete steps that build towards a larger goal. Requires constant human awareness of the code context. More like a pair-programmer. Success comes from breaking large refactorings into small, manageable chunks (e.g., "Move module X here, fix imports," then "Split module Y into Z," etc.). Avoid big-bang refactors.

Conclusion

Adapting to this AI-driven workflow takes time and significant token usage. Those who embrace it will implement far more than ever before; those who resist will fall behind. These tools are economically indispensable and here to stay. The future belongs to those who learn to collaborate effectively with AI.

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

It's Time To Let The Model Drive (for coding)

It's 2025 and you're still writing code by hand?! Just a few months ago, I would not have trusted an LLM to generate tons of code. The quality of the models were good, but there was some intersection between model capability, my skill issues, and my engrained preference to control the code.

The workflow

The workflow itself is pretty simple.

I use the smartest model I can get, then use it to sketch a plan.
- In the initial prompt, I often reference files that I know the model should use to provide more context, and I also try to give it as much context as I can.
- I use plan mode in Cursor which generates a markdown file with the implementation details.
- I then review the plan, and iterate on the plan with the model until I'm satisfied.
Then I implement the plan, with a fast model.
- In Cursor, I use composer-1. You can also use a model like Sonnet-4.5 to execute the plan.
Afterwards review the code and accept.
Have the model add tests.

Code organization

Agent context

Trust

You'll need to practice, and you really can't cheap out by running open models (unfortunately).

Async vs Sync

I think you can do both, but you need to be mindful about how you're using them.

Async agents

Where async agents fail is if the task requires a human in a loop.

Syncronous agents

An example:

Conclusion

Footnotes

No Vibes Allowed: Solving Hard Problems in Complex Codebases – Dex Horthy, HumanLayer ↩︎

Let the Model Drive Your Code

AI Take the Wheel

It’s 2025. Writing code by hand feels like a relic.
The only viable path to productivity is to let a large language model (LLM) generate and validate the bulk of your implementation.

Why the shift happened

Three weeks ago I was skeptical of an LLM producing “tons of code.”
Today, after using Opus‑4.5, GPT‑5.2, and Composer‑1 inside Cursor, I’m a full convert.
The code quality exceeds what I can hand‑write, the models follow instructions precisely, and the workflow is powerful.

The catalyst? The 2025 AI Engineer Summit. Sessions on context engineering, a Research‑Plan‑Implement loop, markdown‑based skills, and more convinced me to trust the models more and change my habits.

The workflow

Plan
- Run the smartest model you can afford.
- Provide as much context as possible (file snippets, project docs).
- In Cursor, enable plan mode → the model emits a markdown with implementation details.
- Review and iterate until the plan is clear and complete.
Implement
- Switch to a fast model (e.g., Composer‑1 or Sonnet‑4.5).
- Execute the plan.
Review & Accept
- Inspect the diff.
- Accept the changes if they meet the plan.
Add Tests
- Prompt the model to generate tests that cover the new code.

Key insight (Dex Horthy): Review the plan in plain English before any code is written. Catch logic gaps early and avoid costly rewrites.

Organizing code for agents

Discrete, well‑named modules
- A model can reference foo/utils and immediately understand its purpose.
- Avoid spaghetti structures that force the model to search blindly.
Skills
- A skill is a folder containing code + contextual markdown (readme, usage notes).
- Follow Anthropic’s skill spec: the model loads only the relevant chunk on demand.
- When you tell the model “use the foo utility,” you’re effectively invoking a skill.
Central context file
- claude.md / agents.md holds high‑level project info.
- Generate it once and keep it up‑to‑date; the model consults it before acting.

Building trust

Trust isn’t instant.
It’s a team dynamic that requires:

Consistent context – provide the right files, docs, and instructions.
Verification loops – review code, tests, and plan alignment.
Hands‑on fallback – for edge cases, write the code yourself.

Once trust is established, the model becomes a true partner, not just a tool.

Async vs. Sync agents

Agent type	Typical use	Interaction flow
Async	Small, self‑contained tasks (bug fixes, JIRA tickets)	Fire the task, let it run, inspect PR when done
Sync	Ambiguous or large‑scale changes (refactors)	Prompt repeatedly, plan‑execute‑verify in incremental steps

Async agents require you to dump all necessary context upfront; they lack an iterative loop.
Sync agents act like a pair‑programmer; you drive the conversation, making decisions at each step.

Choose the modality that matches task granularity and need for human oversight.

Bottom line

Adopting a model‑driven workflow demands time, token investment, and disciplined organization.
Those who adapt will outpace those who cling to manual coding.
The tools are here; they’re economically indispensable.

Let the Model Drive Your Coding (2025)

Writing code by hand in 2025 feels outdated. Recent model releases—Opus‑4.5, GPT‑5.2, Composer‑1—have reached a quality level that rivals, often exceeds, human output. After a few weeks of using them in Cursor, I’ve settled on a workflow that treats the LLM as a true coding partner.

The Workflow

Plan with the smartest model
- Prompt includes relevant files and maximal context.
- Use Cursor’s plan mode to generate a markdown implementation plan.
- Iterate on the plan until it reads as a clear, English‑level specification.
Execute with a fast model
- In Cursor, run Composer‑1 (or Sonnet‑4.5) to translate the plan into code.
Review and accept
- Perform a quick diff review; the model’s output is usually production‑ready.
Add tests
- Ask the model to generate unit and integration tests for the new code.

The key insight from Dex Horthy’s talk^[1] is that a plain‑English plan catches logical flaws before any line of code is written. This early validation saves time and tokens.

Organize for Agents

Agents consume context linearly; a tangled codebase inflates token usage and hampers comprehension. Structure your repository into discrete, well‑named modules:

utils/foo/ – a self‑contained utility library.
services/bar/ – a bounded service with its own tests.

When you tell the model “use the foo utility,” it can locate the module, load its context, and apply it without scanning unrelated files.

Agent Context & Skills

Maintain a high‑level document—e.g., claude.md—that summarises architectural decisions, conventions, and critical dependencies. Generate it with a model, then keep it up to date.

Skills are folders that bundle code with explanatory markdown (or readme) files. They act as plug‑in knowledge bases the model can load on demand. Follow Anthropic’s skill spec for consistency: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview.

Trust as a Team Dynamic

Trusting the model requires continuous verification: feed correct context, inspect outputs, and correct missteps. Over time the human‑model partnership matures into a high‑throughput collaboration. Edge cases still demand manual intervention, but the ratio of automated to manual work shifts dramatically.

Running open‑source models at scale is rarely cost‑effective; enterprise‑grade APIs provide the reliability needed for production pipelines.

Async vs. Sync Agents

Asynchronous agents

Fire‑and‑forget tasks (bug fixes, small JIRA tickets).
Require a thorough context dump upfront; no iterative feedback loop.
Ideal for self‑contained work that ends with a PR.

Synchronous agents

Operate inside an IDE or terminal, receiving continuous prompts.
Function as pair programmers, handling ambiguous, multi‑step refactors.
Follow a Plan → Research → Execute cycle on small, verifiable chunks.

A successful synchronous refactor might look like:

Locate module X and move it to Y.
Split X into X1, X2; adjust imports.
Add tests for X1 and X2.

Attempting a “big‑bang” refactor with a single async call typically yields poor results because the model lacks nuanced human context.

Conclusion

Adopting a model‑first workflow demands token investment and disciplined planning, but the payoff is exponential productivity. Those who adapt will out‑code the competition; those who cling to manual typing will become obsolete. The era of AI‑driven development is here—and it’s unstoppable.

Footnotes

Dex Horthy, No Vibes Allowed: Solving Hard Problems in Complex Codebases, HumanLayer, 2025. https://www.youtube.com/watch?v=rmvDxxNubIg ↩︎

Its Time To Let The Model Drive (for coding)

The workflow

Code organization

Agent context

Trust

Async vs Sync

Async agents

Syncronous agents

Conclusion

Footnotes

It's Time to Let the Model Drive

The Workflow

Code Organization

Agent Context

Trust

Async vs Sync

Async Agents

Synchronous Agents

Conclusion

It’s Time to Let the Model Drive (for Coding)

The workflow: plan in English, execute in code

Code organization becomes a first-class concern

Agent context: treat it like an interface

1) A high-level context file (claude.md, agents.md)

2) “Skills” as packaged code + context

Trust is a workflow problem, not a personality trait

Async vs. sync agents: use both, deliberately

Async agents: great for bounded work with clear done-ness

Sync agents: best for ambiguous work that needs a human in the loop

Conclusion

Footnotes

It’s Time To Let the Model Drive (for Coding)

The Workflow

Code Organization

Agent Context

Trust

Async vs. Sync

Async Agents

Sync Agents

Conclusion

It's Time to Let the Model Drive (for Coding)

The Workflow

Code Organization

Agent Context

Building Trust

Async vs. Sync Agents

Async Agents

Sync Agents

Conclusion

The Workflow

Code Organization Matters

Agent Context and Skills

Trust: The Foundation

Async vs Sync Agents

Conclusion

It's Time To Let The Model Drive (for coding)

The workflow

Code organization

Agent context

Trust

Async vs Sync

Async agents

Syncronous agents

Conclusion

Footnotes

Let the Model Drive Your Code

Why the shift happened

The workflow

Organizing code for agents

Building trust

Async vs. Sync agents

Bottom line

Let the Model Drive Your Coding (2025)

The Workflow

Organize for Agents

Agent Context & Skills

Trust as a Team Dynamic

Async vs. Sync Agents

Asynchronous agents

Synchronous agents

1) A high-level context file (`claude.md`, `agents.md`)