Coding Agents as the Serious-Work Stack

For a long time I treated coding agents as coding agents.

The name does that to you. You assume the boundary is the craft. If you are not shipping code, you should be somewhere else. ChatGPT can handle the words. Claude can handle the brainstorm. The terminal-shaped product is for engineers doing engineer work.

I stayed skeptical until I didn’t.

The reality is the environment around the model matters as much as the model. When I started pairing skill files with MCP servers, local files, and the ability to run commands or write scripts on the fly, I stopped thinking of Cursor-style setups as “for code only.” I started using them for work that looks nothing like a pull request.

Go-to-market was my gateway. Customer research, positioning notes, first-pass outreach, mapping pain from real threads instead of vibes. But GTM is not the thesis. It is the example that made the pattern obvious.

Simply put: the winning interface for serious work is shifting from “talk to a model” toward “steer an agent inside a project.”

A project, here, is not poetry. It is a folder with history. Rules the agent should respect. Skills that encode someone else’s process. Tools that pull direct sources from the web or your stack. A shell when the task needs ten steps, not one paragraph. Git when you want iteration to leave a trail.

That bundle does something chat-first products are still awkward at. Not because they lack models. Because they optimize for conversation, not composition.

What Actually Changed

Nothing magic happened inside the weights.

What changed was the stack around the model: judgment packaged as skills, evidence wired through MCP, hands in the terminal and in small programs the agent writes when needed, and memory that lives in the same place as the thing you are building.

Skill files are not generic motivation. They are stored process and taste. I wrote about what that means in There’s a Skill for That. The short version is that a good skill does not only instruct. It teaches the agent what good looks like and how to move through a real workflow.

MCP is the opposite of hoping the model “just knows.” It is a pipe to live data: pages, APIs, docs, sometimes a browser. The issue with plain chat is not honesty. It is grounding. If you want to know whether a market pain is real, you need more than a plausible paragraph. You need to pull the source—forum threads, pricing pages, changelogs, support complaints—and let the argument attach to reality.

Then there is execution. CLI tools and “write a script to do the boring part” sound technical. They are. That is the point. A lot of valuable work is half judgment and half repetition. The coding agent can turn the repetition into something reusable. Chat UIs can bolt on code execution, but it rarely feels as natural as the machine that was already built for files, diffs, and commands.

Finally, the repo. One place. Context does not get lost across thirty threads. Your constraints, your prior drafts, your links, your decisions sit next to the work. That is not a small UX tweak. It changes how knowledge compounds.

Why the Chat Window Feels Second-Class for This

I still use general chatbots for simple tasks. A quick rewrite. A short explanation. A back-and-forth where I do not need a trail.

The issue with using only that layer for workflow automation or deep research is not that the model is dumb. Apps and plugins exist. You can connect tools. You can upload PDFs. You can get a long answer.

But the default posture is still conversation as product. Memory is session-shaped. Tooling is bolted on. The center of gravity is the scrollback, not the filesystem.

On my machine, the skill installs and the agent gains a repeatable superpower. MCP is first-class wiring, not an afterthought. The codebase is the spine. That difference matters when the output has to be checked, re-run, and improved the way real work does.

A Concrete Scene (GTM as Example, Not the Whole Point)

Picture customer discovery before you build.

You grab a research skill—something like the kind of packaged marketing process people distribute in ecosystems such as Marketing Skills for AI Agents, where domain experts turn repeatable go-to-market work into skill files. You wire in Firecrawl or another connector so the agent is not inventing Reddit threads. You point it at a subreddit, a competitor’s help forum, a niche Slack archive if you have access. You ask for pain points, language people use, what they tried, what they hate.

The output is not only a summary. It can be excerpts, rough taxonomy, a list of hypotheses that trace back to URLs. That is different from “here is what I think customers feel.”

That workflow looks like go-to-market. The underlying stack is not limited to GTM.

The Same Machine, Different Jobs

Once you see the pattern, you stop forcing “developer tool” to mean “developer task.”

Ops and internal workflows: Turn a messy weekly process into a script plus a checklist the agent follows, with logs in the repo.
Competitive teardowns: Same evidence loop—pull pages, changelogs, pricing, positioning—then synthesize with a skill that knows what to compare.
Research dossiers: Long-running notes with sources, not a single ephemeral essay.
Policy-ish or high-stakes drafts: I am not saying the agent is a lawyer. I am saying the serious version of the task is iterate with citations and structure, which maps cleanly to files and tools.
Content pipelines: First drafts grounded in your own corpus, stored next to the site, diffable like code.

The through-line is not sales. It is judgment plus verifiable inputs plus repeatable execution, held in one place.

A Simple Framework

If you want a shorthand for why this stack hits harder than chat alone, think in five dimensions.

Judgment (skills) — Whose process are you borrowing, and what does “good” mean?
Evidence (MCP and tools) — What can the agent actually read, fetch, or query?
Execution (CLI / code on the fly) — What can it run, automate, or regenerate without you typing everything again?
Memory (repo) — Where do constraints, decisions, and drafts live so the next session starts ahead?
Iteration (diffs, commits) — How does improvement leave a trail you can trust?

You do not need perfect scores on all five every time. You need enough of them that the work stops feeling like a magic trick and starts feeling like work.

Picture all five feeding one workspace: skills and repo shape what to do and what you already decided; MCP and CLI handle what is true in the world and what can be automated; the agent sits in the middle and produces artifacts you can reopen, diff, and verify.

Concerns

This stack has costs.

Setup is not free. Skills vary wildly in quality. MCP and APIs tie you to vendors and rate limits. The terminal is power with responsibility: you can automate harm or spam as easily as insight if you stop being careful. Polished Markdown can look rigorous when the underlying pulls were shallow—fake rigor is a real failure mode.

Scraping public threads raises ethics and ToS questions. “Research” can slide into surveillance vibes if you forget there are people on the other side of the text.

None of that kills the pattern. It means the grown-up version includes skepticism, consent, and checking sources—same as any serious research habit.

What This Means

Tools do not only change output. They change behavior.

When the serious interface for knowledge work shifts toward agents inside projects, the floor rises for solo builders and small teams. You can borrow expertise without hiring a full role for every early task. You can ground decisions in direct sources instead of narrative confidence.

Institutions and habits will lag. Lots of knowledge work is still organized around meetings, decks, and Slack scroll, not repos and diffs. That mismatch will keep creating openings for people who learn this stack early.

I am not claiming chatbots are useless, or that every task belongs in a coding agent. I am saying the center of gravity for workflow automation and deep research has started to move—toward environments built for files, tools, and memory, not only messages.

That is bigger than GTM.

It is software reshaping how serious work gets done, one folder at a time.

Related: There’s a Skill for That on skill files, packaged process, and labor leverage.