The era of intelligence design

The system's behavior has always been determined by code, but we've often treated the model rather than the architecture as the central artifact.

After two years of building AI systems, I've started describing a shift in focus as intelligence design: treating architecture, not the model alone, as the main design surface. This aligns with what Andrej Karpathy calls context engineering, the industry's move from crafting prompts to designing systems.

Prompt Engineering EraIntelligence Design Era
Intelligence lives inside the prompt/modelIntelligence lives in the system architecture
Work: craft prompts, tweak temperatureWork: design context, state, memory, routing, loops
Primary artifact: the prompt stringPrimary artifact: type system + primitives
Wait for better modelsDesign better systems

Six key shifts: LLMs demoted to stateless primitives. Primitives over prompts. Horizontal over vertical. Context over chat. Designed loops over human-fixes-blob. Theory over framework.

You don't get reliability by making the model smarter; you get it by treating the model as a fallible component and wrapping it in contracts, loops, and checks.

Many advanced AI systems will adopt coding-agent architectures because code offers high expressivity and adaptability. The challenge is productization: moving from kernel mode (full system access) to user mode (constrained, trustable systems). This includes giving agents domain-specific languages to generate code in, structured programs instead of scattered tool calls.


Part 1: My Journey

Late 2023: Excitement

ChatGPT had just launched. The demos were compelling: code generation, writing assistance, reasoning that felt almost human. I started building immediately, chatbots, assistants, automations.

Within weeks, specific problems with reliability and maintainability showed up.

Early 2024: Prompt Engineering Disillusionment

The process lacked clear abstractions, invariants, or tests. It reduced to ad-hoc prompt adjustments.

I'd write a prompt, test it, tweak a word, test again. Add "think step by step." Remove "think step by step." Try "You are an expert." Try "You are a helpful assistant." Change the temperature from 0.7 to 0.3. Back to 0.7. Maybe 0.5?

There was no clear set of abstractions or principles for predicting or explaining model behavior across tasks. The best practices were folklore: things that seemed to work for someone, sometime, on some model. Nothing composed. Nothing generalized. Every new task meant starting from scratch.

Compared with conventional software engineering, there were no types, tests, or composable abstractions to reason about behavior. Prompt engineering relied on trial-and-error heuristics rather than explicit models, guarantees, or repeatable methods.

Mid-2024: Framework Disillusionment

I evaluated common frameworks to see whether their abstractions improved reliability and structure.

I dove into LangChain. Then LangGraph. Then DSPy. I read the docs, followed the tutorials, built the example apps. The frameworks advertised composable abstractions and repeatable patterns for building LLM applications, but in practice their boundaries didn't match the systems I needed to build. This frustration is widespread.

LangChain gave me abstractions like Chain and Agent and Memory, but the boundaries did not align with concrete system boundaries I needed to model. The framework's concepts did not map to anything fundamental. They reflected one author's opinionated theory about how to organize code.

LangGraph was worse. The graph abstraction forced me to think in nodes and edges, but the real question, how do I know when to split a node?, had no principled answer. It was the framework author's intuition, encoded as API.

DSPy promised automatic optimization, but optimized for metrics that didn't capture what I actually cared about. The optimizer would find prompts that scored well on benchmarks but failed on the edge cases that mattered.

The frameworks encoded their authors' ad-hoc design choices as APIs without grounding them in shared underlying principles. They standardized APIs around abstractions that, in my experience, didn't yet reflect well-understood design tradeoffs.

The problem went beyond weak or leaky abstractions: many of the boundaries felt arbitrary, reflecting local conventions rather than shared underlying principles.

November 2024: Semantic Objects Turn

The standard framing for agent design is tool-centric: "What tools can the agent use?" You give the agent a list of functions (search_web(), read_file(), send_email()) and it decides which to call. The tools float free, unattached to any domain structure.

I started thinking differently: "What can the agent manipulate?" Instead of tools, give the agent objects. A Document with sections. A Codebase with modules and functions. A Conversation with history. The operations are methods on the objects, not free-floating functions.

I called these semantic objects: typed entities with operations attached.

// Tool-centric: "What can the agent do?"
const agent = new Agent({
  prompt: "You are a code assistant...",
  tools: [read_file, write_file, search, run_tests]
})
await agent.chat("Update the login function")

// Object-centric: "What can the agent manipulate?"
const agent = new Agent({
  prompt: "You are a code assistant...",
  objects: [codebase, testSuite]  // operations are attached to objects
})
await agent.chat("Update the login function")

In the first pattern, read_file and write_file float free. In the second, the codebase object exposes .getModule(), .getFunction(), .updateImplementation(). The operations are scoped to what they affect.

This is object-oriented agent design. The shift matters because:

The objects define what's manipulable. If the agent has a Document object, it can manipulate sections, paragraphs, citations. If it only has read_file and write_file, it manipulates bytes. The object's interface constrains the space of valid operations.

The tools belong to the objects. Instead of a flat list of tools, operations are namespaced to the entities they affect. project.addFunction() is clearly about the project. add_function(project_id, ...) is not.

Text becomes serialization. The text representation is how objects get passed to the model; the object is what the agent is actually manipulating. In most systems, components operate on structured representations rather than raw strings.

I wrote a spec for semantic objects: rich, manipulable objects with data, operations, relationships, and visualization. But semantic objects alone weren't enough. Where do the objects live? How do they persist? Agents need environments with shared storage and lifecycle management, not just isolated objects.

Late 2024: Tool Calling Problem

Around this time, I started hating tool calling.

Most frameworks treat tool calling as a single primitive: the model decides to call a tool, you execute it, you return the result. This conflates distinct concerns.

Tool calling actually conflates two completely different operations:

Decision making: The model choosing what action to take. "I should search the web for X." This is a reasoning operation, analyzing context, weighing options, committing to a course of action.

Context manipulation: How the tool's results transform the system's state. The search results get added to context, or stored in memory, or used to update an artifact. This is a state management operation. It's about information flow, not reasoning.

These should be separate abstractions: one for control and decision-making, and another for how tool outputs update system state. But most frameworks collapse them into one "tool call" concept, making systems hard to reason about, hard to debug, hard to compose.

They were crystallized into APIs before anyone understood the underlying structure.

Early 2025: Visual Grammar Attempt

I needed a representation that made the system's data flow and control structure visible, not just textual logs.

I started designing a visual grammar for agent systems. Boxes for components, arrows for data flow, colors for different types of operations. The goal was to make context composition visible, to see how information assembled for each call, how state flowed through the system, where decisions were made.

The visual models revealed control-flow and data-flow issues, like unnecessary loops and redundant context, that I had overlooked in code. The grammar let me reason about structure in a way that prose and code alone couldn't.

But I hit limits. Visual representations constrain you. Every design choice in the grammar is a commitment about what matters. Some things that were easy to draw were hard to build; some things that were easy to build were hard to draw. The visual was good for understanding, but code was still where the building happened.

In practice, I use visualization for exploration and rely on code for implementation. Diagrams made call sequences and data dependencies easier to inspect than log output alone.

Mid-2025: XJSN Exploration

Around this time I started exploring constrained output languages for agents.

Consider why tools like Bash and Grep are so expressive for LLMs. Models have seen thousands of examples of shell pipelines in their training data. They can generate correct Bash quickly because they understand the syntax patterns deeply. The same is true for JavaScript.

The insight behind XJSN: leverage the model's prior understanding of JavaScript syntax, but don't actually execute it as JavaScript. Instead, parse it into an AST and write a custom interpreter.

Pipeline([
  Filter({ status: "active" }),
  Map({ extract: ["id", "name", "email"] }),
  GroupBy({ field: "department" }),
  Aggregate({ count: Count(), average_age: Average("age") })
])

This looks like JavaScript function calls, so the model generates it fluently. But it's not meant to be executed as JS. You parse it, and each function call becomes a node in your AST. Filter, Map, GroupBy, Aggregate are node types you define, not JavaScript functions.

The power is in the AST phase. You control the grammar: what node types exist, what arguments they accept, what can appear inside them. You can define an If construct but constrain what's allowed in its branches. You can allow Filter but restrict what predicates it accepts. The validation happens at parse time, before any interpretation.

// You define the grammar
const grammar = {
  Pipeline: { children: ['Filter', 'Map', 'GroupBy', 'Aggregate'] },
  Filter: { args: { status: 'string' } },
  // ...
}

// Agent generates XJSN
const output = await agent.generate("Transform this data...")

// Parse and validate against grammar
const ast = parse(output)
const errors = validate(ast, grammar)
if (errors) return retry(errors)

// Your custom interpreter executes the validated AST
const result = interpret(ast, data)

Businesses can define their own grammars for their domain. A compliance team defines audit node types. A data team defines transformation primitives. The agent generates code in familiar JavaScript syntax, but only the node types in your grammar are valid. Verification is tractable because you validate the AST against a known grammar, not arbitrary code.

I didn't take XJSN all the way at the time, but the concept shaped how I think about constrained generation: leverage existing syntax knowledge, parse into an AST, validate against a grammar, interpret with custom semantics.

Fall 2025: Coding Agents Arrive

Claude Code shipped. Cursor exploded. Devin made waves. Coding agents went from research demos to production tools.

They were effective enough to complete nontrivial coding tasks end-to-end, with some failures. These systems could take a task, explore a codebase, write code, run tests, fix errors, iterate until done. They exhibited more autonomous problem-solving than many earlier "agent" demos I had seen.

Why did coding agents work when so many other agent applications flopped?

The answer started crystallizing everything I'd been thinking: coding agents work because they're well-architected systems implementing real primitives.

Context: Intelligent assembly of relevant code, documentation, error messages, history. Not just "dump everything in", careful selection of what the model needs.

Memory: Persistent understanding of the codebase. Files, structure, patterns, conventions.

Agency: Real effects in the world. File writes, command execution, test runs. Not just text generation, actual action.

Reasoning: Verification loops. Write code, run tests, see errors, fix, repeat. The loop structure turns unreliable generation into reliable convergence.

Evaluation: Test suites, linters, type checkers. External verification that doesn't depend on the model's self-assessment.

Coding agents made this architecture concrete and showed that it can work in practice. The intelligence isn't in the prompt. It's in the system architecture. The model is a component. The orchestration is the product.

November 2025: Mechanistic Mindset Connection

Meanwhile, a parallel thread was running that I hadn't fully connected.

For years I'd been building a wiki on what I call the "mechanistic mindset": the idea that cognition is computation. Not as metaphor but literally. Thinking is information processing. Behavior is the execution of algorithms. The mind is a machine, and understanding the machine is understanding the mind.

The wiki had grown to 95 articles covering everything from motivation to laziness to willpower, all reframed through a computational lens. Instead of "I procrastinate because I'm lazy," you ask: What are the competing reward signals? What's the latency and friction of switching tasks? How is uncertainty about success suppressing action?

This isn't everyone's view. But it's mine, and I published it as an Obsidian site. At that point, I connected this view of cognition with how I was structuring AI systems.

If cognition is computation, then designing cognition is designing computation. The question "how do I build an agent?" becomes "how do I design a computational system that exhibits intelligent behavior?" This is software engineering: inputs, outputs, state, control flow, abstraction, composition.

In this framing, intelligence design reduces to code design. The code is where the intelligence lives. The model is one component, a stateless text transformer. The intelligence emerges from how you orchestrate context, state, memory, and loops.

December 2025: Elements of Agentic System Design

I started writing Elements of Agentic System Design. The goal was to describe the design space from first principles, rather than documenting a particular framework.

I landed on 10 elements that capture every design decision you make when building intelligent systems:

  1. Context: What information is assembled for each call
  2. Memory: What persists across calls
  3. Agency: Capacity to produce effects in the world
  4. Reasoning: Structure of model calls (loops, planning, reflection)
  5. Coordination: How multiple components communicate
  6. Artifacts: Structured outputs that evolve over iterations
  7. Autonomy: What triggers computation
  8. Evaluation: How you measure success
  9. Feedback: Signals that flow back to improve the system
  10. Learning: How the system gets better over time

These aren't new inventions for AI. They're distributed systems primitives, programming language concepts, standard software engineering applied to a context where one of your components is an LLM.

The model is a stateless text transformer. You control everything else.

January 2026: Intelligence Design

On January 1st, 2026, I finished the book and connected the strands of the previous work into a single framework.

ExperienceInsight
The prompt engineering frustrationthe primitives are missing
The framework disillusionmentabstractions before understanding
The semantic objects workagents need typed environments
The tool calling critiqueoperations are conflated
The visual grammarcode is the final artifact
The XJSN branchgive agents languages to think in
The mechanistic mindsetcognition is computation
The coding agents successthe paradigm is proven

The system's behavior has always been determined by code, but we rarely treated it as the primary locus of intelligence.

I started calling this "intelligence design": the discipline of designing systems that think with you, where the code architecture is the central artifact rather than the prompt.


Part 2: The Core Reframes

1. LLMs demoted: From "smart teammate" to "stateless text transformer"

The model has no memory. No identity. No continuity between calls. Each invocation is pure function over its input, text in, text out. The "agent" that feels continuous is your architecture, not any capability inherent to the model.

This demotion changes the design focus from eliciting "understanding" to engineering around known limitations. You stop relying on the model to infer everything correctly and instead design systems that tolerate and correct its errors.

2. Primitives over prompts: From prompt engineering to system primitives

Instead of focusing on phrasing a single prompt, focus on how the system constructs context for each model call. Instead of tweaking temperature, design verification loops.

Each of the 10 elements is a category of decisions you're making, whether explicitly or by default.

3. Horizontal over vertical: From apps to horizontal primitives

Don't build a "coding agent." Don't build a "writing agent." Build context construction primitives, reasoning loop primitives, verification primitives. The horizontal primitives compose into any vertical application.

The coding agents that work look more like orchestration systems applied to code than like monolithic "coding AIs." Many of the same patterns also carry over to writing, research, and data analysis.

4. Context over chat: From chat agents to context machines

The context window IS the agent's "mind" for that call. What you put in the window determines what the model can do. Context engineering, deciding what goes in, what stays out, how it's structured, is the primary leverage point.

A 10-line prompt with perfect context beats a 1000-line prompt with wrong context. Every time.

5. Designed loops: From human-fixes-blob to draft→critique→revise

The old pattern: model generates, human evaluates, human fixes, repeat. The model produces blobs; humans are the loop.

The new pattern: model generates, model (or tool) critiques, model revises, system evaluates, repeat. The loop is designed into the system. Humans set the termination condition, not the iteration.

Draft → critique → revise turns unreliable generation into directed search. The model doesn't need to be right, it needs to be improvable.

6. Theory over framework: From framework-driven to theory-first

Build on distributed systems thinking. Build on programming language theory. Build on state machine formalism. The primitives of intelligent systems are the primitives of computation, applied to a new domain.

Don't learn LangChain. Learn context construction. Don't learn LangGraph. Learn control flow patterns. The framework is temporary. The theory persists.

Reliable Systems from Unreliable Components

You don't get reliability by making the model smarter; you get it by treating the model as a fallible component and wrapping it in contracts, loops, and checks.

The model will hallucinate. The model will misunderstand. The model will go off the rails. This isn't a bug to be fixed, it's a feature to be designed around.

Typed interfaces. Every LLM boundary is a contract. Structured outputs, not blob-of-text. Schemas that enforce valid structure. Instead of emitting free-form JSON, the model fills fields defined by a schema, and the runtime enforces structural validity.

Narrow primitives. Single-responsibility operations. Each model call does one thing. Compose small reliable calls, don't build large unreliable monoliths.

Verification loops. Draft, then verify. Generate, then check. The loop structure means errors get caught and corrected. Self-evaluation, tool-based verification (run the tests, check the types), cross-checks against external sources.

Graceful degradation. When verification fails repeatedly, don't crash, escalate. Ask the user. Fall back to simpler approaches. Admit uncertainty. Reliability includes knowing when you can't succeed.

Spotify's engineering team describes coding workflows that use similar draft-run-revise loops. They draft code, run tests, see failures, revise, repeat. The loop turns an unreliable generator into a reliable problem-solver.


Part 3: What Comes Next

Convergence Toward Coding Agent Patterns

Many sophisticated AI systems converge toward coding agent patterns.

Why? Because code is a highly expressive medium. A coding agent isn't limited to predefined tools, it can write new tools. It isn't limited to predefined workflows, it can create new workflows. The ability to write and execute code lets the system define new tools and workflows on demand.

Coding agents can reach high levels of agency and adaptability when given enough access and tooling:

  • Maximum expressivity: Any computable operation is reachable
  • Maximum adaptability: The system can modify its own behavior
  • Maximum causality: Real effects in real systems

This is why coding agents work better than most domain-specific agents. They're not constrained by the tools some framework author imagined. They have the full power of computation.

I expect many workflows that can be expressed as code-generation loops to shift toward coding-agent-style architectures. Writing assistance becomes "write code that produces documents." Data analysis becomes "write code that analyzes data." Research becomes "write code that searches and synthesizes."

How Do We Productize It?

Productizing these systems raises a constraint:

Current coding agents operate in what I call kernel mode: full access to everything.

  • Full file system access
  • Arbitrary code execution
  • Unconstrained tool use
  • Complete system control

This is necessary for capability. You can't build a real coding agent without letting it read and write files, run commands, make network requests.

But kernel mode isn't productizable. You can't ship a system that has arbitrary access to users' systems. You can't deploy an agent that might do anything. For products, you need constraints. You need safety. You need trust.

One useful framing is a shift from "kernel mode" systems toward more constrained "user mode" designs: still capable, but bounded in ways that enable trust.

Prediction 1: File Systems → Object Systems

Current state: Agents operate on raw files with free-floating tools.

// Tool-centric agent construction
const agent = new Agent({
  prompt: "You are a coding assistant...",
  tools: [readFile, writeFile, search, runCommand]
})
// Agent can access any path, write any content

Arbitrary paths. Arbitrary content. The file system is the interface. This works, but it's maximally unconstrained. The agent can read anything, write anything, delete anything.

Future state: Agents operate on semantic objects with attached operations.

// Object-centric agent construction
const agent = new Agent({
  prompt: "You are a coding assistant...",
  objects: [project, testSuite]
  // project exposes: .getModule(), .getFunction(), .updateImplementation()
  // testSuite exposes: .run(), .getFailures(), .getCoverage()
})
// Agent can only manipulate what the objects expose

This is the semantic objects pattern applied to production systems. The operations are methods on the objects, not free-floating tools. The file system becomes an implementation detail. The agent sees "functions" and "modules" and "tests", not paths and bytes.

Why this matters:

The objects define what's manipulable. You can't accidentally delete system files if your interface is "modify this function." Invalid operations become impossible, not just forbidden.

Operations have semantic preconditions. "Update this function" can verify that the new code type-checks, has tests, doesn't break callers. "Write arbitrary bytes to path" can't.

Objects enforce their own manipulation. Today, "agent skills" are prompts explaining how to use scripts: when to invoke them, what they do, how to call them. This is weak because it's just instructions; the agent can still do anything. With object systems, the object's interface IS the enforcement. You don't write prompts explaining how to manipulate a Module. The Module exposes .getFunction(), .updateImplementation(), and that's what's possible. The abstraction enforces itself.

Higher-level structures become possible. When objects enforce their interfaces, you can build on top of them. A Project contains Modules which contain Functions. You stop re-reading files to understand what's possible. The types tell you. Agents at different levels compose because the interfaces guarantee what each level can do.

Builders can refine object types. Domain experts create ReportSpec, DataPipeline, AuditTrail objects with operations tailored to their domain. These aren't prompts explaining file formats. They're typed abstractions that enforce how they're used. The ecosystem grows because people share object definitions that carry their own guarantees.

Prediction 2: Constrained Code Generation

Current state: Sandbox everything, hope for the best.

Agents generate arbitrary code. We contain the damage with sandboxes, isolated environments, limited permissions, restricted networks. The model can do anything; we just limit the blast radius when it goes wrong.

Sandboxing limits the blast radius of mistakes but doesn't address whether the generated code is appropriate in the first place.

Future state: Constrain the type of code agents can generate.

Instead of "generate arbitrary Python," generate "data transformations in this DSL." Instead of "write any code," write "workflow steps using these primitives." The output language constrains what's possible, not just what's allowed.

// Arbitrary generation: anything goes
const code = await model.generate("Write Python to process this data")
await sandbox.execute(code)  // Hope it's safe

// Constrained generation: valid by construction
const transform = await model.generate<DataTransform>(schema)
const result = interpret(transform, data)  // Safe by design

Why this matters:

Verification becomes possible. A DSL with limited operations can be verified exhaustively. Arbitrary code can only be contained.

Safety becomes design. You're not defending against bad code, you're making more classes of harmful outputs unrepresentable. The grammar of valid outputs excludes harmful outputs.

Reliability improves. Constrained generation is easier than arbitrary generation. The model has less rope to hang itself with.

Constrained generation reduces the need to rely solely on sandboxing by making more classes of harmful outputs unrepresentable.

Prediction 3: Domain-Specific Languages for Agents

Current state: Agents express computation through scattered tool calls.

Each tool call requires a full model invocation. The agent can't express "search for X, filter by Y, summarize Z" as a single thought. It must interleave execution with decision-making, rebuilding context at each step. This is analogous to a system that recomputes its control state from scratch after every step, instead of maintaining an explicit program.

Future state: Businesses create domain-specific languages with JavaScript-like syntax, and agents generate code in those languages.

The XJSN approach: leverage the model's existing understanding of JavaScript syntax, but constrain the valid constructs via a grammar. A compliance agent generates audit expressions that look like JavaScript function calls:

AuditProcess({
  scope: RecursiveWalk({
    root: EntityGraph("subsidiaries"),
    filter: And([
      Revenue(GreaterThan(10000000)),
      Jurisdiction(In(["EU", "US", "UK"]))
    ])
  }),
  decision: Switch({
    cases: [
      { when: Above(Percentile(95)), then: Escalate({ to: "CFO" }) },
      { when: Above(Percentile(80)), then: Schedule({ review: "quarterly" }) },
      { default: Archive({ retention: Years(7) }) }
    ]
  })
})

The model generates this fluently because it looks like JavaScript. But only the function names and argument structures defined in your grammar are valid. You validate the output against the grammar and retry with error messages if it fails.

Why this matters:

Leverages existing knowledge. Models already know JavaScript syntax deeply. You don't need to teach them a new language from scratch.

Grammar-based validation. You define which constructs are valid. Validation is fast because you check against a known grammar, not arbitrary code.

Domain-specific constraints. A compliance team defines audit operations. A data team defines transformation primitives. The grammar constrains what operations are expressible.

Safety by design. If "delete all files" isn't in the grammar, it can't be generated. The grammar is the security boundary.


Part 4: The Thesis

The Core Claim

During the prompt-engineering phase, most effort went into writing instructions for the model instead of designing system structure.

Intelligence design shifts the focus to architecting with models: designing context construction, loop structures, verification layers, and state management. The model is a component; the system is the product.

The 10 Elements

A practical design space for intelligent systems can be organized as follows:

ElementWhat It IsStandard CS Analog
ContextWhat information is assembled for each callInputs & representations
MemoryWhat persists across callsData structures & storage
AgencyCapacity to produce effects in the worldI/O & side effects
ReasoningStructure of model calls (loops, planning, reflection)Control flow & algorithms
CoordinationHow multiple components communicateDistributed systems
ArtifactsStructured outputs that evolveData models & versioning
AutonomyWhat triggers computationEvent systems & scheduling
EvaluationHow you measure successTesting & observability
FeedbackSignals that flow back to improveMonitoring & logging
LearningHow the system improves over timeOnline learning & adaptation

These aren't new inventions for AI, they're the same primitives we use for any distributed system. The model is just one component. You control the other nine.

The Vocabulary

Intelligence Design: The discipline of designing systems that think with you. It's more specific than prompt engineering, which tunes one component, and narrower than generic "AI development." Intelligence design: the systematic practice of building computational systems that exhibit intelligent behavior.

The term "agent" is overloaded. Focusing on system design clarifies the underlying concerns: context construction, state management, control flow, verification. The 10 elements provide a vocabulary to discuss these systems precisely.

The Consequences

Context construction over prompt phrasing. The prompt is one line. The context is everything. What information does the model need? How is it assembled? How is it structured? This is where leverage lives.

System design over model capability. Model quality will improve over time, but in many projects I see larger near-term gains from improving system architecture. You control nine of the ten elements of any intelligent system.

Primitives over frameworks. It is more durable to understand architectural primitives than to depend on the APIs of any single framework. LangChain will change. LangGraph will evolve. DSPy will iterate. The 10 elements won't.

System thinking over agent thinking. Focusing on the system's architecture, context, loops, and state, clarifies design choices more than the generic label "agent." What's my context strategy? What's my loop topology? What's my state model? What's my verification approach? These questions have answers. They lead to code. They compose into systems that work.


Conclusion

Over two years, I iterated across many systems and frameworks, discarding abstractions that did not support reliable behavior.

In practice, intelligence design reduces to code design. The code is where the intelligence lives. The model is stateless, a function from text to text. Everything that makes a system "intelligent" is your architecture: what context you construct, what loops you run, what state you maintain, what verification you perform.

Work is shifting from prompt-centric experimentation toward system-level intelligence design. The system's behavior has always been determined by code. Now we recognize that code as the primary artifact.