Back to Blog

Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

Nicholaus Link

Nicholaus Link

Feb 20, 2026 · 12 min read

Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

TL;DR

We built a system that coordinates 7 specialized AI agents in a sequential pipeline to clone entire websites from reference templates. Each agent is an expert in one domain (extraction, planning, building, polishing), and they communicate through a shared workspace to transform a design reference into a production-ready React app.


The Agent Architecture

Meet the Team

Our system uses 7 specialized agents, each with specific tools and responsibilities:

  1. EXTRACT Agent - Web scraping specialist

    • Uses Browserbase MCP for browser automation
    • Captures desktop/mobile screenshots at multiple viewports
    • Extracts text content, structure, and assets
    • Downloads source site images
    • Output: workspace/extraction/*.json with structured content
  2. REFERENCE Agent - Design system analyst

    • Screenshots reference template at 5+ viewport sizes
    • Downloads HTML/CSS source code
    • Analyzes design tokens (colors, typography, spacing)
    • Documents component patterns and animations
    • Output: workspace/reference/ with screenshots + source + analysis
  3. PLANNER Agent - Implementation architect (uses Claude Sonnet 4.5)

    • Maps source content to reference design
    • Extracts exact CSS values for design tokens
    • Creates component specifications
    • Plans page layouts and routing
    • Output: workspace/plan/*.md with detailed implementation specs
  4. BUILDER Agent - React developer

    • Translates designs to React + TypeScript + Tailwind
    • Implements animations with Framer Motion
    • Integrates shadcn/ui components
    • Creates responsive layouts
    • Output: Full React app in workspace/src/
  5. COPYWRITE Agent - Content specialist

    • Generates tone-appropriate copy (luxury/SaaS/playful/etc)
    • Fills content gaps from extraction
    • Enhances headlines and CTAs
    • Output: workspace/content/copy.json
  6. ASSETS Agent - Image curator

    • Searches Pexels/Unsplash APIs
    • Curates style-matching stock photos
    • Downloads optimized sizes
    • Output: workspace/public/images/ + metadata
  7. POLISH Agent - QA engineer

    • Screenshots built site vs reference
    • Identifies visual discrepancies
    • Makes iterative fixes (max 3 rounds)
    • Output: workspace/screenshots/round_N/ + critique files

The Orchestration Strategy

Sequential Pipeline Execution

Unlike systems that spam parallel agents, we use a strict sequential pipeline:

INIT → EXTRACT → REFERENCE → PLAN → BUILD →
COPYWRITE → ASSETS → VERIFY_BUILD → DEV_SERVER → POLISH → COMPLETE

Why sequential?

  • Data dependencies: Each agent needs outputs from previous agents
  • Workspace coordination: All agents write to the same directory
  • Resource management: Prevents file write conflicts
  • Predictability: Clear execution flow, easy to debug

Hub-and-Spoke Communication

Agents don't talk to each other directly. They communicate through:

  1. HTTP API: Orchestrator sends prompts to OpenCode server (http://localhost:4096/api/chat)
  2. File System: Agents read outputs from previous agents in shared workspace/
  3. Session IDs: Maintains conversational context across agent invocations
Pipeline Orchestrator (Hub)
    ↓ HTTP POST
EXTRACT Agent → Writes extraction/*.json
    ↓ File System
REFERENCE Agent → Reads extraction, writes reference/*
    ↓ File System
PLANNER Agent → Reads both, writes plan/*.md
    ... and so on

State Machine with Resume Capability

The pipeline is a state machine that saves progress to .pipeline_state.json:

{
  "phase": "BUILD",
  "start_time": "2026-01-31T10:30:00",
  "polish_rounds": 0,
  "session_ids": {
    "extract": "ses_abc123",
    "reference": "ses_def456",
    "planner": "ses_ghi789"
  },
  "errors": []
}

Resume from anywhere: If an agent fails, you can restart the pipeline from the last successful phase:

corebolt resume

The Iterative Polish Loop

The POLISH agent is unique—it's the only agent that runs iteratively:

while polish_rounds < 3:  # Max 3 iterations
    1. Screenshot the built site at multiple viewports
    2. Compare to reference screenshots
    3. Write a critique identifying visual bugs
    4. Make fixes to React components
    5. Check if "POLISH_COMPLETE" marker exists
    6. If complete, break; else, repeat

This creates a self-improving loop where the agent:

  • Identifies spacing issues ("gap should be 24px, not 16px")
  • Fixes color mismatches ("bg-gray-100 should be bg-gray-50")
  • Adjusts responsive breakpoints
  • Refines animations

Each round saves screenshots to workspace/screenshots/round_N/ for debugging.


Performance & Resource Management

Timeouts

  • Agent timeout: 5 minutes per invocation (for long-running tasks like BUILD)
  • HTTP timeout: 5 minutes for OpenCode API calls
  • No retry logic: Failed agents are logged but not auto-retried

Optimizations

  1. Resume capability: Skip completed phases on restart
  2. Workspace isolation: Each project gets its own directory
  3. Non-blocking build: Build failures don't halt pipeline (POLISH fixes them)
  4. Background dev server: npm run dev runs in subprocess
  5. Health checks: Verify OpenCode server before starting

Resource Limits

  • Polish rounds: Max 3 iterations to prevent infinite loops
  • No rate limiting: External API calls (Pexels, Unsplash) are unbounded
  • No parallel execution: All agents run sequentially (could optimize COPYWRITE + ASSETS to run in parallel)

Configuration & Customization

Agent Definitions (.opencode/agents/*.md)

Each agent is defined with YAML frontmatter:

---
name: planner
description: Creates implementation roadmap
mode: primary
model: anthropic/claude-sonnet-4-5 # Uses most advanced model
tools:
  read: true
  write: true
  bash: true
  glob: true
---
[Detailed markdown instructions...]

Key features:

  • Model selection: Different agents use different Claude versions
  • Tool access: Fine-grained control (e.g., EXTRACT gets webfetch, BUILDER doesn't)
  • Instructions: Embedded prompts with examples and constraints

Pattern Library (patterns/)

Agents reference curated design patterns:

  • Color palettes: 7 presets (dark-luxury, warm-neutral, tech-blue)
  • Heroes: 5 patterns (editorial, minimal-centered, bento-intro)
  • Navigation: 4 patterns (minimal-corners, centered-logo)
  • Footers: 3 patterns (cta-footer, mega-footer, minimal-bar)

PLANNER agent selects patterns, BUILDER agent implements them.


Error Handling & Resilience

Graceful Degradation

# Critical errors halt pipeline
if not extract_response.success:
    log_error(f"Extract failed: {response.error}")
    return False  # Stop execution

# Non-critical errors are logged but don't stop pipeline
if build_failed:
    log_error("Build verification failed")
    # Continue to POLISH phase - it will fix the build

State Recovery

corebolt status   # Inspect current phase and errors
corebolt resume   # Resume from last successful phase

HTTP Error Handling

try:
    response = httpx.post(url, json=payload, timeout=300.0)
    response.raise_for_status()
except httpx.HTTPStatusError as e:
    return AgentResponse(success=False, error=f"HTTP {e.response.status_code}")
except httpx.RequestError as e:
    return AgentResponse(success=False, error=f"Request error: {str(e)}")

The Tech Stack

| Component | Technology | | ------------------ | ----------------------------- | | Orchestrator | Python 3.11+ | | Agent Runtime | OpenCode (Claude Code fork) | | LLM Models | Claude Sonnet 4 & 4.5 | | Browser Automation | Browserbase MCP | | Output Framework | React + TypeScript + Tailwind | | UI Components | shadcn/ui | | Animations | Framer Motion | | HTTP Client | httpx (async-capable) | | State Management | JSON file-based |


What Makes This Interesting

1. Session-Based Context Preservation

Each agent maintains a session_id so it remembers previous conversations:

# First invocation
response = client.invoke(
    agent=AgentType.POLISH,
    prompt="Polish round 1: Fix spacing issues",
    session_id="ses_polish_123"
)

# Second invocation (remembers round 1)
response = client.invoke(
    agent=AgentType.POLISH,
    prompt="Polish round 2: Check color palette",
    session_id="ses_polish_123"  # Same session
)

This is critical for iterative agents like POLISH that need to remember what they've already fixed.

2. Multi-Model Strategy

Not all agents use the same LLM:

  • PLANNER: Claude Sonnet 4.5 (best reasoning for architecture decisions)
  • Others: Claude Sonnet 4 (faster, cheaper for execution tasks)

This balances quality (use best model where it matters) vs cost (use cheaper model for routine tasks).

3. Human-in-the-Loop Checkpoints

The pipeline has natural pause points:

  • After EXTRACT: Review extracted content for accuracy
  • After REFERENCE: Verify design analysis is correct
  • After BUILD: Manually test the app
  • During POLISH: Inspect screenshot comparisons

You can stop, inspect outputs, and resume with corebolt resume.

4. File-Based Inter-Agent Communication

Agents don't use complex message queues or databases. They just read/write files:

workspace/
  extraction/
    content.json          ← EXTRACT writes, PLANNER reads
  reference/
    screenshots/          ← REFERENCE writes, POLISH reads
    analysis.md           ← REFERENCE writes, PLANNER reads
  plan/
    style-spec.md         ← PLANNER writes, BUILDER reads
    component-specs.md    ← PLANNER writes, BUILDER reads
  src/
    components/           ← BUILDER writes, POLISH reads
  content/
    copy.json             ← COPYWRITE writes, BUILDER reads
    assets.json           ← ASSETS writes, BUILDER reads

Pros: Simple, debuggable, versionable (git tracks everything) Cons: No parallelization, potential file locks


Limitations & Future Improvements

Current Limitations

  1. No parallel execution: COPYWRITE and ASSETS could run simultaneously
  2. No retry logic: Failed agents require manual restart
  3. Fixed pipeline: Can't dynamically reorder agents based on errors
  4. Global timeout: All agents share 5-minute timeout (EXTRACT might need longer)
  5. No rate limiting: Could hit API limits on Pexels/Unsplash

Potential Improvements

  1. Parallel agent groups: Run independent agents concurrently

    await asyncio.gather(
        invoke_async(AgentType.COPYWRITE),
        invoke_async(AgentType.ASSETS)
    )
    
  2. Dynamic agent spawning: If BUILDER fails, spawn a DEBUG agent

    if build_failed:
        invoke(AgentType.DEBUG, error_log)
    
  3. Agent communication protocol: Use message queue instead of files

    # Agent A publishes
    queue.publish("design_tokens", {...})
    
    # Agent B subscribes
    tokens = queue.subscribe("design_tokens")
    
  4. Adaptive timeouts: Per-agent timeout configuration

    extract:
      timeout: 600 # 10 minutes for browser automation
    builder:
      timeout: 900 # 15 minutes for complex builds
    

Why This Matters

Time Savings

Manual website cloning: 40-80 hours (design analysis, React conversion, responsive testing)

With agent system: 15-30 minutes (mostly waiting for BUILD and POLISH phases)

~100x speedup for template-based websites.

Consistency

Agents follow strict patterns:

  • Always use Tailwind tokens (no arbitrary values)
  • Always implement mobile-first responsive design
  • Always use shadcn/ui components
  • Always match reference animations

Humans are inconsistent. Agents follow rules.

Reproducibility

The entire pipeline is deterministic:

corebolt run \
  --source https://source-site.com \
  --reference https://template-demo.com

Same inputs → Same outputs. Great for testing and iteration.


Code Snippets

Agent Client (orchestrator/agents.py)

class AgentClient:
    def invoke(self, agent: AgentType, prompt: str,
               session_id: str | None = None) -> AgentResponse:
        payload = {
            "agent": agent.value,
            "message": prompt,
            "session_id": session_id,
        }

        try:
            response = self._client.post(
                f"{self.base_url}/api/chat",
                json=payload,
                timeout=self.timeout
            )
            response.raise_for_status()

            data = response.json()
            return AgentResponse(
                success=True,
                content=data.get("content", ""),
                session_id=data.get("session_id")
            )
        except httpx.HTTPStatusError as e:
            return AgentResponse(
                success=False,
                error=f"HTTP error {e.response.status_code}"
            )

Pipeline Orchestrator (orchestrator/pipeline.py)

class Pipeline:
    def run(self, resume: bool = False) -> bool:
        if resume:
            self._load_state()

        if self.state.phase == PipelinePhase.EXTRACT:
            if not self._phase_extract():
                return False
            self._advance_phase(PipelinePhase.REFERENCE)

        if self.state.phase == PipelinePhase.REFERENCE:
            if not self._phase_reference():
                return False
            self._advance_phase(PipelinePhase.PLAN)

        # ... continues through all phases

        return True

    def _advance_phase(self, next_phase: PipelinePhase):
        self.state.phase = next_phase
        self._save_state()  # Persist after every phase

Conclusion

This is a workflow orchestration system optimized for complex, multi-step creative tasks with strong sequential dependencies.

Key Takeaways:

  • 7 specialized agents beat one general-purpose agent
  • Sequential execution prevents chaos (for now)
  • File-based communication is simple and debuggable
  • Session continuity enables iterative refinement
  • State persistence makes long-running workflows resumable

Not every problem needs parallel agents. Sometimes, a well-ordered assembly line is exactly what you need.