Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

TL;DR

We built a system that coordinates 7 specialized AI agents in a sequential pipeline to clone entire websites from reference templates. Each agent is an expert in one domain (extraction, planning, building, polishing), and they communicate through a shared workspace to transform a design reference into a production-ready React app.

The Agent Architecture

Meet the Team

Our system uses 7 specialized agents, each with specific tools and responsibilities:

EXTRACT Agent - Web scraping specialist
- Uses Browserbase MCP for browser automation
- Captures desktop/mobile screenshots at multiple viewports
- Extracts text content, structure, and assets
- Downloads source site images
- Output: workspace/extraction/*.json with structured content
REFERENCE Agent - Design system analyst
- Screenshots reference template at 5+ viewport sizes
- Downloads HTML/CSS source code
- Analyzes design tokens (colors, typography, spacing)
- Documents component patterns and animations
- Output: workspace/reference/ with screenshots + source + analysis
PLANNER Agent - Implementation architect (uses Claude Sonnet 4.5)
- Maps source content to reference design
- Extracts exact CSS values for design tokens
- Creates component specifications
- Plans page layouts and routing
- Output: workspace/plan/*.md with detailed implementation specs
BUILDER Agent - React developer
- Translates designs to React + TypeScript + Tailwind
- Implements animations with Framer Motion
- Integrates shadcn/ui components
- Creates responsive layouts
- Output: Full React app in workspace/src/
COPYWRITE Agent - Content specialist
- Generates tone-appropriate copy (luxury/SaaS/playful/etc)
- Fills content gaps from extraction
- Enhances headlines and CTAs
- Output: workspace/content/copy.json
ASSETS Agent - Image curator
- Searches Pexels/Unsplash APIs
- Curates style-matching stock photos
- Downloads optimized sizes
- Output: workspace/public/images/ + metadata
POLISH Agent - QA engineer
- Screenshots built site vs reference
- Identifies visual discrepancies
- Makes iterative fixes (max 3 rounds)
- Output: workspace/screenshots/round_N/ + critique files

The Orchestration Strategy

Sequential Pipeline Execution

Unlike systems that spam parallel agents, we use a strict sequential pipeline:

INIT → EXTRACT → REFERENCE → PLAN → BUILD →
COPYWRITE → ASSETS → VERIFY_BUILD → DEV_SERVER → POLISH → COMPLETE

Why sequential?

Data dependencies: Each agent needs outputs from previous agents
Workspace coordination: All agents write to the same directory
Resource management: Prevents file write conflicts
Predictability: Clear execution flow, easy to debug

Hub-and-Spoke Communication

Agents don't talk to each other directly. They communicate through:

HTTP API: Orchestrator sends prompts to OpenCode server (http://localhost:4096/api/chat)
File System: Agents read outputs from previous agents in shared workspace/
Session IDs: Maintains conversational context across agent invocations

Pipeline Orchestrator (Hub)
    ↓ HTTP POST
EXTRACT Agent → Writes extraction/*.json
    ↓ File System
REFERENCE Agent → Reads extraction, writes reference/*
    ↓ File System
PLANNER Agent → Reads both, writes plan/*.md
    ... and so on

State Machine with Resume Capability

The pipeline is a state machine that saves progress to .pipeline_state.json:

{
  "phase": "BUILD",
  "start_time": "2026-01-31T10:30:00",
  "polish_rounds": 0,
  "session_ids": {
    "extract": "ses_abc123",
    "reference": "ses_def456",
    "planner": "ses_ghi789"
  },
  "errors": []
}

Resume from anywhere: If an agent fails, you can restart the pipeline from the last successful phase:

corebolt resume

The Iterative Polish Loop

The POLISH agent is unique—it's the only agent that runs iteratively:

while polish_rounds < 3:  # Max 3 iterations
    1. Screenshot the built site at multiple viewports
    2. Compare to reference screenshots
    3. Write a critique identifying visual bugs
    4. Make fixes to React components
    5. Check if "POLISH_COMPLETE" marker exists
    6. If complete, break; else, repeat

This creates a self-improving loop where the agent:

Identifies spacing issues ("gap should be 24px, not 16px")
Fixes color mismatches ("bg-gray-100 should be bg-gray-50")
Adjusts responsive breakpoints
Refines animations

Each round saves screenshots to workspace/screenshots/round_N/ for debugging.

Performance & Resource Management

Timeouts

Agent timeout: 5 minutes per invocation (for long-running tasks like BUILD)
HTTP timeout: 5 minutes for OpenCode API calls
No retry logic: Failed agents are logged but not auto-retried

Optimizations

Resume capability: Skip completed phases on restart
Workspace isolation: Each project gets its own directory
Non-blocking build: Build failures don't halt pipeline (POLISH fixes them)
Background dev server: npm run dev runs in subprocess
Health checks: Verify OpenCode server before starting

Resource Limits

Polish rounds: Max 3 iterations to prevent infinite loops
No rate limiting: External API calls (Pexels, Unsplash) are unbounded
No parallel execution: All agents run sequentially (could optimize COPYWRITE + ASSETS to run in parallel)

Configuration & Customization

Agent Definitions (`.opencode/agents/*.md`)

Each agent is defined with YAML frontmatter:

---
name: planner
description: Creates implementation roadmap
mode: primary
model: anthropic/claude-sonnet-4-5 # Uses most advanced model
tools:
  read: true
  write: true
  bash: true
  glob: true
---
[Detailed markdown instructions...]

Key features:

Model selection: Different agents use different Claude versions
Tool access: Fine-grained control (e.g., EXTRACT gets webfetch, BUILDER doesn't)
Instructions: Embedded prompts with examples and constraints

Pattern Library (`patterns/`)

Agents reference curated design patterns:

Color palettes: 7 presets (dark-luxury, warm-neutral, tech-blue)
Heroes: 5 patterns (editorial, minimal-centered, bento-intro)
Navigation: 4 patterns (minimal-corners, centered-logo)
Footers: 3 patterns (cta-footer, mega-footer, minimal-bar)

PLANNER agent selects patterns, BUILDER agent implements them.

Error Handling & Resilience

Graceful Degradation

# Critical errors halt pipeline
if not extract_response.success:
    log_error(f"Extract failed: {response.error}")
    return False  # Stop execution

# Non-critical errors are logged but don't stop pipeline
if build_failed:
    log_error("Build verification failed")
    # Continue to POLISH phase - it will fix the build

State Recovery

corebolt status   # Inspect current phase and errors
corebolt resume   # Resume from last successful phase

HTTP Error Handling

try:
    response = httpx.post(url, json=payload, timeout=300.0)
    response.raise_for_status()
except httpx.HTTPStatusError as e:
    return AgentResponse(success=False, error=f"HTTP {e.response.status_code}")
except httpx.RequestError as e:
    return AgentResponse(success=False, error=f"Request error: {str(e)}")

The Tech Stack

| Component | Technology | | ------------------ | ----------------------------- | | Orchestrator | Python 3.11+ | | Agent Runtime | OpenCode (Claude Code fork) | | LLM Models | Claude Sonnet 4 & 4.5 | | Browser Automation | Browserbase MCP | | Output Framework | React + TypeScript + Tailwind | | UI Components | shadcn/ui | | Animations | Framer Motion | | HTTP Client | httpx (async-capable) | | State Management | JSON file-based |

What Makes This Interesting

1. Session-Based Context Preservation

Each agent maintains a session_id so it remembers previous conversations:

# First invocation
response = client.invoke(
    agent=AgentType.POLISH,
    prompt="Polish round 1: Fix spacing issues",
    session_id="ses_polish_123"
)

# Second invocation (remembers round 1)
response = client.invoke(
    agent=AgentType.POLISH,
    prompt="Polish round 2: Check color palette",
    session_id="ses_polish_123"  # Same session
)

This is critical for iterative agents like POLISH that need to remember what they've already fixed.

2. Multi-Model Strategy

Not all agents use the same LLM:

PLANNER: Claude Sonnet 4.5 (best reasoning for architecture decisions)
Others: Claude Sonnet 4 (faster, cheaper for execution tasks)

This balances quality (use best model where it matters) vs cost (use cheaper model for routine tasks).

3. Human-in-the-Loop Checkpoints

The pipeline has natural pause points:

After EXTRACT: Review extracted content for accuracy
After REFERENCE: Verify design analysis is correct
After BUILD: Manually test the app
During POLISH: Inspect screenshot comparisons

You can stop, inspect outputs, and resume with corebolt resume.

4. File-Based Inter-Agent Communication

Agents don't use complex message queues or databases. They just read/write files:

workspace/
  extraction/
    content.json          ← EXTRACT writes, PLANNER reads
  reference/
    screenshots/          ← REFERENCE writes, POLISH reads
    analysis.md           ← REFERENCE writes, PLANNER reads
  plan/
    style-spec.md         ← PLANNER writes, BUILDER reads
    component-specs.md    ← PLANNER writes, BUILDER reads
  src/
    components/           ← BUILDER writes, POLISH reads
  content/
    copy.json             ← COPYWRITE writes, BUILDER reads
    assets.json           ← ASSETS writes, BUILDER reads

Pros: Simple, debuggable, versionable (git tracks everything) Cons: No parallelization, potential file locks

Limitations & Future Improvements

Current Limitations

No parallel execution: COPYWRITE and ASSETS could run simultaneously
No retry logic: Failed agents require manual restart
Fixed pipeline: Can't dynamically reorder agents based on errors
Global timeout: All agents share 5-minute timeout (EXTRACT might need longer)
No rate limiting: Could hit API limits on Pexels/Unsplash

Potential Improvements

Parallel agent groups: Run independent agents concurrently

await asyncio.gather(
    invoke_async(AgentType.COPYWRITE),
    invoke_async(AgentType.ASSETS)
)

Dynamic agent spawning: If BUILDER fails, spawn a DEBUG agent
```
if build_failed:
    invoke(AgentType.DEBUG, error_log)
```

Agent communication protocol: Use message queue instead of files

# Agent A publishes
queue.publish("design_tokens", {...})

# Agent B subscribes
tokens = queue.subscribe("design_tokens")

Adaptive timeouts: Per-agent timeout configuration

extract:
  timeout: 600 # 10 minutes for browser automation
builder:
  timeout: 900 # 15 minutes for complex builds

Why This Matters

Time Savings

Manual website cloning: 40-80 hours (design analysis, React conversion, responsive testing)

With agent system: 15-30 minutes (mostly waiting for BUILD and POLISH phases)

~100x speedup for template-based websites.

Consistency

Agents follow strict patterns:

Always use Tailwind tokens (no arbitrary values)
Always implement mobile-first responsive design
Always use shadcn/ui components
Always match reference animations

Humans are inconsistent. Agents follow rules.

Reproducibility

The entire pipeline is deterministic:

corebolt run \
  --source https://source-site.com \
  --reference https://template-demo.com

Same inputs → Same outputs. Great for testing and iteration.

Code Snippets

Agent Client (`orchestrator/agents.py`)

class AgentClient:
    def invoke(self, agent: AgentType, prompt: str,
               session_id: str | None = None) -> AgentResponse:
        payload = {
            "agent": agent.value,
            "message": prompt,
            "session_id": session_id,
        }

        try:
            response = self._client.post(
                f"{self.base_url}/api/chat",
                json=payload,
                timeout=self.timeout
            )
            response.raise_for_status()

            data = response.json()
            return AgentResponse(
                success=True,
                content=data.get("content", ""),
                session_id=data.get("session_id")
            )
        except httpx.HTTPStatusError as e:
            return AgentResponse(
                success=False,
                error=f"HTTP error {e.response.status_code}"
            )

Pipeline Orchestrator (`orchestrator/pipeline.py`)

class Pipeline:
    def run(self, resume: bool = False) -> bool:
        if resume:
            self._load_state()

        if self.state.phase == PipelinePhase.EXTRACT:
            if not self._phase_extract():
                return False
            self._advance_phase(PipelinePhase.REFERENCE)

        if self.state.phase == PipelinePhase.REFERENCE:
            if not self._phase_reference():
                return False
            self._advance_phase(PipelinePhase.PLAN)

        # ... continues through all phases

        return True

    def _advance_phase(self, next_phase: PipelinePhase):
        self.state.phase = next_phase
        self._save_state()  # Persist after every phase

Conclusion

This is a workflow orchestration system optimized for complex, multi-step creative tasks with strong sequential dependencies.

Key Takeaways:

7 specialized agents beat one general-purpose agent
Sequential execution prevents chaos (for now)
File-based communication is simple and debuggable
Session continuity enables iterative refinement
State persistence makes long-running workflows resumable

Not every problem needs parallel agents. Sometimes, a well-ordered assembly line is exactly what you need.

Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

TL;DR

The Agent Architecture

Meet the Team

The Orchestration Strategy

Sequential Pipeline Execution

Hub-and-Spoke Communication

State Machine with Resume Capability

The Iterative Polish Loop

Performance & Resource Management

Timeouts

Optimizations

Resource Limits

Configuration & Customization

Agent Definitions (.opencode/agents/*.md)

Pattern Library (patterns/)

Error Handling & Resilience

Graceful Degradation

State Recovery

HTTP Error Handling

The Tech Stack

What Makes This Interesting

1. Session-Based Context Preservation

2. Multi-Model Strategy

3. Human-in-the-Loop Checkpoints

4. File-Based Inter-Agent Communication

Limitations & Future Improvements

Current Limitations

Potential Improvements

Why This Matters

Time Savings

Consistency

Reproducibility

Code Snippets

Agent Client (orchestrator/agents.py)

Pipeline Orchestrator (orchestrator/pipeline.py)

Conclusion

Agent Definitions (`.opencode/agents/*.md`)

Pattern Library (`patterns/`)

Agent Client (`orchestrator/agents.py`)

Pipeline Orchestrator (`orchestrator/pipeline.py`)