Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration

Nicholaus Link
Feb 20, 2026 · 12 min read
Building with AI Agents: A Technical Deep-Dive into Multi-Agent Orchestration
TL;DR
We built a system that coordinates 7 specialized AI agents in a sequential pipeline to clone entire websites from reference templates. Each agent is an expert in one domain (extraction, planning, building, polishing), and they communicate through a shared workspace to transform a design reference into a production-ready React app.
The Agent Architecture
Meet the Team
Our system uses 7 specialized agents, each with specific tools and responsibilities:
-
EXTRACT Agent - Web scraping specialist
- Uses Browserbase MCP for browser automation
- Captures desktop/mobile screenshots at multiple viewports
- Extracts text content, structure, and assets
- Downloads source site images
- Output:
workspace/extraction/*.jsonwith structured content
-
REFERENCE Agent - Design system analyst
- Screenshots reference template at 5+ viewport sizes
- Downloads HTML/CSS source code
- Analyzes design tokens (colors, typography, spacing)
- Documents component patterns and animations
- Output:
workspace/reference/with screenshots + source + analysis
-
PLANNER Agent - Implementation architect (uses Claude Sonnet 4.5)
- Maps source content to reference design
- Extracts exact CSS values for design tokens
- Creates component specifications
- Plans page layouts and routing
- Output:
workspace/plan/*.mdwith detailed implementation specs
-
BUILDER Agent - React developer
- Translates designs to React + TypeScript + Tailwind
- Implements animations with Framer Motion
- Integrates shadcn/ui components
- Creates responsive layouts
- Output: Full React app in
workspace/src/
-
COPYWRITE Agent - Content specialist
- Generates tone-appropriate copy (luxury/SaaS/playful/etc)
- Fills content gaps from extraction
- Enhances headlines and CTAs
- Output:
workspace/content/copy.json
-
ASSETS Agent - Image curator
- Searches Pexels/Unsplash APIs
- Curates style-matching stock photos
- Downloads optimized sizes
- Output:
workspace/public/images/+ metadata
-
POLISH Agent - QA engineer
- Screenshots built site vs reference
- Identifies visual discrepancies
- Makes iterative fixes (max 3 rounds)
- Output:
workspace/screenshots/round_N/+ critique files
The Orchestration Strategy
Sequential Pipeline Execution
Unlike systems that spam parallel agents, we use a strict sequential pipeline:
INIT → EXTRACT → REFERENCE → PLAN → BUILD →
COPYWRITE → ASSETS → VERIFY_BUILD → DEV_SERVER → POLISH → COMPLETE
Why sequential?
- Data dependencies: Each agent needs outputs from previous agents
- Workspace coordination: All agents write to the same directory
- Resource management: Prevents file write conflicts
- Predictability: Clear execution flow, easy to debug
Hub-and-Spoke Communication
Agents don't talk to each other directly. They communicate through:
- HTTP API: Orchestrator sends prompts to OpenCode server (
http://localhost:4096/api/chat) - File System: Agents read outputs from previous agents in shared
workspace/ - Session IDs: Maintains conversational context across agent invocations
Pipeline Orchestrator (Hub)
↓ HTTP POST
EXTRACT Agent → Writes extraction/*.json
↓ File System
REFERENCE Agent → Reads extraction, writes reference/*
↓ File System
PLANNER Agent → Reads both, writes plan/*.md
... and so on
State Machine with Resume Capability
The pipeline is a state machine that saves progress to .pipeline_state.json:
{
"phase": "BUILD",
"start_time": "2026-01-31T10:30:00",
"polish_rounds": 0,
"session_ids": {
"extract": "ses_abc123",
"reference": "ses_def456",
"planner": "ses_ghi789"
},
"errors": []
}
Resume from anywhere: If an agent fails, you can restart the pipeline from the last successful phase:
corebolt resume
The Iterative Polish Loop
The POLISH agent is unique—it's the only agent that runs iteratively:
while polish_rounds < 3: # Max 3 iterations
1. Screenshot the built site at multiple viewports
2. Compare to reference screenshots
3. Write a critique identifying visual bugs
4. Make fixes to React components
5. Check if "POLISH_COMPLETE" marker exists
6. If complete, break; else, repeat
This creates a self-improving loop where the agent:
- Identifies spacing issues ("gap should be 24px, not 16px")
- Fixes color mismatches ("bg-gray-100 should be bg-gray-50")
- Adjusts responsive breakpoints
- Refines animations
Each round saves screenshots to workspace/screenshots/round_N/ for debugging.
Performance & Resource Management
Timeouts
- Agent timeout: 5 minutes per invocation (for long-running tasks like BUILD)
- HTTP timeout: 5 minutes for OpenCode API calls
- No retry logic: Failed agents are logged but not auto-retried
Optimizations
- Resume capability: Skip completed phases on restart
- Workspace isolation: Each project gets its own directory
- Non-blocking build: Build failures don't halt pipeline (POLISH fixes them)
- Background dev server:
npm run devruns in subprocess - Health checks: Verify OpenCode server before starting
Resource Limits
- Polish rounds: Max 3 iterations to prevent infinite loops
- No rate limiting: External API calls (Pexels, Unsplash) are unbounded
- No parallel execution: All agents run sequentially (could optimize COPYWRITE + ASSETS to run in parallel)
Configuration & Customization
Agent Definitions (.opencode/agents/*.md)
Each agent is defined with YAML frontmatter:
---
name: planner
description: Creates implementation roadmap
mode: primary
model: anthropic/claude-sonnet-4-5 # Uses most advanced model
tools:
read: true
write: true
bash: true
glob: true
---
[Detailed markdown instructions...]
Key features:
- Model selection: Different agents use different Claude versions
- Tool access: Fine-grained control (e.g., EXTRACT gets
webfetch, BUILDER doesn't) - Instructions: Embedded prompts with examples and constraints
Pattern Library (patterns/)
Agents reference curated design patterns:
- Color palettes: 7 presets (dark-luxury, warm-neutral, tech-blue)
- Heroes: 5 patterns (editorial, minimal-centered, bento-intro)
- Navigation: 4 patterns (minimal-corners, centered-logo)
- Footers: 3 patterns (cta-footer, mega-footer, minimal-bar)
PLANNER agent selects patterns, BUILDER agent implements them.
Error Handling & Resilience
Graceful Degradation
# Critical errors halt pipeline
if not extract_response.success:
log_error(f"Extract failed: {response.error}")
return False # Stop execution
# Non-critical errors are logged but don't stop pipeline
if build_failed:
log_error("Build verification failed")
# Continue to POLISH phase - it will fix the build
State Recovery
corebolt status # Inspect current phase and errors
corebolt resume # Resume from last successful phase
HTTP Error Handling
try:
response = httpx.post(url, json=payload, timeout=300.0)
response.raise_for_status()
except httpx.HTTPStatusError as e:
return AgentResponse(success=False, error=f"HTTP {e.response.status_code}")
except httpx.RequestError as e:
return AgentResponse(success=False, error=f"Request error: {str(e)}")
The Tech Stack
| Component | Technology | | ------------------ | ----------------------------- | | Orchestrator | Python 3.11+ | | Agent Runtime | OpenCode (Claude Code fork) | | LLM Models | Claude Sonnet 4 & 4.5 | | Browser Automation | Browserbase MCP | | Output Framework | React + TypeScript + Tailwind | | UI Components | shadcn/ui | | Animations | Framer Motion | | HTTP Client | httpx (async-capable) | | State Management | JSON file-based |
What Makes This Interesting
1. Session-Based Context Preservation
Each agent maintains a session_id so it remembers previous conversations:
# First invocation
response = client.invoke(
agent=AgentType.POLISH,
prompt="Polish round 1: Fix spacing issues",
session_id="ses_polish_123"
)
# Second invocation (remembers round 1)
response = client.invoke(
agent=AgentType.POLISH,
prompt="Polish round 2: Check color palette",
session_id="ses_polish_123" # Same session
)
This is critical for iterative agents like POLISH that need to remember what they've already fixed.
2. Multi-Model Strategy
Not all agents use the same LLM:
- PLANNER: Claude Sonnet 4.5 (best reasoning for architecture decisions)
- Others: Claude Sonnet 4 (faster, cheaper for execution tasks)
This balances quality (use best model where it matters) vs cost (use cheaper model for routine tasks).
3. Human-in-the-Loop Checkpoints
The pipeline has natural pause points:
- After EXTRACT: Review extracted content for accuracy
- After REFERENCE: Verify design analysis is correct
- After BUILD: Manually test the app
- During POLISH: Inspect screenshot comparisons
You can stop, inspect outputs, and resume with corebolt resume.
4. File-Based Inter-Agent Communication
Agents don't use complex message queues or databases. They just read/write files:
workspace/
extraction/
content.json ← EXTRACT writes, PLANNER reads
reference/
screenshots/ ← REFERENCE writes, POLISH reads
analysis.md ← REFERENCE writes, PLANNER reads
plan/
style-spec.md ← PLANNER writes, BUILDER reads
component-specs.md ← PLANNER writes, BUILDER reads
src/
components/ ← BUILDER writes, POLISH reads
content/
copy.json ← COPYWRITE writes, BUILDER reads
assets.json ← ASSETS writes, BUILDER reads
Pros: Simple, debuggable, versionable (git tracks everything) Cons: No parallelization, potential file locks
Limitations & Future Improvements
Current Limitations
- No parallel execution: COPYWRITE and ASSETS could run simultaneously
- No retry logic: Failed agents require manual restart
- Fixed pipeline: Can't dynamically reorder agents based on errors
- Global timeout: All agents share 5-minute timeout (EXTRACT might need longer)
- No rate limiting: Could hit API limits on Pexels/Unsplash
Potential Improvements
-
Parallel agent groups: Run independent agents concurrently
await asyncio.gather( invoke_async(AgentType.COPYWRITE), invoke_async(AgentType.ASSETS) ) -
Dynamic agent spawning: If BUILDER fails, spawn a DEBUG agent
if build_failed: invoke(AgentType.DEBUG, error_log) -
Agent communication protocol: Use message queue instead of files
# Agent A publishes queue.publish("design_tokens", {...}) # Agent B subscribes tokens = queue.subscribe("design_tokens") -
Adaptive timeouts: Per-agent timeout configuration
extract: timeout: 600 # 10 minutes for browser automation builder: timeout: 900 # 15 minutes for complex builds
Why This Matters
Time Savings
Manual website cloning: 40-80 hours (design analysis, React conversion, responsive testing)
With agent system: 15-30 minutes (mostly waiting for BUILD and POLISH phases)
~100x speedup for template-based websites.
Consistency
Agents follow strict patterns:
- Always use Tailwind tokens (no arbitrary values)
- Always implement mobile-first responsive design
- Always use shadcn/ui components
- Always match reference animations
Humans are inconsistent. Agents follow rules.
Reproducibility
The entire pipeline is deterministic:
corebolt run \
--source https://source-site.com \
--reference https://template-demo.com
Same inputs → Same outputs. Great for testing and iteration.
Code Snippets
Agent Client (orchestrator/agents.py)
class AgentClient:
def invoke(self, agent: AgentType, prompt: str,
session_id: str | None = None) -> AgentResponse:
payload = {
"agent": agent.value,
"message": prompt,
"session_id": session_id,
}
try:
response = self._client.post(
f"{self.base_url}/api/chat",
json=payload,
timeout=self.timeout
)
response.raise_for_status()
data = response.json()
return AgentResponse(
success=True,
content=data.get("content", ""),
session_id=data.get("session_id")
)
except httpx.HTTPStatusError as e:
return AgentResponse(
success=False,
error=f"HTTP error {e.response.status_code}"
)
Pipeline Orchestrator (orchestrator/pipeline.py)
class Pipeline:
def run(self, resume: bool = False) -> bool:
if resume:
self._load_state()
if self.state.phase == PipelinePhase.EXTRACT:
if not self._phase_extract():
return False
self._advance_phase(PipelinePhase.REFERENCE)
if self.state.phase == PipelinePhase.REFERENCE:
if not self._phase_reference():
return False
self._advance_phase(PipelinePhase.PLAN)
# ... continues through all phases
return True
def _advance_phase(self, next_phase: PipelinePhase):
self.state.phase = next_phase
self._save_state() # Persist after every phase
Conclusion
This is a workflow orchestration system optimized for complex, multi-step creative tasks with strong sequential dependencies.
Key Takeaways:
- 7 specialized agents beat one general-purpose agent
- Sequential execution prevents chaos (for now)
- File-based communication is simple and debuggable
- Session continuity enables iterative refinement
- State persistence makes long-running workflows resumable
Not every problem needs parallel agents. Sometimes, a well-ordered assembly line is exactly what you need.