A 20-year-old shipped a production AI system in 9 months: 10 patterns from 4 real failures

A MacBook with lines of code on its screen on a busy desk

A 20-year-old college student in Japan started writing code nine months ago by talking to Claude Code. What came out the other side is a production B2B automation system, four customer-facing failures, and a set of 10 engineering patterns he’s now packaging as a paid blueprint and a free MIT-licensed skills bundle.

The product is called Arena Blueprint, shipping May 13, 2026. This is the technical write-up behind it, and the patterns are worth reading whether or not you buy anything.

What He Built and Ran in Production

The system is a B2B proposal pipeline running on Lancers and CrowdWorks (Japan’s major freelance platforms). It handles automated proposal submissions, customer reply management, content distribution, and a self-evolution loop that feeds past failures back into future prompts.

That last part matters. The system wasn’t just automated. It was designed to get slightly less wrong over time by learning from its own errors.

The 4 Failures That Became the Blueprint

The patterns didn’t come from theory. They came from four production incidents that touched real customers.

  • Duplicate sends: The dedup mechanism was an in-memory Set that reset every time the watcher restarted. The retry loop did exactly what it was told. Same proposal, sent four times to the same person.
  • Silent skips: A PowerShell encoding error was swallowed without any log entry. Tasks dropped with no trace.
  • Dead schedulers: Daemons died quietly and stayed dead for weeks. No alert. No dual-trigger health check.
  • Dedup race: A Make.com Iterator output shape mismatch consumed 30 minutes of debugging before the shape of the incident was recognized.

His observation: none of these were AI failures. They were plain systems failures with an AI in the loop.

person using macbook pro on table

️ Pattern 1: The Inbox Pattern (File-Based Dedup)

The fix for duplicate sends moves dedup state off memory and onto disk. An append-only JSONL file. Two operations: has() and commit().

// inbox.js — minimal core
const fs = require('fs');
const path = require('path');
const crypto = require('crypto');

const INBOX = path.join(__dirname, '.state/inbox.jsonl');

function hashKey(parts) {
  return crypto.createHash('sha256').update(parts.join('|')).digest('hex').slice(0, 16);
}

function has(parts) {
  const key = hashKey(parts);
  if (!fs.existsSync(INBOX)) return false;
  const lines = fs.readFileSync(INBOX, 'utf8').split('n');
  return lines.some((l) => l && JSON.parse(l).key === key);
}

function commit(parts, meta = {}) {
  const key = hashKey(parts);
  const record = { key, ts: new Date().toISOString(), ...meta };
  fs.mkdirSync(path.dirname(INBOX), { recursive: true });
  fs.appendFileSync(INBOX, JSON.stringify(record) + 'n');
}

module.exports = { has, commit, hashKey };

The rule is simple: before any side-effect that touches a customer, call inbox.has(parts). If it returns true, exit. Otherwise send, then call inbox.commit(parts).

if (inbox.has(['proposal', clientId, today])) {
  console.log('skip — already sent');
  return;
}
await sender.sendProposal(clientId);
inbox.commit(['proposal', clientId, today], { channel: 'lancers' });

No database. No Redis. An append-only file you can also cat to debug. His advice: reach for the database when you’ve outgrown the JSONL file, not because you’re afraid you’ll outgrow one.

️ Pattern 2: The 10-Layer Validator

Before building this, validation was a single Claude prompt: “is this proposal text OK?” At around 80 sends per week, that cost roughly $40/week on validation alone. It still let a typo-laden draft through.

The fix is ordering validation layers by cost. Cheap deterministic checks first. Expensive LLM judge last. If any layer rejects, stop.

LayerNameWhat It Checks
0TypographyFull-width / half-width character consistency
1Mine fieldsBanned phrases (legal, brand)
2PlaceholdersUnfilled {{client_name}} tokens
3Duplicate sendCalls inbox.has()
4Brand guardRequired disclosure strings present
5SanityLength, link count, encoding
6ReflectionKnown-failure patterns from past incidents
7Platform rulesSite-specific limits (Lancers 20-char title, etc.)
8Secrets leakEmails, phone numbers, API key shapes
9Sender auditStatic analysis of the caller (runs at code-merge, not runtime)

Layer 9 runs at code-merge. Any new sender must call await inbox.has(...) before the side-effect. If it doesn’t, the build fails. It’s a CI gate, not a vibe check.

The outcome: 9 of 10 problems caught by cheap deterministic rules. The Claude judge runs on roughly 10% of cases. Validation cost dropped from $40/week to approximately $4/week. Rejection reasons are legible: “Layer 4: missing disclosure” is fixable. “The model said no, somehow” is not.

Woman working at a desk in a cozy home office.

The Other 8 Patterns

Patterns 1 and 2 are documented here with code. The remaining eight are in the Blueprint with their own failure logs:

  • Atomic State Writer: Race-free shared-state writes using the rename trick
  • Polling Watcher: Lock plus dual triggers so daemons don’t quietly die
  • Sender Pattern Audit: Static analysis as a CI gate
  • Generator + Validator Split: Mandatory post-check layer on every output
  • Reflection Loop: Auto-collect failures and inject them into the next prompt
  • Devil’s Advocate Council: Institutionalized dissent before unrecoverable decisions
  • Time-to-Detect Log: Measure how long until anomalies are noticed, then shrink it
  • Predictions Registry: Force the agent to commit predictions you can verify later

What’s Free and What’s Paid

The Skills Bundle on GitHub is free and MIT-licensed: roughly 12 skills covering all 10 patterns, ready to drop into a Claude Code session.

The Blueprint on Notion is paid: deep dives on each pattern, the four failure logs in full, and the trade-offs he’d flag if you asked him in person. It works independently of the code bundle. The code bundle also works independently of the Notion doc.

Three tiers:

  • $39: Blueprint only
  • $99: Blueprint + Skills bundle
  • $199: Blueprint + Skills bundle + a 30-minute call where he reviews your stack and identifies which 3 modules to start with

The Replication Playbook

If you’re building any agent system that touches customers or money, the two patterns worth dropping in first are the Inbox Pattern and the 10-Layer Validator. Together they eliminate the most common class of production failures: duplicate side-effects and undetected bad output.

The broader lesson from the failure log: most AI agent bugs aren’t model problems. They’re file systems that don’t persist, watchers that don’t report their own death, and validators that have one job but aren’t named. Fix the infrastructure around the model before you try to fix the model.

Stay on top of AI & Automation with BizStack Newsletter
BizStack  —  Entrepreneur’s Business Stack
Logo