Paitho
← Product / Self-Teaching Loop
The compounding moat

Prompts you can
measure.

Every LLM call writes a row. Every funnel event back-links to the call that drafted the email. After a month, "which pitch variant is winning" is a SQL query, not an opinion.

What gets logged. What gets answered.

Inputs (per LLM call)
stage_runs
  • stage_runs.prompt_id — versioned, scoped
  • stage_runs.input_presence — which facts were available
  • stage_runs.missing_required — which hard inputs forced abort
  • stage_runs.engine — anthropic / google / perplexity
  • stage_runs.model — exact model id
  • stage_runs.tokens_in + tokens_out
  • stage_runs.output_meta — templates fired, signals emitted, etc.
Outputs (queryable answers)
analytics
  • Which pitch_brief variant has the highest promote-to-reply rate on a given vertical
  • Which pain-signal combinations correlate with reply vs ignore
  • Which solution templates actually convert vs just get fired
  • Whether switching email_draft from sonnet to opus is worth the cost
  • Which ICPs have the highest missing_required rate (signals to fix)
  • Per-prompt × vertical × subcategory funnel attribution

Versioned.
Scoped.
Attributed.

prompt → call → stage_run → output → draft → send → funnel_event

join back to prompt_id

per-prompt outcome metric

Every prompt in the system carries a version, a scope, and a tier. The version increments on any meaningful edit. The scope is a tuple — (lead_type, vertical, subcategory, brand) — and tiers are baseline, vertical-override, and subcategory-override. When the runtime needs a pitch_brief prompt, it walks a 5-step fallback resolution from most-specific to most-general scope and picks the first match.

Every call writes a stage_runs row before it executes — prompt_id, input presence, engine, model, token estimates. After completion, tokens_out and output_meta backfill. If the call aborts because hard inputs are missing, that's also recorded. This means even abort patterns are queryable: "which prompt has the highest abort rate, and on which input field?" That tells you what data to fetch, not what prompt to rewrite.

Downstream, every email draft records pitch_prompt_id, email_prompt_id, and tune_prompt_id. Every send records email_prompt_id. Every funnel event on a lead — replied, call_booked, demo, closed_won, closed_lost — joins back to the exact prompt versions that drafted the outreach. After a quarter of operation, you can compare promote-to-reply rate between two pitch_brief variants on the same vertical-subcategory pair and pick the winner. The competitor running mail-merge cannot reconstruct this. Their prompts aren't versioned, their outputs aren't attributed.

What a registry entry looks like.

registry · pitch_brief_v14 +
--- entry ---
prompt_id:    pitch_brief_v14
stage:        pitch_brief
tier:         vertical_override
scope:
  pack:        devtools
  vertical:    kubernetes
  subcategory: b2b
  brand:       acme

input_contract:
  required:  [lead.painSignals, lead.aiSummary, brand.kbExcerpts]
  soft:      [lead.namedLeaders, lead.socialResearch]
  abort_on:  any required missing

engine_pref:  anthropic
model_pref:   claude-sonnet-4-6
fallback_to:  pitch_brief_v9 (baseline tier)

--- attached metrics (last 30d) ---
calls:                 412
abort_rate:            7.0%   # missing_required: aiSummary
promote_rate:          68.4%  # human approves draft
reply_rate:            — PLACEHOLDER —
call_booked_rate:      — PLACEHOLDER —
avg_tokens_in / out:   3,914 / 842

--- variant under test ---
prompt_id:    pitch_brief_v15  # 10% traffic split
delta:        new system instruction on opener grounding
status:       live · 9 days

# <!-- PLACEHOLDER — full registry available in app -->

What can break.
And what catches it.

Risk
Small-sample overfitting

A new prompt variant looks great after 12 calls. You promote it. It loses badly at scale.

Mitigation

Tier-based prompt resolution: 5-step fallback to baseline. Variants are flagged "experimental" until they cross a minimum-call threshold.

Risk
Model drift

An LLM provider silently updates the model behind a name. Performance changes overnight.

Mitigation

Exact model id is logged on every call. Date-keyed metrics surface step-changes. Pin to specific model versions when stability matters.

Risk
Attribution loss

A reply comes in two months after send. The prompt that drafted it has been edited.

Mitigation

Prompts are immutable on version increment. The exact prompt_id at send time is preserved on the email_message row, not refetched.

The self-teaching loop isn't a stage. It's the bookkeeping that wraps every AI stage in the pipeline.

highlighted = AI stages writing to stage_runs and back-linked from funnel events

Stop guessing which pitch works.
Start measuring it.

See the prompt registry, stage_runs, and funnel attribution in a 30-minute walkthrough.