Paitho
← Product / Stage 05 · Pain Signals
Pipeline · Stage 05

The pack picks
the signals.

Pain signals are not a single global list. Every vertical pack ships its own taxonomy — built by an operator who has actually sold in that industry and knows which observations correlate with a reply. The pipeline can only fire pitch angles from signals the pack defines. The model can't compliment a lead on a strength they don't have.

Different industry, different signals.
Same pipeline.

The pack is the source of truth for what counts as a pain signal. An operator who has shipped outbound in the industry decides which observations actually correlate with a reply. Below, signals from three live packs.

Devtools pack tuned by ex-Datadog SRE
kubernetes_sprawl observability_gap on_call_burnout missing_oss_repo no_changelog_visible stale_runbooks + 18 more
Clinical Ops pack tuned by clinic admin
legacy_emr_lock manual_intake_forms no_patient_portal no_telehealth staffing_burnout prior_auth_backlog + 14 more
Fintech pack tuned by ex-Stripe lead
manual_kyc single_payment_gateway no_audit_trail pci_self_assessment reconciliation_manual no_chargeback_workflow + 16 more

A signal in one pack can be meaningless in another. no_chargeback_workflow is a sales hook for fintech and noise everywhere else. no_dealer_locator matters for industrial manufacturers and is irrelevant for SaaS. The pack draws the line. You can fork a pack, rename signals, add your own — renames propagate through every pitch angle that referenced them.

What goes in, what comes out.

Inputs
Audit + Taxonomy
  • lead.webAudit — Perplexity structured extraction (channels, traffic, ecommerce maturity, content recency)
  • lead.htmlSnapshot — homepage + key page captures
  • lead.socialResearch — follower counts, post recency
  • taxonomy.signals[] — controlled vocabulary, per-vertical extensions
  • taxonomy.derivationRules[] — declarative rules (e.g. no_dealer_locator if HTML lacks "locator")
Outputs
PainSignal[]
  • painSignal.name — must match taxonomy entry
  • painSignal.confidence — 0–1, with derivation source
  • painSignal.evidence — quoted text or URL anchor
  • painSignal.derivedFromrule | llm | manual
  • painSignal.frequencyCap — max signals retained per lead

Rules first.
LLM second.
Operator last.

Signal extraction is a three-pass operation. The first pass runs declarative rules over the structured audit and HTML snapshot. no_dealer_locator fires if the HTML lacks any anchor or section labelled "locator", "find a dealer", or equivalent in the audit. slow_load_>3s fires from the Lighthouse-style timing in the audit. These rules are deterministic, free, and explain themselves.

The second pass is LLM-driven extraction (Perplexity Sonar with structured output) over the same inputs. It can name signals the rule pass missed — most signals about strategy and positioning fall into this bucket. The model is bound to the taxonomy: it can only emit signal names that already exist. New signal proposals go into a separate candidates table for human curation.

The third pass is the operator. Every signal carries provenance — rule, LLM, or manual — and an evidence quote. You can override a signal that fired wrong, add a signal the system missed, or downvote a signal that the model overuses. Overrides feed back into the prompt registry as few-shot examples on the next version bump. The taxonomy itself is editable: rename, deprecate, merge, or split signals, and every solution template that referenced the old name updates by reference.

Bound to the vocabulary.

prompt · extract_signals_v17 · fintech +
--- system ---
You extract pain signals from a B2B company's web presence.
You may ONLY emit signal names from the provided taxonomy. If you
identify a pain that does not match any taxonomy entry, write it to
candidates[] for human review — never invent a name in the output.

--- inputs ---
audit:        {{lead.webAudit | json}}
htmlSnapshot: {{lead.htmlSnapshot | truncate(8000)}}
social:       {{lead.socialResearch | json}}
taxonomy:     {{taxonomy.signals[] | json}}      # 28 entries

--- output schema ---
{
  "signals": [
    {
      "name":       string  # MUST be in taxonomy,
      "confidence": 0..1,
      "evidence":   string  # quoted text or URL anchor
    }
  ],
  "candidates": [
    { "proposedName": string, "rationale": string }
  ]
}

--- guards ---
- frequency_cap: max 6 signals per lead. Drop lowest confidence first.
- evidence required: signals without an evidence quote are dropped.
- on missing input: do NOT emit. Write field name to missing_required.

# <!-- PLACEHOLDER — full prompt registry available in app -->

What can break.
And what catches it.

Risk
Signal over-detection

Every lead gets the same eight signals, pitches start to look templated.

Mitigation

Frequency cap per lead (default 6). Lowest-confidence signals drop first. Per-signal saturation alerts in the registry.

Risk
Hallucinated signal name

Model emits a plausible-sounding signal that no template references.

Mitigation

Schema-bound output. Names not in taxonomy are silently dropped to candidates[] for curation.

Risk
Evidence-free claims

A signal fires but the operator can't see why.

Mitigation

Every signal must carry an evidence quote or URL anchor. Signals without evidence are dropped server-side.

Extract pain signals on a real lead.
In about 12 seconds.

Drop a domain. Watch the rules and the LLM agree (or argue) in the sandbox.