Paitho
← Product / Stage 05 · Pain Signals
Pipeline · Stage 05

A vocabulary,
not adjectives.

Twenty-eight-plus signals in a controlled, editable taxonomy. Pitches can only fire from observed signals. The model can't compliment a lead on a strength they don't have.

Real signal names from the live registry.

multi_system_scatter manual_po_entry marketplace_dependency no_online_booking portfolio_wp_heavy portfolio_bloat slow_load_>3s stale_blog_>6mo no_dealer_locator no_request_quote single_channel_amazon low_review_velocity missing_oss_repo support_email_only no_changelog_visible team_page_outdated single_payment_gateway no_intl_shipping manual_inventory_sync + 9 more in registry

Verticals add their own. The Devtools pack adds missing_oss_repo. The Coatings pack adds no_dealer_locator. You can add and rename your own, and the renames propagate through every solution template that referenced them.

What goes in, what comes out.

Inputs
Audit + Taxonomy
  • lead.webAudit — Perplexity structured extraction (channels, traffic, ecommerce maturity, content recency)
  • lead.htmlSnapshot — homepage + key page captures
  • lead.socialResearch — follower counts, post recency
  • taxonomy.signals[] — controlled vocabulary, per-vertical extensions
  • taxonomy.derivationRules[] — declarative rules (e.g. no_dealer_locator if HTML lacks "locator")
Outputs
PainSignal[]
  • painSignal.name — must match taxonomy entry
  • painSignal.confidence — 0–1, with derivation source
  • painSignal.evidence — quoted text or URL anchor
  • painSignal.derivedFromrule | llm | manual
  • painSignal.frequencyCap — max signals retained per lead

Rules first.
LLM second.
Operator last.

Signal extraction is a three-pass operation. The first pass runs declarative rules over the structured audit and HTML snapshot. no_dealer_locator fires if the HTML lacks any anchor or section labelled "locator", "find a dealer", or equivalent in the audit. slow_load_>3s fires from the Lighthouse-style timing in the audit. These rules are deterministic, free, and explain themselves.

The second pass is LLM-driven extraction (Perplexity Sonar with structured output) over the same inputs. It can name signals the rule pass missed — most signals about strategy and positioning fall into this bucket. The model is bound to the taxonomy: it can only emit signal names that already exist. New signal proposals go into a separate candidates table for human curation.

The third pass is the operator. Every signal carries provenance — rule, LLM, or manual — and an evidence quote. You can override a signal that fired wrong, add a signal the system missed, or downvote a signal that the model overuses. Overrides feed back into the prompt registry as few-shot examples on the next version bump. The taxonomy itself is editable: rename, deprecate, merge, or split signals, and every solution template that referenced the old name updates by reference.

Bound to the vocabulary.

prompt · extract_signals_v17 · fintech +
--- system ---
You extract pain signals from a B2B company's web presence.
You may ONLY emit signal names from the provided taxonomy. If you
identify a pain that does not match any taxonomy entry, write it to
candidates[] for human review — never invent a name in the output.

--- inputs ---
audit:        {{lead.webAudit | json}}
htmlSnapshot: {{lead.htmlSnapshot | truncate(8000)}}
social:       {{lead.socialResearch | json}}
taxonomy:     {{taxonomy.signals[] | json}}      # 28 entries

--- output schema ---
{
  "signals": [
    {
      "name":       string  # MUST be in taxonomy,
      "confidence": 0..1,
      "evidence":   string  # quoted text or URL anchor
    }
  ],
  "candidates": [
    { "proposedName": string, "rationale": string }
  ]
}

--- guards ---
- frequency_cap: max 6 signals per lead. Drop lowest confidence first.
- evidence required: signals without an evidence quote are dropped.
- on missing input: do NOT emit. Write field name to missing_required.

# <!-- PLACEHOLDER — full prompt registry available in app -->

What can break.
And what catches it.

Risk
Signal over-detection

Every lead gets the same eight signals, pitches start to look templated.

Mitigation

Frequency cap per lead (default 6). Lowest-confidence signals drop first. Per-signal saturation alerts in the registry.

Risk
Hallucinated signal name

Model emits a plausible-sounding signal that no template references.

Mitigation

Schema-bound output. Names not in taxonomy are silently dropped to candidates[] for curation.

Risk
Evidence-free claims

A signal fires but the operator can't see why.

Mitigation

Every signal must carry an evidence quote or URL anchor. Signals without evidence are dropped server-side.

Extract pain signals on a real lead.
In about 12 seconds.

Drop a domain. Watch the rules and the LLM agree (or argue) in the sandbox.