Pipeline · stage 02

web_audit()

Read the company's site like a human would. Pricing page, careers page, changelog, blog, status page.

Inputs and outputs

In. Canonical company record from discover().

Out. Site map, last-modified dates, page-type tags, raw text excerpts capped at 12k tokens per page.

Current version

v2.1.0 Forkable through the prompt library.

Model defaults

Headless fetch through our crawler pool. No LLM at this stage. The classifier that tags page type is a small fine-tune we host ourselves. You can override per stage in your routing config.

How it fails

Cloudflare bot pages, JavaScript-heavy SPAs that do not server-render, and paywalls. We retry once with a real browser and then mark the lead web_audit_partial. Partial leads can still pass downstream stages but they get a lower confidence score.

Evals

Page-type classification accuracy: 93.4% across 18 page types. Pricing-page detection specifically: 98.2%. The full eval set ships with the prompt and is editable. Eval format.