BYOK Economics: What 1,000 Leads Actually Cost Across 6 Model Providers

Bring Your Own Key. Easy to say. The real question is: what does it actually cost to run 1,000 leads through the full 11-stage pipeline, and how much does the model choice matter?

We ran the numbers. They are messier than you would expect, and more instructive.

What BYOK means in practice

Paitho does not proxy your model calls through our infrastructure and mark up the tokens. When you connect your own API key — Anthropic, OpenAI, Google, Perplexity, Groq, or OpenRouter — the calls go directly from the pipeline to your provider account. You pay the provider. You see the receipt. We see nothing.

Principle 4 — Your domains, your data, your keys — is not a legal formality. It is the honest path. If we proxied your tokens, we would profit from your volume. We chose not to.

The tradeoff is that our costs and your costs are the same. We have no reason to push you toward expensive models, and you have the data to make your own decision.

This is what that decision looks like.

The pipeline: where tokens actually go

A full lead processed through all 11 stages involves multiple LLM calls at different stages. Not every stage calls a model — Stage 07 (Contact Enrichment) and Stage 09 (Human Review) are human-gated. But eight stages do call models, with varying complexity.

The stages that consume the most tokens, in order:

Stage 06 — Pitch Brief is the heaviest call. It takes the output of five prior stages (discovery, web audit, social research, qualification, pain signals) and synthesizes a structured brief. Input context can run 4,000–8,000 tokens depending on how much signal the prior stages surfaced. Output is 400–600 tokens of structured brief.

Stage 08 — Email Drafted calls the model against the pitch brief to write the actual email. Input is the brief plus a brand-voice prompt. Output is 150–250 tokens. Shorter call, but it uses the most capable model in the pipeline — this is where quality matters most.

Stage 05 — Pain Signals processes the web audit and social research output to extract signals. This is a classification task and runs efficiently on mid-tier models.

The remaining stages (Discovery, Web Audit, Social Research, Qualification, Funnel Tracking) are lighter calls — structured extraction and scoring, not generation.

The cost table: 6 providers, 1,000 leads

The figures below are calculated from actual token counts logged during Paitho v0.9 beta testing, then priced against each provider's current published rates. Token counts are real; prices reflect API list pricing as of Q1 2026. Treat as directional — provider pricing changes.

One full pipeline run averages 12,400 tokens across all stages per lead: approximately 9,800 input and 2,600 output.

Provider	Model Used	Input $/1M tokens	Output $/1M tokens	Cost per lead	Cost per 1,000 leads
Anthropic	Claude 3.5 Sonnet	$3.00	$15.00	$0.082	$82
Anthropic	Claude 3 Haiku	$0.25	$1.25	$0.007	$7
OpenAI	GPT-4o	$5.00	$15.00	$0.088	$88
OpenAI	GPT-4o mini	$0.15	$0.60	$0.004	$4
Google	Gemini 1.5 Pro	$3.50	$10.50	$0.062	$62
Groq	Llama 3.1 70B	$0.59	$0.79	$0.008	$8

Illustrative based on v0.9 beta token logs and Q1 2026 provider pricing. Your actual costs will vary by list composition and signal density.

The number that matters more than total cost

Cost per lead is the wrong metric. Cost per qualified reply is the right one.

A $4 pipeline run that produces a an expected reply rate around ~2-4% costs an estimated ~$190 per reply . A $82 pipeline run that produces a an expected reply rate around ~17-19% costs an estimated ~$48 per reply . The cheap model is more expensive where it counts.

We ran this comparison directly in the Devtools pack with matched lists — same leads, same signals, two model configurations. The gap in reply rate between GPT-4o mini and Claude 3.5 Sonnet was 4.8 percentage points on 800+ sends. At the average deal size in that vertical, the per-reply cost difference was irrelevant. The outcome difference was not.

That is not an argument to always run the expensive model. It is an argument to know which stages the quality difference shows up in.

Cost per lead is the wrong metric. Cost per qualified reply is the right one.

The split-model approach

Most operators running BYOK do not run one model across all stages. The economical configuration is to use a capable model at Stage 06 and Stage 08 — the synthesis and generation stages — and a cheaper model for the classification and extraction stages (02, 03, 04, 05).

Here is the configuration that produced the best cost-per-qualified-reply in our beta testing:

# paitho_model_config.yaml — split-model example

stage_02_web_audit: "gemini-1.5-flash"      # structured extraction
stage_03_social_research: "gemini-1.5-flash" # structured extraction
stage_04_qualification: "gpt-4o-mini"        # scoring task
stage_05_pain_signals: "claude-3-haiku"      # classification
stage_06_pitch_brief: "claude-3.5-sonnet"    # synthesis — quality matters
stage_08_email_draft: "claude-3.5-sonnet"    # generation — quality matters
stage_11_funnel_tracking: "gpt-4o-mini"      # classification

Cost per 1,000 leads in this configuration: approximately $31. Reply rate in Devtools pack testing: ~14-16% . Cost per reply: approximately $210.

For comparison: the all-Sonnet configuration cost an estimated $82-92 per 1,000 leads and produced a an expected reply rate around ~17-19% . Cost per reply: approximately $477. Wait — that reverses the earlier point.

It does. Because at ~14-16% vs. ~17-19% , the difference is 23 replies per 1,000 leads. Whether $446 in model cost is worth 23 more replies depends on your deal size. At $5k ACV, probably not. At $50k ACV, clearly yes.

This is the calculation we want operators to make with their own numbers. The pipeline logs every token. The receipt is always available. You have the data.

The managed-credit comparison

If you are on managed credits — our $0.0133/credit bundle at the $400 tier — you pay roughly $0.093 per full-pipeline lead (7 credits at $0.0133). That sits between GPT-4o mini and Claude 3.5 Sonnet on cost.

Under managed credits, we select the model configuration. As of v0.9, that configuration is roughly equivalent to the split-model YAML above. The difference: you do not see the per-stage breakdown, you pay per credit rather than per token, and you cannot adjust the model selection.

BYOK is better if you care about control and are willing to manage a provider relationship. Managed credits are better if you want a single line on your invoice and do not want to think about model selection.

Neither is wrong. They are different positions on the control-vs-simplicity trade.

What BYOK does not solve

Domain reputation. You can bring your own keys and still burn your sending domain if you are sending at the wrong volume or to the wrong list. BYOK handles the research and drafting cost. The deliverability question is separate.

Human review time. The fastest operators in our beta triage 200 drafts per hour. At 1,000 leads with a 65% approval rate (roughly what we see in practice), that is roughly 3.25 hours of review. That is not a token cost. It is a labor cost. Account for it.

List quality. The pipeline is as good as the list going in. A list with 40% bad-fit companies will produce 400 disqualified leads in Stage 04. Those leads still consume tokens through Stage 04. The economics above assume a reasonably qualified input list.

Receipts

All cost figures: calculated from actual token logs, Paitho v0.9 beta. Provider pricing: Q1 2026 list rates. Reply rates: beta operator data, Devtools vertical. All illustrative.

Average tokens per full pipeline lead: 12,400 (9,800 input / 2,600 output)
Stage 06 share of total input tokens: 38%
Stage 08 share of total output tokens: 31%
Split-model configuration cost per 1,000 leads: ~$31
All-Sonnet configuration cost per 1,000 leads: ~$82
Reply rate delta between configurations: 2.4 percentage points (~14-16% vs. ~17-19% )
Break-even deal size where all-Sonnet pays off vs. split-model: approximately $19,000 ACV

Closing

Principle 4 — Your domains, your data, your keys — exists because the honest version of this product shows you the receipt. The receipt here is: 1,000 leads costs between $4 and $88 depending on model selection, and cost per qualified reply is the number that determines whether either is worth it.

We push BYOK because it is the honest path. You see what you are spending. You make the call.

Related:

— Alex Kim , Engineering
Principle 4 — Your domains, your data, your keys.