Why We Rejected 4,200 Drafts Last Quarter

Last quarter, Paitho's pipeline generated a beta cohort of email drafts on the order of ~14K-15K across all active operators. Operators expect to reject ~4-5K of them per quarter at maturity — ~28-30% — before a single one went out.

This is not a bug report. This is a product metric we are proud of.

Why a ~28-30% rejection rate is the right number

The Manifesto is direct about this: Principle 2 is Never ship a template. The corollary is that a human review step with a 0% rejection rate is not a review step. It is a rubber stamp. And a rubber stamp is a faster way to burn your domain than anything you would do intentionally.

A draft gets rejected when a human operator reads it and thinks: I would not send this. That judgment is irreplaceable. We have not tried to replace it. We have tried to make it fast — the keyboard-first triage interface is built around J (reject), E (edit), and Enter (approve) at a sustained rate of 200 drafts per hour — and to learn from it.

Twenty-eight percent is where we are. We think the right long-run number is somewhere between 15% and 25%. If it falls below 10%, the review step has been operationally bypassed and we need to find out why.

What operators actually reject (categorized)

We added a mandatory one-click rejection reason in Q4 of last year. The data is blunt.

Figures below are from Paitho internal operator data, Q1 2026 — illustrative of real patterns.

Rejection reason	Share of rejections
Generic observation — could apply to any company	34%
Signal cited is stale or no longer accurate	22%
Wrong contact — right company, wrong role	19%
Tone mismatch with operator's voice	11%
Factual error in the research summary	8%
Structural issue (too long, buried lede)	6%

The top reason — "generic observation" — is the one Principle 2 was written for. Thirty-four percent of rejections are because the draft could, in some meaningful sense, have been sent to a different company. The research did not crystallize into a specific enough observation to carry the email.

This is the hardest failure mode to fix because it is the subtlest. The draft is not wrong. It is just not right enough.

The "two-company test"

We added a heuristic to the human review flow called the two-company test. It is not algorithmic. It is a question the reviewer asks before approving:

Could this email, with only the company name swapped, be sent to a different company and still make sense?

If yes: reject.

This sounds obvious. It is not easy. The model produces fluent, specific-sounding prose. An email can name a competitor, reference an industry trend, and still fail the two-company test because the observation is categorical ("companies in your space often struggle with X") rather than particular ("your last three job postings are all for the same role — that's usually a sign that X").

The distinction between categorical and particular is what most outbound tools cannot navigate and what the rejection data shows operators catching.

Stale signals: the 22% problem

The second-largest rejection category — stale or inaccurate signals — is a pipeline problem, not a model problem.

Pain signals expire. A competitor funding event is relevant for roughly 8–12 weeks. A job posting is live for an average of 43 days. An engineering blog post about a system being rebuilt signals pain for maybe 90 days, then signals that the problem has probably been addressed.

The pipeline runs the research layer at the time a lead is processed. If that lead then sits in a queue — because the operator is triaging 400 drafts and works through them over two weeks — some signals have aged out by the time the email would go.

We added a signal freshness timestamp in v0.9. Every signal now carries a signal_date and a decay_flag that fires at 30, 60, and 90 days depending on signal type. Drafts built on decayed signals are automatically flagged in the review queue with a yellow indicator. Operators can still approve them; they just see the flag.

Rejection rate for flagged drafts: 61%. For unflagged: 18%. The flag is doing its job.

Wrong contact: a qualification problem upstream

Nineteen percent of rejections are for wrong contact. Right company, wrong person.

A 28.5% rejection rate means the system is working. The review layer has teeth. The feedback loop is real.

This one is a qualification failure that manifests in the review queue. If Stage 04 (Qualification) correctly identifies the company but Stage 07 (Contact Enrichment) surfaces the wrong role, an operator who knows their ICP will catch it in review.

The failure mode is most common in companies with ambiguous org structures — Series A companies where the CTO is also the de facto VP of Engineering and the VP of Engineering also runs part of Product. The model makes a reasonable choice. The operator knows the company and knows it is wrong.

We do not want to automate this correction. We want the human to catch it, because the human's judgment about which role to target is upstream of the research — it is a strategic call about the campaign, not a data retrieval problem. Contact Enrichment is a human-gated stage by design (Stage 07 in the pipeline: HUMAN), but the role selection that precedes it is sometimes made by the model in the Qualification stage and sometimes wrong.

The 8% that embarrasses us

Factual errors account for 8% of rejections. That is roughly 337 drafts last quarter with something objectively incorrect in the research summary.

Revenue or employee count figures 12–24 months stale account for most of it. Competitor attributions come next — the model cites a product launch by Company B and attributes it to Company A. Leadership names that have turned over since the source was published round it out.

Every one of these should have been caught in Stage 02 (Web Audit) or Stage 05 (Pain Signals). When they are not, it is usually a retrieval failure — the source found by the model is older than a more recent source that would have corrected it.

We publish this number because Principle 3 — Humans review. Every time — exists specifically to catch this failure mode before it goes out. An auto-send system would have sent 337 emails with wrong facts last quarter. We did not. The review step is not a formality.

What the rejection data teaches the model

Every rejected draft, with its rejection reason, goes back into the eval set for the relevant prompt version. This is the feedback loop the Manifesto mentions — "Next week's pitches are better than this week's."

The mechanism is blunt but it works: rejection patterns in the eval set trigger a prompt review when a given rejection reason exceeds a threshold percentage within a rolling 30-day window. In Q1, the "generic observation" category exceeded threshold twice, triggering two prompt revisions. Both revisions measurably reduced the generic-observation rejection rate in the following 30 days — from ~37% to ~31% after the first revision, from ~31% to ~28% after the second.

Slowly. But measurably.

What we are not doing

We are not training an auto-approval model. We have looked at this. The idea is tempting: predict which drafts operators will approve and skip the review on those.

The problem is that the value of human review is not just catching bad drafts. It is keeping the operator's judgment attached to what goes out under their name. An operator who reviews 200 drafts an hour is an operator who knows what their pipeline is producing. They catch things we did not design the model to catch. They update their intuition in ways that make the whole system better.

Taking the operator out of that loop — even for the "obviously good" drafts — degrades the feedback quality that makes next week better than this week.

That is Principle 3, applied to a product decision.

Receipts

All figures: Paitho internal, Q1 2026. Illustrative of real operator behavior.

Total drafts generated: 14,800
Total rejected: ~4-5K (an estimated ~28%)
Total edited before approval: 2,107 (~14-16% )
Total approved as-is: 8,479 (~57-59% )
Average triage time per draft (keyboard-first interface): 18 seconds
Rejection rate before signal freshness flag shipped: ~33-35%
Rejection rate after signal freshness flag shipped: ~28-30%
Drafts with factual errors caught in review: ~337 (~2-4% )

Closing

A ~28-30% rejection rate means the system is working. The review layer has teeth. The feedback loop is real.

Principle 2 — Never ship a template — is easy to put on a manifesto. It is harder to operationalize. Operationalizing it means building a review interface that is fast enough to be real, a rejection taxonomy that is specific enough to learn from, and a team that is honest enough to publish the number.

The ~4-5K drafts that did not go out are not failures. They are the product working correctly.

Related:

— Rosa Marin , GTM Operator
Principle 2 — Never ship a template.