Course → Module 11: Quality Control & The Human Gate
Session 2 of 7

Manual Fact-Checking Does Not Scale

In Session 11.1, you learned where hallucination concentrates. Now the question is: how do you check claims at production volume? If you publish 10 articles per week and each contains 15 verifiable claims, that is 150 claims to check. Doing this manually, one Google search at a time, takes hours. API-assisted fact-checking reduces that to minutes.

The key distinction: this is not automated fact-checking. No system reliably verifies truth autonomously. This is automated flagging, a process that searches for evidence, compares it against claims, and directs human attention to the items that need it most.

API-Assisted Fact-Checking: A workflow that uses search APIs to gather evidence for verifiable claims, then flags discrepancies for human review. The API handles the searching. The human handles the judgment.

The Four-Stage Fact-Check Pipeline

Every fact-checking workflow follows the same structure, regardless of which APIs or tools you use.

flowchart LR A[AI Output] --> B[Extract Claims] B --> C[Search Each Claim] C --> D[Compare & Flag] D --> E[Human Review] B -.->|"Manual or AI-assisted"| B1["List of verifiable claims
with category labels"] C -.->|"Tavily / Google API"| C1["Top 3-5 results per claim"] D -.->|"Match / Mismatch / No data"| D1["Flagged report"] E -.->|"Approve / Correct / Remove"| E1["Verified content"] style A fill:#8a8478,color:#ede9e3 style B fill:#c8a882,color:#111 style C fill:#c8a882,color:#111 style D fill:#c8a882,color:#111 style E fill:#6b8f71,color:#111

Stage 1: Extract Claims

Pull every verifiable claim from the AI output. You can do this manually for small batches or use a second AI call with a prompt like: "List every factual claim in this text that could be verified with a search engine. Include the exact claim text and its category (statistic, date, attribution, source, technical fact)."

The output should be a structured list. Not prose. A table or JSON array that your next step can process.

Stage 2: Search Each Claim

For each extracted claim, run a search query. Tavily is purpose-built for this: its API returns structured results optimized for AI consumption, with relevant snippets pre-extracted. Google Search API works too, but requires more parsing.

The search query should be the claim itself, rephrased as a question when necessary. "73% of marketers report increased ROI" becomes the search query "percentage of marketers reporting increased ROI."

Stage 3: Compare and Flag

For each claim, compare the AI's statement against the search results. Three possible outcomes:

Outcome Meaning Action
Match Search results confirm the claim Approve. Low risk.
Mismatch Search results contradict the claim Flag for correction. Include the contradicting source.
No data Search returns no relevant results Flag for manual review. The claim may be fabricated entirely.

The "no data" outcome is often the most dangerous. When a search for a specific statistic or source returns nothing, the most likely explanation is that the AI invented it.

Stage 4: Human Review

The flagged report goes to a human reviewer who makes final decisions. For matches, a quick scan is sufficient. For mismatches, the reviewer corrects the claim using the contradicting source. For no-data flags, the reviewer either finds the information through deeper research or removes the claim entirely.

Tavily in the Workflow

Tavily's search API is designed for exactly this use case. Unlike a standard web search that returns page titles and URLs, Tavily returns extracted content snippets that an AI model (or a human) can compare directly against claims. The workflow becomes:

  1. Extract claim from AI output
  2. Send claim as query to Tavily API
  3. Receive structured results with relevant text excerpts
  4. Pass claim + excerpts to a comparison prompt (or review manually)
  5. Record the verdict in your verification report

A single Tavily API call costs a fraction of a cent. Checking 150 claims per week costs less than a dollar. The economics of API-assisted fact-checking are not the bottleneck. The bottleneck is building the workflow and running it consistently.

The Verification Report

Your pipeline should produce a structured report for each piece of content. This report is your audit trail, your proof of due diligence, and your training data for improving the pipeline over time.

Claim Category Search Result Verdict Action Taken
"Market reached $4.2B in 2025" Statistic Multiple sources confirm $4.1B Minor mismatch Corrected to $4.1B
"According to McKinsey (2024)..." Citation Report exists, but from 2023 Date mismatch Corrected year
"CEO John Smith stated..." Attribution No matching quote found No data Quote removed
"Founded in San Francisco" Fact Confirmed by company website Match Approved

Limitations You Must Accept

API-assisted fact-checking catches fabricated sources, wrong numbers, and misattributed quotes. It does not catch subtle misrepresentations, out-of-context claims, or claims that are technically true but misleading. Those require human judgment that no search API can replicate.

The goal is not perfection. The goal is catching the 80% of hallucinations that are straightforward verification failures, so your human reviewers can spend their time on the 20% that require actual thinking.

Further Reading

Assignment

Build a fact-checking pipeline for one piece of AI-generated content. Extract all verifiable claims (manually is fine for now). Search each claim using Tavily or any search tool. Produce a verification report with columns: Claim, Category, Search Result, Verdict, Action Taken. How many claims did you flag? How many flags were legitimate issues? What was the false positive rate of your flagging process?