FACT-CHECKING SYSTEMS

What gets verified, how it's verified, and what "automatically checked" actually means — including its limits.

← How We Work
01 — External Source Verification

Seven Academic APIs, Running Continuously

Source verification runs against external databases — not just our own judgment. Every DOI, ISBN, and named scholar is cross-checked against real academic infrastructure.

4,318 Docs CrossRef Verified
61,304 Entries Checked
70.1% DOI Match Rate
33 Retraction Signals
Source What It Checks Type
CrossRef147M+ academic records DOI validity, title/year match, retraction signals, bibliographic metadata Live API
Retraction WatchDatabase of retracted papers Whether cited papers have been formally retracted or issued corrections Live API
Open LibraryISBN validation Book existence, title/author/publisher match for cited books Live API
Semantic ScholarCitation graph Paper existence, citation counts, venue — additional DOI cross-check Live API
PubMed / NCBIBiomedical literature Biomedical and life-science paper validation Live API
WikipediaEntity existence Named scholars, institutions, and entities exist and match description Live API
OpenAlexAuthor graph Author identity, publication record, institutional affiliation Live API

Honest note: A 70.1% CrossRef match rate means ~30% of bibliography entries could not be automatically verified via DOI. Many of these are correctly cited books without DOIs, older papers, or grey literature — not necessarily errors. Unverified entries reduce a document's source confidence score. They do not trigger automatic removal, but they are flagged for review.

02 — Eight Quality Dimensions

Every Document Scored on 8 Dimensions

The quality pipeline scores each document automatically. Scores are aggregated to a 0–100 quality index. Current corpus average: 88.45/100 (A-).

Structural Compliance
15 pts weight · Current: 100%

Document follows the canonical template: all required sections present, header metadata complete, bibliography minimum met.

Source Attribution
15 pts weight · Current: 85%

Claims are tied to specific bibliography entries. Named scholars appear in bold. Institutional attribution present where applicable.

Evidence Depth
15 pts weight · Current: 85.5%

Evidence is specific: exact dates, measurements with units, artifact IDs, excavation site references. Vague "studies show" language penalized.

Counter-Argument Rigor
15 pts weight · Current: 70%

Counter-arguments are real, published objections — not strawmen. Strongest available challenge to each claim is presented fairly.

Bibliography Quality
15 pts weight · Current: 94.5%

Citations are in canonical format. DOIs and ISBNs present where available. Mix of source types (peer-reviewed, academic books, primary texts).

Cross-Referencing
10 pts weight · Current: 87%

Documents link to related docs across the corpus. Cross-reference index present. Connections are genuine and described, not just listed.

Factual Traceability
10 pts weight · Current: 81.8%

Each factual claim can be traced to a specific citation. No floating facts without source anchoring.

Content Completeness
5 pts weight · Current: 99%

All required content sections contain substantive, topic-specific content. No placeholder text or boilerplate.

03 — Seven Factuality Dimensions

Separate Factuality Scan, Same Documents

The factuality scanner runs independently from the quality scorecard. It targets verifiability rather than structural completeness. Current corpus average: 75.29/100 (B).

Tier Weight
25 pts weight

Documents weighted by their primary evidence tier. Tier 1-dominant documents score higher than Tier 3-dominant ones.

Bibliography Breadth
20 pts weight

How many distinct sources are cited. Documents with thin bibliographies (fewer than 10–15 entries) are penalized.

Bibliography Density
15 pts weight

Ratio of citations to document length. Longer documents with proportionally fewer citations score lower.

Verifiable Sources
10 pts weight

Proportion of bibliography entries with a verified DOI or ISBN. Higher verification = higher score.

Vague Sourcing
10 pts weight

Penalty for phrases like "studies show" or "researchers believe" without specific attribution. Current corpus scores 98% — very low vague sourcing.

Counter-Argument Rigor
10 pts weight

Same as quality scorecard but weighted differently — factuality scoring penalizes weak counter-argument sections more heavily for documents making strong positive claims.

Evidence Specificity
10 pts weight

Claims are supported by specific, named evidence rather than general descriptions. Publication years, scholar names, and measurements required.

04 — The Automated Pipeline

Quality Checks That Run Automatically

The pipeline runs on the full corpus whenever documents are added or updated. Ten quality steps plus ten infrastructure steps — each with defined inputs, outputs, and failure modes.

Q0
Structural Validation

Checks every document for required sections, header format compliance, bibliography minimum. 0 errors maintained as a hard standard.

Q1
DOI Enrichment

Adds DOIs to bibliography entries via CrossRef API where missing. Rate-limited to respect CrossRef's infrastructure.

Q2
Weighted Source Ratings

Classifies each bibliography entry by source type (journal/book/other), calculates weighted score, assigns [N/5] confidence rating.

Q3
CrossRef Batch Verify

Submits all DOIs to CrossRef for validation. Flags mismatches between cited metadata and CrossRef records. Checks for retraction signals.

Q4–Q5
Quality Scorecard + Factuality Scan

Scores all 3,632 documents on the 8 quality dimensions (Q4) and 7 factuality dimensions (Q5). Outputs per-document JSON and corpus aggregates.

Q7–Q9
Consistency, Formula, and Date Validators

Q7: checks claim consistency across documents. Q8: validates mathematical formulas against the formula reference. Q9: checks date plausibility and cross-document date consistency.

What the pipeline cannot catch: A well-formatted citation to a real paper that doesn't actually support the claim made in the document. Pipeline checks verify that a source exists and is real — not that the claim accurately represents what the source says. That requires human review, which we do on a rotating basis for high-stakes claims.