Two Scoring Modes
AI (default)
Gemini 2.5 Flash reads each full article and a sample of its citations, then returns structured 0–100 scores for objectivity, formality, confidence, bias, sentiment, and citation quality, plus a one-sentence rationale per article. The same call produces the structured difference summary (factual disagreements, framing differences, coverage gaps, sourcing notes, and a verdict).
Heuristics (fallback)
A fast word-list scan that runs entirely on the server with no model call. It counts loaded words, hedging language, citation domains, and source-type mixes. It is used as the fallback when AI scoring is unavailable, and is available as a toggle if you want to see the raw word-list view.
The AI scores are intentionally allowed to diverge from the heuristic scores. A heuristic "low bias" score driven by neutral vocabulary can mask one-sided framing; a heuristic "high citation score" driven by tier-1 domain matches can mask citations that don't actually support the claim next to them. Gemini is asked to weigh those things.
Tone Metrics
Tone metrics describe how the article reads. They are not a fact-check.
Objectivity
How neutral and evidence-led the prose reads rather than promotional, accusatory, or opinionated.
"The company reported revenue growth in 2024."
"The company achieved stunning revenue growth in 2024."
The word "stunning" adds judgment instead of evidence.
Bias Level
How much the prose tries to frame the reader toward a positive or negative interpretation.
"Critics said the policy increased costs."
"The disastrous policy punished ordinary families."
"Disastrous" and "punished" are stronger emotional framing signals.
Formality
Whether the text reads like formal reference writing rather than casual commentary.
"The proposal was subsequently revised."
"They later tweaked the plan."
"Tweaked" is conversational; "subsequently revised" reads more like reference prose.
Confidence
How definite or cautious the article sounds when making claims.
"The system reduced latency."
"The system may have reduced latency."
"May have" is a hedge. Hedging can be accurate, but it lowers confidence.
Sentiment
The emotional direction of the language, from negative to positive.
"successful", "praised", "improved"
"failed", "controversial", "criticized"
The score balances positive and negative emotional language across the text.
Citation Quality
Citation quality is a single 0–100 score per article. In AI mode it reflects Gemini's judgment of the sourcing after reviewing the citation sample. In Heuristics mode it is a composite of source authority, source diversity, high-quality source ratio, and citation volume.
Source Credibility
Whether the cited sources look credible for the subject matter — academic, government, and established institutional sources score higher than blogs, forums, social posts, or weakly identified pages.
A WHO report or peer-reviewed journal.
An unsourced personal blog post.
Claim Support
Whether the article appears to actually support its claims with the citations it uses, not just decorate them.
Each major claim has a citation pointing to a source that backs it.
Citations exist but link to sources that do not address the claim.
Source Diversity
Whether the article draws on different kinds of sources or leans on one source family.
Academic research, official records, and reputable reporting.
Ten links to the same news site.
Citation Volume
Citation density relative to article length.
A long article with citations throughout its major claims.
A 10,000-word article with only three citations.
Heuristic Score Formula
When the Heuristics mode is active, the citation score uses this weighting:
- 40% average domain score
- 20% source diversity
- 20% high-quality source ratio
- 20% citation volume relative to article length
In AI mode the score is a single value returned by Gemini after reviewing the citation sample, not a weighted formula.
Divergence Score
A single 0–100 score that estimates how far apart the two articles are overall. Higher means more divergent. It is shown in the condensed summary header at the top of every comparison.
AI mode. Gemini produces the score directly as part of the difference summary, considering factual disagreements, framing, coverage gaps, and sourcing together, with a short rationale.
Heuristics mode. A composite of four signals, each independently scored 0–100 and then averaged with the weights below:
- 50% text divergence — 100 minus the bigram-Jaccard similarity of the two article texts.
- 20% section divergence — 1 minus the Jaccard overlap of normalized section titles.
- 15% length divergence — absolute word-count difference, normalized by the larger article.
- 15% tone divergence — average absolute delta across objectivity, formality, confidence, and bias.
Bands used in the UI: 0–15 substantively aligned, 16–35 mostly aligned, 36–60 clearly divergent, 61–85 meaningfully different, 86–100 fundamentally different. The score is a signal for reading, not a verdict on accuracy — two articles can diverge stylistically while agreeing on the facts, or vice versa.
AI Difference Summary
In addition to scoring, the AI mode generates a structured summary that calls out:
- Key factual differences — specific claims, numbers, dates, or events where the two articles materially disagree.
- Framing differences — how each side foregrounds or backgrounds aspects of the subject.
- Coverage gaps — subjects one article covers that the other omits or barely mentions.
- Sourcing notes — a short comparison of how each article supports its claims.
- Verdict — a one-sentence read on whether the articles are aligned, partially divergent, or telling different stories.
The summary is generated by Gemini 2.5 Flash with structured output and a fixed JSON schema, so the rendering is deterministic even when the model wording varies. Both articles are sent in full (truncated only beyond ~30k characters each).