Back to Comparipedia

Methodology & Scores

Comparipedia uses Gemini (Google's LLM) to score tone and citation quality by default — the model reads both articles in full and assigns 0–100 scores with a short written rationale for each side. Word-list heuristics are kept as a fast, offline fallback for when AI scoring is unavailable or the user explicitly switches to the Heuristics view. These scores are signals for reading, not proof that one article is correct.

Two Scoring Modes

AI (default)

Gemini 2.5 Flash reads each full article and a sample of its citations, then returns structured 0–100 scores for objectivity, formality, confidence, bias, sentiment, and citation quality, plus a one-sentence rationale per article. The same call produces the structured difference summary (factual disagreements, framing differences, coverage gaps, sourcing notes, and a verdict).

Heuristics (fallback)

A fast word-list scan that runs entirely on the server with no model call. It counts loaded words, hedging language, citation domains, and source-type mixes. It is used as the fallback when AI scoring is unavailable, and is available as a toggle if you want to see the raw word-list view.

The AI scores are intentionally allowed to diverge from the heuristic scores. A heuristic "low bias" score driven by neutral vocabulary can mask one-sided framing; a heuristic "high citation score" driven by tier-1 domain matches can mask citations that don't actually support the claim next to them. Gemini is asked to weigh those things.

Tone Metrics

Tone metrics describe how the article reads. They are not a fact-check.

Objectivity

How neutral and evidence-led the prose reads rather than promotional, accusatory, or opinionated.

AI scoring: Gemini reads the full article and rates 0–100 based on whether claims are attributed, whether multiple perspectives are represented, and whether language is judgmental. A short rationale explains the score.
Heuristic fallback: A word-list scan looks for loaded adjectives, weasel words, and assertive praise or criticism that is not clearly attributed.
Example:
Higher objectivity

"The company reported revenue growth in 2024."

Lower objectivity

"The company achieved stunning revenue growth in 2024."

Why

The word "stunning" adds judgment instead of evidence.

Bias Level

How much the prose tries to frame the reader toward a positive or negative interpretation.

AI scoring: Gemini scores 0–100 based on loaded language, one-sided framing, and selective emphasis. Higher scores mean MORE bias detected. The model can catch framing bias even when the vocabulary itself looks neutral.
Heuristic fallback: Counts charged words, one-sided framing, and emotional trigger language from a curated list.
Example:
Lower bias

"Critics said the policy increased costs."

Higher bias

"The disastrous policy punished ordinary families."

Why

"Disastrous" and "punished" are stronger emotional framing signals.

Formality

Whether the text reads like formal reference writing rather than casual commentary.

AI scoring: Gemini scores 0–100 based on register — encyclopedic and academic phrasing vs. casual or conversational language.
Heuristic fallback: Rewards structured academic phrasing and penalizes contractions, slang, and conversational wording.
Example:
More formal

"The proposal was subsequently revised."

Less formal

"They later tweaked the plan."

Why

"Tweaked" is conversational; "subsequently revised" reads more like reference prose.

Confidence

How definite or cautious the article sounds when making claims.

AI scoring: Gemini scores 0–100 based on how often the article hedges ("may", "allegedly", "some argue") vs. asserts declaratively. Hedging is not penalized for accuracy — it just lowers the confidence score.
Heuristic fallback: Compares firm declarative language with hedging terms such as "may", "possibly", "reportedly", and "some argue".
Example:
Higher confidence

"The system reduced latency."

Lower confidence

"The system may have reduced latency."

Why

"May have" is a hedge. Hedging can be accurate, but it lowers confidence.

Sentiment

The emotional direction of the language, from negative to positive.

AI scoring: Gemini returns a value from -1.0 to 1.0 for the overall affective tilt of the article — negative, neutral, or positive.
Heuristic fallback: Uses curated positive and negative word lists, then balances the counts across the article text.
Example:
Positive signals

"successful", "praised", "improved"

Negative signals

"failed", "controversial", "criticized"

Why

The score balances positive and negative emotional language across the text.

Citation Quality

Citation quality is a single 0–100 score per article. In AI mode it reflects Gemini's judgment of the sourcing after reviewing the citation sample. In Heuristics mode it is a composite of source authority, source diversity, high-quality source ratio, and citation volume.

Source Credibility

Whether the cited sources look credible for the subject matter — academic, government, and established institutional sources score higher than blogs, forums, social posts, or weakly identified pages.

AI scoring: Gemini reviews a sample of the actual citations and judges whether they look credible and appropriate for the article's claims.
Heuristic fallback: Citations are grouped into reliability tiers based on the domain (e.g. .gov, .edu, established news, blogs).
Example:
Higher score

A WHO report or peer-reviewed journal.

Lower score

An unsourced personal blog post.

Claim Support

Whether the article appears to actually support its claims with the citations it uses, not just decorate them.

AI scoring: Gemini reads the article and citation sample and judges whether the sourcing genuinely backs the article's assertions or whether claims are under-supported.
Heuristic fallback: Not measured by heuristics — they count citations but cannot tell whether a citation actually supports the claim next to it.
Example:
Well supported

Each major claim has a citation pointing to a source that backs it.

Decorative

Citations exist but link to sources that do not address the claim.

Source Diversity

Whether the article draws on different kinds of sources or leans on one source family.

AI scoring: Considered as part of Gemini's overall sourcing judgment.
Heuristic fallback: Improves when an article uses a mix of source types (academic, news, government, etc.).
Example:
More diverse

Academic research, official records, and reputable reporting.

Less diverse

Ten links to the same news site.

Citation Volume

Citation density relative to article length.

AI scoring: Considered as part of Gemini's overall sourcing judgment.
Heuristic fallback: A long article with many citations supporting major claims scores higher than a long article with only a handful.
Example:
Better supported

A long article with citations throughout its major claims.

Possibly thin

A 10,000-word article with only three citations.

Heuristic Score Formula

When the Heuristics mode is active, the citation score uses this weighting:

  • 40% average domain score
  • 20% source diversity
  • 20% high-quality source ratio
  • 20% citation volume relative to article length

In AI mode the score is a single value returned by Gemini after reviewing the citation sample, not a weighted formula.

Divergence Score

A single 0–100 score that estimates how far apart the two articles are overall. Higher means more divergent. It is shown in the condensed summary header at the top of every comparison.

AI mode. Gemini produces the score directly as part of the difference summary, considering factual disagreements, framing, coverage gaps, and sourcing together, with a short rationale.

Heuristics mode. A composite of four signals, each independently scored 0–100 and then averaged with the weights below:

  • 50% text divergence — 100 minus the bigram-Jaccard similarity of the two article texts.
  • 20% section divergence — 1 minus the Jaccard overlap of normalized section titles.
  • 15% length divergence — absolute word-count difference, normalized by the larger article.
  • 15% tone divergence — average absolute delta across objectivity, formality, confidence, and bias.

Bands used in the UI: 0–15 substantively aligned, 16–35 mostly aligned, 36–60 clearly divergent, 61–85 meaningfully different, 86–100 fundamentally different. The score is a signal for reading, not a verdict on accuracy — two articles can diverge stylistically while agreeing on the facts, or vice versa.

AI Difference Summary

In addition to scoring, the AI mode generates a structured summary that calls out:

  • Key factual differences — specific claims, numbers, dates, or events where the two articles materially disagree.
  • Framing differences — how each side foregrounds or backgrounds aspects of the subject.
  • Coverage gaps — subjects one article covers that the other omits or barely mentions.
  • Sourcing notes — a short comparison of how each article supports its claims.
  • Verdict — a one-sentence read on whether the articles are aligned, partially divergent, or telling different stories.

The summary is generated by Gemini 2.5 Flash with structured output and a fixed JSON schema, so the rendering is deterministic even when the model wording varies. Both articles are sent in full (truncated only beyond ~30k characters each).