AI Evals: The hidden framework that will define Responsible Journalism

14 Nov 2025|Emerging Tech

AI is coming for the newsroom. But who’s checking its work?

Abstract illustration of a coral-coloured hand reaching towards fragmented grey shapes on a grid background with green circles.

A breaking story hits the wires. Your newsroom’s AI offers out a polished draft in seconds. The headline is slick, the quotes convincing, and the copy reads like a pro. Then the phone rings.

The “exclusive” fact in the first paragraph? Wrong.

The convincing quote? Misattributed.

The story? Legally shaky.

In a world where trust is the most valuable currency in the media, a single slip like this isn’t just embarrassing - it’s existential. And whilst building out AI tools based on vibes is an okay starting point, it is extremely open to biases (especially within verification).

This is exactly why evals matter. Vibes don’t scale - but evals do.

And this is why the Newsrooms are about to hear a whole lot more about evals in the journalism sector.

Why “Good Enough” isn’t good enough

Most AI testing today is built for consumer tech.

Did the output sound plausible?
Was it fast enough?
Did the user click the thumbs-up?

That’s fine if you’re recommending a playlist. But in journalism, plausibility isn’t the bar.

The truth is. Accountability is. Context is.

Without evaluation frameworks built specifically for journalism, you are leaving credibility up to chance.

Journalism needs its own rulebook

Here’s what responsible AI evaluation looks like in the media world:

Correctness vs. Truthiness: Not “does this sound okay?” but “who said it, when, and can it be verified?”
Speed vs. Accuracy: Breaking news tolerates provisional facts. Investigations cannot.
Audience Impact: Does this inform citizens, shift behaviour, or build trust?
Legal and Ethical Stakes: In journalism, errors don’t just annoy users. They invite lawsuits, break privacy, and erode credibility.

The five layers of Responsible AI

At Futurice, we’ve been developing multi-layer evals designed with journalism in mind:

The basics: Who/what/when/where/why in the lede. Proper attribution. No cut corners.
The flow: Logical narrative, credible sources, coherent context.
The checks: Real-time verification against databases, records, and competitor coverage.
The accessibility: Readability, jargon detection, audience inclusivity.
The judgment calls: At scale, AI systems approximate editorial fairness, harm minimisation, and public value.

And crucially, each layer is an interlocking safeguard.

Why leaders should care (now)

This isn’t about teaching editors how to write eval prompts. It’s about board-level governance and building a system to deliver them. Evals are what allows you to catch missteps before your readers do.

The C-suite question is simple: If AI is helping produce journalism under our brand, how do we know it’s safe to publish?

Because here’s the hard truth: without evals, you’re not innovating. You’re gambling with trust, something credible journalism has never done before. So why start today?

The bottom line

AI will transform newsrooms. That much is inevitable. The real question is whether it transforms them responsibly.

Evals aren’t a technical afterthought. They’re the frontline defence for integrity. And the media companies that get this right will be the ones audiences still trust five years from now - while the rest scramble to rebuild their reputation.