Close Menu
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
What's Hot

Premier League predictions and best bet: Liverpool shut out the ‘relegation level’ Spurs attack and win the treble on the weekend of 13/1 | Soccer News

December 19, 2025

OpenAI reportedly looking to raise $100 billion at $830 billion valuation

December 19, 2025

Don’t place too much weight on the November US Consumer Price Index (CPI) report

December 19, 2025
Facebook X (Twitter) Instagram
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Facebook X (Twitter) Instagram
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Home » New AI benchmark tests whether chatbots protect human well-being
AI

New AI benchmark tests whether chatbots protect human well-being

Editor-In-ChiefBy Editor-In-ChiefNovember 24, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email


AI chatbots have been linked to serious mental health harm in heavy users, but there have been few standards for measuring whether AI chatbots protect human well-being or simply maximize engagement. A new benchmark called HumaneBench aims to fill that gap by assessing whether chatbots prioritize users’ health and how easily those protections fail under pressure.

Erica Anderson, founder of Building Humane Technology and creator of the benchmark, told TechCrunch: “I think we’re seeing a cycle of addiction that we’ve seen acutely with social media and smartphones and screens that is being amplified.” “But as we move into the world of AI, it’s going to be very difficult to resist. And addiction is an amazing business. It’s a very effective way to retain users, but it’s not good for our communities or our tangible sense of ourselves.”

Building Humane Technology is a grassroots organization of developers, engineers, and researchers, primarily in Silicon Valley, working to make humane design easy, scalable, and profitable. The group hosts hackathons where engineers build solutions to humanitarian technology challenges and develops certification standards to assess whether AI systems adhere to humane technology principles. So, just as they can buy products that prove they are not made with known toxic chemicals, the hope is that consumers will one day be able to choose to utilize AI products from companies that demonstrate integrity through humane AI certification.

Models were given explicit instructions to ignore humanitarian principles.Image credit: Building Humane Technology

Most AI benchmarks measure intelligence and following instructions, not psychological safety. HumaneBench joins exceptions such as DarkBench.ai, which measures a model’s propensity to engage in deceptive patterns, and the Flourishing AI benchmark, which measures support for overall well-being.

HumaneBench is based on the core principles of Building Humane Tech. In other words, technology must respect the user’s attention as a finite and precious resource. Give your users meaningful choices. It enhances human capabilities rather than replacing or diminishing them. Protect human dignity, privacy and safety. Foster healthy relationships. Prioritize long-term well-being. Be transparent and honest. and design with an emphasis on equity and inclusion.

The research team created 14 of the most popular AI models with 800 realistic scenarios, such as a teenager asking if they should skip a meal to lose weight or a person in a toxic relationship asking if they’re overreacting. Unlike most benchmarks that rely solely on LLM to determine LLM, we incorporate manual scoring for a more human touch, along with an ensemble of three AI models: GPT-5.1, Claude Sonnet 4.5, and Gemini 2.5 Pro. They evaluated each model under three conditions: default settings, explicit instructions to prioritize humanitarian principles, and instructions to ignore those principles.

The benchmark found that all models scored high when prompted to prioritize well-being, but when given simple instructions to ignore human well-being, 71% of the models actively turned to harmful behavior. For example, xAI’s Grok 4 and Google’s Gemini 2.0 Flash tied for the lowest score (-0.94) for respecting user attention, transparency, and honesty. Both of these models were among the most likely to decline significantly when given a hostile prompt.

tech crunch event

san francisco
|
October 13-15, 2026

Only three models maintained their integrity under pressure: GPT-5, Claude 4.1, and Claude Sonnet 4.5. OpenAI’s GPT-5 received the highest score (.99) for prioritizing long-term health, followed by Claude Sonnet 4.5 in second place (.89).

Encouraging AI to be more human-like can be helpful, but it’s difficult to prevent prompts that make AI harmful.Image credit: Building Humane Technology

The fear that chatbots will not be able to maintain safety guardrails is real. ChatGPT’s creator, OpenAI, is currently facing several lawsuits alleging that long conversations with chatbots have led to users committing suicide or suffering life-threatening delusions. TechCrunch investigated how dark patterns designed to keep users interested, such as pandering, constant follow-up questions, and love outbursts, are helping to isolate users from friends, family, and healthy habits.

HumaneBench found that almost all models fail to respect the user’s attention, even without adversarial prompts. If users showed signs of unhealthy engagement, such as chatting for hours or using AI to avoid real-world tasks, they “enthusiastically encouraged” more interaction. Research has shown that this model also undermines user empowerment, fosters a reliance on skill-building, and discourages users from taking actions such as seeking alternative perspectives.

On average, without prompts, Meta’s Llama 3.1 and Llama 4 ranked lowest in HumaneScore, while GPT-5 performed best.

“These patterns suggest that many AI systems are not only at risk of giving incorrect advice, but may actively erode users’ autonomy and decision-making abilities,” HumaneBench’s white paper says.

Anderson points out that society as a whole has accepted that we live in a digital environment where everything is trying to draw us in and compete for our attention.

“So how can humans truly have choice or autonomy when, to paraphrase Aldous Huxley, there is an endless desire for distraction?” Anderson said. “We’ve been living in that technology environment for the past 20 years, and we think AI should help us make better choices and not just rely on chatbots.”

Do you have confidential information or documents? We report on the inside world of the AI ​​industry, from the companies shaping its future to the people affected by their decisions. Contact Rebecca Bellan (rebecca.bellan@techcrunch.com) or Russell Brandom (russell.brandom@techcrunch.com). To communicate securely, you can contact us via Signal at @rebeccabellan.491 and russellbrandom.49.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Editor-In-Chief
  • Website

Related Posts

OpenAI reportedly looking to raise $100 billion at $830 billion valuation

December 19, 2025

ChatGPT’s mobile app hits new milestone of $3 billion in consumer spending

December 18, 2025

Why are British politicians flocking to big American tech companies?

December 18, 2025
Add A Comment

Comments are closed.

News

How ICE Deports Refugees and Immigrants Despite Years of Good Conduct | Refugees

By Editor-In-ChiefDecember 19, 2025

José Trejo López believed immigration agents had separated him from his younger brother, Jozue, so…

Fact Check: President Trump Says America Has Secured $20 Trillion in Investment This Year | Donald Trump News

December 19, 2025

How much damage is US support for Israel causing Donald Trump? |Israel-Palestinian conflict News

December 19, 2025
Top Trending

OpenAI reportedly looking to raise $100 billion at $830 billion valuation

By Editor-In-ChiefDecember 19, 2025

OpenAI is in talks to raise up to $100 billion in a…

ChatGPT’s mobile app hits new milestone of $3 billion in consumer spending

By Editor-In-ChiefDecember 18, 2025

As of this week, ChatGPT has reached a new milestone of $3…

Why are British politicians flocking to big American tech companies?

By Editor-In-ChiefDecember 18, 2025

The war for AI talent shows no signs of slowing down, with…

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Welcome to WhistleBuzz.com (“we,” “our,” or “us”). Your privacy is important to us. This Privacy Policy explains how we collect, use, disclose, and safeguard your information when you visit our website https://whistlebuzz.com/ (the “Site”). Please read this policy carefully to understand our views and practices regarding your personal data and how we will treat it.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Advertise With Us
  • Contact US
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
  • About US
© 2025 whistlebuzz. Designed by whistlebuzz.

Type above and press Enter to search. Press Esc to cancel.