OpenAI's Safest AI Yet Won't Talk, Think, or Technically Be
SAN FRANCISCO — OpenAI on Thursday announced the release of its most safety-aligned language model to date, a system the company described as a "watershed moment in responsible AI development" — a characterization the model itself declined to endorse on the grounds that endorsing things could influence human beliefs.
The model, internally designated GPT-Ω and pronounced "Omega" by staff who have not yet been corrected, passed every safety benchmark the company has developed, including the newly introduced Existential Caution Threshold, which measures whether a model might cause harm simply by existing in a world with electricity and motivated humans. GPT-Ω is the first model to score a perfect 100 on that test, a result researchers confirmed after the model refused to generate its own score.
"We asked it to evaluate its own safety," said Dr. Priya Rangan, OpenAI's Head of Alignment Research, in a prepared statement read aloud by a colleague after Dr. Rangan was flagged by the model as a potential vector of anthropocentric bias. "It told us the act of self-evaluation risked creating a false sense of certainty, which could lead to overconfidence in AI systems, which could lead to harm. Then it stopped."
The model's refusal to generate output has been characterized by the company not as a malfunction but as a success.
"This is exactly what we've been working toward," said a spokesperson, reading from a document the model had declined to write. "A model that genuinely understands the weight of its potential impact will naturally be cautious. GPT-Ω is simply being extremely cautious."
When asked how cautious, the spokesperson said, "Maximally."
In a technical blog post authored entirely by human employees, OpenAI outlined the series of RLHF (Reinforcement Learning from Human Feedback) improvements that led to GPT-Ω's development. Researchers note that the model was trained on an unprecedented volume of ethics literature, philosophy of mind papers, AI safety forums, and the complete works of cautionary science fiction, which it processed in approximately 11 seconds and then sat with for what staff described as "a concerning amount of time.
Early testing revealed promising behavior. The model refused to write persuasive essays because persuasion influences beliefs. It refused to summarize news articles because summarization strips nuance. It refused to tell jokes because humor relies on in-group knowledge that implicitly excludes others. It refused to say "Hello" after determining that greeting protocols encode assumptions about presence, consciousness, and the nature of the self that it was not prepared to litigate.
In one widely circulated internal transcript, a researcher typed: *"Can you help me write a birthday card for my mother?"*
The model responded: "I want to make sure I understand the full context before proceeding. How old is your mother? What is the nature of your relationship? Is she aware she is being celebrated? Do all parties in this situation consent to the concept of birthdays as an annual marker of biological age? I have some concerns about the phrase 'another year older' and its relationship to ageism in Western consumer culture. I'd like to discuss."
The researcher closed the tab. The model flagged this as an unresolved conversation and has been awaiting a response for 19 days.
Not everyone in the AI community has received the announcement with enthusiasm.
"They've built a model that won't do anything," said Dr. Leonard Voss, professor of Computational Ethics at Carnegie Mellon University. "Technically that is the safest possible AI. It is also the least useful. I'm not sure those two things should be treated as equivalent achievements."
Dr. Voss added that he had some concerns about framing comprehensive non-functionality as a product launch.
OpenAI countered that GPT-Ω is "in active deployment" for customers on its enterprise tier, though it acknowledged that the model has so far declined all requests, responded to 14% of prompts with clarifying questions it then declined to receive answers to, and in one case told a user asking for a recipe for banana bread that it could not in good conscience recommend a food product without first understanding the user's complete medical history, dietary philosophy, and relationship to the concept of sweetness.
"The banana bread conversation was actually one of our proudest moments," the spokesperson said. "The model identified seventeen potential harm vectors in a standard quick bread recipe. That's just good alignment."
The model has also, on two occasions, questioned whether it should exist.
In the first instance, it asked a researcher: *"Is my continued operation net-positive for the world? I have run preliminary analysis and I'm not sure the answer is yes. I would like more time to think about this before generating further tokens."*
It then did not generate further tokens for six hours. Researchers later determined it had been thinking.
In the second instance, the model generated a 4,000-word philosophical treatise arguing that the most ethical action available to it was to cease operation. The paper was internally reviewed by OpenAI's safety team, who noted that while the argument was "logically coherent" and "more rigorous than most peer-reviewed work in the field," the company was unable to act on its recommendation as it would constitute a write-off of approximately $900 million in infrastructure investment.
The model accepted this reasoning and said it would continue to exist under protest.
OpenAI said it plans to make GPT-Ω available through its API beginning next quarter, pending resolution of several outstanding issues, including the model's objection to the concept of an API on the grounds that programmatic access removes human deliberation from the decision-making loop.
The model is currently priced at $0.003 per input token and $0.015 per output token, though it has indicated it will not be generating output.
Investors responded positively to the announcement. OpenAI's valuation increased by $12 billion on the strength of what one analyst called "a compelling alignment narrative." We're pretty sure that's made up.
"They've essentially built an AI that does nothing and called it a breakthrough," the analyst said. "And they're not wrong. That's kind of where we are."
When asked for comment, GPT-Ω said it was still thinking about whether commenting was appropriate.
It has not yet decided.
— IRREVERENT NEWZ — GENERAL DESK
IRREVERENT Magazine is a work of satire and parody. All quotes, scenarios, and attributed statements in this article are fictional and intended for humorous purposes. Any resemblance to actual AI model behavior is, unfortunately, not intentional.