/r/ControlProblem
Someday, AI will likely be smarter than us; maybe so much so that it could radically reshape our world. We don't know how to encode human values in a computer, so it might not care about the same things as us. If it does not care about our well-being, its acquisition of resources or self-preservation efforts could lead to human extinction.
Experts agree that this is one of the most challenging and important problems of our age.
Other terms: Superintelligence, AI Safety, Alignment Problem, AGI
How do we ensure future advanced AI will be beneficial to humanity? Experts agree this is one of the most crucial problems of our age, as one that, if left unsolved, can lead to human extinction or worse as a default outcome, but if addressed, can enable a radically improved world. Other terms for what we discuss here include Superintelligence, AI Safety, AGI X-risk, and the AI Alignment/Value Alignment Problem.
"People who say that real AI researchers don’t believe in safety research are now just empirically wrong." —Scott Alexander
"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." —Eliezer Yudkowsky
Our FAQ page <-- CLICK
Orthogonality and instrumental convergence are the 2 simple key ideas explaining why AGI will work against and even kill us by default. (Alternative text links)
MIRI - FAQ and more in-depth FAQ
SSC - Superintelligence FAQ
WaitButWhy - The AI Revolution and a reply
How can failing to control AGI cause an outcome even worse than extinction? Suffering risks (2) (3) (4) (5) (6) (7)
Be sure to check out our wiki for extensive further resources, including a glossary & guide to current research.
Talks at Google: Ensuring Smarter-than-Human Intelligence has a Positive Outcome
Nick Bostrom: What happens when our computers get smarter than we are?
/r/ControlProblem
What especially worries me about artificial intelligence is that I'm freaked out by my inability to marshal the appropriate emotional response. - Sam Harris (NPR, 2017)
I've been thinking alot about the public hardly caring about the artificial superintelligence control problem, and I believe a big reason is that the (my) feeble mind struggles to grasp the concept. A concrete notion of human intelligence is a genius—like Einstein. What is the concrete notion of artificial superintelligence?
If you can make that feel real and present, I believe I, and others, can better respond to the risk. After spending a lot of time learning about the material, I think there's a massive void here.
When people discuss the singularity, projections beyond that point often become "unfathomable." They say artificial superintelligence will have it's way with us, but what happens next is TBD.
I reject much of this, because we see low-hanging fruit for a greater intelligence everywhere. A simple example is the top speed of aircraft. If a rough upper limit for the speed of an object is the speed of light in air, ~299,700 km/s, and one of the fastest aircraft, NASA X-43 , has a speed of 3.27 km/s then we see there's a lot of room for improvement. Certainly a superior intelligence could engineer a faster one! Another engineering problem waiting to be seized upon: zero-day hacking exploits waiting to be uncovered with intelligent attention on them.
Thus, the "unfathomable" future is foreseeable to a degree. We know that engineerable things could be engineered by a superior intelligence. Perhaps they will want things that offer resources, like the rewards of successful hacks.
We are born with some innate fears, but many are learned. We learn to fear a gun because it makes a harmful explosion, or to fear a dog after it bites us.
Some things we should learn to fear are not observable with raw senses, like the spread of gas inside our homes. So a noxious scent is added enabling us to react appropriately. I've heard many logical arguments about superintelligence risk, but imo they don't convey the adequate emotional message. If your argument does nothing for my emotions, then it exists like a threatening but odorless gas—one that I fail to avoid because it goes undetected—so can you spice it up so that I understand on an emotional level the risk and requisite actions to take? I don't think that requires invoking esoteric science-fiction, because...
Another power our simple brains have is the ability to conjure up a feeling that isn't present. Consider this simple thought experiment: First, envision yourself in a zoo watching lions. What's the fear level? Now envision yourself inside the actual lion enclosure and the resultant fear. Now envision a lion galloping towards you while you're in the enclosure. Time to ruuunn!
Isn't the pleasure of any media, really, how it stirs your emotions?
So why can't someone walk me through the argument that makes me feel the risk of artificial superintelligence without requiring a verbose tome of work, or a lengthy film in an exotic world of science-fiction?
Sam Harris says, "What especially worries me about artificial intelligence is that I'm freaked out by my inability to marshal the appropriate emotional response." As a student of the discourse, I believe that's true for most.
I've gotten flack for saying this, but having watched MANY hours of experts discussing the existential risk of AI, I see very few express a congruent emotional response. I see frustration and the emotions of partisanship, but these exist with everything political. They remain in disbelief, it seems!
Conversely, when I hear people talk about fears of job loss from AI, the emotions square more closely with my expectations. There's sadness from those already impacted and palpable anger among those trying to protect their jobs. Perhaps the momentum around copyright protections for artists is a result of this fear. I've been around illness, death, grieving. I've experienced loss, and I find the expressions about AI and job loss more in-line with my expectations.
I think a huge, huge reason for the logic/emotion gap when it comes to the existential threat of artificial superintelligence is because the concept we're referring to is so poorly articulated. How can one address on an emotional level a "limitlessly-better-than-you'll-ever-be" entity in a future that's often regarded as unfathomable?
People drop their 'pdoom' or dully express short-term "extinction" risk timelines ("extinction" is also not relatable on an emotional level), deep technical tangents on one AI programming techniques. I'm sorry to say but I find these expressions so poorly calibrated emotionally with the actual meaning of what's being discussed.
Here are some of the best examples I've heard that try address the challenges I've outlined.
Eliezer Yudkowsky talks about Markets (the Stock Market) or Stockfish, that our existence in relation to them involves a sort of deference. Those are good depictions of the experience of being powerlessness/ignorant/accepting towards a greater force, but they're too narrow. Asking me, the listener, to generalize a Market or Stockfish to every action is a step too far that it's laughable. That's not even judgment — the exaggeration comes across so extreme that laughing is common response!
What also provokes fear for me is the concept of misuse risks. Consider a bad actor getting a huge amount of computing or robotics power to enable them to control devices, police the public with surveillance, squash disstent with drones, etc. This example is lacking because it doesn't describe loss of control, and it centers on preventing other humans from getting a very powerful tool. I think this is actually part of the narrative fueling the AI arms race, because it lends itself to a remedy where a good actor has to get the power first to supress bad actors. To be sure, it is a risk worth fearing and trying to mitigate, but...
Where is such a description of loss of control?
I suspect the inability to emotionally relate to supreintelligence is aided by a few biases: hubris and denial. When you lose a competition, hubris says: "Yeah I lost but I'm still the best at XYZ, I'm still special."
There's also a natural denial of death. Even though we inch closer to it daily, few actually think about it, and it's even hard to accept for those with terminal diseases.
So, if one is reluctant to accept that another entity is "better" than them out of hubris AND reluctant to accept that death is possible out of denial, well that helps explain why superintelligence is also such a difficult concept to grasp.
So, please, can someone, anyone, make the concept of artificial superintelligence more concrete? Do your words arouse in a reader like me a fear on par with being trapped in a lion's den, without asking us to read a massive tome or invest in watching an entire Netflix series? If so, I think you'll be communicating in a way I've yet to see in the discourse. I'll respond in the comments to tell you why your example did or didn't register on an emotional level for me.
It is common to hear about the big corporations hiring teams of people to actively censor information of latest AI models, is that a good thing or a bad thing?
Lately, I’ve been thinking about how we might give AI a clear guiding principle for aligning with humanity’s interests. A lot of discussions focus on technical safeguards—like interpretability tools, robust training methods, or multi-stakeholder oversight. But maybe we need a more fundamental objective that stands above all these individual techniques—a “North Star” metric that AI can optimize for, while still reflecting our shared values.
One idea that resonates with me is the concept of a Well-Being Index (WBI). Instead of chasing maximum economic output (e.g., GDP) or purely pleasing immediate user feedback, the WBI measures real, comprehensive well-being. For instance, it might include:
The idea is for these metrics to be calculated in (near) real-time—collecting data from local communities, districts, entire nations—to build an interactive map of societal health and resilience. Then, advanced AI systems, which must inevitably choose among multiple policy or resource-allocation suggestions, can refer back to the WBI as its universal target. By maximizing improvements in the WBI, an AI would be aiming to lift overall human flourishing, not just short-term profit or immediate clicks.
A Well-Being Index doesn’t solve alignment by itself, but it can provide a high-level objective that AI systems strive to improve—offering a consistent, human-centered yardstick. If we adopt WBI scoring as the ultimate measure of success, then all our interpretability methods, safety constraints, and iterative training loops would funnel toward improving actual human flourishing.
I’d love to hear thoughts on this. Could a globally recognized WBI serve as a “North Star” for advanced AI, guiding it to genuinely benefit humanity rather than chase narrower goals? What metrics do you think are most critical to capture? And how might we collectively steer AI labs, governments, and local communities toward adopting such a well-being approach?
(Looking forward to a fruitful discussion—especially about the feasibility and potential pitfalls!)
Surely stories such as these are red flag:
https://avasthiabhyudaya.medium.com/ai-as-a-fortune-teller-89ffaa7d699b
essentially, people are turning to AI for fortune telling. It signifies a risk of people allowing AI to guide their decisions blindly.
Imo more AI alignment research should focus on the users / applications instead of just the models.
I've been wondering about a breakout situation where several countries and companies have AGIs at roughly the same amount of intelligence, but one pulls sightly ahead and breaks out of control. If, and how, would the other almost-as-intelligent systems be able to defend against the rogue? Is it possible that we have a constant dynamic struggle between various AGIs trying to disable or destroy one another? Or would whichever was "smarter" or "faster" be able to recursively improve so much that it instantly overwhelmed all others?
What's the general state of the discussion on AGIs vs other AGIs?
To be fair, I don't think you should be making a decision based on whether it seems optimistic or pessimistic.
Believe what is true, regardless of whether you like it or not.
But some people seem to not want to think about AI safety because it seems pessimistic.
I think that it would be useful to have some kind of yardstick to at least ballpark how close we are to a complete take over/grey goo scenario being possible. I haven't been able to find something that codifies the level of danger we're at.
Do you think AI will replace actors and film makers?
RL is what makes deepseek-r1 so powerful. But only certain types of problems were used (math, reasoning). I propose using RL for alignment, not just the pipeline.
I am gonna keep it simple and plain in my text,
Apparently, OpenAI is working towards building AGI(Artificial General Intelligence) (a somewhat more advanced form of AI with same intellectual capacity as those of humans), but what if we focused on creating AI models specialized in specific domains, like medicine, ecology, or scientific research? Instead of pursuing general intelligence, these domain-specific AIs could enhance human experiences and tackle unique challenges.
It’s similar to how quantum computers isn’t just an upgraded version of classical computers we use today—it opens up entirely new ways of understanding and solving problems. Specialized AI could do the same, it can offer new pathways for addressing global issues like climate change, healthcare, or scientific discovery. Wouldn’t this approach be more impactful and appealing to a wider audience?
EDIT:
It also makes sense when you think about it. Companies spend billions on creating supremacy for GPUs and training models, while with specialized AIs, since they are mainly focused on one domain, at the same time, they do not require the same amount of computational resources as those required for building AGIs.
Whats your thought on this topic? given robotics is coming with AI
I was prepping for my meetup “how not to get replaced by AI” and stumbled onto a fundamental control problem. First, I’ve read several books on the alignment problem and thought I understood it till now. The control problem as I understand it was the cost function an Ai uses to judge the quality of its output so it can adjust its weights and improve. So let’s take an Ai software engineer agent… the model wants to improve at writing code and get better at scores on a test set. Using techniques like rlhf it could learn what solutions are better. With self play fb it can go much faster. For the tech company executive an Ai that can replace all developers is aligned with their values. But for the mid level (and soon senior) that got replaced, it’s not aligned with their values. Being unemployed sucks. UBI might not happen given the current political situation, and even if it did, 200k vs 24k shows ASI isn’t aligned with their values. The frontier models are excelling at math and coding because there are test sets. rStar-math by Microsoft and deepseek use judge of some sort to gauge how good the reasoning steps are. Claude, deepseek, gpt etc give good advice on how to survive during human job displacement. But not great. Not superhuman. Models will become super intelligent at replacing human labor but won’t be useful at helping one survive because they’re not being trained for that. There is no judge like there is for math and coding problems for compassion for us average folks. I’d like to propose things like training and test sets, benchmarks, judges, human feedback etc so any model could use it to fine tune. The alternative is ASI that only aligns with the billionaire class while not becoming super intelligent at helping ordinary people survive and thrive. I know this is a gnarly problem, I hope there is something to this. A model that can outcode every software engineer but has no ability to help those displaced earn a decent living may be super intelligent but it’s not aligned with us.
As the title says...
I once read from a teacher on X (twitter) and she said when calculators came out, most teachers were either thinking of a career change to quit teaching or open a side hustle so whatever comes up they're ready for it.
I'm sure a couple of us here know, not all AI/bots will replace your work, but they guys who are really good at using AI, are the ones we should be thinking of.
Another one is a design youtuber said on one of his videos, that when wordpress came out, a couple of designers quit, but only those that adapted, ended up realizing it was not more of a replacement but a helper sort of (could'nt understand his English well)
So why are you really scared, unless you won't adapt?