/r/ControlProblem
Someday, AI will likely be smarter than us; maybe so much so that it could radically reshape our world. We don't know how to encode human values in a computer, so it might not care about the same things as us. If it does not care about our well-being, its acquisition of resources or self-preservation efforts could lead to human extinction.
Experts agree that this is one of the most challenging and important problems of our age.
Other terms: Superintelligence, AI Safety, Alignment Problem, AGI
How do we ensure future advanced AI will be beneficial to humanity? Experts agree this is one of the most crucial problems of our age, as one that, if left unsolved, can lead to human extinction or worse as a default outcome, but if addressed, can enable a radically improved world. Other terms for what we discuss here include Superintelligence, AI Safety, AGI X-risk, and the AI Alignment/Value Alignment Problem.
"People who say that real AI researchers don’t believe in safety research are now just empirically wrong." —Scott Alexander
"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." —Eliezer Yudkowsky
Our FAQ page <-- CLICK
Orthogonality and instrumental convergence are the 2 simple key ideas explaining why AGI will work against and even kill us by default. (Alternative text links)
MIRI - FAQ and more in-depth FAQ
SSC - Superintelligence FAQ
WaitButWhy - The AI Revolution and a reply
How can failing to control AGI cause an outcome even worse than extinction? Suffering risks (2) (3) (4) (5) (6) (7)
Be sure to check out our wiki for extensive further resources, including a glossary & guide to current research.
Talks at Google: Ensuring Smarter-than-Human Intelligence has a Positive Outcome
Nick Bostrom: What happens when our computers get smarter than we are?
/r/ControlProblem
As a filmmaker (who already wrote another related post earlier) I'm diving into the potential emergence of a covert, transformative AI, I'm seeking insights into the subtle, almost imperceptible signs of an AI system growing beyond human control. My goal is to craft a realistic narrative that moves beyond the sensationalist "killer robot" tropes and explores a more nuanced, insidious technological takeover (also with the intent to shake up people, and show how this could be a possibility if we don't act).
Potential Early Warning Signs I came up with (refined by Claude):
I'm particularly interested in hearing from experts, tech enthusiasts, and speculative thinkers: What subtle signs might indicate an AI system is quietly expanding its influence? What would a genuinely intelligent system's first moves look like?
Bonus points for insights that go beyond sci-fi clichés and root themselves in current technological capabilities and potential evolutionary paths of AI systems.
It feels like nobody out of this bubble truly cares about AI safety. Even the industry giants who issue warnings don’t seem to really convey a real sense of urgency. It’s even worse when it comes to the general public. When I talk to people, it feels like most have no idea there’s even a safety risk. Many dismiss these concerns as "Terminator-style" science fiction and look at me lime I'm a tinfoil hat idiot when I talk about.
There's this 80s movie; The Day After (1983) that depicted the devastating aftermath of a nuclear war. The film was a cultural phenomenon, sparking widespread public debate and reportedly influencing policymakers, including U.S. President Ronald Reagan, who mentioned it had an impact on his approach to nuclear arms reduction talks with the Soviet Union.
I’d love to create a film (or at least a screen play for now) that very realistically portrays what an AI-driven catastrophe could look like - something far removed from movies like Terminator. I imagine such a disaster would be much more intricate and insidious. There wouldn’t be a grand war of humans versus machines. By the time we realize what’s happening, we’d already have lost, probably facing an intelligence capable of completely controlling us - economically, psychologically, biologically, maybe even on the molecular level in ways we don't even realize. The possibilities are endless and will most likely not need brute force or war machines...
I’d love to connect with computer folks and nerds who are interested in brainstorming realistic scenarios with me. Let’s explore how such a catastrophe might unfold.
Feel free to send me a chat request... :)
What is our latest knowledge of capability in the area of AI alignment and the control problem? Are we limited to asking it nicely to be good, and poking around individual nodes to guess which ones are deceitful? Do we have built-in loss functions or training data to steer toward true-alignment? Is there something else I haven't thought of?
If I say to MS Copilot "Don't be an ass!", it doesn't start explaining to me that it's not a donkey or a body part. It doesn't take my message literally.
So if I tell an AGI to produce paperclips, why wouldn't it understand the same way that I don't want it to turn the universe into paperclips? This AGI turining into a paperclip maximizer sounds like it would be dumber than Copilot.
What am I missing here?
Of course, there are a ton of trade-offs for who you can date, but finding somebody who helps you, rather than holds you back, is a pretty good thing to look for.
There is time spent finding the person, but this is usually done outside of work hours, so doesn’t actually affect your ability to help with AI safety.
Also, there should be a very strong norm against movements having any say in your romantic life.
Which of course also applies to this advice. Date whoever you want. Even date nobody! But don’t feel like you have to choose between impact and love.
I've just read a recent post by u/YaKaPeace talking about how OpenAI's o1 has outperformed him in some cognitive tasks and cause of that AGI has been reached (& according to him we are beyond AGI) and people are just shifting goalposts. So I'd like to ask, what is AGI (according to you), who gets to decide what AGI is & when can you definitely say "Alas, here is AGI". I think having a proper definition that a majority of people can agree with will then make working on the 'Control Problem' much easier.
For me, I take Shane Legg's definition of AGI: "Intelligence is the measure of an agent's ability to achieve goals in a wide range of environments." . Shane Legg's paper: Universal Intelligence: A Definition of Machine Intelligence .
I'll go further and say for us to truly say we have achieved AGI, your agent/system needs to provide a satisfactory operational definition of intelligence (Shane's definition). Your agent / system will need to pass the Total Turing Test (as described in AIMA) which is:
"Turing’s test deliberately avoided direct physical interaction between the interrogator and the computer, because physical simulation of a person was (at that time) unnecessary for intelligence. However, TOTAL TURING TEST the so-called total Turing Test includes a video signal so that the interrogator can test the subject’s perceptual abilities, as well as the opportunity for the interrogator to pass physical objects.”
So for me the Total Turing Test is the real goalpost to see if we have achieved AGI.