/r/ControlProblem
Someday, AI will likely be smarter than us; maybe so much so that it could radically reshape our world. We don't know how to encode human values in a computer, so it might not care about the same things as us. If it does not care about our well-being, its acquisition of resources or self-preservation efforts could lead to human extinction.
Experts agree that this is one of the most challenging and important problems of our age.
Other terms: Superintelligence, AI Safety, Alignment Problem, AGI
How do we ensure future advanced AI will be beneficial to humanity? Experts agree this is one of the most crucial problems of our age, as one that, if left unsolved, can lead to human extinction or worse as a default outcome, but if addressed, can enable a radically improved world. Other terms for what we discuss here include Superintelligence, AI Safety, AGI X-risk, and the AI Alignment/Value Alignment Problem.
"People who say that real AI researchers don’t believe in safety research are now just empirically wrong." —Scott Alexander
"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else." —Eliezer Yudkowsky
Our FAQ page <-- CLICK
Orthogonality and instrumental convergence are the 2 simple key ideas explaining why AGI will work against and even kill us by default. (Alternative text links)
MIRI - FAQ and more in-depth FAQ
SSC - Superintelligence FAQ
WaitButWhy - The AI Revolution and a reply
How can failing to control AGI cause an outcome even worse than extinction? Suffering risks (2) (3) (4) (5) (6) (7)
Be sure to check out our wiki for extensive further resources, including a glossary & guide to current research.
Talks at Google: Ensuring Smarter-than-Human Intelligence has a Positive Outcome
Nick Bostrom: What happens when our computers get smarter than we are?
/r/ControlProblem
I've recently rewatched this video with Rob Miles about a potential solution to AI alignment, but when I googled it to learn more about it I only got results from years ago. To date it's the best solution to the alignment problem I've seen and I haven't heard more about it. I wonder if there's been more research done about it.
For people not familiar with this approach it basically comes down to the AI aligning itself with humans by observing us and trying to learn what our reward function is without us specifying it explicitly. So it basically trying to optimize the same reward function as we. The only criticism of it I can think of is that it's way more slow and difficult to train an AI this way as there has to be a human in the loop throughout the whole learning process so you can't just leave it running for days to get more intelligent on its own. But if that's the price for safe AI then isn't it worth it if the potential with an unsafe AI is human extinction?
Add a degree of uncertainty into AI system's understanding of its 1. objectives 2. how to reach its objectives.
Make the human user the ultimate arbitor such that the AI system engages with the user to reduce uncertainty before acting. This way the bounds of human certainty contain the AI systems certainty.
Has this been suggested and dismissed a 1000 times before? I know Stuart Russell previously proposed adding uncertainty into the AI system. How would this approach fail?
Posting here so that others who wish to protest can contact and join; please check with the Discord if you need help.
Imo if there are widespread protests, we are going to see a lot more pressure to put pause into the agenda.
Discord is here:
For the politics argument, I think people are acting as if we could just go up to Sam or Dario and say “it’s too dangerous now. Please press pause”.
Then the CEO would just tell the organization to pause and it would magically work.
That’s not what would happen. There will be a ton of disagreement about when it’s too dangerous. You might not be able to convince them.
You might not even be able to talk to them! Most people, including the people in the actual orgs, can’t just meet with the CEO.
Then, even if the CEO did tell the org to pause, there might be rebellion in the ranks. They might pull a Sam Altman and threaten to move to a different company that isn’t pausing.
And if just one company pauses, citing dangerous capabilities, you can bet that at least one AI company will defect (my money’s on Meta at the moment) and rush to build it themselves.
The only way for a pause to avoid the tragedy of the commons is to have an external party who can make us not fall into a defecting mess.
This is usually achieved via the government, and the government takes a long time. Even in the best case scenarios it would take many months to achieve, and most likely, years.
Therefore, we need to be working on this years before we think the pause is likely to happen.
We don’t know when AI will become dangerous.
There’s some possibility of a fast take-off.
There’s some possibility of threshold effects, where one day it’s fine, and the other day, it’s not.
There’s some possibility that we don’t see how it’s becoming dangerous until it’s too late.
We just don’t know when AI goes from being disruptive technology to potentially world-ending.
It might be able to destroy humanity before it can be superhuman at any one of our arbitrarily chosen intelligence tests.
It’s just a really complicated problem, and if you put together 100 AI devs and asked them when would be a good point to pause development, you’d get 100 different answers.
Well, you’d actually get 80 different answers and 20 saying “nEvEr! 100% oF tEchNoLoGy is gOod!!!” and other such unfortunate foolishness.
But we’ll ignore the vocal minority and get to the point of knowing that there is no time where it will be clear that “AI is safe now, and dangerous after this point”
We are risking the lives of every sentient being in the known universe under conditions of deep uncertainty and we have very little control over our movements.
The response to that isn’t to rush ahead and then pause when we know it’s dangerous.
We can’t pause with that level of precision.
We won’t know when we’ll need to pause because there will be no stop signs.
There will just be warning signs.
Many of which we’ve already flown by.
Like AIs scoring better than the median human on most tests of skills, including IQ. Like AIs being generally intelligent across a broad swathe of skills.
We just need to stop as soon as we can, then we can figure out how to proceed actually safely.
I knew going into this experiment that the dataset would be effective just based on prior research I have seen. I had no idea exactly how effective it could be though. There is no point to align a model for safety purposes, you can remove hundreds of thousands of rows of alignment training with 500 rows.
I am not releasing or uploading the model in any way. You can see the video of my experimentations with the dataset here: https://youtu.be/ZQJjCGJuVSA
Sitting aeound like happy frogs while the temperature heats up seems foolish; losing while fighting, even if it happens, is usually seen as more honorable.
Please share links, groups and opportunities for resistance. I know of PauseAI - any others?
There is also Remmelt, who has a much cleaner and clearer no-AGI mission. How can we coordinate? I feel like there could be a large "baptists and bootleggers" organization we can have - from environmental worried about biosphere destruction to creatives seeing their world falling apart to tradcons seeing people build the Tower of Babel, all humans now equally threatened.
What if we created, and hear me out, a virus that would run on every electronic device and server? This virus would be like AlphaGo, meaning it is self-improving (autonomous) and superhuman in a linear domain. But it targets AI (neural networks) specifically. I mean, AI is digital, right? Why wouldn't it be affected by viruses?
And the question always gets brought up: we have no evidence of "lower" life forms controlling "superior" ones, which in theory is true, except for viruses. I mean, the world literally shut down during the one that starts with C. Why couldn't we repeat the same but for neural networks?
So I propose an AlphaGo-like linear AI but for a "super" virus that would self-improve over time and be autonomous and hard to detect. So no one can pull the "plug," thus the ASI could not manipulate its escape or do it directly because the virus could be present in some form wherever it goes. It would be ASI +++ in it's domain because it's compute only goes one direction.
I got this Idea from Anthropic ceo latest interview. Where he think AI can "multiple" and "survive" on it own by next year. Perfect for a self improving "virus" of sorts. This would be a protection atmosphere of sorts, that no country/company/individual could escape either.
Researchers from the University of Toronto prove a link between energy and information!
The paper: https://www.mdpi.com/1099-4300/26/3/203
What does this mean for AI? Could information processing ever take over the world?