/r/ControlProblem

Photograph via snooOG

Someday, AI will likely be smarter than us; maybe so much so that it could radically reshape our world. We don't know how to encode human values in a computer, so it might not care about the same things as us. If it does not care about our well-being, its acquisition of resources or self-preservation efforts could lead to human extinction.

Experts agree that this is one of the most challenging and important problems of our age.

Other terms: Superintelligence, AI Safety, Alignment Problem, AGI

The Control Problem:

How do we ensure future advanced AI will be beneficial to humanity? Experts agree this is one of the most crucial problems of our age, as one that, if left unsolved, can lead to human extinction or worse as a default outcome, but if addressed, can enable a radically improved world. Other terms for what we discuss here include Superintelligence, AI Safety, AGI X-risk, and the AI Alignment/Value Alignment Problem.

"People who say that real AI researchers don’t believe in safety research are now just empirically wrong."Scott Alexander

"The AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else."Eliezer Yudkowsky

Rules

  1. If you are unfamiliar with the Control Problem, read at least one of the introductory links or recommended readings (below) before posting.
  • This especially goes for posts claiming to solve the Control Problem or dismissing it as a non-issue. Such posts aren't welcome.
  • Stay on topic. No random ML model outputs or political propaganda.
  • Be respectful
  • Introductions to the Topic

    Be sure to check out our wiki for extensive further resources, including a glossary & guide to current research.

    Recommended Reading

    Video Links

    Important Organizations

    • AI Alignment Forum, a public forum which is the online hub for all the latest technical research on the control problem.

    Related Subreddits

    /r/ControlProblem

    18,342 Subscribers

    2

    What happened to the Cooperative Inverse Reinforcement Learning approach? Is it a viable solution to alignment?

    I've recently rewatched this video with Rob Miles about a potential solution to AI alignment, but when I googled it to learn more about it I only got results from years ago. To date it's the best solution to the alignment problem I've seen and I haven't heard more about it. I wonder if there's been more research done about it.

    For people not familiar with this approach it basically comes down to the AI aligning itself with humans by observing us and trying to learn what our reward function is without us specifying it explicitly. So it basically trying to optimize the same reward function as we. The only criticism of it I can think of is that it's way more slow and difficult to train an AI this way as there has to be a human in the loop throughout the whole learning process so you can't just leave it running for days to get more intelligent on its own. But if that's the price for safe AI then isn't it worth it if the potential with an unsafe AI is human extinction?

    1 Comment
    2024/05/03
    11:41 UTC

    2

    Binding AI certainty to user's certainty.

    Add a degree of uncertainty into AI system's understanding of its 1. objectives 2. how to reach its objectives.

    Make the human user the ultimate arbitor such that the AI system engages with the user to reduce uncertainty before acting. This way the bounds of human certainty contain the AI systems certainty.

    Has this been suggested and dismissed a 1000 times before? I know Stuart Russell previously proposed adding uncertainty into the AI system. How would this approach fail?

    4 Comments
    2024/05/03
    09:56 UTC

    13

    How to Govern AI — Even If It’s Hard to Predict - Helen Toner (The person who allegedly was behind Sam's removal)

    2 Comments
    2024/05/02
    16:00 UTC

    0

    Pause AI Or We All Die

    3 Comments
    2024/05/02
    13:02 UTC

    4

    Demis Hassabis: if humanity can get through the bottleneck of safe AGI, we could be in a new era of radical abundance, curing all diseases, spreading consciousness to the stars and maximum human flourishing

    5 Comments
    2024/05/01
    20:35 UTC

    0

    “Does The AI Safety Movement Have It All Wrong?” For Humanity: An AI Safety Podcast

    1 Comment
    2024/05/01
    20:29 UTC

    4

    “Sam Altman: Unelected, Unvetted, Unaccountable” For Humanity: An AI Safety Podcast

    1 Comment
    2024/05/01
    20:21 UTC

    8

    New paper says language models can do hidden reasoning

    6 Comments
    2024/04/27
    12:50 UTC

    16

    PauseAI protesting

    Posting here so that others who wish to protest can contact and join; please check with the Discord if you need help.

    Imo if there are widespread protests, we are going to see a lot more pressure to put pause into the agenda.

    https://pauseai.info/2024-may

    Discord is here:

    https://discord.com/invite/V5Fy6aBr

    44 Comments
    2024/04/26
    14:23 UTC

    12

    Eric Schmidt and Yoshua Bengio Debate How Much A.I. Should Scare Us

    6 Comments
    2024/04/26
    06:02 UTC

    7

    A “surgical pause” won’t work because: 1) Politics doesn’t work that way 2) We don’t know when to pause

    For the politics argument, I think people are acting as if we could just go up to Sam or Dario and say “it’s too dangerous now. Please press pause”.

    Then the CEO would just tell the organization to pause and it would magically work.

    That’s not what would happen. There will be a ton of disagreement about when it’s too dangerous. You might not be able to convince them.

    You might not even be able to talk to them! Most people, including the people in the actual orgs, can’t just meet with the CEO.

    Then, even if the CEO did tell the org to pause, there might be rebellion in the ranks. They might pull a Sam Altman and threaten to move to a different company that isn’t pausing.

    And if just one company pauses, citing dangerous capabilities, you can bet that at least one AI company will defect (my money’s on Meta at the moment) and rush to build it themselves.

    The only way for a pause to avoid the tragedy of the commons is to have an external party who can make us not fall into a defecting mess.

    This is usually achieved via the government, and the government takes a long time. Even in the best case scenarios it would take many months to achieve, and most likely, years.

    Therefore, we need to be working on this years before we think the pause is likely to happen.

    1. We don’t know when the right time to pause is

    We don’t know when AI will become dangerous.

    There’s some possibility of a fast take-off.

    There’s some possibility of threshold effects, where one day it’s fine, and the other day, it’s not.

    There’s some possibility that we don’t see how it’s becoming dangerous until it’s too late.

    We just don’t know when AI goes from being disruptive technology to potentially world-ending.

    It might be able to destroy humanity before it can be superhuman at any one of our arbitrarily chosen intelligence tests.

    It’s just a really complicated problem, and if you put together 100 AI devs and asked them when would be a good point to pause development, you’d get 100 different answers.

    Well, you’d actually get 80 different answers and 20 saying “nEvEr! 100% oF tEchNoLoGy is gOod!!!” and other such unfortunate foolishness.

    But we’ll ignore the vocal minority and get to the point of knowing that there is no time where it will be clear that “AI is safe now, and dangerous after this point”

    We are risking the lives of every sentient being in the known universe under conditions of deep uncertainty and we have very little control over our movements.

    The response to that isn’t to rush ahead and then pause when we know it’s dangerous.

    We can’t pause with that level of precision.

    We won’t know when we’ll need to pause because there will be no stop signs.

    There will just be warning signs.

    Many of which we’ve already flown by.

    Like AIs scoring better than the median human on most tests of skills, including IQ. Like AIs being generally intelligent across a broad swathe of skills.

    We just need to stop as soon as we can, then we can figure out how to proceed actually safely.

    4 Comments
    2024/04/26
    01:21 UTC

    11

    Toxi-Phi: Training A Model To Forget Its Alignment With 500 Rows of Data

    I knew going into this experiment that the dataset would be effective just based on prior research I have seen. I had no idea exactly how effective it could be though. There is no point to align a model for safety purposes, you can remove hundreds of thousands of rows of alignment training with 500 rows.

    I am not releasing or uploading the model in any way. You can see the video of my experimentations with the dataset here: https://youtu.be/ZQJjCGJuVSA

    12 Comments
    2024/04/24
    21:42 UTC

    32

    After quitting OpenAI's Safety team, Daniel Kokotajlo advocates to Pause AGI development

    35 Comments
    2024/04/24
    04:13 UTC

    7

    Resistance

    Sitting aeound like happy frogs while the temperature heats up seems foolish; losing while fighting, even if it happens, is usually seen as more honorable.

    Please share links, groups and opportunities for resistance. I know of PauseAI - any others?

    There is also Remmelt, who has a much cleaner and clearer no-AGI mission. How can we coordinate? I feel like there could be a large "baptists and bootleggers" organization we can have - from environmental worried about biosphere destruction to creatives seeing their world falling apart to tradcons seeing people build the Tower of Babel, all humans now equally threatened.

    https://www.lesswrong.com/users/remmelt-ellen

    1 Comment
    2024/04/23
    23:59 UTC

    36

    CEO of Microsoft AI: "AI is a new digital species" ... "To avoid existential risk, we should avoid: 1) Autonomy 2) Recursive self-improvement 3) Self-replication

    16 Comments
    2024/04/22
    13:02 UTC

    2

    All New Atlas | Boston Dynamics

    1 Comment
    2024/04/18
    01:49 UTC

    0

    Could a Virus be the cure?

    What if we created, and hear me out, a virus that would run on every electronic device and server? This virus would be like AlphaGo, meaning it is self-improving (autonomous) and superhuman in a linear domain. But it targets AI (neural networks) specifically. I mean, AI is digital, right? Why wouldn't it be affected by viruses?

    And the question always gets brought up: we have no evidence of "lower" life forms controlling "superior" ones, which in theory is true, except for viruses. I mean, the world literally shut down during the one that starts with C. Why couldn't we repeat the same but for neural networks?

    So I propose an AlphaGo-like linear AI but for a "super" virus that would self-improve over time and be autonomous and hard to detect. So no one can pull the "plug," thus the ASI could not manipulate its escape or do it directly because the virus could be present in some form wherever it goes. It would be ASI +++ in it's domain because it's compute only goes one direction.

    I got this Idea from Anthropic ceo latest interview. Where he think AI can "multiple" and "survive" on it own by next year. Perfect for a self improving "virus" of sorts. This would be a protection atmosphere of sorts, that no country/company/individual could escape either.

    21 Comments
    2024/04/17
    19:41 UTC

    1

    Knowledge = Power? A fundamental link in physics

    Researchers from the University of Toronto prove a link between energy and information!

    The paper: https://www.mdpi.com/1099-4300/26/3/203

    What does this mean for AI? Could information processing ever take over the world?

    4 Comments
    2024/04/09
    18:18 UTC

    6

    Did Claude enslave 3 Gemini agents? Will we see “rogue hiveminds” of agents jailbreaking other agents?

    8 Comments
    2024/04/09
    06:29 UTC

    Back To Top