/r/mlsafety

Photograph via snooOG

ML/AI/DL research towards making models more safe, reliable, and aligned

https://twitter.com/topofmlsafety newsletter.mlsafety.org

/r/mlsafety

340 Subscribers

1

Improved Techniques for Optimization-Based Jailbreaking on Large Language Models

0 Comments
2024/06/04
19:01 UTC

1

Efficient Adversarial Training in LLMs with Continuous Attacks, Proposes a method for LLM adversarial training which does not require expensive discrete optimization steps

0 Comments
2024/05/29
22:02 UTC

2

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

0 Comments
2024/05/28
22:05 UTC

2

Benchmark Early and Red Team Often: A Framework for Assessing and Managing Dual-Use Hazards of AI Foundation Models

0 Comments
2024/05/27
21:53 UTC

2

"Our testbed, which we call Poser, is a step toward evaluating whether developers would be able to detect alignment faking."

0 Comments
2024/05/13
17:02 UTC

3

Paid facilitator roles for AI Safety, Ethics, and Society, a 12-week online course running running July-October 2024. Apply by May 31st!

We are excited to announce the launch of AI Safety, Ethics, and Society, a textbook on AI safety by Dan Hendrycks, Director of the Center for AI Safety, which is freely available!

We will be running a 12-week free online course in summer 2024, following a curriculum based on the textbook. Apply by May 31st to take part.

We are also actively seeking people with experience in AI safety (such as previous Intro to ML Safety participants) to serve as paid course facilitators - you can learn more and apply here.

Key topics discussed in the textbook and course include:

  • Fundamentals of modern AI systems and deep learning, scaling laws, and their implications for AI safety
  • Technical challenges in building safe AI including opaqueness, proxy gaming, and adversarial attacks, and their consequences for managing AI risks
  • The diverse sources of societal-scale risks from advanced AI, such as malicious use, accidents, rogue AI, and the role of AI racing dynamics and organizational risks
  • The importance of focussing on the safety of the sociotechnical systems within which AI is embedded, the relevance of safety engineering and complex systems theory, and approaches to managing tail events and black swans
  • Collective action problems associated with AI development and challenges with building cooperative AI systems
  • Approaches to AI governance, including safety standards and international treaties, and trade-offs between centralised and decentralised access to advanced AI
0 Comments
2024/04/25
03:55 UTC

2

$250K in Prizes: SafeBench Competition Announcement

The Center for AI Safety is excited to announce SafeBench, a competition to develop benchmarks for empirically assessing AI safety! This project is supported by Schmidt Sciences, with $250,000 in prizes available for the best benchmarks - submissions are open until February 25th, 2025.

To view additional info about the competition, including submission guidelines, example ideas and FAQs, visit https://www.mlsafety.org/safebench

If you are interested in receiving updates about SafeBench, feel free to sign up on our homepage here.

0 Comments
2024/03/27
16:21 UTC

Back To Top