LESSWRONGTags
LW

Adversarial Training

EditHistorySubscribe

Help improve this page

EditHistorySubscribe

Help improve this page

Adversarial Training

Contributors

Posts tagged Adversarial Training

2

145Ironing Out the Squiggles

12d

34

2

143Takeaways from our robust injury classifier project [Redwood Research]

2y

12

2

109Deep Forgetting & Unlearning for Safely-Scoped LLMs

5mo

29

2

37Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

2y

0

2

30Adversarial Robustness Could Help Prevent Catastrophic Misuse

5mo

18

2

17AI Safety 101 - Chapter 5.2 - Unrestricted Adversarial Training

Charbel-Raphaël

6mo

0

2

16AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

2y

0

2

9Some thoughts on why adversarial training might be useful

2y

6

1

42Latent Adversarial Training

2y

12

1

30EIS IX: Interpretability and Adversaries

1y

7

1

20Oversight Leagues: The Training Game as a Feature

2y

6

1

19EIS XI: Moving Forward

1y

2

1

17EIS XII: Summary

1y

0

1

6Continuous Adversarial Quality Assurance: Extending RLHF and Constitutional AI

10mo

0