Akash

Sequences

Leveling Up: advice & resources for junior alignment researchers

Wiki Contributions

Comments

Akash312

I found this answer helpful and persuasive– thank you!

Akash70

Potentially unpopular take, but if you have the skillset to do so, I'd rather you just come up with simple/clear explanations for why ARA is dangerous, what implications this has for AI policy, present these ideas to policymakers, and iterate on your explanations as you start to see why people are confused.

Note also that in the US, the NTIA has been tasked with making recommendations about open-weight models. The deadline for official submissions has ended but I'm pretty confident that if you had something you wanted them to know, you could just email it to them and they'd take a look. My impression is that they're broadly aware of extreme risks from certain kinds of open-sourcing but might benefit from (a) clearer explanations of ARA threat models and (b) specific suggestions for what needs to be done.

Akash121

Why do you think we are dropping the ball on ARA?

I think many members of the policy community feel like ARA is "weird" and therefore don't want to bring it up. It's much tamer to talk about CBRN threats and bioweapons. It also requires less knowledge and general competence– explaining ARA and autonomous systems risks is difficult, you get more questions, you're more likely to explain something poorly, etc.

Historically, there was also a fair amount of gatekeeping, where some of the experienced policy people were explicitly discouraging people from being explicit about AGI threat models (this still happens to some degree, but I think the effect is much weaker than it was a year ago.)

With all this in mind, I currently think raising awareness about ARA threat models and AI R&D threat models is one of the most important things for AI comms/policy efforts to get right.

In the status quo, even if the evals go off, I don't think we have laid the intellectual foundation required for policymakers to understand why the evals are dangerous. "Oh interesting– an AI can make copies of itself? A little weird but I guess we make copies of files all the time, shrug." or "Oh wow– AI can help with R&D? That's awesome– seems very exciting for innovation."

I do think there's a potential to lay the intellectual foundation before it's too late, and I think many groups are starting to be more direct/explicit about the "weirder" threat models. Also, I think national security folks have more of a "take things seriously and worry about things even if there isn't clear empirical evidence yet" mentality than ML people. And I think typical policymakers fall somewhere in between. 

Akash92

Minor note: Paul is at the US AI Safety Institute, while Jade & Geoffrey are at the UK AI Safety Institute. 

Akash51

@habryka I think you're making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether "an average reasonably smart person out in society" would see the difference as meaningful (IMO they would not). 

(My guess is you interpreted "reasonable people" to mean like "people who are really into reasoning about the world and trying to figure out the truth" and Kaj interpreted reasonable people to mean like "an average person." Kaj should feel free to correct me if I'm wrong.)

Akash51

My two cents RE particular phrasing:

When talking to US policymakers, I don't think there's a big difference between "causes a national security crisis" and "kills literally everyone." Worth noting that even though many in the AIS community see a big difference between "99% of people die but civilization restarts" vs. "100% of people die", IMO this distinction does not matter to most policymakers (or at least matters way less to them).

Of course, in addition to conveying "this is a big deal" you need to convey the underlying threat model. There are lots of ways to interpret "AI causes a national security emergency" (e.g., China, military conflict). "Kills literally everyone" probably leads people to envision a narrower set of worlds.

But IMO even "kills literally everybody" doesn't really convey the underlying misalignment/AI takeover threat model.

So my current recommendation (weakly held) is probably to go with "causes a national security emergency" or "overthrows the US government" and then accept that you have to do some extra work to actually get them to understand the "AGI--> AI takeover--> Lots of people die and we lose control" model.

Akash83

Thanks! Despite the lack of SMART goals, I still feel like this reply gave me a better sense of what your priorities are & how you'll be assessing success/failure.

One failure mode– which I'm sure is already on your radar– is something like: "MIRI ends up producing lots of high-quality stuff but no one really pays attention. Policymakers and national security people are very busy and often only read things that (a) directly relate to their work or (b) are sent to them by someone who they respect."

Another is something like: "MIRI ends up focusing too much on making arguments/points that are convincing to general audiences but fail to understand the cruxes/views of the People Who Matter." (A strawman version of this is something like "MIRI ends up spending a lot of time in the Bay and there's lots of pressure to engage a bunch with the cruxes/views of rationalists, libertarians, e/accs, and AGI company employees. Meanwhile, the kinds of conversations happening among natsec folks & policymakers look very different, and MIRI's materials end up being less relevant/useful to this target audience."

I'm extremely confident that these are already on your radar, but I figure it might be worth noting that these are two of the failure modes I'm most worried about. (I guess besides the general boring failure mode along the lines of "hiring is hard and doing anything is hard and maybe things just stay slow and when someone asks what good materials you guys have produced the answer is still 'we're working on it'.)

(Final note: A lot of my questions and thoughts have been critical, but I should note that I appreciate what you're doing & I'm looking forward to following MIRI's work in the space! :D)

Akash42

Thank you! I still find myself most curious about the "how will MIRI make sure it understands its audience" and "how will MIRI make sure its materials are read by policymakers + natsec people" parts of the puzzle. Feel free to ignore this if we're getting too in the weeds, but I wonder if you can share more details about either of these parts.

There is also an audience-specific component, and to do well on that, we do need to understand our audience better. We are working to recruit beta readers from appropriate audience pools.

There are several approaches here, most of which will not be executed by the comms team directly, we hand off to others

Akash2212

I'm surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.

But in practice, I'd be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)

And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)

I think liability also has the "added" problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it's just not how America ends up regulating things (we don't hold Adobe accountable for someone doing something bad with Photoshop.)

To be clear, I don't think "something is politically unpopular" should be a full-stop argument against advocating for it.

But I do think that "liability for AI companies" scores poorly both on "actual usefulness if implemented" and "political popularity/feasibility." I also think the "liability for AI companies" advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the "weirder" points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)

I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.

With this in mind, I'm not an expert in liability and admittedly haven't been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I'd be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here). 

Stylistic note: I'd prefer replies along the lines of "here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases" rather than replies along the lines of "here is a thing you can read about the general legal/philosophical arguments about how liability is good."

Load More