Follow-up to: Reshaping the AI Industry: Straightforward Appeals to Insiders


Introduction

The central issue with convincing people of the AI Risk is that the arguments for it are not respectable. In the public consciousness, the well's been poisoned by media, which relegated AGI to the domain of science fiction. In the technical circles, the AI Winter is to blame — there's a stigma against expecting AGI in the short term, because the field's been burned in the past.

As such, being seen taking the AI Risk seriously is bad for your status. It wouldn't advance your career, it wouldn't receive popular support or peer support, it wouldn't get you funding or an in with powerful entities. It would waste your time, if not mark you as a weirdo.

The problem, I would argue, lies only partly in the meat of the argument. Certainly, the very act of curtailing the AI capabilities research would step on some organizations' toes, and mess with people's careers. Some of the resistance is undoubtedly motivated by these considerations.

It's not, however, the whole story. If it were, we could've expected widespread public support, and political support from institutions which would be hurt by AI proliferation.

A large part of the problem lies in the framing of the arguments. The specific concept of AGI and risks thereof is politically poisonous, parsed as fictional nonsense or a social faux pas. And yet this is exactly what we reach for when arguing our cause. We talk about superintelligent entities worming their way out of boxes, make analogies to human superiority over animals and our escape from evolutionary pressures, extrapolate to a new digital species waging war on humanity.

That sort of talk is not popular with anyone. The very shape it takes, the social signals it sends, dooms it to failure.

Can we talk about something else instead? Can we reframe our arguments?


The Power of Framing

Humanity has developed a rich suite of conceptual frameworks to talk about the natural world. We can view it through the lens of economy, of physics, of morality, of art. We can empathize certain aspects of it while abstracting others away. We can take a single set of facts, and spin innumerable different stories out of them, without even omitting or embellishing any of them — simply by playing with emphases.

The same ground-truth reality can be comprehensively described in many different ways, simply by applying different conceptual frameworks. If humans were ideal reasoners, the choice of framework or narrative wouldn't matter — we would extract the ground-truth facts from the semantics, and reach the conclusion were always going to reach.

We are not, however, ideal reasoners. What spin we give to the facts matters.

The classical example goes as follows:

Participants were asked to choose between two treatments for 600 people affected by a deadly disease. Treatment A was predicted to result in 400 deaths, whereas treatment B had a 33% chance that no one would die but a 66% chance that everyone would die. This choice was then presented to participants either with positive framing, i.e. how many people would live, or with negative framing, i.e. how many people would die.

FramingTreatment ATreatment B
Positive"Saves 200 lives""A 33% chance of saving all 600 people, 66% possibility of saving no one."
Negative"400 people will die""A 33% chance that no people will die, 66% probability that all 600 will die."

Treatment A was chosen by 72% of participants when it was presented with positive framing ("saves 200 lives") dropping to 22% when the same choice was presented with negative framing ("400 people will die").

As another example, we can imagine two descriptions of an island — one that waxes rhapsodic on its picturesque landscapes, and one that dryly lists the island's contents in terms of their industrial uses. One would imagine that reading one or the other would have different effects on the reader's desire to harvest that island, even if both descriptions communicated the exact same set of facts.

More salient examples exist in the worlds of journalism and politics — these industries have developed advanced tools for telling any story in a way that advances the speaker's agenda.

Fundamentally, language matters. The way you speak, the conceptual handles you use, the facts you empathize and the story you tell, have social connotations that go beyond the literal truths of your statements.

And the AGI frame is, bluntly, a bad one. To those outside our circles, to anyone not feeling charitable, it communicates detachment from reality, fantastical thinking, overhyping, low status.

On top of that, framing has disproportionate effects on people with domain knowledge. Trying to convince a professional of something while using a bad frame is a twice-doomed endeavor.


What Frame Do We Want?

[Successful policies] allow people to continue to pretend to be trying to get the thing they want to pretend to want while actually getting more other things they actually want even if they can deny it.Robin Hanson

We don't have to use the AGI frame, I would argue. If the problem is with specific terms, such as "intelligence" and "AGI", we can start by tabooing them and other "agenty" terms, then seeing what convincing arguments we can come up with under these restrictions.

More broadly, we can repackage our arguments using a different conceptual framework — the way a poetic description of an island could be translated into utilitarian terms to advance the cause of resource-extraction. We simply have to look for a suitable one.  (I'll describe a concrete approach I consider promising in the next section.)

What we need is a frame of argumentation that is, at once:

  • Robust. It isn't a lie or mischaracterization, and wouldn't fall apart under minimal scrutiny. It is, fundamentally, a valid way to discuss what we're currently calling "the AI Risk".
  • Respectable. Being seen acting on it doesn't cost people social points, and indeed, grants them social points. (Alternatively, not acting on it once it's been made common knowledge costs social points.)
  • Safety-promoting. It causes people/companies to act in ways that reduce the AI Risk.

Also, as Rob notes:

Info about AGI propagates too slowly through the field, because when one ML person updates, they usually don't loudly share their update with all their peers. [...] On a gut level, they see that they have no institutional home and no super-widely-shared 'this is a virtuous and respectable way to do science' narrative.

By implication, there's a fair number of AI researchers who are "sold" on the AI Risk, but who can't publicly act on that belief because it'd have personal costs they're not willing to pay. Finding a frame that would be beneficial to be seen supporting would flip that dynamic: it would allow them to rally behind it, solve the coordination problem.


Potential Candidate

(I suggest taking the time to think about the problem on your own, before I potentially bias you.)

It seems that any effective framing would need to talk about AI systems as about volitionless mechanisms, not agents. From that, a framework naturally offers itself: software products and integrity thereof.

It's certainly a valid way to look at the problem. AI models are software, and they're used for the same tasks mundane software is. More parallels:

  • Modern large software is often an incomprehensible mess of code, and we barely understand how it works — much like ML models.
  • This incomprehensibility gives rise to wide varieties of bugs and unintended behaviors, and their severity and potential for catastrophic failures scales with the complexity of the application.
  • Poorly-audited software contains a lot of security vulnerabilities and instabilities. AI, as well.
  • Much like security, Alignment Won't Happen By Accident.
  • Do What I Mean is the equivalent of the AI control problem: how can we tell the program what we really want, instead of what we technically programmed it to do?

Most people would agree that putting a program that was never code-audited and couldn't be bug-fixed in charge of critical infrastructure is madness. That, at least, should be a "respectable" way to argue for the importance of interpretability research, and the foolishness of putting ML systems in control of anything important.

Mind, "respectable" doesn't mean "popular" — software security/reliability isn't exactly most companies' or users' top priority. But it's certainly viewed with more respect than the AI Risk. If we argued that integrity is especially important with regards to this particular software industry, we might get somewhere.

It wouldn't be smooth sailing, even then. We'd need to continuously argue that fixing "bugs" only after a failure has occurred "in the wild" is lethally irresponsible, and there would always be people trying to lower the standards for interpretability. But that should be relatively straightforward to oppose.

This much success would already be good. It would motivate companies that plan to use AI commercially to invest in interpretability, and make interpretability-focused research & careers more prestigious.

It wouldn't decisively address the real issue, however: AI labs conducting in-house experiments with large ML models. Some non-trivial work would need to be done to expand the frame — perhaps developing a suite of arguments where sufficiently powerful "glitches" could "spill over" in the environment. Making allusions to nuclear power and pollution, and borrowing some language from these subjects, might be a good way to start on that.

There would be some difficulties in talking about concrete scenarios, since they often involve AI models acting in unmistakably intelligent ways. But, for example, Paul Christiano's story would work with minimal adjustments, since the main "vehicle of agency" there is human economy.

To further ameliorate this problem, we can also imagine rolling out our arguments in stages. First, we may popularize the straightforward "AI as software" case that argues for interpretability and control of deployed models, as above. Then, once the language we use has been accepted as respectable and we've expanded the Overton Window such, we may extrapolate, and discuss concrete examples that involve AI models exhibiting agenty behaviors. If we have sufficient momentum, they should be accepted as natural extensions of established arguments, instead of instinctively dismissed.

New Comment
7 comments, sorted by Click to highlight new comments since:

Here are some of my thoughts after reflecting on this post for a day. These ideas are somewhat disconnected from one another but hopefully in aggregate provide some useful commentary on different aspects of the "reframing AI risk" proposal:

  • The results of the AI Safety Arguments Competition should be released soon (Thomas Woodside told me last week they were wrapping up review of argument submissions). If it went well, then we may see some compelling reframings coming out of that.
  • I agree that AGI discussion has historically been stigmatized, but it seems to be becoming less so. DeepMind isn't shy about using the term AGI on their website and in their podcast. Same with OpenAI who mentions "artificial general intelligence" in the headline of their About page, and whose CEO tweeted "AGI is gonna be wild" earlier this year.  Elon Musk tweeted a few weeks ago that he thinks we'll see AGI by 2029. Google Trends shows that searches on the terms "AGI" and "artificial intelligence" have been at historically high levels since around 2016.

    Projecting forward, I believe that as increasingly impressive large model and multimodal feats like GPT-N, DALL-E, etc. continue to be announced, it will become more natural for ML experts and other people following these developments to think and talk about AGI. Instead of seeming like a wacky far-off sci-fi idea, AGI starts to look like more something that's coming down the pipe. So while I think reframing AI risk to get around AGI stigma might be useful for getting traction on alignment ideas sooner, I don't think the benefit would be as large as it would be if you're assuming that AGI skepticism is constant.
  • Changing the terminology has costs and risks. You touch on this in your post (the "Robust" point), but I think it's worth emphasizing the care that needs to be taken with a sudden change. I worry that if we start using a term like "advanced software security" instead of "AGI alignment", some ML researchers will be trying to decode what you're saying. Then when they probe your models and realize what you're talking about, their reaction will be "Oh look, the paranoid AGI transhumanists are getting sneaky and using euphemisms now". Changing the terminology could also be confusing to people who are sympathetic or open to taking the risks seriously.  
  • I thought this post from Matthew Yglesias was interesting, where he's arguing for somewhat the opposite approach as this post. He says that we should embrace analogies to The Terminator movies to make AI risk more concrete and relatable, at least when discussing it with broader audiences.
  • There is something that has been bothering me about the dynamics of the ML/AI field for awhile now with respect to AGI risk - your post reminded me of it again such I'm going to finally try to write it down here. The opinions of ML researchers and engineers across the software industry are granted elevated respect on this topic. For example, the influential Grace et al. 2017 survey on AI timelines surveyed researchers from the NIPS/NeurIPS and ICML, which are conferences for people who work in ML broadly, not just for AGI researchers. Also, anecdotally, I have several friends who work in ML (but not on AGI research), and when I initially started talking to them about AGI risk they scoffed and looked down their noses (though over time have become more sympathetic).

    My point is that working on narrow ML systems and AGI research are very different things. We currently treat anyone in the ML industry as having an expert opinion on AGI, but I don't think we should. Working on a spam classifier model or self-driving cars does not qualify you to talk about AGI. In fact, it tends to bias you to think that AGI is more of a pipe dream than you would otherwise, because you're accustomed to dealing with the embarrassing day-to-day failures of present-day ML systems, and you assume that's the state of the art while not having time to read about recent research out of DeepMind and OpenAI. I would be interested in a reframing that rebalanced the default status/respect granted in AGI risk discussions from "anyone who works in ML" to "people who work at AGI research labs and people who have been specifically studying AGI".

I think "AI is software" that has "bugs" is dangerously misleading - it incorrectly implies that we know what we want AI to do, and just need to be a little more careful about how we program it. But in reality, today's AIs are not programmed, but instead semi-randomly chosen by a process that we do not fully understand and are not fully in control of. I think it's the latter part that need to be emphasized - we are not in control, and we do not know how to regain control of something that keeps getting further and further away from us, and a runaway crash is the only possible outcome of the current trajectory we are on.

In addition to being misleading, this just makes AI one more (small) facet of security.  But security is broadly underinvested in and there is limited government pushback.  In addition, there is already a security community which prioritizes other issues and thinks differently.  So this would place AI in the wrong metaphorical box.  

While I'm not a fan of the proposed solution I do want to note that its good that people are beginning to look at the problem. 

ChatGPT was recently launched, and it is so powerful, that it made me think that the problem of a misuse of a powerful AI  It's a very powerful tool. No one really knows how to use it, but I am sure, we will soon see it used as a tool for unpleasant things

But I also see more and more of perception of AI as a live entity with agency. People are having conversations with ChatGPT as with a human

Interesting proposal. Just finished reading and will be thinking on it.

One candidate for an alternative to "AGI safety" that is less precise but also less fraught is "ML safety", a term which I've noticed Dan Hendryks using.

Would be very interested in your feedback, once you've thought it through.

Just shared some of my thoughts in a top-level comment.