Quick Takes

decision theory is no substitute for utility function

some people, upon learning about decision theories such as LDT and how it cooperates on problems such as the prisoner's dilemma, end up believing the following:

my utility function is about what i want for just me; but i'm altruistic (/egalitarian/cosmopolitan/pro-fairness/etc) because decision theory says i should cooperate with other agents. decision theoritic cooperation is the true name of altruism.

it's possible that this is true for some people, but in general i expect that to be a mistaken anal... (read more)

Viliam6h20

ah, it also annoys me when people say that caring about others can only be instrumental.

what does it even mean? helping other people makes me feel happy. watching a nice movie makes me feel happy. the argument that I don't "really" care about other people would also prove that I don't "really" care about movies etc.

I am happy for the lucky coincidence that decision theories sometimes endorse cooperation, but I would probably do that regardless. for example, if I had an option to donate something useful to million people, or sell it to dozen people, I would... (read more)

1MinusGix7h
I agree, though I haven't seen many proposing that, but also see So8res' Decision theory does not imply that we get to have nice things, though this is coming from the opposite direction (with the start being about people invalidly assuming too much out of LDT cooperation) Though for our morals, I do think there's an active question of which pieces we feel better replacing with the more formal understanding, because there isn't a sharp distinction between our utility function and our decision theory. Some values trump others when given better tools. Though I agree that replacing all the altruism components is many steps farther than is the best solution in that regard.
2mako yass7h
An interesting question for me is how much true altruism is required to give rise to a generally altruistic society under high quality coordination frameworks. I suspect it's quite small. Another question is whether building coordination frameworks to any degree requires some background of altruism. I suspect that this is the case. It's the hypothesis I've accreted for explaining the success of post-war economies (war leads to a boom in altruism, generally increased fairness and mutual faith).
Mati_Roy10h151

it seems to me that disentangling beliefs and values are important part of being able to understand each other

and using words like "disagree" to mean both "different beliefs" and "different values" is really confusing in that regard

Viliam6h42

Lets use "disagree" vs "dislike".

So the usual refrain from Zvi and others is that the specter of China beating us to the punch with AGI is not real because limits on compute, etc. I think Zvi has tempered his position on this in light of Meta's promise to release the weights of its 400B+ model. Now there is word that SenseTime just released a model that beats GPT-4 Turbo on various metrics. Of course, maybe Meta chooses not to release its big model, and maybe SenseTime is bluffing--I would point out though that Alibaba's Qwen model seems to do pretty okay in the arena...anyway, my point is that I don't think the "what if China" argument can be dismissed as quickly as some people on here seem to be ready to do.

7Seth Herd15h
Are you saying that China will use Llama 3 400B weights as a basis for improving their research on LLMs? Or to make more tools from? Or to reach real AGI? Or what?
5Andrew Burns14h
Yes, yes. Probably not. And they already have a Sora clone called Vidu, for heaven's sake. We spend all this time debating: should greedy companies be in control, should government intervene, will intervention slow progress to the good stuff: cancer cures, longevity, etc. All of these arguments assume that WE (which I read as a gloss for the West) will have some say in the use of AGI. If the PRC gets it, and it is as powerful as predicted, these arguments become academic. And this is not because the Chinese are malevolent. It's because, AGI would fall into the hands of the CCP via their civil-military fusion. This is a far more calculating group than those in Western governments. Here officials have to worry about getting through the next election. There, they can more comfortably wield AGI for their ends while worrying less about palatability of the means: observe how the population quietly endured a draconian lock-down and only meekly revolted when conditions began to deteriorate and containment looked futile. I am not an accelerationist. But I am a get-it-before-them-ist. Whether the West (which I count as including Korea and Japan and Taiwan) can maintain our edge is an open question. A country that churns out PhDs and loves AI will not be easily thwarted.
niplav8h20

The standard way of dealing with this:

Quantify how much worse the PRC getting AGI would be than OpenAI getting it, or the US government, and how much existential risk there is from not pausing/pausing, or from the PRC/OpenAI/the US government building AGI first, and then calculating whether pausing to do {alignment research, diplomacy, sabotage, espionage} is higher expected value than moving ahead.

(Is China getting AGI first half the value of the US getting it first, or 10%, or 90%?)

The discussion over pause or competition around AGI has been lacking this so far. Maybe I should write such an analysis.

Gentlemen, calculemus!

I think that people who work on AI alignment (including me) have generally not put enough thought into the question of whether a world where we build an aligned AI is better by their values than a world where we build an unaligned AI. I'd be interested in hearing people's answers to this question. Or, if you want more specific questions:

  • By your values, do you think a misaligned AI creates a world that "rounds to zero", or still has substantial positive value?
  • A common story for why aligned AI goes well goes something like: "If we (i.e. humanity) align AI
... (read more)
Reply21111
Showing 3 of 13 replies (Click to show all)
2Wei Dai1d
Thank you for detailing your thoughts. Some differences for me: 1. I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them. 2. I'm perhaps less optimistic than you about commitment races. 3. I have some credence on max good and max bad being not close to balanced, that additionally pushes me towards the "unaligned AI is bad" direction. ETA: Here's a more detailed argument for 1, that I don't think I've written down before. Our universe is small enough that it seems plausible (maybe even likely) that most of the value or disvalue created by a human-descended civilization comes from its acausal influence on the rest of the multiverse. An aligned AI/civilization would likely influence the rest of the multiverse in a positive direction, whereas an unaligned AI/civilization would probably influence the rest of the multiverse in a negative direction. This effect may outweigh what happens in our own universe/lightcone so much that the positive value from unaligned AI doing valuable things in our universe as a result of acausal trade is totally swamped by the disvalue created by its negative acausal influence.

I'm also worried about unaligned AIs as a competitor to aligned AIs/civilizations in the acausal economy/society. For example, suppose there are vulnerable AIs "out there" that can be manipulated/taken over via acausal means, unaligned AI could compete with us (and with others with better values from our perspective) in the race to manipulate them.

This seems like a reasonable concern.

My general view is that it seems implausible that much of the value from our perspective comes from extorting other civilizations.

It seems unlikely to me that >5% of the... (read more)

2ryan_greenblatt8h
Naively, acausal influence should be in proportion to how much others care about what a lightcone controlling civilization does with our resources. So, being a small fraction of the value hits on both sides of the equation (direct value and acausal value equally). Of course, civilizations elsewhere might care relatively more about what happens in our universe than whoever controls it does. (E.g., their measure puts much higher relative weight on our universe than the measure of whoever controls our universe.) This can imply that acausal trade is extremely important from a value perspective, but this is unrelated to being "small" and seems more well described as large gains from trade due to different preferences over different universes. (Of course, it does need to be the case that our measure is small relative to the total measure for acausal trade to matter much. But surely this is true?) Overall, my guess is that it's reasonably likely that acausal trade is indeed where most of the value/disvalue comes from due to very different preferences of different civilizations. But, being small doesn't seem to have much to do with it.

conditionalization is not the probabilistic version of implies

P Q Q| P P → Q
T T T T
T F F F
F T N/A T
F F N/A T

Resolution logic for conditionalization:

if P:
	return Q
else:
	return None

Resolution logic for implies:

if P:
	return Q
else:
	return True

(equivalently) return not P or Q

Agreed code as coordination mechanism

Code nowadays can do lots of things, from buying items to controlling machines. This presents code as a possible coordination mechanism, if you can get multiple people to agree on what code should be run in particular scenarios and situations, that can take actions on behalf of those people that might need to be coordinated.

This would require moving away from the “one person committing code and another person reviewing” code model. 

This could start with many people reviewing the code, people could write their own t... (read more)

2faul_sname3d
Can you give a concrete example of a situation where you'd expect this sort of agreed-upon-by-multiple-parties code to be run, and what that code would be responsible for doing? I'm imagining something along the lines of "given a geographic boundary, determine which jurisdictions that boundary intersects for the purposes of various types of tax (sales, property, etc)". But I don't know if that's wildly off from what you're imagining.

Looks like someone has worked on this kind of thing for different reasons https://www.worlddriven.org/

1Will_Pearson3d
I was thinking of having evals that controlled deployment of LLMs could be something that needs multiple stakeholders to agree upon. Butt really it is a general use pattern.
Raemon9d30

What would a "qualia-first-calibration" app would look like?

Or, maybe: "metadata-first calibration"

The thing with putting probabilities on things is that often, the probabilities are made up. And the final probability throws away a lot of information about where it actually came from.

I'm experimenting with primarily focusing on "what are all the little-metadata-flags associated with this prediction?". I think some of this is about "feelings you have" and some of it is about "what do you actually know about this topic?"

The sort of app I'm imagining would he... (read more)

"what are all the little-metadata-flags associated with this prediction?"

Some metadata flags I associate with predictions:

  • what kinds of evidence went into this prediction? ('did some research', 'have seen things like this before', 'mostly trusting/copying someone else's prediction')
    • if I'm taking other people's predictions into account, there's a metadata-flags for 'what would my prediction be if I didn't consider other people's predictions?'
  • is this a domain in which I'm well calibrated?
  • is my prediction likely to change a lot, or have I already seen most of the evidence that I expect to for a while?
  • how important is this?

My current main cruxes:

  1. Will AI get takeover capability? When?
  2. Single ASI or many AGIs?
  3. Will we solve technical alignment?
  4. Value alignment, intent alignment, or CEV?
  5. Defense>offense or offense>defense?
  6. Is a long-term pause achievable?

If there is reasonable consensus on any one of those, I'd much appreciate to know about it. Else, I think these should be research priorities.

I offer, no consensus, but my own opinions: 

Will AI get takeover capability? When?

0-5 years.

Single ASI or many AGIs?

There will be a first ASI that "rules the world" because its algorithm or architecture is so superior. If there are further ASIs, that will be because the first ASI wants there to be. 

Will we solve technical alignment?

Contingent. 

Value alignment, intent alignment, or CEV?

For an ASI you need the equivalent of CEV: values complete enough to govern an entire transhuman civilization. 

Defense>offense or offense>defense?

Of... (read more)

AGI doom by noise-cancelling headphones:                                                                            

ML is already used to train what sound-waves to emit to cancel those from the environment. This works well with constant high-entropy sound waves easy to predict, but not with low-entropy sounds like speech. Bose or Soundcloud or whoever train very hard on... (read more)

Showing 3 of 10 replies (Click to show all)

FWIW it was obvious to me

2Seth Herd12d
Thanks! A joke explained will never get a laugh, but I did somehow get a cackling laugh from your explanation of the joke. I think I didn't get it because I don't think the trend line breaks. If you made a good enough noise reducer, it might well develop smart and distinct enough simulations that one would gain control of the simulator and potentially from there the world. See A smart enough LLM might be deadly simply if you run it for long enough if you want to hurt your head on this. I've thought about it a little because it's interesting, but not a lot because I think we probably are killed by agents we made deliberately long before we're killed by accidentally emerging ones.
3faul_sname16d
Fixed, thanks
nim1d20

I've found an interesting "bug" in my cognition: a reluctance to rate subjective experiences on a subjective scale useful for comparing them. When I fuzz this reluctance against many possible rating scales, I find that it seems to arise from the comparison-power itself.

The concrete case is that I've spun up a habit tracker on my phone and I'm trying to build a routine of gathering some trivial subjective-wellbeing and lifestyle-factor data into it. My prototype of this system includes tracking the high and low points of my mood through the day as recalled ... (read more)

dirk2d30

I'm not alexithymic; I directly experience my emotions and have, additionally, introspective access to my preferences. However, some things manifest directly as preferences which I have been shocked to realize in my old age, were in fact emotions all along. (In rare cases these are stronger than the ones directly-felt even, despite reliably seeming on initial inspection to be simply neutral metadata).

2Viliam1d
Specific examples would be nice. Not sure if I understand correctly, but I imagine something like this: You always choose A over B. You have been doing it for such long time that you forgot why. Without reflecting about this directly, it just seems like there probably is a rational reason or something. But recently, either accidentally or by experiment, you chose B... and realized that experiencing B (or expecting to experience B) creates unpleasant emotions. So now you know that the emotions were the real cause of choosing A over B all that time. (This is probably wrong, but hey, people say that the best way to elicit answer is to provide a wrong one.)

Here's an example for you: I used to turn the faucet on while going to the bathroom, thinking it was due simply to having a preference for somewhat-masking the sound of my elimination habits from my housemates, then one day I walked into the bathroom listening to something-or-other via earphones and forgetting to turn the faucet on only to realize about halfway through that apparently I actually didn't much care about such masking, previously being able to hear myself just seemed to trigger some minor anxiety about it I'd failed to recognize, though its ab... (read more)

dirk2d40

I'm against intuitive terminology [epistemic status: 60%] because it creates the illusion of transparency; opaque terms make it clear you're missing something, but if you already have an intuitive definition that differs from the author's it's easy to substitute yours in without realizing you've misunderstood.

1cubefox1d
I agree. This is unfortunately often done in various fields of research where familiar terms are reused as technical terms. For example, in ordinary language "organic" means "of biological origin", while in chemistry "organic" describes a type of carbon compound. Those two definitions mostly coincide on Earth (most such compounds are of biological origin), but when astronomers announce they have found "organic" material on an asteroid this leads to confusion.

Also astronomers: anything heavier than helium is a "metal"

Research Writing Workflow: First figure stuff out

  • Do research and first figure stuff out, until you feel like you are not confused anymore.
  • Explain it to a person, or a camera, or ideally to a person and a camera.
    • If there are any hiccups expand your understanding.
    • Ideally, as the last step, explain it to somebody whom you have not ever explained it to.
  • Only once you made a presentation without hiccups you are ready to write post.
    • If you have a recording this is useful as a starting point.

I like the rough thoughts way though. I'm not here to like read a textbook.

Nathan and Carson's Manifold discussion.

As of the last edit my position is something like:

"Manifold could have handled this better, so as not to force everyone with large amounts of mana to have to do something urgently, when many were busy. 

Beyond that they are attempting to satisfy two classes of people:

  • People who played to donate can donate the full value of their investments
  • People who played for fun now get the chance to turn their mana into money

To this end, and modulo the above hassle this decision is good. 

It is unclear to me whether there... (read more)

Showing 3 of 14 replies (Click to show all)

Nevertheless lots of people were hassled. That has real costs, both to them and to you. 

2Nathan Young1d
If that were true then there are many ways you could partially do that - eg give people a set of tokens to represent their mana at the time of the devluation and if at future point you raise. you could give them 10x those tokens back.
2Nathan Young1d
I’m discussing with Carson. I might change my mind but i don’t know that i’ll argue with both of you at once.

Have there been any great discoveries made by someone who wasn't particularly smart?

This seems worth knowing if you're considering pursuing a career with a low chance of high impact. Is there any hope for relatively ordinary people (like the average LW reader) to make great discoveries?

Various sailors made important discoveries back when geography was cutting-edge science.  And they don't seem particularly bright.

Vasco De Gama discovered that Africa was circumnavigable.

Columbus was wrong about the shape of the Earth, and he discovered America.  He died convinced that his newly discovered islands were just off the coast of Asia, so that's a negative sign for his intelligence (or a positive sign for his arrogance, which he had in plenty.)

Cortez discovered that the Aztecs were rich and easily conquered.

Of course, lots of other wou... (read more)

5niplav2d
My best guess is that people in these categories were ones that were high in some other trait, e.g. patience, which allowed them to collect datasets or make careful experiments for quite a while, thus enabling others to make great discoveries. I'm thinking for example of Tycho Brahe, who is best known for 15 years of careful astronomical observation & data collection, or Gregor Mendel's 7-year-long experiments on peas. Same for Dmitry Belayev and fox domestication. Of course I don't know their cognitive scores, but those don't seem like a bottleneck in their work. So the recipe to me looks like "find an unexplored data source that requires long-term observation to bear fruit, but would yield a lot of insight if studied closely, then investigate".
4Gunnar_Zarncke2d
I asked ChatGPT  and it's difficult to get examples out of it. Even with additional drilling down and accusing it of being not inclusive of people with cognitive impairments, most of its examples are either pretty smart anyway, savants or only from poor backgrounds. The only ones I could verify that fit are: * Richard Jones accidentally created the Slinky * Frank Epperson, as a child, Epperson invented the popsicle * George Crum inadvertently invented potato chips I asked ChatGPT (in a separate chat) to estimate the IQ of all the inventors is listed and it is clearly biased to estimate them high, precisely because of their inventions. It is difficult to estimate the IQ of people retroactively. There is also selection and availability bias.

I expect large parts of interpretability work could be safely automatable very soon (e.g. GPT-5 timelines) using (V)LM agents; see A Multimodal Automated Interpretability Agent for a prototype. 

Notably, MAIA (GPT-4V-based) seems approximately human-level on a bunch of interp tasks, while (overwhelmingly likely) being non-scheming (e.g. current models are bad at situational awareness and out-of-context reasoning) and basically-not-x-risky (e.g. bad at ARA).

Given the potential scalability of automated interp, I'd be excited to see plans to use large amo... (read more)

Showing 3 of 8 replies (Click to show all)
2ryan_greenblatt2d
Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of internals. I agree that this model might help in performing various input/output experiments to determine what made a model do a given suspicious action.

Noteably, the mainline approach for catching doesn't involve any internals usage at all, let alone labeling a bunch of things.

This was indeed my impression (except for potentially using steering vectors, which I think are mentioned in one of the sections in 'Catching AIs red-handed'), but I think not using any internals might be overconservative / might increase the monitoring / safety tax too much (I think this is probably true more broadly of the current control agenda framing).

1Bogdan Ionut Cirstea2d
Hey Jacques, sure, I'd be happy to chat!  
dirk2d125

Sometimes a vague phrasing is not an inaccurate demarkation of a more precise concept, but an accurate demarkation of an imprecise concept

Yeah. It's possible to give quite accurate definitions of some vague concepts, because the words used in such definitions also express vague concepts. E.g. "cygnet" - "a young swan".

1dkornai2d
I would say that if a concept is imprecise, more words [but good and precise words] have to be dedicated to faithfully representing the diffuse nature of the topic. If this larger faithful representation is compressed down to fewer words, that can lead to vague phrasing. I would therefore often view vauge phrasing as a compression artefact, rather than a necessary outcome of translating certain types of concepts to words. 

Today I learned that being successful can involve feelings of hopelessness.

When you are trying to solve a hard problem, where you have no idea if you can solve it, let alone if it is even solvable at all, your brain makes you feel bad. It makes you feel like giving up.

This is quite strange because most of the time when I am in such a situation and manage to make a real efford anyway I seem to always suprise myself with how much progress I manage to make. Empirically this feeling of hopelessness does not seem to track the actual likelyhood that you will completely fail.

Showing 3 of 5 replies (Click to show all)
7Carl Feynman4d
I was depressed once for ten years and didn’t realize that it was fixable.  I thought it was normal to have no fun and be disagreeable and grumpy and out of sorts all the time.  Now that I’ve fixed it, I’m much better off, and everyone around me is better off.  I enjoy enjoyable activities, I’m pleasant to deal with, and I’m only out of sorts when I’m tired or hungry, as is normal. If you think you might be depressed, you might be right, so try fixing it.  The cost seems minor compared to the possible benefit (at least it was in my case.). I don’t think there’s a high possibility of severe downside consequences, but I’m not a psychiatrist, so what do I know. I had been depressed for a few weeks at a time in my teens and twenties and I thought I knew how to fix it: withdraw from stressful situations, plenty of sleep, long walks in the rain.  (In one case I talked to a therapist, which didn’t feel like it helped.)  But then it crept up on me slowly in my forties and in retrospect I spent ten years being depressed. So fixing it started like this.  I have a good friend at work, of many years standing.  I’ll call him Barkley, because that‘s not his name.  I was riding in the car with my wife, complaining about some situation at work.  My wife said “well, why don’t you ask Barkley to help?”  And I said “Ahh, Barkley doesn’t care.”  And my wife said “What are you saying?  Of course he cares about you.”  And I realized in that moment that I was detached from reality, that Barkley was a good friend who had done many good things for me, and yet my brain was saying he didn’t care.  And thus my brain was lying to me to make me miserable.  So I think for a bit and say “I think I may be depressed.”  And my wife thinks (she told me later) “No duh, you’re depressed. It’s been obvious for years to people who know you.”  But she says “What would you like to do about it?” And I say, “I don’t know, suffer I guess, do you have a better idea?”  And she says “How about if I find you a
1Johannes C. Mayer3d
This is useful. Now that I think about it, I do this. Specifically, I have extremely unrealistic assumptions about how much I can do, such that these are impossible to accomplish. And then I feel bad for not accomplishing the thing. I haven't tried to be mindful of that. The problem is that this is I think mainly subconscious. I don't think things like "I am dumb" or "I am a failure" basically at all. At least not in explicit language. I might have accidentally suppressed these and thought I had now succeeded in not being harsh to myself. But maybe I only moved it to the subconscious level where it is harder to debug.

I would highly recommend getting someone else to debug your subconscious for you.  At least it worked for me.  I don’t think it would be possible for me to have debugged myself.
 

My first therapist was highly directive.  He’d say stuff like “Try noticing when you think X, and asking yourself what happened immediately before that.  Report back next week.” And listing agenda items and drawing diagrams on a whiteboard.  As an engineer, I loved it.  My second therapist was more in the “providing supportive comments while I tal... (read more)

Fabien Roger2dΩ6130

List sorting does not play well with few-shot mostly doesn't replicate with davinci-002.

When using length-10 lists (it crushes length-5 no matter the prompt), I get:

  • 32-shot, no fancy prompt: ~25%
  • 0-shot, fancy python prompt: ~60% 
  • 0-shot, no fancy prompt: ~60%

So few-shot hurts, but the fancy prompt does not seem to help. Code here.

I'm interested if anyone knows another case where a fancy prompt increases performance more than few-shot prompting, where a fancy prompt is a prompt that does not contain information that a human would use to solve the task. ... (read more)

Load More