TsviBT - LessWrong

That's another main possibility. I don't buy the reasoning in general though--integrity is just super valuable. (Separately I'm aware of projects that are very important and neglected (legibly so) without being funded, so I don't overall believe that there are a bunch of people strategically capitulating to anti-integrity systems in order to fund key projects.) Anyway, my main interest here is to say that there is a real, large-scale, ongoing problem(s) with the social world, which increases X-risk; it would be good for some people to think clearly about that; and it's not good to be satisfied with false / vague / superficial stories about what's happening.

Non-Disparagement Canaries for OpenAI

TsviBT3h52

I'm interpreting "realize" colloquially, as in, "be aware of". I don't think the people discussed in the post just haven't had it occur to them that pre-singularity wealth doesn't matter because a win singularity society very likely wouldn't care much about it. Instead someone might, for example...

...care a lot about their and their people's lives in the next few decades.
...view it as being the case that [wealth mattering] is dependent on human coordination, and not trust others to coordinate like that. (In other words: the "stakeholders" would have to all agree to cede de facto power from themselves, to humanity.)
...not agree that humanity will or should treat wealth as not mattering; and instead intend to pursue a wealthy and powerful position mid-singularity, with the expectation of this strategy having large payoffs.
...be in some sort of mindbroken state (in the genre of Moral Mazes), such that they aren't really (say, in higher-order derivatives) modeling the connection between actions and long-term outcomes, and instead are, I don't know, doing something else, maybe involving arbitrary obeisance to power.

I don't know what's up with people, but I think it's potentially important to understand deeply what's up with people, without making whatever assumption goes into thinking that IF someone only became aware of this vision of the future, THEN they would adopt it.

(If Tammy responded that "realize" was supposed to mean the etymonic sense of "making real" then I'd have to concede.)

Seth Herd's Shortform

TsviBT6h122

the AGI can be corrected and can act as a collaborator in improving its alignment as we collaborate to improve its intelligence.

Why do you think you can get to a state where the AGI is materially helping to solve extremely difficult problems (not extremely difficult like chess, extremely difficult like inventing language before you have language), and also the AGI got there due to some process that doesn't also immediately cause there to be a much smarter AGI? https://tsvibt.blogspot.com/2023/01/a-strong-mind-continues-its-trajectory.html

MIRI 2024 Communications Strategy

TsviBT1d95

IDK if there's political support that would be helpful and that could be affected by people saying things to their representatives. But if so, then it would be helpful to have a short, clear, on-point letter that people can adapt to send to their representatives. Things I'd want to see in such a letter:

AGI, if created, would destroy all or nearly all human value.
We aren't remotely on track to solving the technical problems that would need to be solved in order to build AGI without destroying all or nearly all human value.
Many researchers say they are trying to build AGI and/or doing research that materially contributes toward building AGI. None of those researchers has a plausible plan for making AGI that doesn't destroy all or nearly all human value.
As your constituent, I don't want all or nearly all human value to be destroyed.
Please start learning about this so that you can lend your political weight to proposals that would address existential risk from AGI.
This is more important to me than all other risks about AI combined.

Or something.

Non-Disparagement Canaries for OpenAI

TsviBT2d169

I wish you would realize that whatever we're looking at, it isn't people not realizing this.

Talent Needs of Technical AI Safety Teams

TsviBT3d40

Look... Consider the hypothetically possible situation that in fact everyone is very far from being on the right track, and everything everyone is doing doesn't help with the right track and isn't on track to get on the right track or to help with the right track.

Ok, so I'm telling you that this hypothetically possible situation seems to me like the reality. And then you're, I don't know, trying to retreat to some sort of agreeable live-and-let-live stance, or something, where we all just agree that due to model uncertainty and the fact that people have vaguely plausible stories for how their thing might possibly be helpful, everyone should do their own thing and it's not helpful to try to say that some big swath of research is doomed? If this is what's happening, then I think that what you in particular are doing here is a bad thing to do here.

Maybe we can have a phone call if you'd like to discuss further.

Talent Needs of Technical AI Safety Teams

TsviBT3d20

Talent Needs of Technical AI Safety Teams

TsviBT3d30

Doomed to irrelevance, or doomed to not being a complete solution in and of itself?

Doomed to not be trying to go to and then climb the mountain.

my brain is a dirty lying liar that lies to me at every opportunity

So then it isn't easy. But it's feedback. Also there's not that much distinction between making a philosophically rigorous argument and "doing introspection" in the sense I mean, so if you think the former is feasible, work from there.

Talent Needs of Technical AI Safety Teams

TsviBT3d20

Is there a particular reason you expect there to be exactly one hard part of the problem,

Have you stopped beating your wife? I say "the" here in the sense of like "the problem of climbing that mountain over there". If you're far away, it makes sense to talk about "the (thing over there)", even if, when you're up close, there's multiple routes, multiple summits, multiple sorts of needed equipment, multiple sources of risk, etc.

and for the part that ends up being hardest in the end to be the part that looks hardest to us now?

We make an argument like "any solution would have to address X" or "anything with feature Y does not do Z" or "property W is impossible", and then we can see what a given piece of research is and is not doing / how it is doomed to irrelevance. It's not like pointing to a little ball in ideaspace and being like "the answer is somewhere in here". Rather it's like cutting out a halfspace and saying "everything on this side of this plane is doomed, we'd have to be somewhere in the other half", or like pointing out a manifold that all research is on and saying "anything on this manifold is doomed, we'd have to figure out how to move somewhat orthogonalward".

research that stemmed from someone trying something extremely simple and getting an unexpected result

I agree IF we are looking at the objects in question. If LLMs were minds, the research would be much more relevant. (I don't care if you have an army of people who all agree on taking a stance that seems to imply that there's not much relevant difference between LLMs and future AGI systems that might kill everyone.)

What is your preferred method for getting feedback from reality on whether your theory describes the world as it is?

I think you (and everyone else) don't know how to ask this question properly. For example, "on whether your theory describes the world as it is" is a too-narrow idea of what our thoughts about minds are supposed to be. Sub-example: our thoughts about mind are supposed to also produce design ideas.

To answer your question: by looking at and thinking about minds. The only minds that currently exist are humans, and the best access you have to minds is introspection. (I don't mean meditation, I mean thinking and also thinking about thinking/wanting/acting--aka some kinds of philosophy and math.)

Talent Needs of Technical AI Safety Teams

TsviBT6d40

I broadly agree with this. (And David was like .7 out of the 1.5 profs on the list who I guessed might genuinely want to grant the needed freedom.)

I do think that people might do good related work in math (specifically, probability/information theory, logic, etc.--stuff about formalized reasoning), philosophy (of mind), and possibly in other places such as theoretical linguistics. But this would require that the academic context is conducive to good novel work in the field, which lower bar is probably far from universally met; and would require the researcher to have good taste. And this is "related" in the sense of "might write a paper which leads to another paper which would be cited by [the alignment textbook from the future] for proofs/analogies/evidence about minds".

LESSWRONG
LW

Posts

Wiki Contributions

Comments