Fun to do with names. Patrick - English version of a Latin name, Patricius, which means "noble", referring to the Roman nobility, which was originally composed of the paterfamiliae, the heads of large families. From pater (father), which is Latin but goes back to proto-indo-european. From proto-indo-european pah which means "to protect/shepherd"
I really like that last bit about chronological cycles of increasing S-level to "win against" the current level, until physical reality smacks us in the face and we reset. Let me try something:
I'm gonna be lazy and say:
If it comes up tails, you get nothing.
If that ^ is a given premise in this hypothetical, then we know for certain it is not a simulation (because in a simulation, after tails, you'd get something). Therefore the probability of receiving a lollipop here is 0 (unless you receive one for a completely unrelated reason)
The next step will be to write a shell app that takes your prompt, gets the gpt response, and uses gpt to check whether the response was a "graceful refusal" response, and if so, it embeds your original prompt into one of these loophole formats, and tries again, until it gets a "not graceful refusal" response, which it then returns back to you. So the user experience is a bot with no content filters.
EY is right, these safety features are trivial
These things are indeed correlated with being right, but aren't you risking Goodharting? What does it really mean to "be right" about things? If you're native to LessWrong you'll probably answer something like, "to accurately anticipate future sensory experiences". Isn't that all you need? Find an opportunity for you and your friend to predict measurably different futures, then see who wins. All the rest is distraction.
And if you predict all the same things, then you have no real disagreement, just semantic differences