Menu

Inadequate Equilibria

Where and How Civilizations Get Stuck

Chapter 6

Against Modest Epistemology

Modest epistemology doesn’t need to reflect a skepticism about causal models as such. It can manifest instead as a wariness about putting weight down on one’s own causal models, as opposed to others’.

In 1976, Robert Aumann demonstrated that two ideal Bayesian reasoners with the same priors cannot have common knowledge of a disagreement. Tyler Cowen and Robin Hanson have extended this result, establishing that even under various weaker assumptions, something has to go wrong in order for two agents with the same priors to get stuck in a disagreement.1 If you and a trusted peer don’t converge on identical beliefs once you have a full understanding of one another’s positions, at least one of you must be making some kind of mistake.

If we were fully rational (and fully honest), then we would always eventually reach consensus on questions of fact. To become more rational, then, shouldn’t we set aside our claims to special knowledge or insight and modestly profess that, really, we’re all in the same boat?

When I’m trying to sort out questions like these, I often find it useful to start with a related question: “If I were building a brain from scratch, would I have it act this way?”

If I were building a brain and I expected it to have some non-fatal flaws in its cognitive algorithms, I expect that I would have it spend some of its time using those flawed reasoning algorithms to think about the world; and I would have it spend some of its time using those same flawed reasoning algorithms to better understand its reasoning algorithms. I would have the brain spend most of its time on object-level problems, while spending some time trying to build better meta-level models of its own cognition and how its cognition relates to its apparent success or failure on object-level problems.

If the thinker is dealing with a foreign cognitive system, I would want the thinker to try to model the other agent’s thinking and predict the degree of accuracy this system will have. However, the thinker should also record the empirical outcomes, and notice if the other agent’s accuracy is more or less than expected. If particular agents are more often correct than its model predicts, the system should recalibrate its estimates so that it won’t be predictably mistaken in a known direction.

In other words, I would want the brain to reason about brains in pretty much the same way it reasons about other things in the world. And in practice, I suspect that the way I think, and the way I’d advise people in the real world to think, works very much like that:

  • Try to spend most of your time thinking about the object level. If you’re spending more of your time thinking about your own reasoning ability and competence than you spend thinking about Japan’s interest rates and NGDP, or competing omega-6 vs. omega-3 metabolic pathways, you’re taking your eye off the ball.
  • Less than a majority of the time: Think about how reliable authorities seem to be and should be expected to be, and how reliable you are—using your own brain to think about the reliability and failure modes of brains, since that’s what you’ve got. Try to be evenhanded in how you evaluate your own brain’s specific failures versus the specific failures of other brains.2 While doing this, take your own meta-reasoning at face value.
  • … and then next, theoretically, should come the meta-meta level, considered yet more rarely. But I don’t think it’s necessary to develop special skills for meta-meta reasoning. You just apply the skills you already learned on the meta level to correct your own brain, and go on applying them while you happen to be meta-reasoning about who should be trusted, about degrees of reliability, and so on. Anything you’ve already learned about reasoning should automatically be applied to how you reason about meta-reasoning.3
  • Consider whether someone else might be a better meta-reasoner than you, and hence that it might not be wise to take your own meta-reasoning at face value when disagreeing with them, if you have been given strong local evidence to this effect.

That probably sounded terribly abstract, but in practice it means that everything plays out in what I’d consider to be the obvious intuitive fashion.

i.

Once upon a time, my colleague Anna Salamon and I had a disagreement. I thought—this sounds really stupid in retrospect, but keep in mind that this was without benefit of hindsight—I thought that the best way to teach people about detaching from sunk costs was to write a script for local Less Wrong meetup leaders to carry out exercises, thus enabling all such meetups to be taught how to avoid sunk costs. We spent a couple of months trying to write this sunk costs unit, though a lot of that was (as I conceived of it) an up-front cost to figure out the basics of how a unit should work at all.

Anna was against this. Anna thought we should not try to carefully write a unit. Anna thought we should just find some volunteers and improvise a sunk costs teaching session and see what happened.

I explained that I wasn’t starting out with the hypothesis that you could successfully teach anti-sunk-cost reasoning by improvisation, and therefore I didn’t think I’d learn much from observing the improvised version fail. This may sound less stupid if you consider that I was accustomed to writing many things, most of which never worked or accomplished anything, and a very few of which people paid attention to and mentioned later, and that it had taken me years of writing practice to get even that far. And so, to me, negative examples seemed too common to be valuable. The literature was full of failed attempts to correct for cognitive biases—would one more example of that really help?

I tried to carefully craft a sunk costs unit that would rise above the standard level (which was failure), so that we would actually learn something when we ran it (I reasoned). I also didn’t think up-front that it would be two months to craft; the completion time just kept extending gradually—beware the planning fallacy!—and then at some point we figured we had to run what we had.

As read by one of the more experienced meetup leaders, the script did not work. It was, by my standards, a miserable failure.

Here are three lessons I learned from that experiment.

The first lesson is to not carefully craft anything that it was possible to literally just improvise and test immediately in its improvised version, ever. Even if the minimum improvisable product won’t be representative of the real version. Even if you already expect the current version to fail. You don’t know what you’ll learn from trying the improvised version.4

The second lesson was that my model of teaching rationality by producing units for consumption at meetups wasn’t going to work, and we’d need to go with Anna’s approach of training teachers who could fail on more rapid cycles, and running centralized workshops using those teachers.

The third thing I learned was to avoid disagreeing with Anna Salamon in cases where we would have common knowledge of the disagreement.

What I learned wasn’t quite as simple as, “Anna is often right.” Eliezer is also often right.

What I learned wasn’t as simple as, “When Anna and Eliezer disagree, Anna is more likely to be right.” We’ve had a lot of first-order disagreements and I haven’t particularly been tracking whose first-order guesses are right more often.

But the case above wasn’t a first-order disagreement. I had presented my reasons, and Anna had understood and internalized them and given her advice, and then I had guessed that in a situation like this I was more likely to be right. So what I learned is, “Anna is sometimes right even when my usual meta-reasoning heuristics say otherwise,” which was the real surprise and the first point at which something like an extra push toward agreement is additionally necessary.

It doesn’t particularly surprise me if a physicist knows more about photons than I do; that’s a case in which my usual meta-reasoning already predicts the physicist will do better, and I don’t need any additional nudge to correct it. What I learned from that significant multi-month example was that my meta-rationality—my ability to judge which of two people is thinking more clearly and better integrating the evidence in a given context—was not particularly better than Anna’s meta-rationality. And that meant the conditions for something like Cowen and Hanson’s extension of Aumann’s agreement theorem were actually being fulfilled. Not pretend ought-to-be fulfilled, but actually fulfilled.

Could adopting modest epistemology in general have helped me get the right answer in this case? The versions of modest epistemology I hear about usually involve deference to the majority view, to the academic mainstream, or to publicly recognized elite opinion. Anna wasn’t a majority; there were two of us, and nobody else in particular was party to the argument. Neither of us were part of a mainstream. And at the point in time where Anna and I had that disagreement, any outsider would have thought that Eliezer Yudkowsky had the more impressive track record at teaching rationality. Anna wasn’t yet heading CFAR. Any advice to follow track records, to trust externally observable eliteness in order to avoid the temptation to overconfidence, would have favored listening to Yudkowsky over Salamon—that’s part of the reason I trusted myself over her in the first place! And then I was wrong anyway, because in real life that is allowed to happen even when one person has more externally observable status than another.

Whereupon I began to hesitate to disagree with Anna, and hesitate even more if she had heard out my reasons and yet still disagreed with me.

I extend a similar courtesy to Nick Bostrom, who recognized the importance of AI alignment three years before I did (as I discovered afterwards, reading through one of his papers). Once upon a time I thought Nick Bostrom couldn’t possibly get anything done in academia, and that he was staying in academia for bad reasons. After I saw Nick Bostrom successfully found his own research institute doing interesting things, I concluded that I was wrong to think Bostrom should leave academia—and also meta-wrong to have been so confident while disagreeing with Nick Bostrom. I still think that oracle AI (limiting AI systems to only answer questions) isn’t a particularly useful concept to study in AI alignment, but every now and then I dust off the idea and check to see how much sense oracles currently make to me, because Nick Bostrom thinks they might be important even after knowing that I’m more skeptical.

There are people who think we all ought to behave this way toward each other as a matter of course. They reason:

a)  on average, we can’t all be more meta-rational than average; and

b)  you can’t trust the reasoning you use to think you’re more meta-rational than average. After all, due to Dunning-Kruger, a young-Earth creationist will also think they have plausible reasoning for why they’re more meta-rational than average.

… Whereas it seems to me that if I lived in a world where the average person on the street corner were Anna Salamon or Nick Bostrom, the world would look extremely different from how it actually does.

… And from the fact that you’re reading this at all, I expect that if the average person on the street corner were you, the world would again look extremely different from how it actually does.

(In the event that this book is ever read by more than 30% of Earth’s population, I withdraw the above claim.)

ii.

I once poked at someone who seemed to be arguing for a view in line with modest epistemology, nagging them to try to formalize their epistemology. They suggested that we all treat ourselves as having a black box receiver (our brain) which produces a signal (opinions), and treat other people as having other black boxes producing other signals. And we all received our black boxes at random—from an anthropic perspective of some kind, where we think we have an equal chance of being any observer. So we can’t start out by believing that our signal is likely to be more accurate than average.

But I don’t think of myself as having started out with the a priori assumption that I have a better black box. I learned about processes for producing good judgments, like Bayes’s Rule, and this let me observe when other people violated Bayes’s Rule, and try to keep to it myself. Or I read about sunk cost effects, and developed techniques for avoiding sunk costs so I can abandon bad beliefs faster. After having made observations about people’s real-world performance and invested a lot of time and effort into getting better, I expect some degree of outperformance relative to people who haven’t made similar investments.

To which the modest reply is: “Oh, but any crackpot could say that their personal epistemology is better because it’s based on a bunch of stuff that they think is cool. What makes you different?”

Or as someone advocating what I took to be modesty recently said to me, after I explained why I thought it was sometimes okay to give yourself the discretion to disagree with mainstream expertise when the mainstream seems to be screwing up, in exactly the following words: “But then what do you say to the Republican?”

Or as Ozy Brennan puts it, in dialogue form:

Becoming Sane Side: “Hey! Guys! I found out how to take over the world using only the power of my mind and a toothpick.”

Harm Reduction Side: “You can’t do that. Nobody’s done that before.”

Becoming Sane Side: “Of course they didn’t, they were completely irrational.”

Harm Reduction Side: “But they thought they were rational, too.”

Becoming Sane Side: “The difference is that I’m right.”

Harm Reduction Side: “They thought that, too!”

This question, “But what if a crackpot said the same thing?”, I’ve never heard formalized—though it seems clearly central to the modest paradigm.

My first and primary reply is that there is a saying among programmers: “There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code.”

This is known as Flon’s Law.

The lesson of Flon’s Law is that there is no point in trying to invent a programming language which can coerce programmers into writing code you approve of, because that is impossible.

The deeper message of Flon’s Law is that this kind of defensive, adversarial, lock-down-all-the-doors, block-the-idiots-at-all-costs thinking doesn’t lead to the invention of good programming languages. And I would say much the same about epistemology for humans.

Probability theory and decision theory shouldn’t deliver clearly wrong answers. Machine-specified epistemology shouldn’t mislead an AI reasoner. But if we’re just dealing with verbal injunctions for humans, where there are degrees of freedom, then there is nothing we can say that a hypothetical crackpot could not somehow misuse. Trying to defend against that hypothetical crackpot will not lead us to devise a good system of thought.

But again, let’s talk formal epistemology.

So far as probability theory goes, a good Bayesian ought to condition on all of the available evidence. E. T. Jaynes lists this as a major desideratum of good epistemology—that if we know A, B, and C, we ought not to decide to condition only on A and C because we don’t like where B is pointing. If you’re trying to estimate the accuracy of your epistemology, and you know what Bayes’s Rule is, then—on naive, straightforward, traditional Bayesian epistemology—you ought to condition on both of these facts, and estimate P(accuracy|know_Bayes) instead of P(accuracy). Doing anything other than that opens the door to a host of paradoxes.

The convergence that perfect Bayesians exhibit on factual questions doesn’t involve anyone straying, even for a moment, from their individual best estimate of the truth. The idea isn’t that good Bayesians try to make their beliefs more closely resemble their political rivals’ so that their rivals will reciprocate, and it isn’t that they toss out information about their own rationality. Aumann agreement happens incidentally, without any deliberate push toward consensus, through each individual’s single-minded attempt to reason from their own priors to the hypotheses that best match their own observations (which happen to include observations about other perfect Bayesian reasoners’ beliefs).

Modest epistemology seems to me to be taking the experiments on the outside view showing that typical holiday shoppers are better off focusing on their past track record than trying to model the future in detail, and combining that with the Dunning-Kruger effect, to argue that we ought to throw away most of the details in our self-observation. At its epistemological core, modesty says that we should abstract up to a particular very general self-observation, condition on it, and then not condition on anything else because that would be inside-viewing. An observation like, “I’m familiar with the cognitive science literature discussing which debiasing techniques work well in practice, I’ve spent time on calibration and visualization exercises to address biases like base rate neglect, and my experience suggests that they’ve helped,” is to be generalized up to, “I use an epistemology which I think is good.” I am then to ask myself what average performance I would expect from an agent, conditioning only on the fact that the agent is using an epistemology that they think is good, and not conditioning on that agent using Bayesian epistemology or debiasing techniques or experimental protocol or mathematical reasoning or anything in particular.

Only in this way can we force Republicans to agree with us… or something. (Even though, of course, anyone who wants to shoot off their own foot will actually just reject the whole modest framework, so we’re not actually helping anyone who wants to go astray.)

Whereupon I want to shrug my hands helplessly and say, “But given that this isn’t normative probability theory and I haven’t seen modesty advocates appear to get any particular outperformance out of their modesty, why go there?”

I think that’s my true rejection, in the following sense: If I saw a sensible formal epistemology underlying modesty and I saw people who advocated modesty going on to outperform myself and others, accomplishing great deeds through the strength of their diffidence, then, indeed, I would start paying very serious attention to modesty.

That said, let me go on beyond my true rejection and try to construct something of a reductio. Two reductios, actually.

The first reductio is just, as I asked the person who proposed the signal-receiver epistemology: “Okay, so why don’t you believe in God like a majority of people’s signal receivers tell them to do?”

“No,” he replied. “Just no.”

“What?” I said. “You’re allowed to say ‘just no’? Why can’t I say ‘just no’ about collapse interpretations of quantum mechanics, then?”

This is a serious question for modest epistemology! It seems to me that on the signal-receiver interpretation you have to believe in God. Yes, different people believe in different Gods, and you could claim that there’s a majority disbelief in every particular God. But then you could as easily disbelieve in quantum mechanics because (you claim) there isn’t a majority of physicists that backs any particular interpretation. You could disbelieve in the whole edifice of modern physics because no exactly specified version of that physics is agreed on by a majority of physicists, or for that matter, by a majority of people on Earth. If the signal-receiver argument doesn’t imply that we ought to average our beliefs together with the theists and all arrive at an 80% probability that God exists, or whatever the planetary average is, then I have no idea how the epistemological mechanics are supposed to work. If you’re allowed to say “just no” to God, then there’s clearly some level—object level, meta level, meta-meta level—where you are licensed to take your own reasoning at face value, despite a majority of other receivers getting a different signal.

But if we say “just no” to anything, even God, then we’re no longer modest. We are faced with the nightmare scenario of having granted ourselves discretion about when to disagree with other people, a discretionary process where we take our own reasoning at face value. (Even if a majority of others disagree about this being a good time to take our own beliefs at face value, telling us that reasoning about the incredibly deep questions of religion is surely the worst of all times to trust ourselves and our pride.) And then what do you say to the Republican?

And if you give people the license to decide that they ought to defer, e.g., only to a majority of members of the National Academy of Sciences, who mostly don’t believe in God; then surely the analogous license is for theists to defer to the true experts on the subject, their favorite priesthood.

The second reductio is to ask yourself whether a superintelligent AI system ought to soberly condition on the fact that, in the world so far, many agents (humans in psychiatric wards) have believed themselves to be much more intelligent than a human, and they have all been wrong.

Sure, the superintelligence thinks that it remembers a uniquely detailed history of having been built by software engineers and raised on training data. But if you ask any other random agent that thinks it’s a superintelligence, that agent will just tell you that it remembers a unique history of being chosen by God. Each other agent that believes itself to be a superintelligence will forcefully reject any analogy to the other humans in psychiatric hospitals, so clearly “I forcefully reject an analogy with agents who wrongly believe themselves to be superintelligences” is not sufficient justification to conclude that one really is a superintelligence. Perhaps the superintelligence will plead that its internal experiences, despite the extremely abstract and high-level point of similarity, are really extremely dissimilar in the details from those of the patient in the psychiatric hospital. But of course, if you ask them, the psychiatric patient could just say the same thing, right?

I mean, the psychiatric patient wouldn’t say that, the same way that a crackpot wouldn’t actually give a long explanation of why they’re allowed to use the inside view. But they could, and according to modesty, That’s Terrible.

iii.

To generalize, suppose we take the following rule seriously as epistemology, terming it Rule M for Modesty:

Rule M: Let X be a very high-level generalization of a belief subsuming specific beliefs X1, X2, X3.… For example, X could be “I have an above-average epistemology,” X1 could be “I have faith in the Bible, and that’s the best epistemology,” X2 could be “I have faith in the words of Mohammed, and that’s the best epistemology,” and X3 could be “I believe in Bayes’s Rule, because of the Dutch Book argument.” Suppose that all people who believe in any Xi, taken as an entire class X, have an average level F of fallibility. Suppose also that most people who believe some Xi also believe that their Xi is not similar to the rest of X, and that they are not like most other people who believe some X, and that they are less fallible than the average in X. Then when you are assessing your own expected level of fallibility you should condition only on being in X, and compute your expected fallibility as F. You should not attempt to condition on being in X3 or ask yourself about the average fallibility you expect from people in X3.

Then the first machine superintelligence should conclude that it is in fact a patient in a psychiatric hospital. And you should believe, with a probability of around 33%, that you are currently asleep.

Many people, while dreaming, are not aware that they are dreaming. Many people, while dreaming, may believe at some point that they have woken up, while still being asleep. Clearly there can be no license from “I think I’m awake” to the conclusion that you actually are awake, since a dreaming person could just dream the same thing.

Let Y be the state of not thinking that you are dreaming. Then Y1 is the state of a dreaming person who thinks this, and Y2 is the state of actually being awake. It boots nothing, on Rule M, to say that Y2 is introspectively distinguishable from Y1 or that the inner experiences of people in Y2 are actually quite different from those of people in Y1. Since people in Y1 usually falsely believe that they’re in Y2, you ought to just condition on being in Y, not condition on being in Y2. Therefore you should assign a 67% probability to currently being awake, since 67% of observer-moments who believe they’re awake are actually awake.

Which is why—in the distant past, when I was arguing against the modesty position for the first time—I said: “Those who dream do not know they dream, but when you are awake, you know you are awake.” The modest haven’t formalized their epistemology very much, so it would take me some years past this point to write down the Rule M that I thought was at the heart of the modesty argument, and say that “But you know you’re awake” was meant to be a reductio of Rule M in particular, and why. Reasoning under uncertainty and in a biased and error-prone way, still we can say that the probability we’re awake isn’t just a function of how many awake versus sleeping people there are in the world; and the rules of reasoning that let us update on Bayesian evidence that we’re awake can serve that purpose equally well whether or not dreamers can profit from using the same rules. If a rock wouldn’t be able to use Bayesian inference to learn that it is a rock, still I can use Bayesian inference to learn that I’m not.

  1. See Cowen and Hanson, “Are Disagreements Honest?” 
  2. This doesn’t mean the net estimate of who’s wrong comes out 50-50. It means that if you rationalized last Tuesday then you expect yourself to rationalize this Tuesday, if you would expect the same thing of someone else after seeing the same evidence. 
  3. And then the recursion stops here, first because we already went in a loop, and second because in practice nothing novel happens after the third level of any infinite recursion. 
  4. Chapter 22 of my Harry Potter fanfiction, Harry Potter and the Methods of Rationality, was written after I learned this lesson.