The thesis that needs to be contrasted with modesty is not the assertion that everyone can beat their civilization all the time. It’s not that we should be the sort of person who sees the world as mad and pursues the strategy of believing a hot stock tip and investing everything.
It’s just that it’s okay to reason about the particulars of where civilization might be inadequate, okay to end up believing that you can state a better monetary policy than the Bank of Japan is implementing, okay to check that against observation whenever you get the chance, and okay to update on the results in either direction. It’s okay to act on a model of what you think the rest of the world is good at, and for this model to be sensitive to the specifics of different cases.
Why might this not be okay?
It could be that “acting on a model” is suspect, at least when it comes to complicated macrophenomena. Consider Isaiah Berlin’s distinction between “hedgehogs” (who rely more on theories, models, global beliefs) and “foxes” (who rely more on data, observations, local beliefs). Many people I know see the fox’s mindset as more admirable than the hedgehog’s, on the basis that it has greater immunity to fantasy and dogmatism. And Philip Tetlock’s research has shown that political experts who rely heavily on simple overarching theories—the kind of people who use the word “moreover” more often than “however”—perform substantially worse on average in forecasting tasks.1
Or perhaps the suspect part is when models are “sensitive to the specifics of different cases.” In a 2002 study, Buehler, Griffin, and Ross asked a group of experimental subjects to provide lots of details about their Christmas shopping plans: where, when, and how. On average, this experimental group expected to finish shopping more than a week before Christmas. Another group was simply asked when they expected to finish their Christmas shopping, with an average response of 4 days. Both groups finished an average of 3 days before Christmas. Similarly, students who expected to finish their assignments 10 days before deadline actually finished one day before deadline; and when asked when they had previously completed similar tasks, replied, “one day before deadline.” This suggests that taking the outside view is an effective response to the planning fallacy: rather than trying to predict how many hiccups and delays your plans will run into by reflecting in detail on each plan’s particulars (the “inside view”), you can do better by just guessing that your future plans will work out roughly as well as your past plans.
As stated, these can be perfectly good debiasing measures. I worry, however, that many people end up misusing and overapplying the “outside view” concept very soon after they learn about it, and that a lot of people tie too much of their mental conception of what good reasoning looks like to the stereotype of the humble empiricist fox. I recently noticed this as a common thread running through three conversations I had.
I am not able to recount these conversations in a way that does justice to the people I spoke to, so please treat my recounting as an unfair and biased illustration of relevant ideas, rather than as a neutral recitation of the facts. My goal is to illustrate the kinds of reasoning patterns I think are causing epistemic harm: to point to some canaries in the coal mine, and to be clear that when I talk about modesty I’m not just talking about Hal Finney’s majoritarianism or the explicit belief in civilizational adequacy.
Conversation 1 was about the importance of writing code to test AI ideas. I suggested that when people tried writing code to test an idea I considered important, I wanted to see the code in advance of the experiment, or without being told the result, to see if I could predict the outcome correctly.
I got pushback against this, which surprised me; so I replied that my having a chance to make advance experimental predictions was important, for two reasons.
First, I thought it was important to develop a skill and methodology of predicting “these sorts of things” in advance, because past a certain level of development when working with smarter-than-human AI, if you can’t see the bullets coming in advance of the experiment, the experiment kills you. This being the case, I needed to test this skill as much as possible, which meant trying to make experimental predictions in advance so I could put myself on trial.
Second, if I could predict the results correctly, it meant that the experiments weren’t saying anything I hadn’t figured out through past experience and theorizing. I was worried that somebody might take a result I considered an obvious prediction under my current views and say that it was evidence against my theory or methodology, since both often get misunderstood.2 If you want to use experiment to show that a certain theory or methodology fails, you need to give advocates of the theory/methodology a chance to say beforehand what they think they predict, so the prediction is on the record and neither side can move the goalposts.
And I still got pushback, from a MIRI supporter with a strong technical background; so I conversed further.
I now suspect that—at least this is what I think was going on—their mental contrast between empiricism and theoreticism was so strong that they thought it was unsafe to have a theory at all. That having a theory made you a bad hedgehog with one big idea instead of a good fox who has lots of little observations. That the dichotomy was between making an advance prediction instead of doing the experiment, versus doing the experiment without any advance prediction. Like, I suspect that every time I talked about “making a prediction” they heard “making a prediction instead of doing an experiment” or “clinging to what you predict will happen and ignoring the experiment.”
I can see how this kind of outlook would develop. The policy of making predictions to test your understanding, to put it on trial, presupposes that you can execute the “quickly say oops and abandon your old belief” technique, so that you can employ it if the prediction turns out to be wrong. To the extent that “quickly say oops and abandon your old belief” is something the vast majority of people fail at, maybe on an individual level it’s better for people to try to be pure foxes and only collect observations and try not to have any big theories. Maybe the average cognitive use case is that if you have a big theory and observation contradicts it, you will find some way to keep the big theory and thereby doom yourself. (The “Mistakes Were Made, But Not By Me” effect.)
But from my perspective, there’s no choice. You just have to master “say oops” so that you can have theories and make experimental predictions. Even on a strictly empiricist level, if you aren’t allowed to have models and you don’t make your predictions in advance, you learn less. An empiricist of that sort can only learn surface generalizations about whether this phenomenon superficially “looks like” that phenomenon, rather than building causal models and putting them on trial.
Conversation 2 was about a web application under development, and it went something like this.
ELIEZER: I predict users will not want to use this version.
FOUNDER 1: Well, from the things I’ve read about startups, it’s important to test as early as possible whether users like your product, and not to overengineer things.
ELIEZER: The concept of a “minimum viable product” isn’t the minimum product that compiles. It’s the least product that is the best tool in the world for some particular task or workflow. If you don’t have an MVP in that sense, of course the users won’t switch. So you don’t have a testable hypothesis. So you’re not really learning anything when the users don’t want to use your product.3
FOUNDER 1: No battle plan survives contact with reality. The important thing is just to get the product in front of users as quickly as possible, so you can see what they think. That’s why I’m disheartened that (group of users) did not want to use (early version of product).
ELIEZER: This reminds me of a conversation I had about AI twice in the last month. Two separate people were claiming that we would only learn things empirically by experimenting, and I said that in cases like that, I wanted to see the experiment description in advance so I could make advance predictions and put on trial my ability to foresee things without being hit over the head by them.
In both of those conversations I had a very hard time conveying the idea, “Just because I have a theory does not mean I have to be insensitive to evidence; the evidence tests the theory, potentially falsifies the theory, but for that to work you need to make experimental predictions in advance.” I think I could have told you in advance that (group of users) would not want to use (early version of product), because (group of users) is trying to accomplish (task 1) and this version of the product is not the best available tool they’ll have seen for doing (task 1).
I can’t convey it very well with all the details redacted, but the impression I got was that the message of “distrust theorizing” had become so strong that Founder 1 had stopped trying to model users in detail and thought it was futile to make an advance prediction. But if you can’t model users in detail, you can’t think in terms of workflows and tasks that users are trying to accomplish, or at what point you become visibly the best tool the user has ever encountered to accomplish some particular workflow (the minimum viable product). The alternative, from what I could see, was to think in terms of “features” and that as soon as possible you would show the product to the user and see if they wanted that subset of features.
There’s a version of this hypothesis which does make sense, which is that when you have the minimum compilable product that it is physically possible for a user to interact with, you can ask one of your friends to sit down in front of it, you can make a prediction about what parts they will dislike or find difficult, and then you can see if your prediction is correct. Maybe your product actually fails much earlier than you expect.
But this is not like getting early users to voluntarily adopt your product. This is about observing, as early as possible, how volunteers react to unviable versions of your product, so you know what needs fixing earliest or whether the exposed parts of your theory are holding up so far.
It really looks to me like the modest reactions to certain types of overconfidence or error are taken by many believers in modesty to mean, in practice, that theories just get you into trouble; that you can either make predictions or look at reality, but not both.
Conversation 3 was with Startup Founder 2, a member of the effective altruism community who was making Material Objects—I’ll call them “Snowshoes”—who had remarked that modern venture capital was only interested in 1000x returns and not 20x returns.
I asked why he wasn’t trying for 1000x returns with his current company selling Snowshoes—was that more annoyance/work than he wanted to undertake?
He replied that most companies in a related industry, Flippers, weren’t that large, and it seemed to him that based on the outside view, he shouldn’t expect his company to become larger than the average company in the Flippers industry. He asked if I was telling him to try being more confident.
I responded that, no, the thing I wanted him to think was orthogonal to modesty versus confidence. I observed that the customer use case for Flippers was actually quite different from Snowshoes, and asked him if he’d considered how many uses of Previous Snowshoes in the world would, in fact, benefit from being replaced by the more developed version of Snowshoes he was making.
He said that this seemed to him too much like optimism or fantasy, compared to asking what his company had to do next.
I had asked about how customers would benefit from new and improved Snowshoes because my background model says that startups are more likely to succeed if they provide real economic value—value of the kind that Danslist would provide over Craigslist if Danslist succeeded, and of the kind that Craigslist provides over newspaper classifieds. Getting people to actually buy your product, of course, is a separate question from whether it would provide real value of that kind. And there’s an obvious failure mode where you’re in love with your product and you overestimate the product’s value or underestimate the costs to the user. There’s an obvious failure mode where you just look at the real economic value and get all cheerful about that, without asking the further necessary question of how many decisionmakers will choose to use your product; or whether your marketing message is either opaque or easily faked; or whether any competitors will get there first if they see you being successful early on; or whether you could defend a price premium in the face of competition. But the question of real economic value seems to me to be one of the factors going into a startup’s odds of succeeding—Craigslist’s success is in part explained by the actual benefit buyers and sellers derive from the existence of Craigslist—and worth factoring out before discussing purchaser decisionmaking and value-capturing questions.4
It wasn’t that I was trying to get Founder 2 to be more optimistic (though I did think, given his Snowshoes product, that he ought to at least try to be more ambitious). It was that it looked to me like the outside view was shutting down his causal model of how and why people might use his product, and substituting, “Just try to build your Snowshoes and see what happens, and at best don’t expect to succeed more than the average company in a related industry.” But I don’t think you can get so far as even the average surviving company, unless you have a causal model (the dreaded inside view) of where your company is supposed to go and what resources are required to get there.
I was asking, “What level do you want to grow to? What needs to be done for your company to grow that much? What’s the obstacle to taking the next step?” And… I think it felt immodest to him to claim that his company could grow to a given level; so he thought only in terms of things he knew he could try, forward-chaining from where he was rather than backward-chaining from where he wanted to go, because that way he didn’t need to immodestly think about succeeding at a particular level, or endorse an inside view of a particular pathway.
I think the details of his business plan had the same outside-view problem. In the Flippers industry, two common versions of Flippers that were sold were Deluxe Flippers and Basic Flippers. Deluxe Flippers were basically preassembled Basic Flippers, and Deluxe Flippers sold for a much higher premium than Basic Flippers even though it was easy to assemble them.
We were talking about a potential variation of his Snowshoes, and he said that it would be too expensive to ship a Deluxe version, but not worth it to ship a Basic version, given the average premiums the outside view said these products could command.
I asked him why, in the Flippers industry, Deluxe sold for such a premium over Basic when it was so easy to assemble Basic into Deluxe. Why was this price premium being maintained?
He suggested that maybe people really valued the last little bit of convenience from buying Deluxe instead of Basic.
I suggested that in this large industry of slightly differentiated Flippers, maybe a lot of price-sensitive consumers bought only Basic versions, meaning that the few Deluxe buyers were price-insensitive. I then observed again that the best use case for his product was quite different from the standard use case in the Flipper industry, and that he didn’t have much direct competition. I suggested that, for his customers that weren’t otherwise customers in the Flippers industry, it wouldn’t make much of a difference to his pricing power whether he sold Deluxe or the much easier to ship Basic version.
And I remarked that it seemed to me unwise in general to look at a mysterious pricing premium, and assume that you could get that premium. You couldn’t just look at average Deluxe prices and assume you could get them. Generally speaking, this indicates some sort of rent or market barrier; and where there is a stream of rent, there will be walls built to exclude other people from drinking from the stream. Maybe the high Deluxe prices meant that Deluxe consumers were hard to market to, or very unlikely to switch providers. You couldn’t just take the outside view of what Deluxe products tended to sell like.
He replied that he didn’t think it was wise to say that you had to fully understand every part of the market before you could do anything; especially because, if you had to understand why Deluxe products sold at a premium, it would be so easy to just make up an explanation.
Again I understand where he was coming from, in terms of the average cognitive use case. When I try to explain a phenomenon, I’m also implicitly relying on my ability to use a technique like “don’t even start to rationalize,” which is a skill that I started practicing at age 15 and that took me a decade to hone to a reliable and productive form. I also used the “notice when you’re confused about something” technique to ask the question, and a number of other mental habits and techniques for explaining mysterious phenomena—for starters, “detecting goodness of fit” (see whether the explanation feels “forced”) and “try further critiquing the answer.” Maybe there’s no point in trying to explain why Deluxe products sell at a premium to Basic products, if you don’t already have a lot of cognitive technique for not coming up with terrible explanations for mysteries, along with enough economics background to know which things are important mysteries in the first place, which explanations are plausible, and so on.
But at the same time, it seems to me that there is a learnable skill here, one that entrepreneurs and venture capitalists at least have to learn if they want to succeed on purpose instead of by luck.
One needs to be able to identify mysterious pricing and sales phenomena, read enough economics to speak the right simplicity language for one’s hypotheses, and then not come up with terrible rationalizations. One needs to learn the key answers for how the challenged industry works, which means that one needs to have explicit hypotheses that one can test as early as possible.
Otherwise you’re… not quite doomed per se, but from the perspective of somebody like me, there will be ten of you with bad ideas for every one of you that happens to have a good idea. And the people that do have good ideas will not really understand what human problems they are addressing, what their potential users’ relevant motivations are, or what are their critical obstacles to success.
Given that analysis of ideas takes place on the level it does, I can understand why people would say that it’s futile to try to analyze ideas, or that teams rather than ideas are important. I’m not saying that either entrepreneurs or venture capitalists could, by an effort of will, suddenly become great at analyzing ideas. But it seems to me that the outside view concept, along with the Fox=Good/Hedgehog=Bad, Observation=Good/Theory=Bad messages—including the related misunderstanding of MVP as “just build something and show it to users”—are preventing people from even starting to develop those skills. At least, my observation is that some people go too far in their skepticism of model-building.5
Maybe there’s a valley of bad rationality here and the injunction to not try to have theories or causal models or preconceived predictions is protective against entering it. But first, if it came down to only those alternatives, I’d frankly rather see twenty aspiring rationalists fail painfully until one of them develops the required skills, rather than have nobody with those skills. And second, god damn it, there has to be a better way.
In situations that are drawn from a barrel of causally similar situations, where human optimism runs rampant and unforeseen troubles are common, the outside view beats the inside view. But in novel situations where causal mechanisms differ, the outside view fails—there may not be relevantly similar cases, or it may be ambiguous which similar-looking cases are the right ones to look at.
Where two sides disagree, this can lead to reference class tennis—both parties get stuck insisting that their own “outside view” is the correct one, based on diverging intuitions about what similarities are relevant. If it isn’t clear what the set of “similar historical cases” is, or what conclusions we should draw from those cases, then we’re forced to use an inside view—thinking about the causal process to distinguish relevant similarities from irrelevant ones.
You shouldn’t avoid outside-view-style reasoning in cases where it looks likely to work, like when planning your Christmas shopping. But in many contexts, the outside view simply can’t compete with a good theory.
Intellectual progress on the whole has usually been the process of moving from surface-level resemblances to more technical understandings of particulars. Extreme examples of this are common in science and engineering: the deep causal models of the world that allowed humans to plot the trajectory of the first moon rocket before launch, for example, or that allow us to verify that a computer chip will work before it’s ever manufactured.
Where items in a reference class differ causally in more ways than two Christmas shopping trips you’ve planned or two university essays you’ve written, or where there’s temptation to cherry-pick the reference class of things you consider “similar” to the phenomenon in question, or where the particular biases underlying the planning fallacy just aren’t a factor, you’re often better off doing the hard cognitive labor of building, testing, and acting on models of how phenomena actually work, even if those models are very rough and very uncertain, or admit of many exceptions and nuances. And, of course, during and after the construction of the model, you have to look at the data. You still need fox-style attention to detail—and you certainly need empiricism.
The idea isn’t, “Be a hedgehog, not a fox.” The idea is rather: developing accurate beliefs requires both observation of the data and the development of models and theories that can be tested by the data. In most cases, there’s no real alternative to sticking your neck out, even knowing that reality might surprise you and chop off your head.
- See Philip Tetlock, “Why Foxes Are Better Forecasters Than Hedgehogs.” ↩
- As an example, my conception of the reward hacking problem for reinforcement learning systems is that below certain capability thresholds, making the system smarter will often produce increasingly helpful behavior, assuming the rewards are a moderately good proxy for the actual objectives we want the system to achieve. The problem of the system exploiting loopholes and finding ways to maximize rewards in undesirable ways is mainly introduced when the system’s resourcefulness is great enough, and its policy search space large enough, that operators can’t foresee even in broad strokes what the reward-maximizing strategies are likely to look like. If this idea gets rounded off to just “making an RL system smarter will always reduce its alignment with the operator’s goal,” however, then a researcher will misconstrue what counts as evidence for or against prioritizing reward hacking research.
And there are many other cases where ideas in AI alignment tend to be misunderstood, largely because “AI” calls to mind present-day applications. It’s certainly possible to run useful experiments with present-day software to learn things about future AGI systems, but “see, this hill-climbing algorithm doesn’t exhibit the behavior you predicted for highly capable Bayesian reasoners” will usually reflect a misconception about what the concept of Bayesian reasoning is doing in AGI alignment theory. ↩
- I did not say this then, but I should have: Overengineering is when you try to make everything look pretty, or add additional cool features that you think the users will like… not when you try to put in the key core features that are necessary for your product to be the best tool the user has ever seen for at least one workflow. ↩
- And a startup founder definitely needs to ask that question and answer it before they go out and try to raise venture capital from investors who are looking for 1000x returns. Don’t discount your company’s case before it starts. They’ll do that for you. ↩
- As Tetlock puts it in a discussion of the limitations of the fox/hedgehog model in the book Superforecasting: “Models are supposed to simplify things, which is why even the best are flawed. But they’re necessary. Our minds are full of models. We couldn’t function without them. And we often function pretty well because some of our models are decent approximations of reality.” ↩