Sunday, 22 November 2020

My fictional influences

I’ve identified as a bookworm for a very long time. Throughout primary school and high school I read voraciously, primarily science fiction and fantasy. But given how much time I’ve spent reading fiction, it’s surprisingly difficult to pin down how it’s influenced me. (This was also tricky to do for nonfiction, actually - see my attempt in this post.)

Thinking back to the fiction I’ve enjoyed the most, two themes emerge: atmosphere, and cleverness. The atmosphere that really engages me in fiction is one that says: the world is huge; there’s so much to explore; and there’s a vastness of potential. But one that’s also a little melancholy - because you can’t possibly experience all of it, and time always flows onwards. I was particularly struck by the ending of The Lord of the Rings, when Frodo leaves all of Middle-Earth behind; by His Dark Materials, when Lyra gains, and loses, uncountable worlds; by the Malazan saga, occurring against a fictional backdrop of hundreds of thousands of years of epic history; and by Speaker for the Dead, as Ender skims through the millennia. Oh, and I can’t forget George R. R. Martin’s A Song for Lya, Neil Gaiman’s Ocean at the End of the Lane, and Paterson’s Bridge to Terabithia - none are the subtlest, but they're all exquisitely wistful. I’m compelled by the aesthetic: each of these a whole world that never was and will never be!

The other thing I love in fiction is cleverness: Xanatos Gambits and Magnificent Bastards and plans within plans that culminated in startling and brilliant ways. Ender’s Game is a great example; so too is The Lies of Locke Lamora. On the literary side, I loved Catch-22 for its cleverness in weaving together so many peculiar threads into a striking tapestry. Lately the novels which most scratch this itch have been online works, particularly Unsong, Worm, and A Practical Guide to Evil. Some sci-fi novels also fall in this category - I’m thinking particularly of Snow Crash, Accelerando, and Hyperion.

It’s hard to tell whether my fiction preferences shaped my worldview or vice versa, but I’d be surprised if all this reading weren’t at least partially responsible for me often thinking about the big picture for humanity, and personally aiming for ambitious goals. What’s more difficult is to point to specific things I gained from these books. I don’t identify with many fictional characters, and can't think of any personal conclusions that I've gained from depictions of them (perhaps apart from: communicate more!) I did read a lot of “big idea” books, but they were never that satisfying - fiction always seemed like an inefficient medium for communicating them.

But for some reason this has changed a bit over the last few years. I now find myself regularly thinking back to a handful of books as a way to remind myself of certain key ideas - in particular books that pair those ideas with compelling plots and characters. In no particular order:
  • Unsong is the work of fiction that most inspires me to be a better person; to do the things that “somebody has to and no one else will”.
  • Diaspora makes me reflect on the emptiness of pure ambition, and the arbitrariness of human preferences.
  • The Darkness That Comes Before pushes me to understand my mind and motivations - to illuminate “what comes before” my thoughts and actions.
  • Accelerando confronts me with the sheer scale of change that humanity might face.
  • Island and Walden Two underline the importance of social progress in building utopias.
  • Flowers for Algernon reminds me of the importance of emotional intelligence.
I wish I had a similar list of fiction which taught me important lessons about friendships and relationships, but for whatever reason I haven’t really found many fictional relationships particularly inspiring. I’m very curious about what would be on other people’s lists, though.

My intellectual influences

Prompted by a friend's question about my reading history, I've been thinking about what shaped the worldview I have today. This has been a productive exercise, which I recommend to others. Although I worry that some of what's written below is post-hoc confabulation, at the very least it's forced me to pin down what I think I learned from each of the sources listed, which I expect will help me track how my views change from here on. This blog post focuses on non-fiction books (and some other writing); I've also written a blog post on how fiction has influenced me.

My first strong intellectual influence was Eliezer Yudkowsky’s writings on Less Wrong (now collected in Rationality: from AI to Zombies). I still agree with many of his core claims, but don’t buy into the overarching narratives as much. In particular, the idea of “rationality” doesn’t play a big role in my worldview any more. Instead I focus on specific habits and tools for thinking well (as in Superforecasters), and creating communities with productive epistemic standards (a focus of less rationalist accounts of reason and science, e.g. The Enigma of Reason and The Structure of Scientific Revolutions).

Two other strong influences around that time were Scott Alexander’s writings on tribalism in politics, and Robin Hanson’s work on signalling (particularly Elephant in the Brain), both of which are now foundational to my worldview. Both are loosely grounded in evolutionary psychology, although not reliant on it. More generally, even if I’m suspicious of many individual claims from evolutionary psychology, the idea that humans are continuous with animals is central to my worldview (see Darwin’s Unfinished Symphony and Are We Smart Enough to Know How Smart Animals Are?). In particular, it has shaped my views on naturalistic ethics (via a variety of sources, with Wright’s The Moral Animal being perhaps the most central).

Another big worldview question is: how does the world actually change? At one point I bought into techno-economic determinism about history, based on reading big-picture books like Guns, Germs and Steel and The Silk Roads, and also because of my understanding of the history of science (e.g. the prevalence of multiple discovery). Sandel’s What Money Can’t Buy nudged me towards thinking more about cultural factors; so did books like The Dream Machine and The Idea Factory, which describe how many technologies I take for granted were constructed. And reading Bertrand Russell’s History of Western Philosophy made me start thinking about the large-scale patterns in intellectual history (on which The Modern Mind further shaped my views).

This paved the way for me to believe that there’s room to have a comparable influence on our current world. Here I owe a lot to Tyler Cowen’s The Great Stagnation (and to a lesser extent its sequels), Peter Thiel’s talks and essays (and to a lesser extent his book Zero to One), and Paul Graham’s essays. My new perspective is similar to the standard “Silicon Valley mindset”, but focusing more on the role of ideas than technologies. To repurpose the well-known quote: “Practical men who believe themselves to be quite exempt from any intellectual influence are usually the slaves of some defunct philosopher.”

Here’s a more complete list of nonfiction books which have influenced me, organised by topic (although I’ve undoubtedly missed some). I welcome recommendations, whether they’re books that fit in with the list below, or books that fill gaps in it!

On ethics:

  • The Righteous Mind

  • Technology and the Virtues

  • Reasons and Persons

  • What Money Can’t Buy

  • The Precipice

On human evolution:

  • The Enigma of Reason

  • The Human Advantage

  • Darwin’s Unfinished Symphony

  • The Secret of our Success

  • Human Evolution (Dunbar)

  • The Mating Mind

  • The Symbolic Species

On human minds and thought:

  • Rationality: from AI to Zombies

  • The Elephant in the Brain

  • How to Create a Mind

  • Why Buddhism is True

  • The Blank Slate

  • The Language Instinct

  • The Stuff of Thought

  • The Mind is Flat

  • Superforecasting

  • Thinking, Fast and Slow

On other sciences:

  • Scale: The Universal Laws of Life and Death in Organisms, Cities and Companies

  • Superintelligence

  • The Alignment Problem

  • Are We Smart Enough to Know How Smart Animals Are?

  • The Moral Animal

  • Ending Aging

  • Improbable Destinies

  • The Selfish Gene

  • The Blind Watchmaker

  • Complexity: The Emerging Science at the Edge of Order and Chaos

  • Quantum Computing Since Democritus

On science itself:

On philosophy:

  • A History of Western Philosophy

  • The Intentional Stance

  • From Bacteria to Bach and Back

  • Good and Real

  • The Big Picture

  • Consciousness and the Social Brain

  • An Enquiry Concerning Human Understanding

On history and economics:

  • The Shortest History of Europe

  • A Farewell to Alms

  • The Technology Trap

  • Iron, Steam and Money

  • The Enlightened Economy

  • The Commanding Heights

On politics and society:

On life, love, etc:

  • Deep Work

  • Man's Search for Meaning

  • More Than Two

  • Authentic Happiness

  • Happiness by Design

  • Written in History


  • Age of Em

  • Immortality: The Quest to Live Forever and How It Drives Civilization

  • Surely you’re Joking, Mr Feynman

  • Impro

  • Never Split the Difference

Saturday, 7 November 2020

Why philosophy of science?

During my last few years working as an AI researcher, I increasingly came to appreciate the distinction between what makes science successful and what makes scientists successful. Science works because it has distinct standards for what types of evidence it accepts, with empirical data strongly prioritised. But scientists spend a lot of their time following hunches which they may not even be able to articulate clearly, let alone in rigorous scientific terms - and throughout the history of science, this has often paid off. In other words, the types of evidence which are most useful in choosing which hypotheses to prioritise can differ greatly from the types of evidence which are typically associated with science. In particular, I’ll highlight two ways in which this happens.

First is scientists thinking in terms of concepts which fall outside the dominant paradigm of their science. That might be because those concepts are too broad, or too philosophical, or too interdisciplinary. For example, machine learning researchers are often inspired by analogies to evolution, or beliefs about human cognition, or issues in philosophy of language - which are all very hard to explore deeply in a conventional machine learning paper! Often such ideas are mentioned briefly in papers, perhaps in the motivation section - but there’s not the freedom to analyse them with the level of detail and rigour that is required for making progress on tricky conceptual questions.

Secondly, scientists often have strong visions for what their field could achieve, and long-term aspirations for their research. These ideas may make a big difference to what subfields or problems those researchers focus on. In the case of AI, some researchers aim to automate a wide range of tasks, or to understand intelligence, or to build safe AGI. Again, though, these aren’t ideas which the institutions and processes of the field of AI are able to thoroughly discuss and evaluate - instead, they are shared and developed primarily in informal ways.

Now, I’m not advocating for these ideas to be treated the same as existing scientific research - I think norms about empiricism are very important to science’s success. But the current situation is far from ideal. As one example, Rich Sutton’s essay on the bitter lesson in AI was published on his blog, and then sparked a fragmented discussion on other blogs and personal facebook walls. Yet in my opinion this argument about AI, which draws on his many decades of experience in the field, is one of the most crucial ideas for the field to understand and evaluate properly. So I think we need venues for such discussions to occur in parallel with the process of doing research that conforms to standard publication norms.

One key reason I’m currently doing a PhD in philosophy is because I hope that philosophy of science can provide one such venue for addressing important questions which can’t be explored very well within scientific fields themselves. To be clear, I’m not claiming that this is the main focus of philosophy of science - there are many philosophical research questions which, to me and most scientists, seem misguided or confused. But the remit of philosophy of science is broad enough to allow investigations of a wide range of issues, while also rewarding thorough and rigorous analysis. So I’m excited about the field’s potential to bring clarity and insight to the high-level questions scientists are most curious about, especially in AI. Even if this doesn’t allow us to resolve those questions directly, I think it will at least help to tease out different conceptual possibilities, and thereby make an important contribution to scientific - and human - progress.

Tuesday, 27 October 2020

What is past, and passing, and to come?

I've realised lately that I haven't posted much on my blog this year. Funnily enough, this coincides with 2020 being my most productive year so far. So in addition to belatedly putting up a few cross-posts from elsewhere, I thought it'd be useful to share here some of the bigger projects I've been working on which haven't featured elsewhere on this blog.

The most important is AGI safety from first principles (also available here as a PDF), my attempt to put together the most compelling case for why the development of artificial general intelligence might pose an existential threat to humanity. It's long (about 15,000 words) but I've tried to make it as accessible as possible to people without a machine learning background, because I think the topic is so critically important, and because there's an appalling lack of clear explanations of what might go wrong and why. Early work by Bostrom and Yudkowsky is less relevant in the context of modern machine learning; more recent work is scattered and brief. I originally intended to just summarise other people's arguments, but as the report grew, it became more representative of my own views and less representative of anyone else's. So while it covers the standard ideas, I also think that it provides a new perspective on how to think about AGI - one which doesn't take any previous claims for granted, but attempts to work them out from first principles.

A second big piece of work is Thiel on progress and stagnation, a 100-page compendium of quotes from Peter Thiel on - you guessed it - progress and stagnation in technology, and in society more generally. This was a joint project with Jeremy Nixon. We both find Thiel's views to be exciting and thought-provoking - but apart from his two books (which focused on different topics) they'd previously only been found scattered across the internet. Our goal was to select and arrange quotes from him to form a clear, compelling and readable presentation of his views. You can judge for yourself if we succeeded - although if you're pressed for time, there's a summary here.

Thirdly, I've put together the Effective Altruism archives reading list. This collates a lot of material from across the internet written by EAs on a range of relevant topics, much of which is otherwise difficult to find (especially older posts). The reading list is aimed at people who are familiar with EA but want to explore in more detail some of the ideas that have historically been influential within EA. These are often more niche or unusual than the material used to promote EA, and I don't endorse all of them - although I tried to only include high-quality content that I think is worth reading if you're interested in the corresponding topic.

Fourth is my first published paper, Avoiding Side Effects By Considering Future Tasks, which was accepted at NeurIPS 2020! Although note that my contributions were primarily on the engineering side; this is my coauthor Victoria's brainchild. From the abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. ... Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

Fifth, a series of posts on AI safety, exploring safety problems and solutions applicable to agents trained in open-ended environments, particularly multi-agent ones. Unlike most safety techniques, these don't rely on precise specifications - instead they involve "shaping" our agents to think in safer ways, and have safer motivations. Note that this is primarily speculative brainstorming; I'm not confident in any of them, although I'd be excited to see further exploration along these lines.

More generally, I've been posting a range of AI safety content on the Alignment Forum; I'm particularly happy about these three posts. And I've been asking questions I'm curious about on Less Wrong and the Effective Altruism Forum. Lastly, I've been very active on Twitter over the past couple of years; I haven't yet gotten around to collating my best tweets, but will do so eventually (and post them on this blog).

So that's what I've been up to so far this year. What's now brewing? I'm currently drafting my first piece of work for my PhD, on the links between biological fitness-maximisation and optimisation in machine learning. A second task is to revise the essay on Tinbergen's levels of explanation which I wrote for my Cambridge application - I think there are some important insights in there, but it needs a lot of work. I'm also writing a post tentatively entitled A philosopher's apology, explaining why I decided to get a PhD, what works very well about academia and academic philosophy, what's totally broken, and how I'm going to avoid (or fix) those problems. Lastly, I'm ruminating over some of the ideas discussed here, with the goal of (very slowly) producing a really comprehensive exploration of them. Thoughts or comments on any of these very welcome!

Zooming out, this year has featured what was probably the biggest shift of my life so far: the switch from my technical career as an engineer and AI researcher, to becoming a philosopher and general thinker-about-things. Of course this was a little butterfly-inducing at times. But increasingly I believe that what the world is missing most is novel and powerful ideas, so I'm really excited about being in a position where I can focus on producing them. So far I only have rough stories about how that happens, and what it looks like to make a big difference as a public intellectual - I hope to refine these over time to be able to really leverage my energies. Then onwards, and upwards!

Against strong bayesianism

In this post (cross-posted from Less Wrong) I want to lay out some intuitions about why bayesianism is not very useful as a conceptual framework for thinking either about AGI or human reasoning. This is not a critique of bayesian statistical methods; it’s instead aimed at the philosophical position that bayesianism defines an ideal of rationality which should inform our perspectives on less capable agents, also known as "strong bayesianism". As described here:

The Bayesian machinery is frequently used in statistics and machine learning, and some people in these fields believe it is very frequently the right tool for the job.  I’ll call this position “weak Bayesianism.”  There is a more extreme and more philosophical position, which I’ll call “strong Bayesianism,” that says that the Bayesian machinery is the single correct way to do not only statistics, but science and inductive inference in general – that it’s the “aspirin in willow bark” that makes science, and perhaps all speculative thought, work insofar as it does work.

Or another way of phrasing the position, from Eliezer:

You may not be able to compute the optimal [Bayesian] answer.  But whatever approximation you use, both its failures and successes will be explainable in terms of Bayesian probability theory.

First, let’s talk about Blockhead: Ned Block’s hypothetical AI that consists solely of a gigantic lookup table. Consider a version of Blockhead that comes pre-loaded with the optimal actions (according to a given utility function) for any sequence of inputs which takes less than a million years to observe. So for the next million years, Blockhead will act just like an ideal superintelligent agent. Suppose I argued that we should therefore study Blockhead in order to understand advanced AI better. Why is this clearly a bad idea? Well, one problem is that Blockhead is absurdly unrealistic; you could never get anywhere near implementing it i n real life. More importantly, even though Blockhead gets the right answer on all the inputs we give it, it’s not doing anything remotely like thinking or reasoning.

The general lesson here is that we should watch out for when a purported "idealised version" of some process is actually a different type of thing to the process itself. This is particularly true when the idealisation is unimaginably complex, because it might be hiding things in the parts which we can’t imagine. So let's think about what an ideal bayesian reasoner like a Solomonoff inductor actually does. To solve the grain of truth problem, the set of hypotheses it represents needs to include every possible way that the universe could be. We don't yet have any high-level language which can describe all these possibilities, so the only way to do so is listing all possible Turing machines. Then in order to update the probabilities in response to new evidence, it needs to know how that entire universe evolves up to the point where the new evidence is acquired.

In other words, an ideal bayesian is not thinking in any reasonable sense of the word - instead, it’s simulating every logically possible universe. By default, we should not expect to learn much about thinking based on analysing a different type of operation that just happens to look the same in the infinite limit. Similarly, the version of Blockhead I described above is basically an optimal tabular policy in reinforcement learning. In reinforcement learning, we’re interested in learning policies which process information about their surroundings - but the optimal tabular policy for any non-trivial environment is too large to ever be learned, and when run does not actually do any information-processing! Yet it's particularly effective as a red herring because we can do proofs about it, and because it can be calculated in some tiny environments.

You might argue that strong bayesianism is conceptually useful, and thereby helps real humans reason better. But I think that concepts from strong bayesianism are primarily useful because they have suggestive names, which make it hard to realise how much work our intuitions are doing to translate from ideal bayesianism to our actual lives. For more on what I mean by this, consider the following (fictional) dialogue:

Alice the (literal-minded) agnostic: I’ve heard about this bayesianism thing, and it makes sense that I should do statistics using bayesian tools, but is there any more to it?

Bob the bayesian: Well, obviously you can’t be exactly bayesian with finite compute. But the intuition that you should try to be more like an ideal bayesian is a useful one which will help you have better beliefs about the world. In fact, most of what we consider to be “good reasoning” is some sort of approximation to bayesianism.

A: So let me try to think more like an ideal bayesian for a while, then. Well, the first thing is - you’re telling me that a lot of the things I’ve already observed to be good reasoning are actually approximations to bayesianism, which means I should take bayesianism more seriously. But ideal bayesians don’t update on old evidence. So if I’m trying to be more like an ideal bayesian, I shouldn’t change my mind about how useful bayesianism is based on those past observations.

B: No, that’s silly. Of course you should. Ignoring old evidence only makes sense when you’ve already fully integrated all its consequences into your understanding of the world.

A: Oh, I definitely haven’t done that. But speaking of all the consequences - what if I’m in a simulation? Or an evil demon is deceiving me? Should I think about as many such skeptical hypotheses as I can, to be more like an ideal bayesian who considers every hypothesis?

B: Well, technically ideal bayesians consider every hypothesis, but only because they have infinite compute! In practice you shouldn’t bother with many far-fetched hypotheses, because that’s a waste of your limited time.*

A: But what if I have some evidence towards that hypothesis? For example, I just randomly thought of the hypothesis that the universe has exactly a googleplex atoms in it. But there's some chance that this thought was planted in my mind by a higher power to allow me to figure out the truth! I should update on that, right?

B: Look, in practice that type of evidence is not worth keeping track of. You need to use common sense to figure out when to actually make the effort of updating.

A: Hmm, alright. But when it comes to the hypotheses I do consider, they should each be an explicit description of the entire universe, right, like an ideal bayesian’s hypotheses?

B: No, that’s way too hard for a human to do.

A: Okay, so I’ll use incomplete hypotheses, and then assign probabilities to each of them. I guess I should calculate as many significant digits of my credences as possible, then, to get them closer to the perfectly precise real-valued credences that an ideal bayesian has?

B: Don’t bother. Imprecise credences are good enough except when you’re solving mathematically precise problems.

A: Speaking of mathematical precision, I know that my credences should never be 0 or 1. But when an ideal bayesian conditions on evidence they’ve received, they’re implicitly being certain about what that evidence is. So should I also be sure that I’ve received the evidence I think I have?

B: No-

A: Then since I’m skipping all these compute-intensive steps, I guess getting closer to an ideal bayesian means I also shouldn’t bother to test my hypotheses by making predictions about future events, right? Because an ideal bayesian gets no benefit from doing so - they can just make updates after they see the evidence.

B: Well, it’s different, because you’re biased. That’s why science works, because making predictions protects you from post-hoc rationalisation.

A: Fine then. So what does it actually mean to be more like an ideal bayesian?

B: Well, you should constantly be updating on new evidence. And it seems like thinking of degrees of belief as probabilities, and starting from base rates, are both helpful. And then sometimes people conditionalise wrong on simple tasks, so you need to remind them how to do so.

A: But these aren’t just bayesian ideas - frequentists are all about base rates! Same with “when the evidence changes, I change my mind” - that one’s obvious. Also, when people try to explicitly calculate probabilities, sometimes they’re way off.** What’s happening there?

B: Well, in complex real-world scenarios, you can’t trust your explicit reasoning. You have to fall back on intuitions like “Even though my inside view feels very solid, and I think my calculations account for all the relevant variables, there’s still a reasonable chance that all my models are wrong.”

A: So why do people advocate for the importance of bayesianism for thinking about complex issues if it only works in examples where all the variables are well-defined and have very simple relationships?

B: I think bayesianism has definitely made a substantial contribution to philosophy. It tells us what it even means to assign a probability to an event, and cuts through a lot of metaphysical bullshit.

Back to the authorial voice. Like Alice, I'm not familiar with any principled or coherent characterisation of what trying to apply bayesianism actually means. It may seem that Alice’s suggestions are deliberately obtuse, but I claim these are the sorts of ideas you’d consider if you seriously tried to consistently “become more bayesian”, rather than just using bayesianism to justify types of reasoning you endorse for other reasons.

I agree with Bob that the bayesian perspective is useful for thinking about the type signature of calculating a subjective probability: it’s a function from your prior beliefs and all your evidence to numerical credences, whose quality should be evaluated using a proper scoring rule. But for this insight, just like Bob’s insights about using base rates and updating frequently, we don’t need to make any reference to optimality proofs or the idealised limit of intelligence brute force search. In fact, doing so often provides an illusion of objectivity which is ultimately harmful. I do agree that most things people identify as tenets of bayesianism are useful for thinking about knowledge; but I claim that they would be just as useful, and better-justified, if we forced each one to stand or fall on its own.

* Abram Demski has posted about m
oving past bayesianism by accounting for logical uncertainty to a greater extent, but I think that arguments similar to the ones I’ve made above are also applicable to logical inductors (although I’m less confident about this).

** You can probably fill in your own favourite example of this. The one I was thinking about was a post where someone derived that the probability of extinction from AI was less than 1 in 10^200; but I couldn’t find it. 

The Future of Science

This is the transcript of a short talk I gave a few months ago, which contains a (fairly rudimentary) presentation of some ideas about the future of science that I've been mulling over for a while. I'm really hoping to develop them much further, since I think this is a particularly important and neglected area of inquiry. Cross-posted from Less Wrong; thanks to Jacob Lagerros and David Lambert for editing the transcript, and to various other people for asking very thought-provoking questions.

Today I'll be talking about the future of science. Even though this is an important topic (because science is very important) it hasn’t received the attention I think it deserves. One reason is that people tend to think, “Well, we’re going to build an AGI, and the AGI is going to do the science.” But this doesn’t really offer us much insight into what the future of science actually looks like.

It seems correct to assume that AGI is going to figure a lot of things out. I am interested in what these things are. What is the space of all the things we don’t currently understand? What knowledge is possible? These are ambitious questions. But I’ll try to come up with some framings that I think are interesting.

One way of framing the history of science is through individuals making observations and coming up with general principles to explain them. So in physics, you observe how things move and how they interact with each other. In biology, you observe living organisms, and so on. I'm going to call this “descriptive science”. More recently, however, we have developed a different type of science, which I'm going to call “generative science”. This basically involves studying the general principles behind things that don’t exist yet and still need to be built.

This is, I think, harder than descriptive science, because you don't actually have anything to study. You need to bootstrap your way into it. A good example of this is electric circuits. We can come up with fairly general principles for describing how they work. And eventually this led us to computer science, which is again very general. We have a very principled understanding of many aspects of computer science, which is a science of things that didn't exist before we started studying them. I would also contrast this to most types of engineering such as aerospace engineering. I don't think it's principled or general enough to put it in the same class as physics or biology and so on.

So what would it look like if we took all the existing sciences and made them more generative? For example, in biology, instead of saying, "Here are a bunch of living organisms, how do they work?" you would say, "What are all the different possible ways that you might build living organisms, or what is the space of possible organisms  and why did we end up in this particular part of the space on Earth?"

Even just from the perspective of understanding how organisms work, this seems really helpful. You understand things in contrast to other things. I don't think we're really going to fully understand how the organisms around us work until we understand why evolution didn't go down all these different paths. And for doing that it's very useful to build those other organisms.

You could do the same thing with physics. Rather than asking how our universe works, you could ask how an infinite number of other possible universes would work. It seems safe to assume that this would keep people busy for quite a long time.

Another direction that you could go in is asking how this would carry over to things we don’t currently think of as science. Take sociology, for example. Sociology is not very scientific right now. It's not very good, mostly speaking. But why? And how might it become more scientific in the future? One aspect of this is just that societies are very complicated, and they're composed of minds, which are also very complicated. There are also a lot of emergent effects of those minds interacting with each other, which makes it a total mess.

So one way of solving this is by having more intelligent scientists. Maybe humans just aren't very good at understanding systems where the base-level components are as intelligent as humans. Maybe you need to have a more intelligent agent studying the system in order to figure out the underlying principles by which it works.

But another aspect of sociology that makes it really hard to study, and less scientific, is that you can't generate societies to study. You have a hypothesis, but you can't generate a new society to test it. I think this is going to change over the coming decades. You are going to be able to generate systems of agents intelligent enough that they can do things like cultural evolution. And you will be able to study these generated societies as they form. So even a human-level scientist might be able to make a science out of sociology by generating lots of different environments and model societies.

The examples of this we've seen so far are super simple but actually quite interesting, like Axelrod's Prisoner's Dilemma tournament or Laland's Social Learning Tournament. There are a couple of things like that which led to really interesting conclusions, despite having really, really basic agents. So I'm excited to see what much more advanced work of this type could look like.


Ben: Thank you very much, Richard. That was fascinating. So you made this contrast between generative and more descriptive versions of science.

How much of that set was just a matter of whether or not feedback loops existed in these other spaces? Once we came up with microprocessors, suddenly we were able to build, research, and explore quite a lot of new, more advanced things using science.

And similarly with the sociology example, you mentioned something along the lines of,  "We'll potentially get to a place where we can actually just test a lot of these things and then a science will form around this measurement tool." In your opinion, is this a key element in being able to explore new sciences?

Richard: Yes. I think feedback loops are pretty useful. I'd say there's probably just a larger space of things in generative sciences. We have these computer architectures, right? So we can study them. But how do we know that the computer architectures couldn't have been totally different? This is not really a question that traditional sciences focus on that much.

Biologists aren't really spending much of their time asking, "But what if animals had been totally different? What are all the possible ways that you could design a circulatory system, and mitochondria, and things like that?” I think some interesting work is being done that does ask these questions, but it seems like, broadly speaking, there's just a much richer space to explore.

David: So when you started talking about generative versus descriptive, my initial thought was Schelling's “Micromotives and Macrobehavior” where basically the idea was, “Hey, even if you start with even these pretty basic things, you can figure out how discrimination happens even if people have very slight preferences.” There's a lot of things he did with that, but what strikes me about it is that it was done with very simple individual agents. Beyond that (unless you go all the way to purely rational actor agents, and even then you need lots and lots of caveats and assumptions), you don’t get much in terms of how economics works.

Even if you can simulate everybody, it doesn't give you much insight.  Is that a problem for your idea of how science develops?

Richard: So you're saying that if we can simulate everyone given really simple models of them, it still doesn't give us much insight?

David: Even when we have complex models of them, we can observe their behavior but we can't do much with it. We can't tell you much that's useful as a result of even pretty good models.

Richard: I would just say that our models are not very good, right? Broadly speaking, often in economics, it feels something like "we're going to reduce all human preferences to a single dimension but still try to study all the different ways that humans interact; all their friendships and various types of psychological workings and goals and so on".

You can collapse all of these things in different ways and then study them, but I don't think we've had models that are anywhere near the complexity of the phenomena that are actually relevant to people's behavior.

David: But even when they are predictive, even when you can actually replicate what it is that you see with humans, it doesn't seem like you get very much insight into the dynamics... other than saying, "Hey, look, this happens." And sometimes, your assumptions are actually wrong, yet you still recover correct behavior. So overall it didn't tell us very much other than, yes, you successfully replicated what happened.

Richard: Right. What it seems like to me is that there are lots of interesting phenomena that happen when you have systems of interacting agents in the world. People do a bunch of interesting things. So I think that if you have the ability to recreate that, then you’d have the ability to play around with it and just see in which cases this arises and which cases it doesn't arise.

Maybe the way I'd characterize it is something like: in our current models, sometimes they're good enough to recreate something that vaguely looks like this phenomenon, but then if you modify it you don't get other interesting phenomena. It's more that they break, I guess. So what would be interesting is the case where you have the ability to model agents that are sophisticated enough, that when you change the inputs away from recreating the behavior that we have observed in humans, you still get some other interesting behavior. Maybe the tit-for-tat agents are a good example of this, where the set-up is pretty simple, but even then you can come up with something that's fairly novel.

Owain: I think your talk was based on a really interesting premise. Namely that, if we do have AGI in the next 50 years, I think it's plausible that development will be fairly continuous; meaning that on the road to AGI we'll have very powerful, narrow AI that is going to be transformative for science. And I think now is a really good time to think about, in advance, how science could be transformed by this technology.

Maybe it is an opportunity similar to big science coming out of World War II, or the mathematization of lots of scientific fields in the 20th century that were informal before.

You brought up one plausible aspect of that: much better ability to run simulations. In particular, simulations of intelligent agents, which are very difficult to run at the moment.

But you could look at all the aspects of what we do in science and say “how much will narrow AI (that’s still much more advanced than today’s AI) actually help with that?” I think that even with simulations, there are going to be limits, due to its difficulty. Some things are just computationally intractable to simulate. AI's not going to change that. There are NP-hard problems even when simulating very simple physical systems.

And when you're doing economics or sociology, there are humans, rational agents. You can get better at simulating them. But humans interact with the physical world, right? We create technologies. We suffer natural disasters. We suffer from pandemics. And so, the intractability is going to bite when you're trying to simulate, say, human history or the future of a group of humans. Does that make sense? I am curious about your response.

Richard: I guess I don't have strong opinions about which bits will be intractable in particular. I think there's probably a lot of space for high-level concepts that we don't currently have. So maybe one way of thinking about this is game theory. Game theory is a pretty limited model in a lot of ways. But it still gives us many valuable concepts like  “defecting in a prisoner's dilemma”, and so on, that inform the way that we view complex systems, even though we don't really know exactly what the hypothesis we're evaluating is. Even just having that type of thing brought to our attention is sufficient to reframe the way that we see a lot of things.

So I guess the thing I'm most excited about is this expansion of concepts. This doesn't feel super intractable because it doesn't feel like you need to simulate anything in its full complexity in order to get the concepts that are going to be really useful going forward.