Tuesday, 19 January 2021

Deutsch and Yudkowsky on scientific explanation

Science aims to come up with good theories about the world - but what makes a theory good? The standard view is that the key traits are predictive accuracy and simplicity. Deutsch focuses instead on the concepts of explanation and understanding: a good theory is an explanation which enhances our understanding of the world. This is already a substantive claim, because various schools of instrumentalism have been fairly influential in the philosophy of science. I do think that this perspective has a lot of potential, and later in this essay explore some ways to extend it. First, though, I discuss a few of Deutsch's arguments which I don't think succeed, in particular when compared to the bayesian rationalist position defended by Yudkowsky.

To start, Deutsch says that good explanations are “hard to vary”, because every part of the explanation is playing a role. But this seems very similar to the standard criterion of simplicity. Deutsch rejects simplicity as a criterion because he claims that theories like “The gods did it” are simple. Yet I’m persuaded by Yudkowsky’s argument that a version of “The gods did it” theory which could actually predict a given set of data would essentially need to encode all that data, making it very complex. I’m not sold on Yudkowsky’s definition of simplicity in terms of Kolmogorov complexity (for reasons I’ll explain later on) but re-encoding a lot of data should give rise to a complex hypothesis by any reasonable definition. So it seems most parsimonious to interpret the “hard to vary” criterion as an implication of the simplicity criterion.

Secondly, Deutsch says that good explanations aren’t just predictive, but rather tell us about the underlying mechanisms which generate those predictions. As an illustration, he argues that even if we can predict the outcome of a magic trick, what we really want to know is how the trick works. But this argument doesn’t help very much in adjudicating between scientific theories - in practice, it’s often valuable to accept purely predictive theories as stepping-stones to more complete theories. For example, Newton’s inverse square law of gravity was a great theory despite not attempting to explain why gravity worked that way; instead it paved the way for future theories which did so (and which also made better predictions). If Deutsch is just arguing that eventually science should aim to identify all the relevant underlying mechanisms, then I think that most scientific realists would agree with him. The main exception would be in the context of foundational physics. Yet that’s a domain in which it’s very unclear what it means for an underlying mechanism to “really exist”; it’s so far removed from our everyday intuitions that Deutsch’s magician analogy doesn’t seem very applicable.

Thirdly, Deutsch says that we can understand the importance of testability in terms of the difference between good and bad explanations:

“The best explanations are the ones that are most constrained by existing knowledge – including other good explanations as well as other knowledge of the phenomena to be explained. That is why testable explanations that have passed stringent tests become extremely good explanations.”

But this doesn’t help us distinguish between explanations which have themselves been tested, versus explanations which were formulated afterwards to match the data from those same tests. Both are equally constrained by existing knowledge - why should we be more confident in the former? Without filling in this step of the argument, it’s hard to understand the central role of testability in science. I think, again, that Yudkowsky provides the best explanation: that the human tendency towards hindsight bias means we dramatically overestimate how well our theories explain observed data, unless we’re forced to make predictions in advance.

Having said all this, I do think that Deutsch’s perspective is valuable in other ways. I was particularly struck by his argument that the “theory of everything” which fundamental physicists search for would be less interesting than a high-level “theory of everything” which forges deep links between ideas from many disciplines (although I wish he’d say a bit more about what it means for a theory to be “deep”). This argument (along with the rest of Deutsch’s framework) pushes back against the longstanding bias in philosophy of science towards treating physics as the central example of science. In particular, thinking of theories as sets of equations is often appropriate for physics, but much less so for fields which are less formalism-based - i.e. almost all of them.[0] For example, the theory of evolution is one of the greatest scientific breakthroughs, and yet its key insights can’t be captured by a formal model. In Chapman’s terminology, evolution and most other theories are somewhat nebulous. This fits well with Deutsch’s focus on science as a means of understanding the world - because even though formalisms don’t deal well with nebulosity, our minds do.

Another implication of the nebulosity of scientific theories is that we should move beyond the true-false dichotomy when discussing them. Bayesian philosophy of science is based on our credences about how likely theories are to be true. But it’s almost never the case that high-level theories are totally true or totally false; they can explain our observations pretty well even if they don’t account for everything, or are built on somewhat leaky abstractions. And so assigning probabilities only to the two outcomes “true” and “false” seems simplistic. I still consider probabilistic thinking about science to be valuable, but I expect that thinking in terms of degrees of truth is just as valuable. And the latter comes naturally from thinking of theories as explanations, because we intuitively understand that the quality of explanations should be evaluated in a continuous rather than binary way.[1]

Lastly, Deutsch provides a good critique of philosophical positions which emphasise prediction over explanation. He asks us to imagine an “experiment oracle” which is able to tell us exactly what the outcome of any specified experiment would be:

“If we gave it the design of a spaceship, and the details of a proposed test flight, it could tell us how the spaceship would perform on such a flight. But it could not design the spaceship for us in the first place. And even if it predicted that the spaceship we had designed would explode on take-off, it could not tell us how to prevent such an explosion. That would still be for us to work out. And before we could work it out, before we could even begin to improve the design in any way, we should have to understand, among other things, how the spaceship was supposed to work. Only then would we have any chance of discovering what might cause an explosion on take-off. Prediction – even perfect, universal prediction – is simply no substitute for explanation.”

Although I assume it isn’t intended as such, this is a strong critique of Solomonoff induction, a framework which Yudkowsky defends as an idealised model for how to reason. The problem is that the types of hypotheses considered by Solomonoff induction are not explanations, but rather computer programs which output predictions. This means that even a hypothesis which is assigned very high credence by Solomonoff induction might be nearly as incomprehensible as the world itself, or more so - for example, if it merely consists of a simulation of our world. So I agree with Deutsch: even idealised Solomonoff induction (with infinite compute) would lack some crucial properties of explanatory science.[2]

Extending the view of science as explanation

How could Deutsch’s identification of the role of science as producing human-comprehensible explanations actually improve science in practice? One way is by making use of the social science literature on explanations. Miller identifies four overarching lessons:
  1. Explanations are contrastive — they are sought in response to particular counterfactual cases.
  2. Explanations are selected (in a biased manner) - humans are adept at selecting one or two causes from a sometimes infinite number of causes to be the explanation.
  3. Referring to probabilities or statistical relationships in explanation is not as effective as referring to causes.
  4. Explanations are social — they are a transfer of knowledge, presented as part of a conversation or interaction, and are thus presented relative to the explainer’s beliefs about the explainee’s beliefs.
We can apply some of these lessons to improve scientific explanations. Consider that scientific theories are usually formulated in terms of existing phenomena. But to formulate properly contrastive explanations, science will need to refer to counterfactuals. For example, in order to fully explain the anatomy of an animal species, we’ll need to understand other possible anatomical structures, and the reasons why those didn’t evolve instead. Geoffrey West’s work on scaling laws in biology provides a good example of this type of explanation. Similarly, we shouldn’t think of fundamental physics as complete until we understand not only how our universe works, but also which counterfactual laws of physics could have generated other universes as interesting as ours.

A second way we can try to use Deutsch’s framework to improve science: what does it mean for a human to understand an explanation? Can we use findings from cognitive science, psychology or neuroscience to make suggestions for the types of theories scientists work towards? This seems rather difficult, but I’m optimistic that there’s some progress to be made. For example, analogies and metaphors play an extensive role in everyday human cognition, as highlighted by Lakoff’s Metaphors we live by. So instead of thinking about analogies as useful ways to communicate a scientific theory, perhaps we should consider them (in some cases) to be a core part of the theory itself. Focusing on analogies may slightly reduce those theories’ predictive power (because it’s hard to cash out analogies in terms of predictions) while nevertheless increasing the extent to which they allow us to actually understand the world. I’m reminded of the elaborate comparison between self-reference in mathematics and self-replication in biology drawn by Hofstadter in Godel, Escher, Bach - if we prioritise a vision of science as understanding, then this sort of work should be much more common. However, the human tendency towards hindsight bias is a formidable opponent, and so we should always demand that such theories also provide novel predictions, in order to prevent ourselves from generating an illusion of understanding.


[0]. As an example of this bias, see the first two perspectives on scientific theories discussed here; my position is closest to the third, the pragmatic view.
[1]. Work on logical induction and embedded agency may partly address this issue; I’m not sure.
[2]. I was originally planning to go on to discuss Deutsch’s broader critiques of empiricism and induction. But Deutsch makes it hard to do this, because he doesn’t refer very much to the philosophical literature, or specific people whose views he disagrees with. It seems to me that this leads to a lot of linguistic disagreements. For example, when he critiques the idea of knowledge being “derived” from experience, or scientific theories being “justified” by empirical experience, I feel like he’s using definitions of these terms which diverge both from what most people take them to mean, and also from what most philosophers take them to mean. Nor do I think that his characterisation of observation as theory-laden is inconsistent with standard inductivism; he seems to think it is, but doesn’t provide evidence for that. So I’ve decided not to go deeper on these issues, except to note my skepticism about his position.

Friday, 15 January 2021

Meditations on faith

A few months before his death, Leonard Cohen, the great lyricist of modern spirituality, sang to God:

Magnified, sanctified

Be the holy name

Vilified, crucified

In the human frame

A million candles burning

For the help that never came

You want it darker


You're lining up the prisoners

And the guards are taking aim

I struggled with some demons

They were middle class and tame

I didn't know I had permission

To murder and to maim

You want it darker


Hineni, hineni

I'm ready, My Lord

The first lines are a reference to the Mourner’s Kaddish, a Jewish prayer for the deceased. The million candles - each one in remembrance of a life lost - reminds us of tragedies upon preventable tragedies. So too with the prisoners, the guards, the murders: if these are part of some deity’s plan, it’s a deity which wants the world darker. Finally, hineni is what Abraham said when God called upon him to sacrifice Isaac. It means Here I am; but with deep connotations: I am willing, or perhaps I am yours.

Together, I find these verses, and the rest of the song, deeply striking and totally incomprehensible. They throw the brute fact of immense suffering, and death, and darkness, at the listener. And then they don’t question it, or retreat from it, or refute it. The opposite! Hineni: yielding completely. Leonard Cohen, at death’s door, singing “I’m ready, My Lord”, as if there is nothing to excuse or explain. Or more: as if the litany of suffering in the rest of the song only strengthens his convictions.

It occured to me, listening to these lyrics, that I had no idea what that mental state feels like; that there is a whole spectrum of experience - apparently, overwhelmingly powerful experience - which is alien to me.

Now, perhaps I am misinterpreting Cohen; perhaps these lyrics are ironic, or despairing (although if so, I just don’t see it, and neither does the former UK Chief Rabbi). But either way, the story of Abraham and Isaac, which encapsulates the question of faith in the face of suffering, is one with which religious thinkers have wrestled down the centuries. It is the story that Kierkegaard chooses to illustrate the absolute absurdity of absolute faith - and its greatness, not in spite of, but because of that absurdity.

Yet, striking as Kierkegaard’s writings are, suppose that we would prefer not to found our worldviews on absurdity; what then? Is there something about this experience which atheists can learn from and absorb? I think there is, and it comes from thinking more deeply about faith. Forgive me if I am naively recapitulating well-known arguments; in this domain, it seems worth cultivating these thoughts myself.

What do we actually mean by faith? Let me distinguish two related concepts: faith as belief (in particular, about properties of God), and faith as surrender. Atheists know that faith in standard beliefs about God is misguided: that God is not omnibenevolent, nor the source of morality, nor the rightful authority. From this perspective, the story of Abraham arouses bewilderment, or scorn, or pity. But, for me at least, this has obscured the second aspect of faith. There are many beautiful things about the human spirit, and one of the greatest is our ability to trust each other - where surrendering control entirely to another is the ultimate form of trust. The absurdity of the story of Abraham comes not from the fact that he surrenders in this way, but rather from the fact that we cannot imagine any good reason why an omnipotent God should require Isaac to be sacrificed. Yet if, instead of God, we picture Abraham’s dearest friend saying to him, “I can’t tell you why, and I’m aghast that this is the case, but it is imperative that you take your only son, whom you love dearly, and kill him” - if at that point Abraham says “Hineni”, in the knowledge that his friend would do the same if their roles were reversed - then it is clearer that Abraham’s surrender reflects a depth of trust and connection that we should strive towards.[0]

So where does this leave us? Abraham’s belief in God was misguided. Today we do better by embracing the virtues of the Enlightenment - of reason and humanism and defiance of unjust authority. But nevertheless, in his ability to trust, he displays something valuable, and powerful enough that it has kept us coming back to his story for thousands of years.

----------------------------------------------------------------------------------

I think I believe this argument. Yet there’s something odd: if faith as surrender is so compelling to so many others, why do I (still) not feel the appeal of it? I’ve been familiar with this core idea since learning that Islam literally means “surrender”, many years ago. And yet I’ve never felt like that’s what’s missing in my life; nor do I hear people talk about it very much outside the context of religion.

Maybe I’m biased by a longstanding dislike of religion. But a hypothesis that worries me more is that I’m too much a child of individualistic modernity to actually understand or appreciate this sort of surrender. Scott Alexander talks about concept-shaped holes: ideas you don’t even realise you don’t understand. Some of his examples: not believing that society has become atomised and individualistic, because you mistake the scattered remnants of community around you for the real thing. Or being shut off from genuine emotional connection, but not realising it because you think that your shallow connections are as good as it gets.

And I wonder whether the same thing has happened to “trust”. When I say that I trust somebody, often I’m thinking about them being a good and sensible person. Even when I say that I’d trust somebody with my life, it feels like that’s still pretty close to saying that they’re a really good person, and pretty careful, and also that they care about me.

Okay, but now if I imagine saying: I trust somebody enough to surrender my will to theirs - enough that they could override any of my decisions, and I wouldn’t even resent them for it, because I’ve chosen to have faith in them. Oh. Yeah. That feels pretty different. That feels like a state of mind I don’t even understand, because I think of myself so deeply in individualistic terms that I’m not sure what it would mean to hand over that level of agency. Does it happen in the best relationships? I don’t know, but I can’t recall anyone talking about it, at least not in these terms. More generally, maybe this is what people mean when they talk about becoming “part of something bigger than you” But I’ve done that, with movements and ideologies, and even so, I relate to them very much as a free individual. I expect that it used to be much easier to feel defined by being a part of something much bigger than you - your family, your town, your religion. Now, even when we still belong to these groups, we no longer let them subsume our autonomy in the same way.

Another way to think about this change is as an increasing desire for control. Modernity’s defining feature is humans exercising control over our environments. Many object to this, perhaps because it seems hubristic - but the tragedy is that we shouldn’t trust nature, or God, or fate; they’re indifferent to us. Humanity needs to seize control of its destiny because nobody else will do it for us. And one of the key ways we do so is via science, where taking nothing on faith is a central tenet.

But perhaps my mistake has been in thinking of my personal life as a microcosm of humanity, and trying to seize control of it in similar ways. Unlike humanity, I’m not surrounded by uncaring nature, but by a social environment of people who are sufficiently trustworthy that this core religious instinct, to surrender, may be worth leaning into. In other words: if we should no longer put our faith in God, and we should no longer (after the atrocities of the 20th century) put our faith in society at large - perhaps we can still fill that fundamental need by putting our faith in the people around us, on a smaller scale.

I started off by talking about absolute faith. But extremism is seldom a virtue when it comes to human interactions - and in this case it runs into many of the same problems as absolute faith in God. What if you’re mistaken about the people you’ve put your faith in? What if they want you to do bad things? I notice that many people who attract this level of faith end up as cult leaders; and surrender to ideologies has often led to atrocities. So in practice, I think we shouldn’t get anywhere near Abrahamic levels of trust when we don’t have an omnibenevolent being to direct it towards. But Abraham as an idealised example, to nudge us in the right direction in our small-scale personal lives at least, and to counterbalance the underlying pressure to be in control - well, that seems like an idea worth having. As a humanist, I consider human relationships to be one of the main sources of meaning and value in the world. And in a secular, humanist context, we’re not just the one surrendering, but also the one being surrendered to. So part of moving towards that ideal is making ourselves worthy of that responsibility. In that sense, at least, humanists should aim to play the role of God.


[0] I’ve specified an equal relationship, because that fits with modern sensibilities. When we imagine Abraham trusting an authority figure in an unreciprocated way, it starts to seem a bit weird - what gives them that authority? But I imagine that the ancients saw unequal relationships (e.g. their marriages) as potentially also very meaningful and fulfilling.

Scope-sensitive ethics: capturing the core intuition motivating utilitarianism

Classical utilitarianism has many advantages as an ethical theory. But there are also many problems with it, some of which I discuss here. A few of the most important:
  • The idea of reducing all human values to a single metric is counterintuitive. Most people care about a range of things, including both their conscious experiences and outcomes in the world. I haven’t yet seen a utilitarian conception of welfare which describes what I’d like my own life to be like.
  • Concepts derived from our limited human experiences will lead to strange results when they’re taken to extremes (as utilitarianism does). Even for things which seem robustly good, trying to maximise them will likely give rise to divergence at the tails between our intuitions and our theories, as in the repugnant conclusion.
  • Utilitarianism doesn’t pay any attention to personal identity (except by taking a person-affecting view, which leads to worse problems). At an extreme, it endorses the world destruction argument: that, if given the opportunity to kill everyone who currently exists and replace them with beings with greater welfare, we should do so.
  • Utilitarianism is post-hoc on small scales; that is, although you can technically argue that standard moral norms are justified on a utilitarian basis, it’s very hard to explain why these moral norms are better than others. In particular, it seems hard to make utilitarianism consistent with caring much more about people close to us than strangers.
I (and probably many others) think that these objections are compelling, but none of them defeat the core intuition which makes utilitarianism appealing: that some things are good, and some things are bad, and we should continue to want more good things and fewer bad things even beyond the parochial scales of our own everyday lives. Instead, the problems seem like side effects of trying to pin down a version of utilitarianism which provides a precise, complete guide for how to act. Yet I’m not convinced that this is very useful, or even possible. So I’d prefer that people defend the core intuition directly, at the cost of being a bit vaguer, rather than defending more specific utilitarian formalisations which have all sorts of unintended problems. Until now I’ve been pointing to this concept by saying things like “utilitarian-ish” or “90 percent utilitarian”. But it seems useful for coordination purposes to put a label on the property which I consider to be the most important part of utilitarianism; I’ll call it “scope-sensitivity”.

My tentative definition is that scope-sensitive ethics consists of:
  • Endorsing actions which, in expectation, bring about more intuitively valuable aspects of individual lives (e.g. happiness, preference-satisfaction, etc), or bring about fewer intuitively disvaluable aspects of individual lives (e.g. suffering, betrayal).
  • A tendency to endorse actions much more strongly when those actions increase (or decrease, respectively) those things much more.
I hope that describing myself as caring about scope-sensitivity conveys the most important part of my ethical worldview, without implying that I have a precise definition of welfare, or that I want to convert the universe into hedonium, or that I’m fine with replacing humans with happy aliens. Now, you could then ask me which specific scope-sensitive moral theory I subscribe to. But I think that this defeats the point: as soon as we start trying to be very precise and complete, we’ll likely run into many of the same problems as utilitarianism. Instead, I hope that this term can be used in a way which conveys a significant level of uncertainty or vagueness, while also being a strong enough position that if you accept scope-sensitivity, you don’t need to clarify the uncertainty or vagueness much in order to figure out what to do. (I say "uncertainty or vagueness" because moral realists are often particularly uncomfortable with the idea of morality being intrinsically vague, and so this phrasing allows them to focus on the uncertainty part: the idea that some precise scope-sensitive theory is true, but we don't yet know which one. Whereas my own position is that it's fine and indeed necessary for morality to be intrinsically imprecise, and so it's hard to draw the line between questions we're temporarily uncertain about, and questions which don't have well-defined answers. From this perspective, we can also think about scope-sensitive ethics as a single vague theory in its own right.)

How does the definition I've given address the problems I described above? Firstly, it’s pluralist (within the restrictions of common sense) about what contributes to the welfare of individuals. The three most common types of utilitarian conceptions of welfare are hedonic theories, desire theories and objective-list theories. But each of these captures something which I care about, and so I don't think we know nearly enough about human minds (let alone non-human minds) to justify taking a strong position on which combination of these constitutes a good life. Scope-sensitivity also allows room for even wider conceptions of welfare: for example, people who think that achieving virtue is the most valuable aspect of life can be scope-sensitive if they try to promote that widely.

Secondly, it’s also consistent with pluralism about value more generally. Scope-sensitivity doesn’t require you to only care about welfare; you can value other things, as long as they don’t override the overall tendency to prioritise actions with bigger effects. In particular, unlike utilitarianism, scope-sensitivity is consistent with using non-consequentialist or non-impartial reasoning about most small-scale actions we take (even when we can't justify why that reasoning leads to the best consequences by impartial standards). Furthermore, it doesn’t require that you endorse welfare-increasing actions because they increase welfare. In addition to my moral preferences about sentient lives, I also have moral preferences about the trajectory of humanity as a whole: as long as humanity flourishing is correlated closely enough with humans flourishing, then those motivations are consistent with scope-sensitivity.

Thirdly, scope-sensitivity isn’t rigid. It doesn’t require welfare-maximisation in all cases; instead, specifying a “tendency” rather than a “rule” of increasing welfare allows us to abide by other constraints as well. I think this reflects the fact that a lot of people do have qualms about extreme cases (for which there may not be any correct answers) even when their general ethical framework aims towards increasing good things and decreasing bad things.

I should make two further points about evaluating the scope-sensitivity of existing moral theories. Firstly, I think it’s best interpreted as a matter of degree, rather than a binary classification. Secondly, we can distinguish between “principled” scope-sensitivity (scope-sensitivity across a wide range of scenarios, including implausible thought experiments) versus “practical” scope-sensitivity (scope-sensitivity given realistic scenarios and constraints).

I expect that almost all of the people who are most scope-sensitive in principle will be consequentialists. But in practice, non-consequentialists can also be highly scope-sensitive. For example, it may be the case that a deontologist who follows the rule ”try to save the world, if it’s in danger” is in practice nearly as scope-sensitive as a classical utilitarian, even if they also obey other rules which infrequently conflict with it (e.g. not lying). Meanwhile, some variants of utilitarianism (such as average utilitarianism) also aren’t scope-sensitive in principle, although they may be in practice.

One problem with the concept of scope-sensitivity is that it might induce motte-and-bailey fallacies - that is, we might defend our actions on the basis of scope-sensitivity when challenged, but then in practice act according to a particular version of utilitarianism which we haven't justified. But I actually think the opposite happens now: people are motivated by the intuition towards scope-sensitivity, and then defend their actions by appealing to utilitarianism. So I hope that introducing this concept improves our moral discourse, by pushing people to explicitly make the argument that scope-sensitivity is sufficient to motivate views like longtermism.

Another possibility is that scope-sensitivity is too weak a concept to motivate action - for example, if people claim to be scope-sensitive, but add a few constraints which mean they don’t ever need to act accordingly. But even if scope-sensitivity in principle is broad enough to include such views, hopefully the concept of practical scope-sensitivity identifies a natural cluster of moral views which, if people follow them, will actually make the world a much better place.

Sunday, 22 November 2020

My fictional influences

I’ve identified as a bookworm for a very long time. Throughout primary school and high school I read voraciously, primarily science fiction and fantasy. But given how much time I’ve spent reading fiction, it’s surprisingly difficult to pin down how it’s influenced me. (This was also tricky to do for nonfiction, actually - see my attempt in this post.)

Thinking back to the fiction I’ve enjoyed the most, two themes emerge: atmosphere, and cleverness. The atmosphere that really engages me in fiction is one that says: the world is huge; there’s so much to explore; and there’s a vastness of potential. But one that’s also a little melancholy - because you can’t possibly experience all of it, and time always flows onwards. I was particularly struck by the ending of The Lord of the Rings, when Frodo leaves all of Middle-Earth behind; by His Dark Materials, when Lyra gains, and loses, uncountable worlds; by the Malazan saga, occurring against a fictional backdrop of hundreds of thousands of years of epic history; and by Speaker for the Dead, as Ender skims through the millennia. Oh, and I can’t forget George R. R. Martin’s A Song for Lya, Neil Gaiman’s Ocean at the End of the Lane, and Paterson’s Bridge to Terabithia - none are the subtlest, but they're all exquisitely wistful. I’m compelled by the aesthetic: each of these a whole world that never was and will never be!

The other thing I love in fiction is cleverness: Xanatos Gambits and Magnificent Bastards and plans within plans that culminated in startling and brilliant ways. Ender’s Game is a great example; so too is The Lies of Locke Lamora. On the literary side, I loved Catch-22 for its cleverness in weaving together so many peculiar threads into a striking tapestry. Lately the novels which most scratch this itch have been online works, particularly Unsong, Worm, and A Practical Guide to Evil. Some sci-fi novels also fall in this category - I’m thinking particularly of Snow Crash, Accelerando, and Hyperion.

It’s hard to tell whether my fiction preferences shaped my worldview or vice versa, but I’d be surprised if all this reading weren’t at least partially responsible for me often thinking about the big picture for humanity, and personally aiming for ambitious goals. What’s more difficult is to point to specific things I gained from these books. I don’t identify with many fictional characters, and can't think of any personal conclusions that I've gained from depictions of them (perhaps apart from: communicate more!) I did read a lot of “big idea” books, but they were never that satisfying - fiction always seemed like an inefficient medium for communicating them.

But for some reason this has changed a bit over the last few years. I now find myself regularly thinking back to a handful of books as a way to remind myself of certain key ideas - in particular books that pair those ideas with compelling plots and characters. In no particular order:
  • Unsong is the work of fiction that most inspires me to be a better person; to do the things that “somebody has to and no one else will”.
  • Diaspora makes me reflect on the emptiness of pure ambition, and the arbitrariness of human preferences.
  • The Darkness That Comes Before pushes me to understand my mind and motivations - to illuminate “what comes before” my thoughts and actions.
  • Accelerando confronts me with the sheer scale of change that humanity might face.
  • Island and Walden Two underline the importance of social progress in building utopias.
  • Flowers for Algernon reminds me of the importance of emotional intelligence.
I wish I had a similar list of fiction which taught me important lessons about friendships and relationships, but for whatever reason I haven’t really found many fictional relationships particularly inspiring. I’m very curious about what would be on other people’s lists, though.

My intellectual influences

Prompted by a friend's question about my reading history, I've been thinking about what shaped the worldview I have today. This has been a productive exercise, which I recommend to others. Although I worry that some of what's written below is post-hoc confabulation, at the very least it's forced me to pin down what I think I learned from each of the sources listed, which I expect will help me track how my views change from here on. This blog post focuses on non-fiction books (and some other writing); I've also written a blog post on how fiction has influenced me.

My first strong intellectual influence was Eliezer Yudkowsky’s writings on Less Wrong (now collected in Rationality: from AI to Zombies). I still agree with many of his core claims, but don’t buy into the overarching narratives as much. In particular, the idea of “rationality” doesn’t play a big role in my worldview any more. Instead I focus on specific habits and tools for thinking well (as in Superforecasters), and creating communities with productive epistemic standards (a focus of less rationalist accounts of reason and science, e.g. The Enigma of Reason and The Structure of Scientific Revolutions).

Two other strong influences around that time were Scott Alexander’s writings on tribalism in politics, and Robin Hanson’s work on signalling (particularly Elephant in the Brain), both of which are now foundational to my worldview. Both are loosely grounded in evolutionary psychology, although not reliant on it. More generally, even if I’m suspicious of many individual claims from evolutionary psychology, the idea that humans are continuous with animals is central to my worldview (see Darwin’s Unfinished Symphony and Are We Smart Enough to Know How Smart Animals Are?). In particular, it has shaped my views on naturalistic ethics (via a variety of sources, with Wright’s The Moral Animal being perhaps the most central).

Another big worldview question is: how does the world actually change? At one point I bought into techno-economic determinism about history, based on reading big-picture books like Guns, Germs and Steel and The Silk Roads, and also because of my understanding of the history of science (e.g. the prevalence of multiple discovery). Sandel’s What Money Can’t Buy nudged me towards thinking more about cultural factors; so did books like The Dream Machine and The Idea Factory, which describe how many technologies I take for granted were constructed. And reading Bertrand Russell’s History of Western Philosophy made me start thinking about the large-scale patterns in intellectual history (on which The Modern Mind further shaped my views).

This paved the way for me to believe that there’s room to have a comparable influence on our current world. Here I owe a lot to Tyler Cowen’s The Great Stagnation (and to a lesser extent its sequels), Peter Thiel’s talks and essays (and to a lesser extent his book Zero to One), and Paul Graham’s essays. My new perspective is similar to the standard “Silicon Valley mindset”, but focusing more on the role of ideas than technologies. To repurpose the well-known quote: “Practical men who believe themselves to be quite exempt from any intellectual influence are usually the slaves of some defunct philosopher.”

Here’s a more complete list of nonfiction books which have influenced me, organised by topic (although I’ve undoubtedly missed some). I welcome recommendations, whether they’re books that fit in with the list below, or books that fill gaps in it!

On ethics:

  • The Righteous Mind

  • Technology and the Virtues

  • Reasons and Persons

  • What Money Can’t Buy

  • The Precipice


On human evolution:

  • The Enigma of Reason

  • The Human Advantage

  • Darwin’s Unfinished Symphony

  • The Secret of our Success

  • Human Evolution (Dunbar)

  • The Mating Mind

  • The Symbolic Species


On human minds and thought:

  • Rationality: from AI to Zombies

  • The Elephant in the Brain

  • How to Create a Mind

  • Why Buddhism is True

  • The Blank Slate

  • The Language Instinct

  • The Stuff of Thought

  • The Mind is Flat

  • Superforecasting

  • Thinking, Fast and Slow


On other sciences:

  • Scale: The Universal Laws of Life and Death in Organisms, Cities and Companies

  • Superintelligence

  • The Alignment Problem

  • Are We Smart Enough to Know How Smart Animals Are?

  • The Moral Animal

  • Ending Aging

  • Improbable Destinies

  • The Selfish Gene

  • The Blind Watchmaker

  • Complexity: The Emerging Science at the Edge of Order and Chaos

  • Quantum Computing Since Democritus


On science itself:


On philosophy:

  • A History of Western Philosophy

  • The Intentional Stance

  • From Bacteria to Bach and Back

  • Good and Real

  • The Big Picture

  • Consciousness and the Social Brain

  • An Enquiry Concerning Human Understanding


On history and economics:

  • The Shortest History of Europe

  • A Farewell to Alms

  • The Technology Trap

  • Iron, Steam and Money

  • The Enlightened Economy

  • The Commanding Heights


On politics and society:


On life, love, etc:

  • Deep Work

  • Man's Search for Meaning

  • More Than Two

  • Authentic Happiness

  • Happiness by Design

  • Written in History


Other:

  • Age of Em

  • Immortality: The Quest to Live Forever and How It Drives Civilization

  • Surely you’re Joking, Mr Feynman

  • Impro

  • Never Split the Difference

Saturday, 7 November 2020

Why philosophy of science?

During my last few years working as an AI researcher, I increasingly came to appreciate the distinction between what makes science successful and what makes scientists successful. Science works because it has distinct standards for what types of evidence it accepts, with empirical data strongly prioritised. But scientists spend a lot of their time following hunches which they may not even be able to articulate clearly, let alone in rigorous scientific terms - and throughout the history of science, this has often paid off. In other words, the types of evidence which are most useful in choosing which hypotheses to prioritise can differ greatly from the types of evidence which are typically associated with science. In particular, I’ll highlight two ways in which this happens.

First is scientists thinking in terms of concepts which fall outside the dominant paradigm of their science. That might be because those concepts are too broad, or too philosophical, or too interdisciplinary. For example, machine learning researchers are often inspired by analogies to evolution, or beliefs about human cognition, or issues in philosophy of language - which are all very hard to explore deeply in a conventional machine learning paper! Often such ideas are mentioned briefly in papers, perhaps in the motivation section - but there’s not the freedom to analyse them with the level of detail and rigour that is required for making progress on tricky conceptual questions.

Secondly, scientists often have strong visions for what their field could achieve, and long-term aspirations for their research. These ideas may make a big difference to what subfields or problems those researchers focus on. In the case of AI, some researchers aim to automate a wide range of tasks, or to understand intelligence, or to build safe AGI. Again, though, these aren’t ideas which the institutions and processes of the field of AI are able to thoroughly discuss and evaluate - instead, they are shared and developed primarily in informal ways.

Now, I’m not advocating for these ideas to be treated the same as existing scientific research - I think norms about empiricism are very important to science’s success. But the current situation is far from ideal. As one example, Rich Sutton’s essay on the bitter lesson in AI was published on his blog, and then sparked a fragmented discussion on other blogs and personal facebook walls. Yet in my opinion this argument about AI, which draws on his many decades of experience in the field, is one of the most crucial ideas for the field to understand and evaluate properly. So I think we need venues for such discussions to occur in parallel with the process of doing research that conforms to standard publication norms.

One key reason I’m currently doing a PhD in philosophy is because I hope that philosophy of science can provide one such venue for addressing important questions which can’t be explored very well within scientific fields themselves. To be clear, I’m not claiming that this is the main focus of philosophy of science - there are many philosophical research questions which, to me and most scientists, seem misguided or confused. But the remit of philosophy of science is broad enough to allow investigations of a wide range of issues, while also rewarding thorough and rigorous analysis. So I’m excited about the field’s potential to bring clarity and insight to the high-level questions scientists are most curious about, especially in AI. Even if this doesn’t allow us to resolve those questions directly, I think it will at least help to tease out different conceptual possibilities, and thereby make an important contribution to scientific - and human - progress.

Tuesday, 27 October 2020

What is past, and passing, and to come?

I've realised lately that I haven't posted much on my blog this year. Funnily enough, this coincides with 2020 being my most productive year so far. So in addition to belatedly putting up a few cross-posts from elsewhere, I thought it'd be useful to share here some of the bigger projects I've been working on which haven't featured elsewhere on this blog.

The most important is AGI safety from first principles (also available here as a PDF), my attempt to put together the most compelling case for why the development of artificial general intelligence might pose an existential threat to humanity. It's long (about 15,000 words) but I've tried to make it as accessible as possible to people without a machine learning background, because I think the topic is so critically important, and because there's an appalling lack of clear explanations of what might go wrong and why. Early work by Bostrom and Yudkowsky is less relevant in the context of modern machine learning; more recent work is scattered and brief. I originally intended to just summarise other people's arguments, but as the report grew, it became more representative of my own views and less representative of anyone else's. So while it covers the standard ideas, I also think that it provides a new perspective on how to think about AGI - one which doesn't take any previous claims for granted, but attempts to work them out from first principles.

A second big piece of work is Thiel on progress and stagnation, a 100-page compendium of quotes from Peter Thiel on - you guessed it - progress and stagnation in technology, and in society more generally. This was a joint project with Jeremy Nixon. We both find Thiel's views to be exciting and thought-provoking - but apart from his two books (which focused on different topics) they'd previously only been found scattered across the internet. Our goal was to select and arrange quotes from him to form a clear, compelling and readable presentation of his views. You can judge for yourself if we succeeded - although if you're pressed for time, there's a summary here.

Thirdly, I've put together the Effective Altruism archives reading list. This collates a lot of material from across the internet written by EAs on a range of relevant topics, much of which is otherwise difficult to find (especially older posts). The reading list is aimed at people who are familiar with EA but want to explore in more detail some of the ideas that have historically been influential within EA. These are often more niche or unusual than the material used to promote EA, and I don't endorse all of them - although I tried to only include high-quality content that I think is worth reading if you're interested in the corresponding topic.

Fourth is my first published paper, Avoiding Side Effects By Considering Future Tasks, which was accepted at NeurIPS 2020! Although note that my contributions were primarily on the engineering side; this is my coauthor Victoria's brainchild. From the abstract: Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. ... Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

Fifth, a series of posts on AI safety, exploring safety problems and solutions applicable to agents trained in open-ended environments, particularly multi-agent ones. Unlike most safety techniques, these don't rely on precise specifications - instead they involve "shaping" our agents to think in safer ways, and have safer motivations. Note that this is primarily speculative brainstorming; I'm not confident in any of them, although I'd be excited to see further exploration along these lines.

More generally, I've been posting a range of AI safety content on the Alignment Forum; I'm particularly happy about these three posts. And I've been asking questions I'm curious about on Less Wrong and the Effective Altruism Forum. Lastly, I've been very active on Twitter over the past couple of years; I haven't yet gotten around to collating my best tweets, but will do so eventually (and post them on this blog).

So that's what I've been up to so far this year. What's now brewing? I'm currently drafting my first piece of work for my PhD, on the links between biological fitness-maximisation and optimisation in machine learning. A second task is to revise the essay on Tinbergen's levels of explanation which I wrote for my Cambridge application - I think there are some important insights in there, but it needs a lot of work. I'm also writing a post tentatively entitled A philosopher's apology, explaining why I decided to get a PhD, what works very well about academia and academic philosophy, what's totally broken, and how I'm going to avoid (or fix) those problems. Lastly, I'm ruminating over some of the ideas discussed here, with the goal of (very slowly) producing a really comprehensive exploration of them. Thoughts or comments on any of these very welcome!

Zooming out, this year has featured what was probably the biggest shift of my life so far: the switch from my technical career as an engineer and AI researcher, to becoming a philosopher and general thinker-about-things. Of course this was a little butterfly-inducing at times. But increasingly I believe that what the world is missing most is novel and powerful ideas, so I'm really excited about being in a position where I can focus on producing them. So far I only have rough stories about how that happens, and what it looks like to make a big difference as a public intellectual - I hope to refine these over time to be able to really leverage my energies. Then onwards, and upwards!