Saturday, 15 September 2018

Realism about rationality

Epistemic status: trying to vaguely gesture at vague intuitions.

Cross-posted to Less Wrong, where there's been some good discussion. A similar idea was explored here under the heading "the intelligibility of intelligence", although I hadn't seen it before writing this post.

There’s a mindset which is common in the rationalist community, which I call “realism about rationality” (the name being intended as a parallel to moral realism). I feel like my skepticism about agent foundations research is closely tied to my skepticism about this mindset, and so in this essay I try to articulate what it is.

Humans ascribe properties to entities in the world in order to describe and predict them. Here are three such properties: "momentum", "evolutionary fitness", and "intelligence". These are all pretty useful properties for high-level reasoning in the fields of physics, biology and AI, respectively. There's a key difference between the first two, though. Momentum is very amenable to formalisation: we can describe it using precise equations, and even prove things about it. Evolutionary fitness is the opposite: although nothing in biology makes sense without it, no biologist can take an organism and write down a simple equation to define its fitness in terms of more basic traits. This isn't just because biologists haven't figured out that equation yet. Rather, we have excellent reasons to think that fitness is an incredibly complicated "function" which basically requires you to describe that organism's entire phenotype, genotype and environment.

In a nutshell, then, realism about rationality is a mindset in which reasoning and intelligence are more like momentum than like fitness. It's a mindset which makes the following ideas seem natural:
  • The idea that there is a simple yet powerful theoretical framework which describes human intelligence and/or intelligence in general. (I don't count brute force approaches like AIXI for the same reason I don't consider physics a simple yet powerful description of biology).
  • The idea that there is an “ideal” decision theory.
  • The idea that AGI will very likely be an “agent”.
  • The idea that Turing machines and Kolmogorov complexity are foundational for epistemology.
  • The idea that, given certain evidence for a proposition, there's an "objective" level of subjective credence which you should assign to it, even under computational constraints.
  • The idea that morality is quite like mathematics, in that there are certain types of moral reasoning that are just correct.
  • The idea that defining coherent extrapolated volition in terms of an idealised process of reflection roughly makes sense, and that it converges in a way which doesn’t depend very much on morally arbitrary factors.
  • The idea that having having contradictory preferences or beliefs is really bad, even when there’s no clear way that they’ll lead to bad consequences (and you’re very good at avoiding dutch books and money pumps and so on).

To be clear, I am neither claiming that realism about rationality makes people dogmatic about such ideas, nor claiming that they're all false. In fact, from a historical point of view I’m quite optimistic about using maths to describe things in general. But starting from that historical baseline, I’m inclined to adjust downwards on questions related to formalising rational thought, whereas rationality realism would endorse adjusting upwards. This essay is primarily intended to explain my position, not justify it, but one important consideration for me is that intelligence as implemented in humans and animals is very messy, and so are our concepts and inferences, and so is the closest replica we have so far (intelligence in neural networks). It's true that "messy" human intelligence is able to generalise to a wide variety of domains it hadn't evolved to deal with, which supports rationality realism, but analogously an animal can be evolutionarily fit in novel environments without implying that fitness is easily formalisable.

Another way of pointing at rationality realism: suppose we model humans as internally-consistent agents with beliefs and goals. This model is obviously flawed, but also predictively powerful on the level of our everyday lives. When we use this model to extrapolate much further (e.g. imagining a much smarter agent with the same beliefs and goals), or base morality on this model (e.g. preference utilitarianism), is that more like using Newtonian physics to approximate relativity (works well, breaks down in edge cases) or more like cavemen using their physics intuitions to reason about space (a fundamentally flawed approach)?

Another gesture towards the thing: a popular metaphor for Kahneman and Tversky's dual process theory is a rider trying to control an elephant. Implicit in this metaphor is the localisation of personal identity primarily in the system 2 rider. Imagine reversing that, so that the experience and behaviour you identify with are primarily driven by your system 1, with a system 2 that is mostly a Hansonian rationalisation engine on top (one which occasionally also does useful maths). Does this shift your intuitions about the ideas above, e.g. by making your coherent extrapolated volition feel less well-defined? I claim that the latter perspective is just as sensible, and perhaps even more so - see, for example, Paul Christiano's model of the mind, which leads him to conclude that "imagining conscious deliberation as fundamental, rather than a product and input to reflexes that actually drive behavior, seems likely to cause confusion."

These ideas have been stewing in my mind for a while, but the immediate trigger for this post was a conversation about morality which went along these lines:

R (me): Evolution gave us a jumble of intuitions, which might contradict when we extrapolate them. So it’s fine to accept that our moral preferences may contain some contradictions.
O (a friend): You can’t just accept a contradiction! It’s like saying “I have an intuition that 51 is prime, so I’ll just accept that as an axiom.”
R: Morality isn’t like maths. It’s more like having tastes in food, and then having preferences that the tastes have certain consistency properties - but if your tastes are strong enough, you might just ignore some of those preferences.
O: For me, my meta-level preferences about the ways to reason about ethics (e.g. that you shouldn’t allow contradictions) are so much stronger than my object-level preferences that this wouldn’t happen. Maybe you can ignore the fact that your preferences contain a contradiction, but if we scaled you up to be much more intelligent, running on a brain orders of magnitude larger, having such a contradiction would break your thought processes.
R: Actually, I think a much smarter agent could still be weirdly modular like humans are, and work in such a way that describing it as having idealised “beliefs” is still a very lossy approximation. And it’s plausible that there’s no canonical way to “scale me up”.

I had a lot of difficulty in figuring out what I actually meant during that conversation, but I think a quick way to summarise the disagreement is that O is a rationality realist, and I’m not. This is not a problem, per se: I'm happy that some people are already working on AI safety from this mindset, and I can imagine becoming convinced that rationality realism is a more correct mindset than my own. But I think it's a distinction worth keeping in mind, because assumptions baked into underlying worldviews are often difficult to notice, and also because the rationality community has selection effects favouring this particular worldview even though it doesn't necessarily follow from the community's founding thesis (that humans can and should be more rational).

Tuesday, 4 September 2018

I'm so meta, even this acronym

This is my 52nd blog post within the span of one calendar year, since I (re)started blogging on September 5th, 2017. My writing productivity has far exceeded my expectations, and I'm very happy about managing to explore so many ideas. Here are some metrics.

Breakdown by topic:
  • 10 posts on computer science, machine learning, and maths
  • 10 posts on philosophy
  • 7 posts on politics and economics
  • 6 posts on modern life and the future of society
  • 5 posts on history and geography
  • 3 posts on intelligence
  • 12 miscellaneous posts (including this one)

Most popular (in order):
  1. In search of All Souls
  2. Is death bad?
  3. What have been the greatest intellectual achievements?
  4. Utilitarianism and its discontents
  5. The unreasonable effectiveness of deep learning
  6. Yes, you should be angry about the housing crisis
  7. Oxford vs Cambridge

Longest posts (in order of word count):
  1. A brief history of India. 5915
  2. Proof, computation and truth. 5627
  3. Utilitarianism and its discontents. 5550
  4. Yes, you should be angry about the housing crisis. 5291
  5. The unreasonable effectiveness of deep learning. 3887
  6. In search of All Souls. 3873
  7. Which neural network architectures perform best at sentiment analysis? 3291

Word clouds:

Daily users (since I signed up for Google analytics):
 I really need a better way of getting readers than just the spikes from sharing on facebook.

Overall word count: 92,808 words (as a comparison, the first Harry Potter book was 76,944 words; The Hobbit was 95,022 words). This is over double the 44,816 words of material (excluding academic essays; most of it found here) which I'd written over the previous few years. Also note that this is a significant underestimate of how much I've actually written this last year, because I have at least a dozen drafts on the go at any one time, and right now also an extra half-dozen which I've written during my internship and am still mulling over.

However, I've been thinking lately that the focus for the next year will be on quality rather than quantity. This year has been fantastic in terms of intellectual exploration, but I'm not sure that any of my posts would actually contribute robustly valuable knowledge to people who already know about the topic. By contrast, I think my friends Jacob and Tom have blog posts which do so, because they're more careful and thorough. So I'd like to move in that direction a little more.

On the other hand, this blog is useful to me in many ways even if it doesn't make novel intellectual contributions. Probably the biggest is the fact that I started seriously reading current machine learning research while I was writing summaries of key ideas in deep learning. Without that, I wouldn't have learned nearly as much and may well not have gotten either my current internship or my upcoming job. Meanwhile, writing book reviews pushes me to read and understand books more thoroughly. In general, I'm glad to have something which feels like a tangible and permanent record of my personal intellectual progress, because I do worry about losing touch with my past self. Now I can be much more confident that my future self won't lose touch with me.

A compendium of conundrums

Logic puzzles

None of the puzzles below have trick answers - they can all be solved using logic and a bit of maths. Whenever a group of people need to achieve a task, assume they're allowed to confer and come up with a strategy beforehand. They're listed roughly in order of difficulty. Let me know of any other good ones you find!

Two ropes
I have two ropes which each, if lighted at one end, takes 1 hour to burn all the way to the other end. However, they burn at variable rates (e.g. the first might take 55 minutes to burn 1/4 of the way, then 5 minutes to burn all the rest; the second might be the opposite). How do I use them to time 45 minutes?

25 horses
I have 25 horses, and am trying to find the 3 fastest. I have no timer, but can race 5 at a time against each other; I know that a faster horse will always beat a slower horse. How many races do I need to find the 3 fastest, in order?

Monty hall problem (explanation taken from here)
The set of Monty Hall's game show Let's Make a Deal has three closed doors. Behind one of these doors is a car; behind the other two are goats. The contestant does not know where the car is, but Monty Hall does. The contestant picks a door and Monty opens one of the remaining doors, one he knows doesn't hide the car. If the contestant has already chosen the correct door, Monty is equally likely to open either of the two remaining doors. After Monty has shown a goat behind the door that he opens, the contestant is always given the option to switch doors. Is it advantageous to do so, or disadvantageous, or does it make no difference?

Four-way duel
A, B, C and D are in a duel. In turn (starting with A) they each choose one person to shoot at, until all but one have been eliminated. They hit their chosen target 0%, 33%, 66% and 100% of the time, respectively. A goes first, and of course misses. It's now B's turn. Who should B aim at, to maximise their probability of winning?

Duck in pond
A duck is in a circular pond with a menacing cat outside. The cat runs four times as fast as the duck can swim, and always runs around the edge of the pond in whichever direction will bring it closest to the duck, but cannot enter the water. As soon as the duck reaches the shore it can fly away, unless the cat is already right there. Can the duck escape?

Non-transitive dice
Say that a die A beats another die B if, when both rolled, the number on A is greater than the number on B more than 50% of the time. Is it possible to design three dice A, B and C such that A beats B, B beats C and C beats A?

Wine tasting
A king has 100 bottles of wine, exactly one of which is poisoned. He decides to figure out which it is by feeding the wines to some of his servants, and seeing which ones drop dead. He wants to find out before the poisoner has a chance to get away, and so he doesn't have enough time to do this sequentially - instead he plans to give each servant some combination of the wines tonight, and see which are still alive tomorrow morning.
a) How many servants does he need?
b) Suppose he had 100 servants - then how many wines could he test?

Crawling on the planet's face
Two people are dropped at random places on a featureless spherical planet (by featureless I also mean that there are no privileged locations like poles). Assume that each person can leave messages which the other might stumble across if they come close enough (within a certain fixed distance).
a) How can they find each other for certain?
b) How can they find each other in an amount of time which scales linearly with the planet's radius?

Dropping coconuts
I have two identical coconuts, and am in a 100-floor building; I want to figure out the highest floor I can drop them from without them breaking. Assume that the coconuts aren't damaged at all by repeated drops from below that floor - but once one is broken, I can't use it again.
a) What's the smallest number of drops I need, in the worst case, to figure that out?
b) Can you figure out an equation for the general case, in terms of number of coconuts and number of floors?

Pirate treasure
There are 5 pirates dividing up 100 gold coins in the following manner. The most senior pirate proposes a division (e.g. "99 for me, 1 for the next pirate, none for the rest of you"). All pirates then vote on this division. If a majority vote no, then the most senior pirate is thrown overboard, and the next most senior pirate proposes a division. Otherwise (including in the case of ties) the coins are split up as proposed. All pirates are entirely selfish, and have common knowledge of each other's perfect rationality.
a) What will the most senior pirate propose?
b) What about if there are 205 pirates?
c) Can you figure out a solution for the general case, in terms of number of coins and number of pirates?

There are n people, all wearing black or white hats. Each can see everyone else's hat colour, but not their own. They have to sort themselves into a line with all the white hats on one end and all the black hats on the other, but are not allowed to communicate about hat colours in any way. How can they do it?

Knights and knaves
You are travelling along a road and come to a fork, where a guardian stands in front of each path. A sign tells you that one guardian only speaks the truth, and one only speaks lies; also, one road goes to Heaven, and ones goes to Hell. You are able to ask yes/no questions (each directed to only one of the guardians) to figure out which is which.
a) Can you figure it out using two questions?
b) How about one?

What is the name of this god? (explanation taken from here)
Three gods A, B, and C are called, in no particular order, True, False, and Random. True always speaks truly, False always speaks falsely, but whether Random speaks truly or falsely is a completely random matter. Your task is to determine the identities of A, B, and C by asking three yes/no questions; each question must be put to exactly one god. The gods understand English, but will answer all questions in their own language, in which the words for yes and no are da and ja, in some order. You do not know which word means which.

A game of greed
You have a pile of n chips, and play the following two-player game. The first player takes some chips, but not all of them. After that players alternate taking chips; the only rule is that you cannot take more than the previous player did. The person who takes the last chip wins. Is it the first player or the second player who has a winning strategy, and what is it?

Heat-seeking missiles
Four heat-seeking missiles are placed at the corners of a square with side length 1. Each of them flies directly towards the missile on its left at a constant speed. How far does each travel before collision? (Assume they're ideal points which only "collide" when right on top of each other).

Blind maze
You're located within a finite square maze. You do not know how large it is, where you are, or where the walls or exit are. At each step you can move left, right, up or down; if there's a wall in the given direction, then you don't go anywhere (but you don't get any feedback telling you that you bumped into it). Is there a sequence of steps you can take to ensure that you will eventually find the exit?

Hats in lines
There are 100 prisoners in a line, facing forwards. Each is wearing a black or white hat, and can see the hat colour of everyone in front of them, but not their own or that of anyone behind them; also, they don't know the total number of hats of each colour. Starting from the back of the line, each person is allowed to say either "black" or "white", and is set free if they correctly say the colour of their hat, but shot otherwise. Everyone in the line can hear every answer, and whether or not they were shot afterwards.
a) How many people can be saved for certain, and using what strategy?
b) Suppose that the number of prisoners is countably infinite (i.e. in correspondence with the natural numbers, with number 1 being at the back). How can they save all but one?
c) Suppose that the number of prisoners is countably infinite, and none of them can hear the answers of the other prisoners. How can they save all but finitely many?

Prisoners and hats
Seven prisoners are given the chance to be set free tomorrow. An executioner will put a hat on each prisoner's head. Each hat can be one of the seven colors of the rainbow and the hat colors are assigned completely at the executioner's discretion. Every prisoner can see the hat colors of the other six prisoners, but not his own. They cannot communicate with others in any form, or else they are immediately executed. Then each prisoner writes down his guess of his own hat color. If at least one prisoner correctly guesses the color of his hat, they all will be set free immediately; otherwise they will be executed. Is there a strategy that they can use which guarantees that they will be set free?

Prisoners and switch
There are 100 immortal prisoners in solitary confinement, whose warden decides to play a game with them. Each day, one will be chosen at random and taken into an empty room with a switch on the wall. The switch can be in the up position or the down position, but isn't connected to anything. The prisoner is allowed to change the switch position if they want, and is then taken back to their cell; the switch will then remained unchanged until the next prisoner comes in. The other prisoners don't know who is chosen each day, and cannot communicate in any other way.
At any point, any prisoner can declare to the warden "I know that every single prisoner has been in this room already". If they are correct, all the prisoners will be set free; if not, they will all be executed.
a) What's a strategy that's guaranteed to work?
b) Does it still work if the warden is allowed to take prisoners into the room as often as he wants, without the other prisoners knowing? If not, find one that does.

Prisoners and boxes
Another 100 prisoners are in another game. They are each given a piece of paper on which they can write whatever they like. The papers are then taken by the warden, shuffled, and placed into boxes labelled 1 to 100 (one per box). One by one, each prisoner will be taken into the room with the boxes, and must find their own piece of paper by opening at most 50 boxes. If they do so, they're set free. To make things easier for them, before anyone else goes inside, the warden allows one prisoner to look inside all the boxes and, if they choose, to swap the contents of any two boxes (the other prisoners aren't allowed to move anything). Find the strategy which saves the greatest number of prisoners for certain.

Blue eyes (explanation taken from here)
A group of people with assorted eye colors live on an island. They are all perfect logicians -- if a conclusion can be logically deduced, they will do it instantly. No one knows the color of their own eyes. Every night at midnight, a ferry stops at the island. Any islanders who have figured out the color of their own eyes then leave the island, and the rest stay. Everyone can see everyone else at all times and keeps a count of the number of people they see with each eye color (excluding themselves), but they cannot otherwise communicate. Everyone on the island knows all the rules in this paragraph.
On this island there are 100 blue-eyed people, 100 brown-eyed people, and the Guru (she happens to have green eyes). So any given blue-eyed person can see 100 people with brown eyes and 99 people with blue eyes (and one with green), but that does not tell him his own eye color; as far as he knows the totals could be 101 brown and 99 blue. Or 100 brown, 99 blue, and he could have red eyes.
The Guru is allowed to speak once (let's say at noon), on one day in all their endless years on the island. Standing before the islanders, she says the following:
"I can see someone who has blue eyes."
Who leaves the island, and on what night?

Can you write a quine: a program that, when executed, prints its own source code?

Cheating on a string theory exam (puzzle taken from here)
You have to take a 90-minute string theory exam consisting of 23 true-false questions, but unfortunately you know absolutely nothing about the subject. You have a friend who will be writing the exam at the same time as you, is able to answer all of the questions in a fraction of the allotted time, and is willing to help you cheat — but the proctors are alert and will throw you out if they suspect him of communicating any information to you. You and your friend have watches which are synchronized to the second, and the proctors are used to him often finishing exams quickly and won't be suspicious if he leaves early.
a) What is the largest value N such that you can guarantee that you answer at least N out of the 23 questions correctly?
b) (Easier). The obvious answer is 12, but in fact you can do better than that, even though it seems like 12 is the information-theoretic limit. How come?

The hydra game (explanation taken from here)
A hydra is a finite tree, with a root at the bottom. The object of the game is to cut down the hydra to its root. At each step, you can cut off one of the heads, after which the hydra grows new heads according to the following rules:
  • If you cut off a head growing out of the root, the hydra does not grow any new heads.
  • Otherwise, remove that head and then make n copies of its grandfather subtree (as in the diagram below), where n is the number of the step you're on
What strategy can you use to eventually kill the hydra?

Physical puzzles

Balancing nails
Picture a nail hammered vertically into the floor (with most of it still sticking out). You're trying to balance as many other nails on it as you can, such that none of them touch the ground. How do you do so?

Hanging pictures
Consider a picture hanging by a string draped over some nails in the wall, in a way such that if any single nail is removed, the picture will fall to the ground.
a) Is it possible for 2 nails?
b) How about n nails?

Two-piece pyramid
Consider the two identical shapes shown below. Each has two planes of symmetry, and a square base. Is it possible to put them together to create a regular pyramid? (For a fun discussion of this problem in the contexts of machine learning, see a few minutes into this video).

Plane on a treadmill
Suppose that a plane were on a gigantic treadmill, which was programmed to roll backwards just as fast as the plane was moving forwards. Could the plane ever take off?

Pennies game
Two players take turns to place pennies flat on a circular table. The first one who can't place a penny loses. Is it the first or the second player who has a winning strategy?

Joining chains
You have four chains, each consisting of three rings. You're able to cut individual rings open and later weld them closed again. How many cuts do you need to make to form one twelve-ring bracelet?

Going postal
Alice and Bob live far apart, but are getting married and want to send each other engagement rings. However, they live in Russia, where all valuable items sent by post are stolen unless they're in a locked box. They each have boxes and locks, but no key for the other person's lock. How do they get the rings to each other?

Nine dots puzzle
Without lifting your pen from the paper, draw four straight lines that go through the centres of all 9 dots.

Mutilated chessboard
Consider a chessboard missing two diagonally opposite corner squares. Is it possible to cover all the remaining squares with dominos (where each domino covers two adjacent squares)?Related image

Safe sex
Suppose a man wants to have safe sex with three women, but only has two condoms. How can he do so, while ensuring that no STD is passed from anyone to anyone else?

Two people are tied together as in the following diagram. Without being able to undo or cut the ropes, how can they get free?