How hard is implementing intelligence?

Is implementing a model of intelligence like the one which I outlined in my last essay easy or hard? How surprised should we be if we learn that it won't be achieved in the next 200 years? My friend Alex and I have very different priors for these questions. He's a mathematician, and constantly sees the most intelligent people in the world bashing their minds against problems which are simple and natural to pose, but whose solutions teeter on the edge of human ability (e.g. Fermat's last theorem), and where any tiny error can invalidate a proof. Many took hundreds of years to solve, or are still open.

I'm a computer scientist, and am therefore based in a field which has blossomed in less than a century. It's a field where theoretical problems naturally cluster into complexity classes which are tightly linked, so that solutions to one problem can easily be transformed into solutions for others. We can alter Turing machines in many ways - adding extra tapes, making those tapes infinite in both directions, allowing non-determinism - without changing the range of algorithms they can implement at all. And even in cases where we can't find exact solutions (including most of machine learning) we can often approximate them fairly well. My instincts say that if we clarify the conceptual issues around abilities like abstraction and composition, then the actual implementation should be relatively easy.

Of course, these perspectives are both quite biased. The study of maths is so old that all the easy problems have been solved, and so of course the rest are at the limits of human ability. Conversely, computer science is so new a field that few problems have been solved except the easy ones, and so we haven't gotten into the really messy bits. But our perspectives also reflect underlying differences in the fields. Maths is simply much more rigorous than computer science. Proofs are, by and large, evaluated using a binary metric: valid or not. There are often many ways to present a proof, but they don't alter that fundamental property. Machine learning algorithms, by contrast, can improve on the previous best by arbitrarily small gradations, and often require many different hyperparameters and implementation choices which subtly change the performance. So improving them via experimentation is much easier. That's also a drawback, since the messier a domain is, the more difficult it is to make a crisp conceptual advance.

I'm really not sure how our expectations about the rate of progress towards AGI should be affected by these two properties. I do think that significant conceptual advances are required before we get anywhere near AGI, and I can imagine machine learning instead getting bogged down for decades on incrementally improving neural network architectures. We can't assume that better scores on standard benchmarks demonstrate long-term potential - in fact the head of Oxford's CS department, Michael Wooldridge, thinks there's been very little progress towards AGI (as opposed to narrow AI) in the last decade. Meanwhile, theoretical physics has been in a rut for thirty years according to some physicists, who blame top researchers unsuccessfully plugging away at string theory without reevaluating the assumptions behind it. On the other hand, there's an important way in which the two cases aren't analogous: deep learning is fundamentally driven by improving performance, whereas string theory is essentially untestable. And the historical record is pretty clear: given the choice between high-minded armchair theorising vs hypotheses informed by empirical investigation, bet on the latter (Einstein is the most notable exception, but still definitely an exception).

What other evidence can we marshal, one way or the other? We might think that the fact that evolution managed to make humans intelligent is proof that it's not so hard. But here we're stymied by anthropic reasoning. We can't use our own existence to distinguish between intelligence being so hard that it only evolved once, or so easy that it has evolved billions of times. So we have evidence against the two extremes - that it's so difficult intelligence is unlikely to arise on any planet, and that it's so easy intelligence should have evolved several times on Earth already - but can't really distinguish anything in the middle. (Also, if there is an infinite multiverse, then the upper bound on difficulty basically vanishes.)

We could instead identify specific steps in the evolution of humans and estimate their difficulty based on how long they took, but here we run into anthropic considerations again. For example, we have evidence that life evolved only a few hundred million years after the Earth itself formed 4.5 billion years ago, which suggests that it was a relatively easy step. However, intelligent species would only ever be formed by a series of steps which took less time than the habitable lifetime of their planet. On Earth, temperatures are predicted to rise sharply in about a billion years and render animal life impossible within the following few hundred million years. Let's round this off to a 6 billion year habitable period, which we're 3/4 of the way through. Then even if the average time required for the formation of life on earth-like planets were 100 billion years, on planets which produced intelligent life within 6 billion years the average would be much lower.

On the other hand, the last common ancestor of humans and chimpanzees lived only 6 million years ago, which is a surprisingly short time given how much smarter we are than apes. So either relatively few evolutionary steps are required to go from ape-level to human-level intelligence, or else evolution progressed unusually quickly during those 6 million years. The latter is plausible because increasing intelligence is a fast way to increase reproductive fitness, both in absolute terms - since using tools and coordinating hunting allowed larger human populations - and relative terms - several theories hold that a main driver of human intelligence was an "intelligence arms race" to outsmart rivals and get to the top of social hierarchies. However, my "simple" model of intelligence makes me also sympathetic to the former. Most mammals seem to have ontologies - to view the world around them as being composed of physical objects with distinct properties. I wouldn't be surprised if implementing ape-level ontologies ended up being the hardest part of building an AGI, and the human ability to reason about abstract objects turned out to be a simple extension of it. That would fit with previous observations that what we think of as high-level thought, like playing chess, is usually much easier to reproduce than low-level capabilities like object recognition, which nature took much longer to hone.

How hard are these low-level capabilities, then? The fact that most vertebrates can recognise and interact with a variety of objects and environments isn't particularly good evidence, since those abilities might just have evolved once and been passed on to all of them. But in fact, many sophisticated behaviours have evolved independently several times, which would be very unlikely if they were particularly difficult steps. Examples include extensive nervous systems and tool use in octopuses (even though the last common ancestor we shared with them was a very primitive wormlike creature), inventive and deceptive behaviour in crows and ravens, and creativity and communication in dolphins. However, we don't know whether these animal behaviours are implemented in ways which have the potential to generalise to human-level intelligence, or whether they use fundamentally different mechanisms which are dead ends in that respect. The fact that they didn't in fact lead to human-level intelligence is some evidence for the latter, but not much: even disregarding cognitive architecture, none of the animals I mentioned above have all the traits which accelerated our own evolution. In particularly, birds couldn't support large brains; dolphins would have a lot of trouble evolving appendages well-suited to tool use; and octopuses lack group environments with complicated social interactions. Apart from apes, the only species I can think of which meets all these criteria is elephants - and it turns out they're pretty damn smart. So I doubt that that dead-end cognitive architectures are common in mammals (although I really have no idea about octopuses). For a more detailed discussion of ideas from the last few paragraphs, see (Shulman and Bostrom, 2012).

What can we deduce from the structure of the brain? It seems like most of the neocortex, where abstract thought occurs, consists of "cortical columns" with a regular, layered structure. This is further evidence that the jump to human-level intelligence wasn't particularly difficult from an evolutionary standpoint. Another important fact is that the brain is a very messy environment. Neurons depend on the absorption of various nutrients and hormones from the blood; signals between them are transmitted by the Brownian diffusion of chemicals. Meanwhile the whole thing is housed in a skull which is constantly shaking around and occasionally sustains blunt traumas. And we know that in cases of brain damage or birth defects, whole sections of the brain can reorganise themselves to pick up the slack. In short, there's a great deal of error tolerance in how brains work, which is pretty strong evidence that intelligence isn't necessarily fiddly to implement. This suggests that once we figure out roughly what the algorithms behind human intelligence are, fine-tuning them until they actually work will be fairly easy. If anything, we'd want to make our implementations less precise as a form of regularisation (something which I'll discuss in detail in my next literature review).

Could it be the case that those algorithms require very advanced hardware, and our lack of it is what's holding us back from AGI? At first, it seems not: the computational power required for intelligence is bounded by that of brains, which supercomputers already exceed. But there are reasons to think hardware is still a limiting factor. If our progress were bounded mainly by the quality of our algorithms, then we should expect to see "hardware overhangs": cases where we invent algorithms that require much less computing power than what is currently available to us. But in fact it's difficult to think of these cases - most breakthroughs in AI (such as Deep Blue, Watson, AlphaGo and deep learning in general) required new algorithmic methods to be implemented on state-of-the-art hardware. My best guess for how to reconcile these two positions: it would be possible to run an efficient AGI on today's hardware, but it's significantly harder to figure out how to implement AGI efficiently than it is to implement it at all. And since AGI is already pretty hard, we won't be able to actually build one until we have far more processing power than we'd theoretically need - also because the more compute you have, the more prolifically you can experiment. This model predicts that we should now be able to implement much more efficient versions of algorithms invented in previous decades - for example, that the best chess software we could implement today using Deep Blue's hardware would vastly outperform the original. However, serious efforts to make very efficient versions of old algorithms are rare, since compute is so abundant now (even smartphones have an order of magnitude more processing power than Deep Blue did).

Lastly, even if all we need to do is solve the "conceptual problems" I identified above, AGI is probably still a long way away. If the history of philosophy should teach us one thing, it's that clarifying and formalising concepts is hard. And in this case, the formalisations need to be rigorous enough that they can be translated into actual code, which is a pretty serious hurdle. But I don't think these conceptual problems are 200-years-away hard; my guesstimate for the median time until transformative AGI is roughly 1/3 of that. This post and the one before it contain many of the reasons my estimate isn't higher. My other recent post, on skepticism about deep learning, contains many of the reasons my estimate isn't lower. But note that this headline number summarises a probability distribution with high variance. A very rough outline, which I haven't thought through a great deal, is something like 10% within the next 20 years, 20% within the 20 years after that, 20% within the 20 years after that, 20% in the 40 years after that, 20% in the 50 years after that, and 10% even later or never. This spread balances my suspicion that the median should be near the mode, like in a Poisson distribution, with the idea that when we're very uncertain about probabilities, we should have priors which are somewhat scale-invariant, and therefore assign less probability to a k-year period the further away it is.

In conclusion, I have no catchy conclusion - but given the difficulty of the topic, that's probably a good thing. Thanks to Vlad for the argument about the role of hardware in algorithm development, and to Alex again for inspiring this pair of essays. Do get in touch if you have any comments or feedback.

Comments

  1. Thanks for another interesting post.

    You argue that because the brain is very robust to noise and trauma, once we know the algorithm there should be very few hyperparameters that need to be tweaked? I am not so convinced. It is hard to say what parameters are crucial to development and continued functioning and which aren't. For example, the balance between inhibition and excitation seems to be crucial as does the correct functioning of a huge class of different neurotransmitters and receptors, not to mention critical periods for development.

    On the hardware front, yes the brain uses the same amount of energy as a lightbulb and our computers have far more FLOPS but our largest deep learning models still have far fewer than 150 trillion parameters (mapping these to synapse count, it should be more if you count glial cells too) and are not nearly as parallelized or sparse meaning there is a whole class of algorithms that researchers are disincentivized from testing.

    ReplyDelete

Post a Comment

Popular posts from this blog

In Search of All Souls

25 poems

Book review: Very Important People