Posts

Showing posts from February, 2018

The Unreasonable Effectiveness of Deep Learning

Image
This essay consists of summaries, explanations, and discussions of several papers which provide high-level arguments and intuitions about why, conceptually, deep learning works. Particular areas of investigation are "Which classes of functions can deep neural nets approximate well in principle?"; "Why can they quickly learn functions which have very small training loss?"; and "Why do the functions they learn generalise so well?". Understanding deep learning requires rethinking generalisation -  Zhang, Bengio, Hardt, Recht and Vinyals, 2017 Machine learning in general is about identifying functions in high-dimensional spaces based on finitely many samples from them. In doing so, we navigate between two potential errors: learning a function which is too simple to capture most of the variation in our data (underfitting) and learning a function which matches the data points well, but doesn't generalise (overfitting). Underfitting implies higher train

In defence of conflict theory

Scott Alexander recently wrote an interesting blog post on the differences between approaches to politics based on conflict theory and mistake theory. Here's a rough summary, in his words: "Mistake theorists treat politics as science, engineering, or medicine. The State is diseased. We’re all doctors, standing around arguing over the best diagnosis and cure. Some of us have good ideas, others have bad ideas that wouldn’t help, or that would cause too many side effects. Conflict theorists treat politics as war. Different blocs with different interests are forever fighting to determine whether the State exists to enrich the Elites or to help the People... Right now I think conflict theory is probably a less helpful way of viewing the world in general than mistake theory. But obviously both can be true in parts and reality can be way more complicated than either." This comparison doesn't explain everything, but it definitely captures some important aspects of p

Topics on my mind: January 2018

My degree has forced me to start learning some linguistics, which turns out to be very interesting. It feels like, in trying to figure out what understanding language really means, we're grappling with the very notion of concepthood, and the nature of intelligence itself. My thesis is based on the question of how to represent words in machine learning models. Vectors seem to work pretty well, and make intuitive sense for nouns and verbs at least. In this model, if you take king, subtract man, and add woman, you get queen. Something can be more or less 'rain' (drizzle, shower, downpour, torrent), or more or less 'run' (jog, lope, sprint), or even more or less 'bird' (ostrich, penguin, vulture, sparrow). Things get a little more complicated when we consider determiners, conjunctions, and prepositions, since you can't really be more or less 'for' or 'or'. And when it comes to putting it all together into representations of sentences, we rea

Which neural network architectures perform best at sentiment analysis?

This essay was my main project for my module on Machine Learning for Natural Language Processing at Cambridge. It assumes some familiarity with NLP and deep learning. Over the last few years, deep neural networks have produced state-of-the-art results in many tasks associated with natural language processing. This comes in conjunction with their excellent results in other areas of machine learning, perhaps most notably computer vision. Different types of neural networks have been particularly successful in different areas. For example, CNNs are the tool of choice in image recognition problems; their internal structure has distinct parallels with the human visual system. Two other neural network architectures have achieved particular success in NLP; these are recurrent neural nets (RNNs) and recursive neural nets (here abbreviated RSNNs). It seems at first glance that the structures of RNNs and RSNNs are better suited to processing language than CNNs; however, empirical result