In this notebook I study deep non-linear networks at initialisation. This is a prerequisite to understand such a network at the end of training, as is discussed in The Principles of Deep Learning Theory (PDLT) by Roberts, Yaida, and Hanin. This notebook also serves as a demonstration of my DeepLearningTheoryTools paclet for Mathematica and is meant to be read as complementary material to the PDLT book.
Some thoughts on why paper reading groups aren’t helping with research (unless you’re specialized on reproduction of research, which is important, too).
Suppose you think you’re 80% likely to have left your laptop power adapter somewhere inside a case with 4 otherwise-identical compartments. You check 3 compartments without finding your adapter. What’s the probability that the adapter is inside the remaining compartment?
How can I become a better researcher? I grouped my thoughts on this into the four themes: Robustness, Sense of Direction, Execution, and Collaboration.
Feed forward neural network layers are typically fully connected, or dense. But do we actually need to connect every input with every output? And if not, which inputs should we connect to which outputs? It turns out that in some of Rasa’s machine learning models we can randomly drop as much as 80% of all connections in feed forward layers throughout training and see their performance unaffected! Here we explore this in more detail.
In Part I we introduced semantic map embeddings and their properties. Now it’s time to see how we create those embeddings in an unsupervised way and how they might improve your NLU pipeline.
How do you convey the “meaning” of a word to a computer? Nowadays, the default answer to this question is “use a word embedding”. A typical word embedding, such as GloVe or Word2Vec, represents a given word as a real vector of a few hundred dimensions. But vectors are not the only form of representation. Here we explore semantic map embeddings as an alternative that has some interesting properties. Semantic map embeddings are easy to visualize, allow you to semantically compare single words with entire documents, and they are sparse and therefore might yield some performance boost.
A medical test does not give you the probability that you have a certain disease (such as Corona). Instead, it is evidence, which moves the odds of you having the disease up or down. I wrote a little app that helps you calculate how these odds change.
Just a note for reference, because I find myself explaining this repeatedly.
Seven scientists (A, B, C, D, E, F, G) with widely-differing experimental skills measure a quantity m. You expect some of them to do accurate work, and some of them to turn in wildly inaccurate answers. What is m and how reliable is each scientist?
For most of my life, I was not really interested in politics. I think, perhaps, this disinterest originated in my affinity for mechanistic and abstract theories that reflect the real world, and politics always seemed everything but mechanistic and abstract – just a huge mess of opinions and empty talk. I changed my view on […]
Throughout my life in academia, I have received much advice on how I should and shouldn’t write. Countless books have been written on the topic, but The Reader’s Brain is special. Instead of just telling people what to do, Yellowlees Douglas actually explains why it is good to write one way or another, based on […]
Buzzwords like artificial intelligence (AI), machine learning (ML), big data, and neural networks appear everywhere around us, but if you are not involved in the field, you may misunderstand them. Thus, here is a short overview on what these words mean, how the associated technologies can affect you personally, and what you can do to […]
Recently, I discovered this neat little algorithm called “self-organizing maps” that can be used to create a low-dimensional “map” (as in cartography) of high-dimensional data. The algorithm is very simple. Say you have a set of high-dimensional vectors and you want to represent them in an image, such that each vector is associated with a […]
This week’s post is about “Semantic Folding Theory and its Application in Semantic Fingerprinting” by Webber . The basic ideas were also discussed in this Braininspired podcast, and also presented and recorded at the HVB Forum in Munich. You don’t need any particular prior knowledge to understand this post. In my own words The space […]
This week’s article is “World Models” by Ha and Schmidhuber . You can find a fancy interactive version of the article here. To understand this post, you need to have a basic understanding of neural networks, recurrent neural networks / LSTMs and reinforcement learning. In my own words Let’s say you want to train a […]
Last year, I noticed that Snickers bars seem to taste different in different countries, but I was not sure. So my partner Nellissa and I conducted a little experiment that involved a lot of chocolate and a little Bayesian statistics. We wanted to establish whether Snickers bars from different countries taste different or not. To […]
Why do we die? Why does any animal die? How old can we expect an average individual of any species to become, and what does that have to do with its body size or heart rate? In “Scale”, Geoffrey West outlines a simple mathematical model that answers all of the questions above with amazing predictive […]
This week’s article is “Conditional Neural Processes” by Garnelo et al. . To understand this post, you need to have a basic understanding of neural networks and Gaussian processes. In my own words A neural process (NP) is a novel framework for regression and classification tasks that combines the strengths of neural networks (NNs) and […]
This month’s book is “Algorithms to live by”, by Brian Christian and Tom Griffiths. I dread reading my postal mail. Bills here, adverts there, and worst of all: forms to fill out. It feels like such a waste of time! Which is why I sometimes let letters stay in my inbox for several months. Reading […]
A Gaussian Process is a mathematical tool that you can use to model a probability distribution from data, i.e. to do regression, classification, and inference.
This week’s article is “Solving the Rubik’s Cube Without Human Knowledge” by McAleer, Agostinelli, Shmakov, and Baldi , which was submitted to NeurIPS 2018. To understand this article, you need to have a basic understanding of neural networks and be familiar with reinforcement learning. In my own words The Rubik’s Cube is a 3-dimensional combination […]