This week’s post is about “Semantic Folding Theory and its Application in Semantic Fingerprinting” by Webber . The basic ideas were also discussed in this Braininspired podcast, and also presented and recorded at the HVB Forum in Munich. You don’t need any particular prior knowledge to understand this post.
In my own words
The space of all concepts is enormously large. Much larger than the space of all possible things. But somehow our brains can navigate this space and find meaningful relations between concepts. How does this work, and how is this related to natural language?
In natural language, we don’t give one word to one concept. Instead, the same word may describe different concepts, depending on the context. For example, the word “organ” may refer to a musical instrument, or to an assemblage of tissues.
Intuitively, we can “add” or “subtract” words to refine concepts. For example,
But until Webber’s work (which I am describing), it was very difficult to make a computer handle such relations. So how does it work?
The key insight comes from neuroscience. Specifically, neuroscientists have hypothesized that the outer-most layer of the brain, called neo-cortex, is essentially made up of a large number of physical, two-dimensional maps of concept space, known as cortical modules. Crucially, although every point on such a map corresponds to a concept, not every concept corresponds to a point on that map. Instead, some concepts are combinations of points on the map.
For example, there may be a point for “car”, and a point for “fast”. If both are active, then this represents “sports car”.
We can abstract this idea to computer science, and mimic a cortical module as a two-dimensional binary array. Say it constitutes 128 \times 128 bits. To assign meaning to these bits, we take a large body of text, e.g. Wikipedia, slice the raw text into snippets, and then assign one bit in this 128 \times 128 matrix to each snippet in such a way that snippets with similar content point to bits that are close to one another. This process is known as semantic folding.
The clustering of alike concepts in the semantic folding process could be done, for example, using self-organizing maps, although Webber never seems to specify the exact algorithm that he uses in his work.
Now that the semantic map is created, we can create semantic fingerprints of words. To this end, we would activate all points in our semantic map that correspond to snippets in which the word appeared. This creates a sparse distributed representation of the word, using one 128\times128 binary matrix.
For a whole sentence or document, we would simply add up the maps of all the words within the document.
Words that are unspecific (a.k.a. stop-words), such as “with” or “it”, will activate points all over the map, whereas very specific words, such as “cake” or “molecule”, will only activate a few points on the map.
We eliminate the stop-words, and thereby create a sparse representation of the document, by deactivating all but the most active 2% of the points on the map. In this way, only the most important semantic points remain, and we are left with a sparse distributed representation of the entire document.
The possible applications of these fingerprints are plenty. Essentially, everything that natural language processing is trying to do might be made possible with semantic fingerprints. Check out these demos at cortical.io to see a few applications in action.
Opinion, and what I have learned
This is another one of those ideas that blew my mind. I am baffled that semantic fingerprinting does not even appear on the Wikipedia page about natural language processing.
Sparse distributed representations might as well be used to encode visual or audio data, but to my knowledge this has not yet been explored.
Since I have just spent time studying neural processes , it seems clear to me that there is a close relation between NPs and the theory presented here. I wonder if the performance of NPs can be improved by re-designing the encoder(s) such that two-dimensional sparse distributed representations are generated.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional
11 months
The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.