Word2Vec is a technique that uses a neural network to produce a representation of words as vectors in a multidimensional vector space. The network is trained to guess a word in a corpus based upon the context in which it appears (continuous bag-of-words) or a context based on a word (skip-gram). As the weights and biases are set to perform this task more accurately, words in the corpus are assigned to vectors. Over the course of training, words that appear in similar contexts end up located closer to each other in the space. This is often called semantic similarity. There are some neat features of these vector representations. For one, word-vectors can be added and subtracted with results that almost look like conceptual analysis. For example, vector(“King”) – vector(“Man”) + vector(“Woman”) = a vector whose closest word-vector is “Queen”. There are also several cases in which the word2vec model seems to learn unexpected information about things in the corpus. When trained on the entirety of Wikipedia, word2vec produces vectors for city names which, when reduced to 2 dimensions, reflect the geographical relations among those cities with a spooky degree of accuracy. It’s natural to wonder, does it reveal anything interesting about the history of philosophy?
The glamorously named ‘fasttext-wiki-news-subwords-300’ is one of the most impressive pre-trained sets of word vectors. It was trained on Wikipedia in 2017 and contains 1 million word vectors taken from a corpus of 16 billion tokens.* When fed a list of philosophers, it gives us this:
This is… a bit rubbish.
It gets that Kant is a special little guy, but that’s about all. It’s also very limited in what it can show us because names like ‘Eriugena’ and ‘Anscombe’ aren’t among the top million words on Wikipedia.
So what if we pick a much smaller but better curated dataset? What if we trained word2vec on the Stanford Encyclopaedia of Philosophy?
What I did
I’ve stuck my code here. I used a combination of BeautifulSoup and the module Newspaper to build the corpus and Gensim, an open-source NLP package, to train the word2vec model. The model output vectors in 100 dimensions. Here’s what the vectors for ‘kant’ and ‘fish’ look like:
I then fed the model a list of around 130 philosophers to find their vectors. This list was formed by taking a bunch of existing ‘top philosophers’ lists and then adding some philosophers who I thought should have been on those lists (e.g. Je Tsongkhapa, Ruth Millikan, Du Bois). If your favourite philosopher isn’t here, no offence was intended. I’m planning to do this again in the future and am open to suggestions.
I then used a standard principle component analysis algorithm to get these vectors down to 2 dimensions while preserving as much information as possible. The end product looks like this:
This may appear chaotic at first, but the closer you examine, the more patterns start to emerge. The ‘big three’ Ancient Greeks are lumped together, as are early modern European philosophers, and some kind of analytic-continental divide seems to have been noted. The biggest question is why most philosophers clump around the bottom left. The answer I suspect (and this is guesswork on my part) is that they are underrepresented in the dataset. I will discuss this more below because it highlights something very important about these representations.
In some cases, philosophers are placed close together due to ‘semantic similarity’. In other cases, they are placed closer together because there was not sufficient incentive to distinguish them. The model has tried to find an efficient representation of names in order to complete its task. There are many contexts in the dataset in which Locke and Hobbes appear but fewer in which Śankara and Zhu Xi appear. I suspect this has led to the bottom left corner compressing a bunch of philosophically distinct people. Anyone who knows something of the history of philosophy could have avoided this error, but the network walked right into it.
We should keep in mind that this is not a map of the history of philosophy, at least not directly. It’s informed by historical and philosophical information, since both are contained in the dataset. But these aren’t objective reflections of philosophical similarity, whatever that might be. It is the partial and impartial collective memory of a field as seen from the perspective of some prominent living scholars. It is a shared dream, a confabulation, a fantasy.
And since the whole thing’s imagined anyhow…
The Geography of our Past
What we have is a scatterplot graph. The process of interpretation has already begun when we treat its points as names. The network doesn’t know what a name is. And so, perhaps, representation isn’t the best way to think about the graph and instead it may be more insightful to appeal to metaphor to make sense of the model. This metaphor is often implicit; the network ‘learns’ that Kant has greater ‘semantic similarity’ to Hume, and it ‘knows’ that Berkeley has more in common with American pragmatism than early modern Europe. I have chosen to call attention to it with a map.
The eastern half of the map presents the standard ‘History of Western Philosophy’ taught in countless undergraduate courses. The river runs from the highlands of Ancient Greece, through the intermediary Ibn-Rushd to St Thomas Aquinas and out to fertilise the plains of early modern Europe. Socrates is closest to Plato and Aquinas is closest to Aristotle. Along its course, we find the ‘Land of the Sages’. I still don’t like this name but I couldn’t think of a better one for a region that includes Confucius, Parmenides and Pythagoras. Why are these people together? Confucians and Taoists, the Neo-Aristotelian Maimonides and the Neoplatonist Plotinus. They have been granted their own neighbourhood. Zhuangzi has a little pond where he can contemplate the happiness of fish.
Northwards, the river curls below the lofty citadels of Scholastica and on to the well-coppiced forest of early modern Europe. Philosophers here have all been given sufficient space to grow and develop. The network has grouped them together but given them each a plot of land with which to mix their labour. Further north, we have David Hume, and rising above them all stands Mt Kant. As with the Wikipedia chart, the whole map can be viewed in terms of philosophers’ proximity to Kant.
And then things get interesting. West of the Land of the Sages, across the fields of Epicurus and Eriugena, we find Scholastica Minor. Why is Democritus lurking on the outskirts? Nearby, Margaret Cavendish finds herself living between Machiavelli and Erasmus. South of Scholastica Minor lies the Timeless Oasis. This may be my favourite point on the map. Three great dialecticians whose work either directly or indirectly called into question the reality of time have been brought together. One might pause here and wonder why Pyrrho isn’t closer to Sextus or why Vasubandhu is so far from his half-brother Asanga. We’ll return to this later. Northeast of The Continent, we find the odd couple of Friedrich Nietzsche and George Berkeley, and Søren Kirkegaard sharing the river bank with Butler.
The Continent has its own internal geography which I might explore at another time. It makes sense for Kitaro Nishida to be beside Henri Bergson, but it is still surprising to see Bentham living closer to Fichte than Mill. I was struck to find Josiah Royce in here as well. I suspect the proximity of these writers to each other is another indication of the Anglo-American bias of the encyclopaedia. From the shoreline, they must envy the well-distinguished, floating hills of the analytic philosophers.
At the southern edge of the Continent, we find the Philippa Foot Hills. In the northern regions live Arendt and Buber, next to Jacques Lacan. To the south, we find some French political theory and the proto-feminism of Condorcet and Astell.
South of this, the algorithm begins to assert itself. By this, I mean that the neural network did not consider it worthwhile learning the differences between these philosophers. This is likely because they are underrepresented in the data set. It is worth contrasting this region with early modern Europe where the network has been given the information required to treat the canonical figures as distinct individuals. In the Desert of the Underexamined, wildly different philosophers are thrust together. They exist as general forms rather than particulars. They have been added on to – but not incorporated into – how philosophy is presented. Articles on the metaphysics of causation discuss Kant and Hume but not Nagarjuna. Dharmakirti is presented as a figure in Indian philosophy but not as an epistemologist or semanticist. I suspect that the philosophers in this region will begin to separate out as their work is connected to specific problems. History – and by this I mean the current choices of our discipline – will show whether this prediction is right or wrong.
Off the coast, we find the Analytic Archipelago, a fragmented realm of floating islands. You may be surprised to see Millikan and Barcan Marcus so far south. I suspect that they have been classified together on account of their name, Ruth. Though perhaps the network recognises them as two theorists who have done more than most to render modal notions respectable. I doubt this, though; the network is fickle and easily distracted. Northwards, the most surprising sights are Marx’s island and Heidegger’s tower, surrounded by fog. I still don’t know why Heidegger has ended up off the coast from Wittgenstein and alongside American pragmatists, but we must imagine Heidegger unhappy. In the far north, we have a land of philosophers who gave themselves to the contemplation of numbers and the normativity of logic, and it is here with Rudolf Carnap we find the furthest point from Aristotle and presumably the apotheosis of philosophy.
This is a sloppy, silly map. In some cases, I had to use shortened forms of names (e.g. Al-Farabi became ‘Farabi’ like ‘Aquinas’) and in others, I have only included one name when there should probably be several (e.g. Zeno, Mill, Lewis). I interpret Butler as Judith and not Joseph as their work seems to be more well-attested in the corpus but I may be wrong. This is why I removed the word ‘Stanford’ from the title. The errors here are mine. The Stanford Encyclopaedia is one of the great recent accomplishments of philosophy whereas this map should not be taken very seriously.
I’ll be honest, the main reason I chose to represent this as a fantasy map was because I thought it would be fun, but I don’t think the form is inappropriate. Few genres reflect biases more clearly than fantasy. And maps, it’s worth remembering, reflect our knowledge of the world, not the world itself (for more thoughts on ‘knowledge-first’ cartography see here). This isn’t a map of the history of philosophy or of the space of logical possibilities. It is a map of an encyclopaedia of philosophy that reflects the interests and values of the community who compiled it. Filtering this through a neural network has made things less, not more reflective of reality. The space itself is not euclidean but bent and distorted by the algorithm. Some people are treated as similar because their work shares common themes; others are treated as similar because their work is less well-known. The network knows only signs (and maybe less than that).
If you find this map hideously twee or even offensive, a projection of bucolic innocence onto a violent history, that’s fair. There are countless maps which could be made with the same data. I used inkarnate to make mine. I’d love to see others. If I were to start making it again, I would probably do it completely differently. Perhaps I will. Feel free to share your thoughts or recommendations.
Since it’s illegal to write about philosophy and maps without citing either Borges or Calvino, I’ll leave this microchapter from Invisible Cities below.
Cities & Desire 4
In the center of Fedora, that gray stone metropolis, stands a metal building with a crystal globe in every room. Looking into each globe, you see a blue city, the model of a different Fedora. These are the forms the city could have taken if, for one reason or another, it had not become what we see today. In every age someone, looking at Fedora as it was, imagined a way of making it the ideal city, but while he constructed his miniature model, Fedora was already no longer the same as before, and what had been until yesterday a possible future became only a toy in a glass globe.
The building with the globes is now Fedora’s museum: every inhabitant visits it, chooses the city that corresponds to his desires, contemplates it, imagining his reflection in the medusa pond that would have collected the waters of the canal (if it had not been dried up), the view from the high canopied box along the avenue reserved for elephants (now banished from the city), the fun of sliding down the spiral, twisting minaret (which never found a pedestal from which to rise).
On the map of your empire, O Great Khan, there must be room both for the big, stone Fedora and the little Fedoras in glass globes. Not because they are all equally real, but because all are only assumptions.
The one contains what is accepted as necessary when it is not yet so; the others, what is imagined as possible and, a moment later, is possible no longer.
Italo Calvino, Invisible Cities
One thought on “Digital Doxography and the Memory of Philosophy”
[…] I made my map of the Stanford Encyclopaedia of Philosophy in February (okay, when the Gensim implementation of […]