Luxemburg by the Sea

When I made my map of the Stanford Encyclopaedia of Philosophy in February (okay, when the Gensim implementation of Word2Vec made the map and I coloured it in), I included a bit of discussion about how more neglected philosophers were clustered in one corner, in a ‘Desert of the Underexamined’, because their work was insufficiently attested within the encyclopaedia. Well, Rosa Luxemborg has made a break for the coast! On the April 13th, Lea Ypi’s article on Luxemburg was published. When I retrained the neural network on the whole SEP, she had noticeably moved northwest, closer to the sea. I think this gives a neat visual representation of the impact of a new article on the encyclopedia as a whole. So accordingly, I have added her new location onto the map, a nice red lighthouse by the shore.

Made with

As you would expect, retraining the network moved lots of things around by a tiny amount. The general outline of the map has been very resiliant over reproductions, however, for the sake of science, here’s the new graph.

Digital Doxography and the Memory of Philosophy


Word2Vec is a technique that uses a neural network to produce a representation of words as vectors in a multidimensional vector space. The network is trained to guess a word in a corpus based upon the context in which it appears (continuous bag-of-words) or a context based on a word (skip-gram). As the weights and biases are set to perform this task more accurately, words in the corpus are assigned to vectors. Over the course of training, words that appear in similar contexts end up located closer to each other in the space. This is often called semantic similarity. There are some neat features of these vector representations. For one, word-vectors can be added and subtracted with results that almost look like conceptual analysis. For example, vector(“King”) – vector(“Man”) + vector(“Woman”) = a vector whose closest word-vector is “Queen”. There are also several cases in which the word2vec model seems to learn unexpected information about things in the corpus. When trained on the entirety of Wikipedia, word2vec produces vectors for city names which, when reduced to 2 dimensions, reflect the geographical relations among those cities with a spooky degree of accuracy. It’s natural to wonder, does it reveal anything interesting about the history of philosophy?

The glamorously named ‘fasttext-wiki-news-subwords-300’ is one of the most impressive pre-trained sets of word vectors. It was trained on Wikipedia in 2017 and contains 1 million word vectors taken from a corpus of 16 billion tokens.* When fed a list of philosophers, it gives us this:

This is… a bit rubbish.

It gets that Kant is a special little guy, but that’s about all. It’s also very limited in what it can show us because names like ‘Eriugena’ and ‘Anscombe’ aren’t among the top million words on Wikipedia.

So what if we pick a much smaller but better curated dataset? What if we trained word2vec on the Stanford Encyclopaedia of Philosophy?

What I did

I’ve stuck my code here. I used a combination of BeautifulSoup and the module Newspaper to build the corpus and Gensim, an open-source NLP package, to train the word2vec model. The model output vectors in 100 dimensions. Here’s what the vectors for ‘kant’ and ‘fish’ look like:

I then fed the model a list of around 130 philosophers to find their vectors. This list was formed by taking a bunch of existing ‘top philosophers’ lists and then adding some philosophers who I thought should have been on those lists (e.g. Je Tsongkhapa, Ruth Millikan, Du Bois). If your favourite philosopher isn’t here, no offence was intended. I’m planning to do this again in the future and am open to suggestions.

I then used a standard principle component analysis algorithm to get these vectors down to 2 dimensions while preserving as much information as possible. The end product looks like this:

This may appear chaotic at first, but the closer you examine, the more patterns start to emerge. The ‘big three’ Ancient Greeks are lumped together, as are early modern European philosophers, and some kind of analytic-continental divide seems to have been noted. The biggest question is why most philosophers clump around the bottom left. The answer I suspect (and this is guesswork on my part) is that they are underrepresented in the dataset. I will discuss this more below because it highlights something very important about these representations.

In some cases, philosophers are placed close together due to ‘semantic similarity’. In other cases, they are placed closer together because there was not sufficient incentive to distinguish them. The model has tried to find an efficient representation of names in order to complete its task. There are many contexts in the dataset in which Locke and Hobbes appear but fewer in which Śankara and Zhu Xi appear. I suspect this has led to the bottom left corner compressing a bunch of philosophically distinct people. Anyone who knows something of the history of philosophy could have avoided this error, but the network walked right into it.

We should keep in mind that this is not a map of the history of philosophy, at least not directly. It’s informed by historical and philosophical information, since both are contained in the dataset. But these aren’t objective reflections of philosophical similarity, whatever that might be. It is the partial and impartial collective memory of a field as seen from the perspective of some prominent living scholars. It is a shared dream, a confabulation, a fantasy.

And since the whole thing’s imagined anyhow…

The Geography of our Past

What we have is a scatterplot graph. The process of interpretation has already begun when we treat its points as names. The network doesn’t know what a name is. And so, perhaps, representation isn’t the best way to think about the graph and instead it may be more insightful to appeal to metaphor to make sense of the model. This metaphor is often implicit; the network ‘learns’ that Kant has greater ‘semantic similarity’ to Hume, and it ‘knows’ that Berkeley has more in common with American pragmatism than early modern Europe. I have chosen to call attention to it with a map.

Made with

The eastern half of the map presents the standard ‘History of Western Philosophy’ taught in countless undergraduate courses. The river runs from the highlands of Ancient Greece, through the intermediary Ibn-Rushd to St Thomas Aquinas and out to fertilise the plains of early modern Europe. Socrates is closest to Plato and Aquinas is closest to Aristotle. Along its course, we find the ‘Land of the Sages’. I still don’t like this name but I couldn’t think of a better one for a region that includes Confucius, Parmenides and Pythagoras. Why are these people together? Confucians and Taoists, the Neo-Aristotelian Maimonides and the Neoplatonist Plotinus. They have been granted their own neighbourhood. Zhuangzi has a little pond where he can contemplate the happiness of fish.

Northwards, the river curls below the lofty citadels of Scholastica and on to the well-coppiced forest of early modern Europe. Philosophers here have all been given sufficient space to grow and develop. The network has grouped them together but given them each a plot of land with which to mix their labour. Further north, we have David Hume, and rising above them all stands Mt Kant. As with the Wikipedia chart, the whole map can be viewed in terms of philosophers’ proximity to Kant.

And then things get interesting. West of the Land of the Sages, across the fields of Epicurus and Eriugena, we find Scholastica Minor. Why is Democritus lurking on the outskirts? Nearby, Margaret Cavendish finds herself living between Machiavelli and Erasmus. South of Scholastica Minor lies the Timeless Oasis. This may be my favourite point on the map. Three great dialecticians whose work either directly or indirectly called into question the reality of time have been brought together. One might pause here and wonder why Pyrrho isn’t closer to Sextus or why Vasubandhu is so far from his half-brother Asanga. We’ll return to this later. Northeast of The Continent, we find the odd couple of Friedrich Nietzsche and George Berkeley, and Søren Kirkegaard sharing the river bank with Butler.

The Continent has its own internal geography which I might explore at another time. It makes sense for Kitaro Nishida to be beside Henri Bergson, but it is still surprising to see Bentham living closer to Fichte than Mill. I was struck to find Josiah Royce in here as well. I suspect the proximity of these writers to each other is another indication of the Anglo-American bias of the encyclopaedia. From the shoreline, they must envy the well-distinguished, floating hills of the analytic philosophers.

At the southern edge of the Continent, we find the Philippa Foot Hills. In the northern regions live Arendt and Buber, next to Jacques Lacan. To the south, we find some French political theory and the proto-feminism of Condorcet and Astell.

South of this, the algorithm begins to assert itself. By this, I mean that the neural network did not consider it worthwhile learning the differences between these philosophers. This is likely because they are underrepresented in the data set. It is worth contrasting this region with early modern Europe where the network has been given the information required to treat the canonical figures as distinct individuals. In the Desert of the Underexamined, wildly different philosophers are thrust together. They exist as general forms rather than particulars. They have been added on to – but not incorporated into – how philosophy is presented. Articles on the metaphysics of causation discuss Kant and Hume but not Nagarjuna. Dharmakirti is presented as a figure in Indian philosophy but not as an epistemologist or semanticist. I suspect that the philosophers in this region will begin to separate out as their work is connected to specific problems. History – and by this I mean the current choices of our discipline – will show whether this prediction is right or wrong.

Off the coast, we find the Analytic Archipelago, a fragmented realm of floating islands. You may be surprised to see Millikan and Barcan Marcus so far south. I suspect that they have been classified together on account of their name, Ruth. Though perhaps the network recognises them as two theorists who have done more than most to render modal notions respectable. I doubt this, though; the network is fickle and easily distracted. Northwards, the most surprising sights are Marx’s island and Heidegger’s tower, surrounded by fog. I still don’t know why Heidegger has ended up off the coast from Wittgenstein and alongside American pragmatists, but we must imagine Heidegger unhappy. In the far north, we have a land of philosophers who gave themselves to the contemplation of numbers and the normativity of logic, and it is here with Rudolf Carnap we find the furthest point from Aristotle and presumably the apotheosis of philosophy.


This is a sloppy, silly map. In some cases, I had to use shortened forms of names (e.g. Al-Farabi became ‘Farabi’ like ‘Aquinas’) and in others, I have only included one name when there should probably be several (e.g. Zeno, Mill, Lewis). I interpret Butler as Judith and not Joseph as their work seems to be more well-attested in the corpus but I may be wrong. This is why I removed the word ‘Stanford’ from the title. The errors here are mine. The Stanford Encyclopaedia is one of the great recent accomplishments of philosophy whereas this map should not be taken very seriously.

Fantasy Maps

I’ll be honest, the main reason I chose to represent this as a fantasy map was because I thought it would be fun, but I don’t think the form is inappropriate. Few genres reflect biases more clearly than fantasy. And maps, it’s worth remembering, reflect our knowledge of the world, not the world itself (for more thoughts on ‘knowledge-first’ cartography see here). This isn’t a map of the history of philosophy or of the space of logical possibilities. It is a map of an encyclopaedia of philosophy that reflects the interests and values of the community who compiled it. Filtering this through a neural network has made things less, not more reflective of reality. The space itself is not euclidean but bent and distorted by the algorithm. Some people are treated as similar because their work shares common themes; others are treated as similar because their work is less well-known. The network knows only signs (and maybe less than that).

If you find this map hideously twee or even offensive, a projection of bucolic innocence onto a violent history, that’s fair. There are countless maps which could be made with the same data. I used inkarnate to make mine. I’d love to see others. If I were to start making it again, I would probably do it completely differently. Perhaps I will. Feel free to share your thoughts or recommendations.

Since it’s illegal to write about philosophy and maps without citing either Borges or Calvino, I’ll leave this microchapter from Invisible Cities below.

Cities & Desire 4

In the center of Fedora, that gray stone metropolis, stands a metal building with a crystal globe in every room. Looking into each globe, you see a blue city, the model of a different Fedora. These are the forms the city could have taken if, for one reason or another, it had not become what we see today. In every age someone, looking at Fedora as it was, imagined a way of making it the ideal city, but while he constructed his miniature model, Fedora was already no longer the same as before, and what had been until yesterday a possible future became only a toy in a glass globe.

The building with the globes is now Fedora’s museum: every inhabitant visits it, chooses the city that corresponds to his desires, contemplates it, imagining his reflection in the medusa pond that would have collected the waters of the canal (if it had not been dried up), the view from the high canopied box along the avenue reserved for elephants (now banished from the city), the fun of sliding down the spiral, twisting minaret (which never found a pedestal from which to rise).

On the map of your empire, O Great Khan, there must be room both for the big, stone Fedora and the little Fedoras in glass globes. Not because they are all equally real, but because all are only assumptions.

The one contains what is accepted as necessary when it is not yet so; the others, what is imagined as possible and, a moment later, is possible no longer.

  Italo Calvino, Invisible Cities

Zellig Harris, a correction

This post was prompted by Noam Chomsky’s personal reflections on the history of the last 70 years of linguistics (Chomsky 2021). It’s a nice piece if – like me – you’re into that kind of thing but it repeats what I take to be a misrepresentation of Zellig Harris’s work. I’ll get to the exact quote in a bit but I should say something about why I find myself interested. I will also avoid saying anything about Harris’s ideas on ‘metalanguage’ which I think were integral to how he thought about language. To even try would make this post interminably long.

Zellig Harris’s work in linguistics has two important features that make it relevant to current thought. The first is that he defines syntactic categories distributionally. The fact that an expression is a noun or adjective is not grounded by its possession of a syntactic property (e.g., [+N, -V]) but by its membership of a class, the class of expressions with which it can be substituted to produce another grammatical sentence. What makes ‘cat’ and ‘object’ nouns is nothing more than the fact that that they can appear in the same position in sentences. This approach is extensional as categories are defined by their members and holist as syntactic classes must be defined in terms of the whole language considered as a unified system.

The second strand of Harris’s work was an emphasis on what he took to be the probabilistic nature of syntax. From the late 1960s onwards, he argued that the relations that hold between constituents in a syntactic structure were probabilistic relations such that the occurrence of one increases the probability that the other will occur. There is no abstract structural relationship between them projected by a competence grammar existing independent of performance. There are merely relations of co-occurrence in usage.

These approaches are often seen to contrast with traditional generative models which treat categories as features lexical items possess independently of each other and which treat syntactic structure as an abstract relation between lexical items reflecting how a grammar represents an agent’s knowledge of language.

The arguments against Harris’s methods are simple enough. While ‘cat’ and ‘object’ can both occur in the frame ‘there is a __ on the table’, ‘object’ can also occur in the frame ‘I __ to what you are saying’ while ‘cat’ cannot. When we actually consider the complexity and ambiguity of natural languages, the distributional method appears hopeless. The case against probabilistic methods is also straightforward. Consider the following sentences:

“(1) Colourless green ideas sleep furiously.
(2) Furiously sleep ideas green colourless.
It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally ‘remote’ from English. Yet (1), though nonsensical, is grammatical, while (2) is not” (Chomsky, 1957).

With such simple refutations, it’s no wonder these were not pursued as major lines of inquiry. Nevertheless, both approaches have seen a resurgence within linguistics (computational linguistics, cognitive science etc.). Clark and others have shown how distributional methods can overcome many of the problems identified in LSLT such that a large class of languages are learnable (in a formally precise sense) (see Clark, 2015 for a mathsy overview). Meanwhile, the use of probabilistic ideas in syntax is even more widespread. To pick just one example, Periera has shown that with even a relatively simple aggregate bigram model, the probability of (1) occurring in contrast to (2) is about 2 x 105. That is p(1)/p(2) = 2 x 105. The moral seems to be, simple models can be easily refuted by a priori reasoning, complex models are harder to handle.

I should make it clear that I don’t have a cat in this café. I don’t find either position intrinsically more intuitive. My main interest is in clarifying certain philosophical claims that have occurred alongside the development of these formal models. Which brings us back to Harris.

Harris in Context

I suspect that for many people, the 1975 preface to The Logical Structure of Linguistic Theory [LSLT] contains much of what they know about the work of Zellig Harris. The Harris one finds there is a radically pluralist antirealist. ‘In his view, there are no ‘competing theories’ and ‘pitting of one linguistic tool against another’ is senseless. Alternative theories are equally valid, as alternative procedures of analysis are equally valid’ (Chomsky, 1975: 38). This Harris was unconcerned with the empirical reality of linguistic theories or with their explanatory adequacy and had no interest in the psychological basis of our knowledge of language. This is in contrast to the realist attitude of generative grammar which seeks to discern the genuine structures of natural languages and their psychological basis. This is the presentation of Harris’s work which has been repeated.

“Retrospectively, Harris (1965) took a still stronger stand: There are no “competing theories”; “pitting of one linguistic tool against another” is senseless, an “aberration” with sociological roots. Alternative procedures of analysis can be applied “as a basis for a description of the whole language,” bringing out its various properties in different but not competing ways” (Chomsky, 2021: 4)

I’m going to argue that this account of Harris’s work is not wholly accurate, that these quotes are taken out of context, and that Harris was not proposing antirealism.

First though, let’s see the case against Harris. It’s easy to read some of Harris’s writing as expressing straight-forward a commitment to mid-century positivism. He rejected earlier grammarians inventories of primitive categories on the grounds that ‘[t]he danger of using such undefined and intuitive criteria as pattern, symbol, and logical a prioris, is that linguistics is precisely the one empirical field which may enable us to derive definitions of these intuitive fundamental relationships out of correlations of observable phenomena’ (Harris, 1940: 228). He contrasted the linguist’s task of discovering novel structures in natural language from those which are ‘already built into the system’ in mathematics and logic (Harris, 1952) while suggesting that the primitives of these systems are likely to be based upon elements of the languages of the systems’ creators (Harris, 1951: 303). The idea that linguistics shouldn’t assume a prior set of categories was fundamental to Harris’s view of language (Harris, 1960).

According to the positivist reading, Harris’s use of distributional analysis can be seen as an attempt to extensionally reduce the vocabulary of grammars to observable data just as the caricature-version of a logical positivist tries to show how the empirical content of scientific theories can be reduced to protocol sentences. He even uses Bloomfield’s word ‘report’, which the latter had presented as a translation of Protokollsatz. However, it’s one thing to argue that there is no a priori set of categories which we can appeal to when describing a language, it’s another to claim that all sets of categories are equivalent and devoid of psychological significance.

Harris didn’t take linguistics to be a subfield of psychology but he wasn’t indifferent to psychological considerations. In Co-Occurrence and Transformation in Linguistic Structure he proposed that ‘[t]here is also some reason to think that the kernels may function differently in memory and thought from the transformations’ (Harris, 1957: 339). This claim would be unintelligible if we read him, as Chomsky proposes, as viewing transformations as a mere means of arranging data rather than as a property of languages themselves. His opposition to the conflation of linguistics and psychology arose because he was unwilling to invoke psychological notions to define the primitive terms of linguistic theory (Harris, 1940: 225). Claims which seem to support the pluralist/antirealist reading make much more sense when understood this way. For example, consider the following claim:

“Any psychological or sociological interpretation of language is permissible (and by the same token every one is irrelevant) so long as it does not conflict with the results of linguistic investigation; which of them is desirable can only be decided in terms of the other sciences.”

This might be taken as an indication that any kind of psychological theory might be introduced to explain linguistic phenomena. However, when we understand it in context, the inadequacy of this reading is clear. Harris is responding directly to the use of psychological arguments in Gray’s Foundations of Language.

“Psychological explanations are often circular: ‘The earliest stages of IE [Proto-Indo-European] had no future, but as need arose to express future time and, consequently, to denote such a tense, a number of devices were adopted’ (20); the tense is there because they had need of it, and the proof that they had need of it is that the tense is there”

He criticises Gray for making ad hoc speculations about the psychological processes underlying pejoration and semantic drift. Positing a psychological process can be an easy way out of explaining a feature of language but risks ‘explaining’ something well-defined and observable with something poorly defined and unobservable. In this context, we can understand Harris’s claim to be that, theorists can posit psychological explanations for language change but these claims must be tested against the claims of other fields. They do not stand alone as explanations. Later in his career, he connected his system of grammar to claims about learnability and language evolution: ‘It is in this way that the structure of a language can be conformed to even without the speakers explicitly knowing the grammar’ (Harris, 1989).

Now, what about the quotes Chomsky identifies? They come from the paper Transformational Theory published in Language in 1965. Since this section seems to have been used as the primary textual source for claims about Harris’s radical pluralism, it’s worth quoting it in full. They immediately follow the distinction between string, constituent, and transformational analyses.

“To interrelate these [transformational, string, constituent] analyses, it is necessary to understand that these are not competing theories, but rather complement each other in the description of sentences.5 It is not that grammar is one or another of these analyses, but that sentences exhibit simultaneously all of these properties. Indeed one can devise modifications of languages, say of English, which lack one property while retaining the others; but the result is not structurally the same as the original language.” And the footnote: “The pitting of one linguistic tool against another has in it something of the absolutist postwar temper of social institutions, but is not required by the character and range of these tools of analysis.”

Again we find that context shows that Harris is not saying that there are no ‘competing theories’ which would be an incredibly strong philosophical claim but that string analysis, constituent analysis, and transformational analysis are not competing theories and that they complement each other because a sentence exhibits all these structures at once. At the same time Harris was making this claim, Chomsky was describing the difference between deep structure and surface structure. In much the same way, we can affirm that these are not competing theories of linguistic structure while also holding that they are distinct structures that inhere within languages.

None of this makes sense if these analyses are simply ways of systematising data rather than describing something which is present in linguistic structure.

Harris’s career spanned sixty years and his ideas did change. His work from the 60s onward increasingly claimed to be influenced by the idea that natural languages have no independent metalanguage and the claim that this introduced significant constraints on how it should be described and explained. I think it is impossible to read texts like A Theory of Language and Information and come to the conclusion that he was any kind of antirealist or that he thought that theoretical approaches were equivalent. I have tried to show here that we don’t even have to consider this material to draw this conclusion.

I think the narratives we tell about the history of a field matter. It’s easy to fall in to caricature, to say, look at the narrow-minded technologists with their neural networks and machine learning, look at how they are repeating the philosophical mistakes of the behaviourists of the 1950s, and the positivists of the 1920s (and the empiricists of the 1690s etc.) All prediction and no explanation. And when this is a natural way to think, it can be valuable to see that the connection between broad philosophical positions and theoretical methods are more complicated than these narrative suggest. At least in the case of Harris.

Some of the texts mentioned

Chomsky, N. 1955: The logical structure of linguistic theory. Ms., Harvard/MIT. [Published in part, 1975, New York: Plenum.]

Chomsky, N. 1957 Syntactic structures. The Hague: Mouton.

Chomsky, N. 2021 Linguistics Then and Now: Some Personal Reflections Annu. Rev. Linguist. 2021. 7:1–11

Clark, A. 2015: The syntactic concept lattice. Journal of Logic and Computation Vol. 25, Issue 5

Harris Z. 1940: Review of Foundations of Language by Louis H. Gray Language, Vol. 16, No. 3, . 216-235

Harris Z. 1952: Discourse Analysis. Language, Vol. 28, No. 1, . 1-30

Harris Z. 1954: Transfer Grammar. International Journal of American Linguistics, Vol. 20, No. 4, . 259- 270

Harris Z. 1955: From Phoneme to Morpheme. Language, Vol. 31, No. 2 . 190-222

Harris Z. 1957: Co-Occurrence and Transformation in Linguistic Structure. Language, Vol. 33, No. 3, Part 1 . 283-340

Harris Z. 1959: The Transformational Model of Language Structure. Anthropological Linguistics 11:27-3

Pereira F. 2000. Formal grammar and information theory: together again? Phil. Trans. R. Soc. Lond. A 358, 1239-1253

Mesas and the mundane

With lockdown wearing on, my occasional walks outside took on a more epic feel. I think many of us felt something like this. Perhaps the new found sense of risk rarified the landscape. I took to photographing tree stumps in the style of Ansel Adams. I can’t say I really captured the style – I was using a telephone, not a camera – still, I hope I captured something of these mesas.


Music (Almost)

Here’s a piece of music that has been described as a sign that I obviously hate pianists and wish for them to suffer. That isn’t strictly true. But if you’re in quarantine and want to hurt your fingers in the service of some noise, fill your boots:  A Minor Etude

If you’re wondering if the title is a terrible pun, it is.

Here’s how a machine renders the piece:

Amhrán na bhFiann

Here’s a little thing I wrote on Liam Ó Rinn’s translation of the Irish national anthem. It’s not exactly a scholarly work, in fact, it existence is almost exclusively a result of a recent bout of insomnia. The anthem gets a rough time but I think that Ó Rinn’s translation is truly impressive and thought it would be nice to dole out some due praise for a change.

Blame Cassirer

While in the process of dissecting my PhD dissertation into discrete articles I have come across some sections which, while interesting to me, are too short or trivial to warrant full publication. Over the next few weeks, I’ll be turning some of these into blog posts beginning with a discussion that connects two of my favourite things; mathematical linguistics and German idealism. 

When Cartesian Linguistics came out in 1964 there was one point which all reviewers and critics agreed upon; Chomsky had seriously misunderstood Humboldt. This is perhaps surprising as Chomsky had heralded generative grammar as a ‘return rather to the Humboldtian conception of underlying competence as a system of generative processes’ (Chomsky, 1964: 4) 

‘It can, furthermore, be quite accurately described as an attempt to develop further the Humboldtian notion of ‘form of language’ and its implications for cognitive psychology, as will surely be evident to anyone familiar both with Humboldt and with recent work in generative grammar’ (Chomsky, 1964: 9) 

In fact, the term ‘generative grammar’ was coined with Humboldt in mind.

The term ‘generate’ is familiar in the sense here intended in logic, particularly in Post’s theory of combinatorial systems. Furthermore, ‘generate’ seems to be the most appropriate translation for Humboldt’s term erzeugen, which he frequently uses, it seems, in essentially he sense intended here’ (Chomsky, 1965: 9) 

Even Chomsky’s more sympathetic reviewers like Gilbert Harman regarded the connection to Humboldt as tenuous. 

`Chomsky mars his discussion of romanticism by trying to read the whole theory of generative grammar into the musings of Wilhelm von Humboldt’ (Harman, 1968: 233)

Others were less forgiving. Ernest Gellner accused Chomsky of ‘irresponsible ancestor-snatching’ while Eschbach & Trabant criticise ‘Chomsky’s totally aberrant interpretation and unhistorical misuse of Humboldt for ends of his own’. German scholars were particularly keen to adumbate Chomsky’s confusions.

‘Erzeugung hat bei ihm kein mathematisches, dafür ein starkes diachronisches Implikat – das bei Chomsky gerade fehlt’ (Baumann, 1971: 3) 

‘Humboldts `erzeugen’ entspricht nicht Chomskys `generate’ und damit zusammenhängend’ (Weydt, 1972, 259)

‘Leider ist dieser Irrtum nicht auf sich selbst begrenzt, sondern,  er hat erheblich dazu beigetragen dass ein bild von der geschichte der linguistik das nicht mit der wirklichkeit übereinstimmt.’  (Ibid)

‘Chomskys Humboldt-Bild ist jedoch nicht nur als sachlich falsch zurückzuwei­ sen. Vielmehr wird seine szientifische Reduktion wesentlich komplexerer tradi­ tioneller Theorieansätze zur Natur der Sprache darüber hinaus auch in ideologi­ scher Hinsicht als symptomatisch verstanden, insofern sie herrschende Sprach- und Sprachwissenschafts-Modelle wie das der GTG durch ihre unhermeneutischen Abbiendungen zusätzlich gegen Infragestellungen zu immunisieren geeignetscheint.’  (Scharf, 1983: 235)

Hans Aarsleff sums up his review of Cartesian Linguistics this way:

 ‘I must conclude with the firm belief that I do not see that anything at all useful can be salvaged from Chomsky’s version of the history of linguistics. That version is fundamentally false from beginning to end-because the scholarship is poor, because the texts have not been read, because the arguments have not been understood, because the secondary literature that might have been helpful has been left aside or unread, even when referred to.’  (Aarsleff, 1970: 583) 

Now I don’t think that such ‘false’ histories are necessarily useless. There can be considerable value in giving ‘rational reconstructions’ of a discipline, identifying ideas in their nascency and tracing their development over time. The stories we tell when we do this will be biased and likely anachronistic, but if we want to understand the past, one might have to boost the signal in the noise. Nevertheless, if the reviews are anything to go by, Chomsky’s account of the history of linguistics is so wrong as to be actively harmful. Whole books have been dedicated to how detailing the problems with Chomsky’s reading of Humboldt (e.g. Sharf’s Chomskys Humboldt-Interpretation). This raises the question, how did Chomsky get it so wrong? 

Rather than trying to do justice to this literature, we’ll confine our focus to a single issue, Humboldt’s notion of the form of language. 

While it can be difficult to pin down exactly what Humboldt means by this, he does make it clear that when discussing form, `we are talking, not of language as such, but of the various different peoples, so that it is also a matter of defining what is meant by one particular language, in contrast, on the one hand, to the linguistic family, and on the other to a dialect, and what we to understand by one language, where it undergoes essential changes during its career. Language, regarded in its real nature, is an enduring thing, and at every moment a transitory one…The concept of form does not as such exclude anything factual and individual; everything to be actually established on historical grounds only, together with the most individual features, is in fact comprehended and included in this concept’ (Humboldt, 49).

Whatever Humboldt means by form, it is something historical and concerns peoples: `it is the quite individual urge whereby a nation gives validity to thought and feeling in language’ (Ibid, 50). 

‘Through exhibiting the form we must perceive the specific course which the language, and with it the nation it belongs to, has hit upon for the expression of thought. We must be able to see how it relates to other languages, not only in the particular goals prescribed to it, but also in its reverse effect upon the mental activity of the nation’ (Ibid, 52).

How does Chomsky interpret this? 

‘In developing the notion of ‘form of language’ as a generative principle, fixed and unchanging, determining the scope and providing the means for the unbounded set of individual ‘creative’ acts that constitute normal language use, Humboldt makes an original and significant contribution to linguistic theory – a contribution that unfortunately remained unrecognized and unexploited until fairly recently’ (Chomsky, 1966: 71)

‘Humboldt’s effort to reveal the organic form of language – the generative system of rules and principles that determines each of its isolated elements – had little impact on modern linguistics…’(Chomsky, 1966: 74)

What’s going on here? Humboldt seems to be talking about a diachronic principle of linguistic organisation while Chomsky is talking about a generative system of rules. Humboldt’s form is something that exists at the level of the ‘nation’ while Chomsky is concerned with internal mental processes. How did this happen? 

While Chomsky’s reading of Humboldt differs from pretty much every serious scholar of the history of linguistics, there is one person whose ideas it does agree with, Ernst Cassirer. 

Cassirer was the founder of the Marburg school of neo-Kantianism and in The Philosophy of Symbolic Forms which Chomsky cites as a source for his understanding of the philosophy of language of German romanticism, he presents a highly Kantian version of Humboldt.

For Cassirer, Humboldt’s form of language has more in common with Kant’s forms of intuition than with any diachronic, empirical phenomenon. 

‘This distinction, the differentiation of matter and form, which dominates Humboldt’s general view, is also rooted in Kantian thought…The unity of form is the synthetic unity in which the unity of the object is grounded…In order to characterize this form of conjunction, grounded in the transcendental subject and its spontaneity, yet strictly “objective,” because necessary and universally valid, Kant himself had invoked the unity of judgment and so indirectly that of the sentence…. Humboldt’s concept of form extends what is here said of a single linguistic term to the whole of language’ (Cassirer, 1955: 161)

Cassirer’s Humboldt takes the form of language to be the subject-internal conditions of the possibility of linguistic experience; the internal relations which ground the objectivity of thought. For Cassirer’s Humboldt ’objectification in thought must come about through objectification in the sounds of language’ (Cassirer, 1923/2013: 117) and, just as the forms of intuition, space and time, make the objectification of physical objects possible for Kant, it is the form of language which makes this possible for Humboldt.  

All I have been trying to suggest here is that Chomsky’s understanding of Humboldt has its roots in Cassirer’s writing. I am not arguing (here at least) that Chomsky is a crypto-Kantian. Chomsky is rightly seen as the founder of the so-called cognitive revolution in psychology. However, this revolution was at least in part a return to the principles of Kant’s Copernican Revolution. In contrast to the structuralism and logical empiricism of the mid-twentieth-century, Chomsky’s great idea was that the source of the structure of linguistic phenomena was to be sought not in the data of the external world but in the mind itself. Understanding the relation of these ideas to their Kantian antecedents, in other words performing rational reconstructions can help us go some way to understanding not just how we ended up with the theoretical assumptions we have but why our theorising matters in the first place.  


Old Notes

When going through some old folders the other day I found some notes for a paper which I had been working on several years ago. Its purpose was to give a quick overview of some of the ideas and theorems that connect formal language theory, model theory, automata theory and abstract algebra but written specifically for a philosophical audience. While most philosophers have a basic training in model and proof theory, automata and grammars are much less known or discussed (a notable and praiseworthy exception to this is Robert Brandom’s discussion of the Chomsky Hierarchy in chapter two of Between Saying and Doing). I still think that the tools of automata theory are at least as important for a philosophical education as the proof theoretic techniques typically covered in an introductory logic course. Particularly, I think that these different fields provide different tools for thinking about the same structures in a way that is philosophically interesting. Anyway, work on my dissertation got in the way of completing this but I thought that some of the more hand-waiving, speculative parts might be of value if launched into the aether. It was never meant for publication anyway. I intend to provide more worked examples in future drafts.  


A simple program for language learners

A tedious part of reading a book in a new language is having to jot down all the new words you’re encountering. My normal method is to read a couple of pages writing down words I don’t know and trying to get the gist of what’s happening. Then I’ll look up the words I’ve written down, learn them, and reread those pages. This isn’t a very efficient way to do things and undermines the experience of reading a book for the first time. Ideally, you’d  know in advance which new words you’ll have to learn before you start reading. The problem is that unless you fork out on special learners’ editions of books, you can’t really do this. This is silly since most of the books I want to read are classics and on anyway.

To help with this, I’ve written a short and simple python program which allows you to scan the words in a .txt file, separate them into words you know and words you don’t, and then convert the list of words you don’t know into a .csv file which can be imported to an Anki deck. Here’s a picture of the code (You’ll need to import collections, os, itertools, csv and io) :
Screen Shot 2017-03-02 at 14.05.49.png

Here’s an explanation.

The program gives you the option of sorting the words by either their frequency (most common words come first) or their order of appearance in the text. Both are reasonable approaches and your choice will probably depend on how much time you have. The first two functions will strip punctuation and capitalisation from the text and output strings sorted according to your desires.

The program will then present each word to you individually and you can  respond by pressing ‘y’ or ‘n’ depending on whether or not you know the word already. If you get bored or need to stop early, you can type ‘quit’. You’ll then have the option of putting the words you don’t know into a python dictionary and/or a csv file. The file will be output to your desktop.

Once you have the csv file, you can import it to an Anki deck and fill in the meanings of the words you don’t know.

Here’s an example of how it can be used.

Go to project gutenberg and copy and paste ‘A Côté de Chez Swann’ to a .txt file on your desktop.  Run the program and sort the obvious words (‘longtemp’, ‘de’, ‘yeux’) from the words you might not know (‘sifflement’). Make an Anki deck of the words you don’t know and look them up in the dictionary (‘sifflement’ means whistling). You can then learn the words you’ll need to know to read a chapter without having to read the whole chapter  (I might write another post on Zipfian distributions which could help us determine exactly how much more efficient this method is than the reading and note-taking method.)

I’ve copy and pasted the code below. I can’t claim that it’s particularly elegant Feel free to use it or improve it. Most of the trouble I had when writing it concerned maintaining accents in UTF-8. The code represents a personal milestone in that it is the first labour-saving program I have written which was not more labour intensive to produce than the sum of labour saved.

Update:Friday 3rd March

Here’s the obvious extension of the program. This will automatically translate the words you don’t know into English and store them in your dictionary. You can then use these translations to build your Anki deck. It uses goslate – a free google translate API. Google have recently updated google translate to prevent this from working, however, if you switch around your VPN when you start getting HTTP ERROR 503 it should still work. You can download the source code for goslate here: . If you are using the program to learn a language, I’d recommend using a proper dictionary and not google translate.

import collections
import os
import itertools
import csv
import io
x = open (os.path.expanduser(“~/Desktop/Text.txt”), ‘rt’, encoding=’utf8′)
def prepfreq(l):
with l as x:
puncstrip =“,<>?&^%$#@!.:;”’?/()”, “”)
low = puncstrip.lower()
split = low.split()
count = collections.Counter(split)
words = sorted(count, key = count.get, reverse = True)
return words
def prepapp(l):
with l as x:
puncstrip =“,<>?&^%$#@!.:;”’?/()”, “”)
low = puncstrip.lower()
slow = low.split()
words = []
[words.append(x) for x in slow if x not in words]
return words
def call(x):
answer = input(“Would you like to sort the words by frequency or appearance?: “)
if answer == ‘frequency’:
list = prepfreq(x)
elif answer == ‘appearance’:
list = prepapp(x)
klist = []
nlist = []
start = 0
i = 0
while i < len(list):
inp = input(list[start] + ‘:’)
if inp == ‘y’:
elif inp == ‘n’:
elif inp == ‘quit’:
i = len(list)
start = start + 1
print(‘All sorted. You know:’)
print(‘You dont know:’)
dicto = input(“Would you like this as a dictionary?  “)
if dicto == ‘yes’:
dictionary = dict(zip(nlist, [‘Translation’]*len(nlist)))
csv  = input(“Would you like this as a csv file?  “)
if csv == ‘yes’:
dictionary = dict(zip(nlist, [‘Translation’]*len(nlist)))
return convert(nlist, os.path.expanduser(“~/Desktop/words.csv”))
def convert(data, path):
with open (path, ‘w’, newline=”, encoding=’utf-8′) as output:
writer = csv.writer(output, data)
for word in data:

For translations switch the bottom 13 lines to:

    translation = goslate.Goslate()
tlist = translation.translate(nlist, ‘en’)
dictionary = dict(zip(nlist, tlist))
if dicto == ‘yes’:
csv  = input(“Would you like this as a csv file?  “)
if csv == ‘yes’:
return convert(dictionary, os.path.expanduser(“~/Desktop/words.csv”))
def convert(data, path):
with open (path, ‘w’, newline=”, encoding=’utf-8′) as output:
writer = csv.writer(output, data)
translations = zip(data.items())
for word in translations: