This post was prompted by Noam Chomsky’s personal reflections on the history of the last 70 years of linguistics (Chomsky 2021). It’s a nice piece if – like me – you’re into that kind of thing but it repeats what I take to be a misrepresentation of Zellig Harris’s work. I’ll get to the exact quote in a bit but I should say something about why I find myself interested. I will also avoid saying anything about Harris’s ideas on ‘metalanguage’ which I think were integral to how he thought about language. To even try would make this post interminably long.
Zellig Harris’s work in linguistics has two important features that make it relevant to current thought. The first is that he defines syntactic categories distributionally. The fact that an expression is a noun or adjective is not grounded by its possession of a syntactic property (e.g., [+N, -V]) but by its membership of a class, the class of expressions with which it can be substituted to produce another grammatical sentence. What makes ‘cat’ and ‘object’ nouns is nothing more than the fact that that they can appear in the same position in sentences. This approach is extensional as categories are defined by their members and holist as syntactic classes must be defined in terms of the whole language considered as a unified system.
The second strand of Harris’s work was an emphasis on what he took to be the probabilistic nature of syntax. From the late 1960s onwards, he argued that the relations that hold between constituents in a syntactic structure were probabilistic relations such that the occurrence of one increases the probability that the other will occur. There is no abstract structural relationship between them projected by a competence grammar existing independent of performance. There are merely relations of co-occurrence in usage.
These approaches are often seen to contrast with traditional generative models which treat categories as features lexical items possess independently of each other and which treat syntactic structure as an abstract relation between lexical items reflecting how a grammar represents an agent’s knowledge of language.
The arguments against Harris’s methods are simple enough. While ‘cat’ and ‘object’ can both occur in the frame ‘there is a __ on the table’, ‘object’ can also occur in the frame ‘I __ to what you are saying’ while ‘cat’ cannot. When we actually consider the complexity and ambiguity of natural languages, the distributional method appears hopeless. The case against probabilistic methods is also straightforward. Consider the following sentences:
“(1) Colourless green ideas sleep furiously.
(2) Furiously sleep ideas green colourless.
It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally ‘remote’ from English. Yet (1), though nonsensical, is grammatical, while (2) is not” (Chomsky, 1957).
With such simple refutations, it’s no wonder these were not pursued as major lines of inquiry. Nevertheless, both approaches have seen a resurgence within linguistics (computational linguistics, cognitive science etc.). Clark and others have shown how distributional methods can overcome many of the problems identified in LSLT such that a large class of languages are learnable (in a formally precise sense) (see Clark, 2015 for a mathsy overview). Meanwhile, the use of probabilistic ideas in syntax is even more widespread. To pick just one example, Periera has shown that with even a relatively simple aggregate bigram model, the probability of (1) occurring in contrast to (2) is about 2 x 105. That is p(1)/p(2) = 2 x 105. The moral seems to be, simple models can be easily refuted by a priori reasoning, complex models are harder to handle.
I should make it clear that I don’t have a cat in this café. I don’t find either position intrinsically more intuitive. My main interest is in clarifying certain philosophical claims that have occurred alongside the development of these formal models. Which brings us back to Harris.
Harris in Context
I suspect that for many people, the 1975 preface to The Logical Structure of Linguistic Theory [LSLT] contains much of what they know about the work of Zellig Harris. The Harris one finds there is a radically pluralist antirealist. ‘In his view, there are no ‘competing theories’ and ‘pitting of one linguistic tool against another’ is senseless. Alternative theories are equally valid, as alternative procedures of analysis are equally valid’ (Chomsky, 1975: 38). This Harris was unconcerned with the empirical reality of linguistic theories or with their explanatory adequacy and had no interest in the psychological basis of our knowledge of language. This is in contrast to the realist attitude of generative grammar which seeks to discern the genuine structures of natural languages and their psychological basis. This is the presentation of Harris’s work which has been repeated.
“Retrospectively, Harris (1965) took a still stronger stand: There are no “competing theories”; “pitting of one linguistic tool against another” is senseless, an “aberration” with sociological roots. Alternative procedures of analysis can be applied “as a basis for a description of the whole language,” bringing out its various properties in different but not competing ways” (Chomsky, 2021: 4)
I’m going to argue that this account of Harris’s work is not wholly accurate, that these quotes are taken out of context, and that Harris was not proposing antirealism.
First though, let’s see the case against Harris. It’s easy to read some of Harris’s writing as expressing straight-forward a commitment to mid-century positivism. He rejected earlier grammarians inventories of primitive categories on the grounds that ‘[t]he danger of using such undefined and intuitive criteria as pattern, symbol, and logical a prioris, is that linguistics is precisely the one empirical field which may enable us to derive definitions of these intuitive fundamental relationships out of correlations of observable phenomena’ (Harris, 1940: 228). He contrasted the linguist’s task of discovering novel structures in natural language from those which are ‘already built into the system’ in mathematics and logic (Harris, 1952) while suggesting that the primitives of these systems are likely to be based upon elements of the languages of the systems’ creators (Harris, 1951: 303). The idea that linguistics shouldn’t assume a prior set of categories was fundamental to Harris’s view of language (Harris, 1960).
According to the positivist reading, Harris’s use of distributional analysis can be seen as an attempt to extensionally reduce the vocabulary of grammars to observable data just as the caricature-version of a logical positivist tries to show how the empirical content of scientific theories can be reduced to protocol sentences. He even uses Bloomfield’s word ‘report’, which the latter had presented as a translation of Protokollsatz. However, it’s one thing to argue that there is no a priori set of categories which we can appeal to when describing a language, it’s another to claim that all sets of categories are equivalent and devoid of psychological significance.
Harris didn’t take linguistics to be a subfield of psychology but he wasn’t indifferent to psychological considerations. In Co-Occurrence and Transformation in Linguistic Structure he proposed that ‘[t]here is also some reason to think that the kernels may function differently in memory and thought from the transformations’ (Harris, 1957: 339). This claim would be unintelligible if we read him, as Chomsky proposes, as viewing transformations as a mere means of arranging data rather than as a property of languages themselves. His opposition to the conflation of linguistics and psychology arose because he was unwilling to invoke psychological notions to define the primitive terms of linguistic theory (Harris, 1940: 225). Claims which seem to support the pluralist/antirealist reading make much more sense when understood this way. For example, consider the following claim:
“Any psychological or sociological interpretation of language is permissible (and by the same token every one is irrelevant) so long as it does not conflict with the results of linguistic investigation; which of them is desirable can only be decided in terms of the other sciences.”
This might be taken as an indication that any kind of psychological theory might be introduced to explain linguistic phenomena. However, when we understand it in context, the inadequacy of this reading is clear. Harris is responding directly to the use of psychological arguments in Gray’s Foundations of Language.
“Psychological explanations are often circular: ‘The earliest stages of IE [Proto-Indo-European] had no future, but as need arose to express future time and, consequently, to denote such a tense, a number of devices were adopted’ (20); the tense is there because they had need of it, and the proof that they had need of it is that the tense is there”
He criticises Gray for making ad hoc speculations about the psychological processes underlying pejoration and semantic drift. Positing a psychological process can be an easy way out of explaining a feature of language but risks ‘explaining’ something well-defined and observable with something poorly defined and unobservable. In this context, we can understand Harris’s claim to be that, theorists can posit psychological explanations for language change but these claims must be tested against the claims of other fields. They do not stand alone as explanations. Later in his career, he connected his system of grammar to claims about learnability and language evolution: ‘It is in this way that the structure of a language can be conformed to even without the speakers explicitly knowing the grammar’ (Harris, 1989).
Now, what about the quotes Chomsky identifies? They come from the paper Transformational Theory published in Language in 1965. Since this section seems to have been used as the primary textual source for claims about Harris’s radical pluralism, it’s worth quoting it in full. They immediately follow the distinction between string, constituent, and transformational analyses.
“To interrelate these [transformational, string, constituent] analyses, it is necessary to understand that these are not competing theories, but rather complement each other in the description of sentences.5 It is not that grammar is one or another of these analyses, but that sentences exhibit simultaneously all of these properties. Indeed one can devise modifications of languages, say of English, which lack one property while retaining the others; but the result is not structurally the same as the original language.” And the footnote: “The pitting of one linguistic tool against another has in it something of the absolutist postwar temper of social institutions, but is not required by the character and range of these tools of analysis.”
Again we find that context shows that Harris is not saying that there are no ‘competing theories’ which would be an incredibly strong philosophical claim but that string analysis, constituent analysis, and transformational analysis are not competing theories and that they complement each other because a sentence exhibits all these structures at once. At the same time Harris was making this claim, Chomsky was describing the difference between deep structure and surface structure. In much the same way, we can affirm that these are not competing theories of linguistic structure while also holding that they are distinct structures that inhere within languages.
None of this makes sense if these analyses are simply ways of systematising data rather than describing something which is present in linguistic structure.
Harris’s career spanned sixty years and his ideas did change. His work from the 60s onward increasingly claimed to be influenced by the idea that natural languages have no independent metalanguage and the claim that this introduced significant constraints on how it should be described and explained. I think it is impossible to read texts like A Theory of Language and Information and come to the conclusion that he was any kind of antirealist or that he thought that theoretical approaches were equivalent. I have tried to show here that we don’t even have to consider this material to draw this conclusion.
I think the narratives we tell about the history of a field matter. It’s easy to fall in to caricature, to say, look at the narrow-minded technologists with their neural networks and machine learning, look at how they are repeating the philosophical mistakes of the behaviourists of the 1950s, and the positivists of the 1920s (and the empiricists of the 1690s etc.) All prediction and no explanation. And when this is a natural way to think, it can be valuable to see that the connection between broad philosophical positions and theoretical methods are more complicated than these narrative suggest. At least in the case of Harris.
Some of the texts mentioned
Chomsky, N. 1955: The logical structure of linguistic theory. Ms., Harvard/MIT. [Published in part, 1975, New York: Plenum.]
Chomsky, N. 1957 Syntactic structures. The Hague: Mouton.
Chomsky, N. 2021 Linguistics Then and Now: Some Personal Reflections Annu. Rev. Linguist. 2021. 7:1–11
Clark, A. 2015: The syntactic concept lattice. Journal of Logic and Computation Vol. 25, Issue 5
Harris Z. 1940: Review of Foundations of Language by Louis H. Gray Language, Vol. 16, No. 3, . 216-235
Harris Z. 1952: Discourse Analysis. Language, Vol. 28, No. 1, . 1-30
Harris Z. 1954: Transfer Grammar. International Journal of American Linguistics, Vol. 20, No. 4, . 259- 270
Harris Z. 1955: From Phoneme to Morpheme. Language, Vol. 31, No. 2 . 190-222
Harris Z. 1957: Co-Occurrence and Transformation in Linguistic Structure. Language, Vol. 33, No. 3, Part 1 . 283-340
Harris Z. 1959: The Transformational Model of Language Structure. Anthropological Linguistics 11:27-3
Pereira F. 2000. Formal grammar and information theory: together again? Phil. Trans. R. Soc. Lond. A 358, 1239-1253