VM3a: Einführung
in der
Sprachtypologie

Contact:
vicente@uni-potsdam.de
Office:
Haus 35:1.03

Next important date:
July 12: Final exam
(worth 70% of the grade)

List of classes
Class 1 (April 12)
Class 2 (April 19)
Class 3 (April 26)
Class 4 (May 3)
Class 5 (May 10)
Class 6 (May 23)
Class 7 (May 31)
Midterm (June 7)
Class 8 (June 14)
Class 9 (June 21)
Class 10 (June 28)
Class 11 (July 5)
Final exam (July 12)

Class 11: July 5

Topic: Writing systems.

Reading: Rogers, Henry. 2005. Writing systems: a linguistic approach. Blackwell: Oxford (warning: 220MB file!).

Additional reading: Moser, David. Why Chinese is so damn hard

Slides: download them here.

Summary of the class: The core notion of this class is that of a script. A script is a finite set of symbols that can be used to write a language, and thus we speak of the Latin script, the Cyrillic script, the Chinese script, etc. Wikipedia has an extensive list of different scripts grouped by type. You might also want to check the website of the Unicode Consortium, the organization that developed and maintains the Unicode Standard for the representation of text in software products. Unfortunately, the University Library only has a copy of an old version (2.0) of the Unicode Standard book, and it's in Babelsberg. StaBi in Potsdamer Stra&eszet;e has a copy of the 4.0 version. The current one is 6.0.

Scripts can be divided into four categories.

Additionally, each script has a number of components. The most basic one is grapheme, which can be defined as the smallest distinctive unit of the script. A grapheme may have one or more allographs, in the same way that phonemes may have allophones. For example, a, a, and A are allographs of the same grapheme. Allography plays an important role in the Arabic script, where each grapheme is realized as a specific allograph depending on its position within the word. The combination of a grapheme and a diacritic forms a complex grapheme. Graphemes can be written independently of each other or joined as a single graphical unit. In the later case, we talk about ligatures. A non-structural ligature is used only for aesthetic effect (or as a mark of good typesetting) and has no effect in pronunciation. For instance, many high quality publications typeset "fi" and "fl" sequences as one graphical unit, with the roof of the "f" joining either the dot of the "i" or the top of the "l". As opposed to this, there are structural ligatures, which correspond to a totally distinct grapheme. For instance, Icelandic has the ligated grapheme æ, which is distinct from ae.

Another issue is the writing direction. The most basic distinction is that of left-to-right (LTR) scripts vs. right-to-left (RTL) ones. This distinction can be crossed with the distiction of whether scripts are written horizontally or vertically to give a four-way classification.

Class 10: June 26

Topic: Sign languages.

Reading: The following two books contain plenty of information about sign languages.

Slides: download them here

Summary of the class: One of the most important statements that one can make about sign languages is that they are real languages in the same way as spoken languages. The only thing that changes is the channel of communication (from visual to auditory). Other than that, sign languages exhibit many of the characteristics of spoken languages.

Take, for instance, the fact that the units of sound and meaning in spoken languages (phonemes and morphemes) are composed of several independent features, each with a variety of values. For instance, a good description of a phoneme requires detailing its place of articulation, manner of articulation, voicing, length, etc. Similarly, a sign can vary across the following dimensions (usually, location and motion are performed in sequence, whereas handshapes and non-manual markings may spread over a stretch of locations and motions). Additionally, not any possible combination of these factors is licit, in the same way that not all potentially pronounceable phonemes are actually found in spoken languages (e.g., there are no languages with bilabial approximants or bilabial taps).

Although handshapes and orientations are the most salient features of signs, motions and locations also have their importance. Specifically, they can be used to mark grammatical information, with specific locations reserved for various discourse participants. For instance, the space behind the signer is used for reference to the past, whereas the space in front is used for reference to the future. If a discourse reference (e.g., BOB) is assigned a space, let's say, to the right side of the signer, then the signer will point to that particular space any time he or she wants to refer to BOB. In many sign languages, including American Sign Language (ASL), this can be done both manually and non-manually. For instance, in order to sign BOB KISS ALICE, one would done either one of the following. In conclusion, locations in space around the signer's body are a crucial part of the expression of agreement in sign languages. Additionally, non-manual markings have an important role, in that they are used for all the following.

Class 9: June 19

Topic: Agreement.

Slides: download them here

Summary of the class: we are used to agreement systems like those in German, English, and other European languages, in which the finite verb always reflects the agreement features of the grammatical subject. However, this is not the only possibility. Agreement can happen in multiple forms across languages, some of which might look really exotic from our Europe-centered perspective. Here are a few of the agreement patterns that we find around the world (see the corresponding slides for an example of each kind).

Class 8: June 14

Topic: The typology of pronouns.

Reading: Harley, Heidi, and Elizabeth Ritter. 2002. Person and number in pronouns: a feature-geometric analysis. Language 78:482--526

Slides: download them here

Summary of the class: We are used to pronominal systems like the ones in English, German, French, etc., in which we have two numbers (singular and plural) and three persons (1st, 2nd, 3rd). This is not really a big surprise, because this system is the most common one typologically: almost half of the languages in Harley and Ritter's corpus look like this (see slide 7). However, there is quite some variation. In particular, one can have up to four numbers and four persons. The extra person we need comes from splitting 1st into exclusive and inclusive.

In languages without an inclusive/exclusive distinction, a 1st person pronoun is simply required to include the speaker, without specifying whether it includes or excludes the speaker.

The additional two numbers we need to complete the paradigm are dual and paucal. Slides 2 through 6 give you the pronoun systems of various languages that exemplify the cells of the paradigm space. Additionally, slides 8 and 9 give you examples of two languages with reduced pronoun systems, i.e., Piraha, which only has person distinctions, and Winnebago (also called Ho-Chunk), which only has a conversation participant/non-participant distinction.

Feature hierarchies: These pronoun systems are not totally unrestricted. Typological research has uncovered a number of generalizations about them, such as the following. These generalizations have inspired the work of Harley and Ritter, among others, whose core claim is that what we call "plural", "singular", "1st person", etc... are not primitive notions. Rather, they are combinations of more basic notions, illustrated in the diagram on slide 10. The important thing to remember is that the lines in the diagram encode dependency relations ---that is, you can't have a certain node unless all of the higher nodes are present. In words, you can't have a [speaker] node unless you also have a [participant] node, and you can't have a [augmented] node unless you also have a [group] and an [individuation] node. In order to understand how this diagram describes all the features systems that we encounter, you are strongly encouraged to read carefully sections 2.2, 2.3, 2.4, 2.5, 2.6. 2.7, and 2.8 of Harley and Ritter.

Midterm: June 7

You can download the text of the midterm here. The results are here.

Class 7: May 30

Topic: Ergativity and switch reference
Reading: van de Visser, Mario. 2005The marked status of ergativity. Doctoral dissertation, University of Utrecht. (read Chapter 1, section 2 "Ergativity in the languages of the world")

Slides: download them here

Assignment 1: figure out the difference between syntactic and morphological ergativity
Answer: as the terms suggest, the difference is whether a language is ergative at the level of morphology or syntax. A morphologically ergative language has the same syntax as an accusative language; the only thing that changes is the way cases are assigned. In contrast, in a syntactically ergative language, the relation between a verb and its arguments is affected. In a syntactically accusative language, Agents are mapped to the subject position of the verb, whereas Patients are mapped to the object position of the verb. In a syntactically ergative language, in contrast, Agents are mapped to the object position of the verb, and Patients to the subject position. This has important consequences in several domains of grammar, such as coordination of predicates and raising/control sentences (embedded non-finite sentences without an overt subject). See slides 5--10 for illustration.

Assignment 2: try to find languages that go in all four cells of this table:

Syntactically accusative Syntactically ergative
Morphologically accusative German ---
Morphologically ergative Basque Dyirbal

According to Dixon (1994:172), there are no languages that are syntactically ergative and morphologically accusative. Comrie (1984) mentions a small class of nouns from certain Iranian languages, but he also comments that it is not totally clear that these languages constitute an exception.

Switch reference: this is a phenomenon we observe in a number of languages in (primarily) North America, Australia, and Papua-New Guinea, in which a special morpheme is used to indicate whether the subjects of two different clauses refer to the same individual or not. For example, the German sentence:

(1) Wenn sie eintrat, setzte sie sich.

is ambiguous. The two pronouns sie can refer to either the same woman or to two different women. In the languages in question (see slide 24, where the language is Kiowa), the coordinating or subordinating particle comes with a morpheme that indicates whether the reference of the subjects is the same (SS, same subject) or not (DS, different subject). The important features of switch reference are the following.

In addition, there is the phenomenon of non-canonical switch reference, in which it is not the subjects of the clauses that are compared; rather, what is compared as "same" or "different" is something more abstract, like narratives, times, scenes, purposes, etc. (see slides 32-36). This has led some researchers to propose that switch reference morphemes don't compare subjects, but rather topics. Subjects are the default topics of a sentences, which explains why these morphemes often target subjects. However, one can also topicalize "situations", and make them the thing that a sentence is about. In languages like German and English, one can already use pronouns and demonstratives to refer to situations (see slide 38) ---in switch reference languages, switch reference morphemes have taken up this particular role of pronouns and demonstratives. This doesn't mean, though, that switch reference morphemes are pronouns or demonstratives (as we said above, they have no relation to nominal elements). It simply means that they perform one of the functions of pronouns and demonstratives.

Class 6: May 23

Topic: Subjects
Slides: download here.

Summary of the class: how do we know what a "subject" is? Typically, subjects are associated with a certain class of properties, of which the following are the most prominent.

The important word here is typically. Subjects exhibit these properties most of the time, but not always. As shown in slides 5-9, one can find noun phrases that we want to call "subjects" in a non-trivial way, but which fail to exhibit one or more of these properties. For this reason, it is better to define "subject" as "the most prominent argument of a clause". In most languages, "prominence" is defined in structural (syntactic) terms ---i.e., the subject is the argument that occupies the highest position in the structure of the clause. We can determine this property through a variety of tests, such as anaphoric binding: a subject can be the antecedent of a reflexive pronoun in another argument, but other arguments can't be the antecedents of a reflexive pronoun in the subject.

However, not all languages resort to structural prominence to determine what the subject is. In some cases, we find that prominence is defined with respect to semantic and pragmatic scales, which incorporate concepts like "human"/"animate"/"inanimate", or "obviative"/"proximate". In these languages, the highest element in the relevant scale can be considered the "subject".

Class 5: May 10

Topic: Typology of relative clauses
Reading: de Vries, Mark. 2002. The syntax of relativization. Doctoral dissertation, University of Amsterdam. (read only Chapter 2)
Slides: download here.

Summary of the class: I could write many things here, but in reality the best thing you can do is go and read selected sections of Mark de Vries's dissertation, where he explains all the important concepts with great clarity. Specifically, you should read chapter 2, sections 2 ("Overview"), 5 ("Downing's universals and general implications"), and 6 ("Special types of relative clauses"). From chapter 5 ("Relative elements") you should also read section 3.2 ("Resumptive pronouns").

I also recommend having a quick look at Keenan and Comrie's article "Noun Phrase Accessibility and Universal Grammar", where they provide many examples of languages that obey the hierarchy of relativization that I reproduce on slide 15.

Class 4: May 3

Topic: Typology of wh- questions
Reading: Rudin, Catherine. 1988. Multiple questions and multiple wh- fronting. Natural Language and Linguistic Theory 6:445--501. (read only pages 461--475)
Slides: download here.

Summary of the class: Today, we begin looking at how languages can be divided into different classes according to their behavior in a specific construction. The construction we choose for today is wh- questions, where languages can be divided into wh- fronting (itself divided into single and multiple fronting), wh- in situ, clefting, and wh- postposing. See slides 1-7. Pay also special attention to pages 461--475 of Rudin's article to understand the subclassifications of multiple wh- fronting languages. The two subclasses are the clustering and non-clustering languages, which differ on whether other kinds of phrases (e.g., clitics or parentheticals) is allowed to break the string of multiple wh- words. Only non-clustering languages allow this option; in clustering languages, clitics and parentheticals necessarily appear after all of the wh- words that have been fronted.

(if you are interested in the question of why some languages are wh- fronting whereas others are wh- in situ, I recommend reading Norvin Richards' article Beyond Strength and Weakness. Be warned, though, that Richards uses syntactic and phonological ideas that are more complicated than what you have been exposed to so far in your other syntax classes, so you might not understand the proposal completely).

Islands: a very interesting property shared by all wh- fronting languages is the phenomenon of islands (wh- in situ languages also have islands, but they are more difficult to detect, so we will not say anything about them). Islands can be defined as follows: John Ross (in a dissertation that you should at least have a look at some day, "Constraints on variables in syntax") noticed that there are two kinds of syntactic rules, which he called construal rules and movement rules. The difference is that movement rules are constrained to apply within certain enviroments ("islands") whereas construal rules can apply across islands (see slide 8). Wh- questions is the prototypical movement rule that cannot cross islands. Furthermore, islands subdivide into strong islands, which block all sorts of movement, and weak islands, which only block movement of adjuncts (when, where, why, how), but not of arguments. See slide 10 for a partial list of which environments constitute weak and strong islands.

Besides blocking different kinds of wh- words, strong and weak islands also differ on how they behave under the kind of deletion known as sluicing (see slide 11). The difference is that sluicing eliminates strong islands, but it doesn't eliminate weak islands. You can see examples of this difference in chapter 3 of Jason Merchant's dissertation "The syntax of silence" and especially Uli Sauerland's short article "Guess how?".

Wh- in situ and wh- postposing languages: for a number of years, the belief was that wh- postposing languages did not exist. However, more recent research has revealed that wh- postposing languages do exist, although they form a peculiar group ---i.e., sign languages. As far as we know, there is no spoken language that is uncontroversially wh- postposing. Some African languages, like Kikuyu, have something that looks like wh- postposing, but it is not clear that this is really wh- postposing.

As for wh- in situ languages, one of the most interesting generalizations about them is the one uncovered in Lisa Cheng's dissertation "On the typology of wh- questions", where she noticed that wh- in situ languages, and only wh- in situ languages, always have a special mark for yes/no questions, which can be a sentence final particle, a special type of agreement, and other things. See slide 15 for a partial list of these possibilities. Another interesting generalization about wh- in situ languages is the observation by Emmon Bach that languages with clause-final complementizers are also wh- in situ languages.

Class 3: April 26

Topic: Universals of language: the VO/OV contrast
Reading: Dryer, Matthew. 1992. The Greenbergian Word Order Correlations. Language 68:81--138.
Additional reading: Chomsky, Noam. 1959. A review of B.F. Skinner's Verbal Behavior (optional).
Slides: download here.

Quiz (slide 1): this language (Basque) is clearly an ergative language, as you can see many ERG(ative) and ABS(olutive) morphemes. It also (mostly) an agglutinative language, as each morpheme corresponds to a single piece of grammatical information. Finally, it is also a verb final language (that is, either SOV or OSV ---the right answer is SOV, but you can't tell this from the text), as all the sentences have the verb in the last position.

Summary of the class: For a detailed description of the languages of America and Oceania, please refer to the relevant chapters of Pereltsvaig 2012. In previous weeks, we have seen that there are a number of large language families (e.g., Austronesian, Papua-New Guinea, Indo-European, etc). The question is whether this is the upper limit of language classification. Is there a larger family that encompasses all or most of these? Some people believe so, but the evidence remains scarce (some would say even non-existent). For instance, Illyc-Svityc and Dolgopolsky have proposed the Nostratic family, which comprises many of the languages families of Europe, Asia, Africa, and North America. Similarly, Greenberg has proposed the Eurasiatic family, which partially overlaps with the Nostratic family. The problem with these hypotheses is that they are based on ideas about sound change and grammar change that have to extrapolated beyond what people are used to. So, for the purposes of this course, we will treat Afro-Asiatic, Indo-European, and the rest as the largest families.

While it is difficult to go beyond these families, many people accept that all languages in the world have the same underlying structure. This idea comes partly from psycholinguistic studies (see, for instance, Neil Smith's work on language savants and the learning of artificial languages), and partly from typological studies. Specifically to the latter, and starting with the work of Greenberg, it became evident that many properties of language are invariant across languages and language families, even totally unrelated ones. These are what we call universals and tendencies, which can be cross-classified with respect to whether they are implicational or non-implicational. See slides 6 through 14 for explanation, as well as the summary from Class 1. Although Greenberg was the first one to provide a list of generalizations, later researchers have uncovered many more ---see slides 15 to 17 for some examples, as well as Matthew Dryer's article "The Greenbergian Word Order Correlations". This article is interesting in that it discusses how certain word order correlations carry over from one domain to others. For instance, many languages with Verb-Object word order also have Noun-Relative Clause, Noun-Genitive, Verb-Prepositional Phrase, and so on (see slide 20). On the other hand, many languages with Object-Verb order have the opposite orderings, i.e., Relative Clause-Noun, Genitive-Noun, Prepositional Phrase-Noun, etc. There are exceptions, but these correlations hold over a much larger percentage of languages than we would expect from chance alone.

There are two major kinds of attempts at explaining why these correlations hold ---namely, the formalist and the functionalist theories. The formalist theories, developed mainly by Chomsky and his colleagues and students, treat language as a formal system like mathematics or computer programming. Therefore, these correlations can be stated as rules over an abstract structure that (by hypothesis) underlies all the individual languages. This abstract structure is called Universal Grammar, and here terminology gets slightly confusing. On the one hand, there is Universal Grammar as a concept (i.e., the idea that all languages come from the same source and then there is Universal Grammar as a specific theory of this hypothesis (for instance, Chomsky's theory, although there are more). For today, we will concentrate on Chomsky's theory of Universal Grammar, which is called Principles and Parameters (and, more recently, Minimalism). "Principles" are parts of language that are invariant, whereas "parameters" are the parts that can vary among languages (typically only in a limited number of ways). For instance, the word order correlations are reduced to a "Head directionality parameter", which states that some languages order heads (Verbs, Nouns...) before complements (PPs, Relative Clauses), whereas other languages do it the other way around.

In contrast, functionalist theories don't necessarily require that there is an underlying structure to all languages ---all that they require is that there is an underlying structure to all of human cognition, which is a much wider domain. For instance, John Hawkins has argued that the word order correlations don't reflect a Head Directionality Parameter; rather, they reflect properties of human sentence comprehension. He argues that some word orders are easy to comprehend than others, and the correlations that we see correspond to the orders that are the easiest to process.

Class 2: April 19

Topic: Language classification criteria; languages of Africa and Asia.
Reading: Pereltsvaig, Aysa. 2012. Languages of the World. Cambridge: Cambridge University Press (chapters 5-7).
Slides: download here.

Summary of the class:

1. Language classification criteria

Languages can be classified into different groups according to a number of different factors. It is important that you become familiar with some of this terminology, so that you can understand what someone means when they say that a certain language is (for example), "dominant SOV, free constituent order, split ergative".

1.1 Word order criteria

Last week, we saw that languages can be classified according to their dominant word order as SOV, SVO, etc. As important as knowing the dominant order of a language is knowing to what extent it allows deviations from that order. In this respect we can establish four categories.

  1. Rigid order: languages of this class include English and Mandarin. They are characterized by the fact that it is really difficult to deviate from the established word order. For instance, English allows some degree of object fronting (in sentences such as "this cake, I like"), but it is not as productive as similar manipulations in German.

  2. Semi-rigid (or semi-free) order: these are languages like German or Spanish. Order can deviate to some extent from the default one, but some constraints need to be obeyed no matter what. For example, German main clauses always require the verb to be in the second position, even though there are no restrictions on what can be in the first position.

  3. Free constituent (or phrase) order: in languages like these (e.g., Basque or Russian), each of the six logically possible combinations of Subject, Verb, and Object is possible. Each different order typically has a different kind of emphasis, but the truth-conditional meaning remains the same. Similarly, different word orders don't affect any of the inflectional morphology. As an example, here are the six possible orders of the Basque sentence the boy has read the book (where the verbal complex irakurri zuen forms an indivisible unit).

    (1) a. Mutilakliburuairakurri zuen
    boy.ergbook.absread has
    b. Mutilakirakurri zuenliburua
    c. irakurri zuenmutilakliburua
    d. irakurri zuenliburuamutilak
    e. Liburuairakurri zuenmutilak
    f. Liburuamutilakirakurri zuen

  4. Free word order: in languages like Basque or Russian, constituents can be more or less freely rearranged with each other, but they (usually) cannot be split. That's why we speak of free constituent order. In contrast, many Australian aboriginal languages display true free word order, where constituents can be routinely split. For instance, the language Warlpiri permits splitting the subject big dog into two separate parts.

    (2) Maliki-rliji-yarlku-rnuwiri-ngki
    dog.erg1sg.bite.pastbig.erg
    "The big dog bit me"

1.2 Morphological criteria

Languages can be classified into two different groups with respect to how they realize functional information (tense, person, number...).
  1. Isolating/analytic languages: isolating languages have few bound morphemes (technically, they have a low morpheme-to-word ratio). Tense, agreement, and other functional information is conveyed through free morphemes. A good example of an isolating language is Mandarin.

    (3) mingtianwodepengyouhuigeiwozuoyigeshengridangao.
    tomorrowIpossfriendfuturegiveImakeoneclassifierchocolatecake.
    "Tomorrow, my friends will make me a chocolate cake"

    Isolating languages are also analytic, meaning that word order is used to mark grammatical relationships between constituents (as opposed to, e.g., case markers). Note that isolation and analyticity are logically independent notions, even though many languages might exhibit them simultaneously.

  2. Synthetic languages: of which there are three kinds.

    1. Agglutinating languages: in these languages, there is a dominant one-to-one ratio between inflectional morphemes and grammatical information ---i.e., any given morpheme expresses a single grammatical property, like tense, aspect, gender, number, case... The paradigmatic agglutinating language is Turkish (those interested in Turkish morphology or agglutinating morphology in general should have a look at Jorge Hankamer's work, especially "Morphological parsing and the lexicon").

      (4) edi-m-inayak-kak-lar-iyok-tu
      cat-mine-offoot-cover-pl-hisnot.was
      "My cat had no shoes"

    2. Inflectional (or fusional) languages: in these languages, a single morpheme can expresses multiple pieces of grammatical information. For instance, in the Latin example below, the suffix -i expresses (i) gender, (ii) number, and (iii) case:

      (5) bon-ivir-i
      good-[nom.masc.pl]man-[nom.masc.pl]
      "Good men"

    3. Polysynthetic languages: in these languages, free morphemes can be freely incorporated inside other words, forming very large morphological units; so large, in fact, that often entire sentences can be expressed with one word. Many of the Native North American languages are polysynthetic, and people interested in these can consult Mark Baker's work. Here is an example from Yup'ik Inuit, a language from Far Eastern Russia.

      (6) tuntu-ssur-qatar-ni-ksaite-ngqiggte-uq.
      reindeer-hunt-future-say-not-again-3.sg-indicative
      "He had not yet said again that he was going to hunt reindeer"
1.3 Case/agreement criteria:

Here we divide languages into the accusative and ergative groups, with certain additional nuances.
  1. Accusative languages: these are the kind of languages you are already used to, namely, European languages. In these languages, subjects are marked with nominative case and objects are marked with accusative case, irrespective of whether the verb is transitive or intransitive.

    (7) a. De-r Mann nimmt de-n Stuhl.
    b. De-r Mann kam.

  2. Ergative languages: in these languages, case distribution works in a different way: absolutive is assigned to the subjects of intransitive verbs and the objects of transitive verbs, whereas ergative is assigned to the subjects of transitive verbs. About 25% of the world's languages are ergative, and the example below comes from Basque (ergative is expressed with the -k suffix, whereas accusative has a zero suffix, notated with an underscore).

    (8) a. Gizona-kaulkia-_hartuzuen
    man.ergchair.abstakehas
    "The man took the chair"
    (9) b. Gizona-_etorrizen
    man.abscomewas
    "The man came"

    A different way of putting it is that accusative languages transitive and intransitive subjects together, and treat objects as a different case; in contrast, ergative languages group intransitive subjects and transitive objects together, and treat transitive subjects as a different case. Ergative languages can be further subdivided into morphologically ergative and syntactically ergative, but we don't have to worry about this distinction just yet.

  3. Split languages: usually referred to as split ergative languages, here we find a combination of both the accusative and the ergative pattern. Which one is used in any given sentence depends on some apparently unrelated factor, such as tense or aspect. For instance, in Hindi, sentences with perfective aspect have an ergative pattern, but sentences with a non-perfective aspect have an accusative pattern.

    (10) larkakitabxaridtahai
    boy.nombookbuy-imperfectivepresentx
    "The boy buys a book"
    (11) larka-nekitabxaridi
    boy.ergbookbuy-perfectivex
    "The boy bought a book"

2. Languages of Africa and Asia

Please refer to this week's reading for details.



Class 1: April 12

Topic: General introduction to typology; the Indo-European family.
Reading: Pereltsvaig, Aysa. 2012. Languages of the World. Cambridge: Cambridge University Press (chapters 1-3).
Slides: download here.

Summary of the class

1. Languages of the world

There are many languages in the world. The most comprehensive list is the one you can find at ethnologue.org, which advertises itself as "an encyclopedic reference work cataloging all of the world's 6 909 known living languages" (see slide 5 for the geographic distribution of these languages). There are a few things to note about this slogan. The first one is that it makes reference to known languages. There might still be languages out there that we don't know about, especially in underexplored areas (e.g., the Amazonian rainforest, where a number of hunter-gatherer tribes still live). Second, it makes reference to living languages. The classification of a language as "living" or "dead" (and the intermediate categories) does not depend on language size: for instance, Faroese only has about 30.000 speakers (the population of the Faroe Islands), but it is as living as Spanish, which has over 1.000 times as many speakers. Rather, the classification depends on how many speakers will be acquiring the language in the foreseeable future.

  1. Active languages: the number of native speakers remains constant or increases with each new generation. It is very likely that, in 100 years from now, there will be as many speakers as there are today.
  2. Endangered languages: the number of native speakers decreases at a more or less constant rate with each new generation. If the trend is not reversed, very few or no native speakers will be left in a 100 years.
  3. Moribund languages: the only native speakers left are adults ---i.e., no children are currently acquiring the language as their mother tongue. When the current generation of speakers dies, no native speakers will remain.
  4. Dead languages: there are no native speakers left. Nonetheless, the language might still be used non-natively in certain environments; for instance, Latin is a dead language by this definition, but it is still used as the official language of the Catholic Church.
  5. Revitalized languages: through coordinated efforts, it is possible to bring back to life a dead language. So far, this has only happened with Hebrew. "Revitilization" can also refer to the process of reversing the decrease in the number of speakers that characterizes an endangered language.
Finally, there is the tricky issue of delimiting what a "language" is. A definition that you might hear every now and then is that two speech communities have different languages if they can't understand each other; otherwise, we talk about two different dialects of the same language. This is problematic, though. For instance, Swedish and Norwegian are very similar to each other, and Swedish and Norwegian speakers can understand each other without too much trouble. In contrast, Mandarin and Cantonese are occasionally considered dialects of Chinese, but the differences between the two of them are as large as the differences between, e.g., Spanish and Italian. The conclusion is that the difference between a "language" and a "dialect" depends on social and political factors, rather than purely linguistic ones. An interesting illustration of this fact is the case of SerboCroatian, which used to be the language of the former Yugoslavia. After the Yugoslavian Wars of the 1990s, SerboCroatian was split into four different languages, corresponding to the independent republics that Yugoslavia itself was split into: Serbian, Croatian, Bosnian, and Montenegrin. Obviously, the languages themselves didn't change. What changed was how we categorized them as dialects or languages on the basis of political factors.

2. Typology: establishing genetic relationships

Typology is the part of linguistics that studies how all of these languages are grouped. For instance, we know that German is closely related to Dutch, and that both German and Dutch are a bit more distantly related to Swedish, which in turn is even more distantly related to Spanish and Italian. Each of the relevant groupings of languages is called a family; in turn, two or more families can be grouped to form a larger family. For instance, German and Dutch belong to the West Germanic family, which in turn is part of the Germanic family, which in turn is part of the Indo-European family. Note that some families only consist of one language: these are called isolated languages. For instance, the Ugric family (part of Finno-Ugric) only contains Hungarian, and we say that Hungarian is an isolated language within the Finno-Ugric family. More extreme are cases like Basque, which cannot be included in any other known family. You are strongly advised to check ethnologue.org for more information on both language families and isolated languages. Note that most language families only consist of a few languages. In fact, there are only 9 language families with more than 100 languages (see slide 13). There is no correlation between belonging to a large family and being a widely spoken language. For instance, the 477 languages of the Trans-New-Guinea family (the 3rd largest) only have a few thousand speakers each; in constrant, the Japonic family consists of only 12 languages, but one of them is Japanese, the 5th most spoken language in the world.

How do we know that two languages belong to the same family? We compare various aspects of these languages (e.g., vocabulary, syntax, phonology...) and try to figure out if there are enough commonalities to suggest a common ancestor. This is a relatively old method: already in the 16th century, the French scholar Joseph Scaliger noted that the Southern European languages had a word for God different from the one in the Northern European languages (see slide 15). From this, he could infer that Italian, Spanish, French, and Portuguese could be grouped together and apart from English, German, Dutch, and Swedish. James Parsons undertook a similar project (but on a larger scale) in the 18th century: he compared the numeral words in several languages, and found them to be strikingly similar in many languages of Europe and Southern Asia (see slides 17 and 18). From this, he hypothesized the existence of the Indo-European family, which extended from Portugal all the way to the Eastern part of India (the reason why he used numerals is that they are more resistant that other kinds of words to be borrowed from neighboring, unrelated languages). This hypothesis was reinforced later on by the studies of Sir William Jones on the roots and inflectional morphemes of various languages. Very eloquently, he wrote that these items exhibit:
"a stronger affinity, both in the roots of verbs and in the forms of grammar, than could possibly have been produced by accident: so strong indeed, than no philologer could examine [them] without believing them to have sprung from some common source, which, perhaps, no longer exists."
The method of comparing the lexicons of different languages was formalized in the 20th century with the work of Morris Swadesh. Swadesh identified a small set of words that strongly resist being borrowed from other languages ---these are words referring to very basic everyday concepts like man, woman, eat, sleep, pronouns, common animals and plants, etc (see slides 22 and 23). These forms what have been called the Swadesh lists (there is more than one, but the variations are relatively minor).

3. Greenberg's universals

The most famous typologist of the 20th century, however, is Joseph Greenberg. His contribution to typology is the formulation of universals ---i.e., properties that hold of all or nearly all of the languages of the world, often irrespective of genetic relations. Here are some examples of universals, from his landmark article "Some universals of human language, with special reference to the order of meaningful elements": In reality, Greenberg formulated both universals and tendencies, the difference being that tendencies hold of many languages, but not all (as in the examples above, he uses phrases like "almost always", or "with overwhelmingly greater than chance frequency"). Additionally, each universal or tendency can be implicational or not. If they are implicational, they have the form "if a language has property x, then it (almost always) also has property y" ---i.e., they establish a correlation between two properties. Here are some examples, some more trivial than others. Note that most of these universals/tendencies make reference to the default word order. Contrary to what you might expect, the six logically possible orderings of Subject, Verb, and Object are not equally distributed ---rather, some are much more frequent than others. Here are the percentages of languages with each order. This classification is not totally exact, because some languages have more than one default order (e.g., German and Dutch and SVO in main clauses and SOV in subordinate clauses), but it should give you an idea. As you can see, verb-initial languages are rare, whereas object-initial languages are almost unheard of (the standard claim is that there are six OVS languages and nineOSV ones, at least that we know of; most of them are spoken by small tribal communities in the Amazonian rainforest). Some theoretical linguists have argued that these orders are so rare because require a much more complicated underlying syntax; however, this is a topic for a different course.

4. A trip into the Indo-European family To illustrate the way in which languages get classified into families, here is a partial tree of the Indo-European family. So that this section doesn't get impossibly long, I only consider the Germanic family in detail (you might be surprised that the relatively small geographical area of Northern Europe can contain so many languages). Names of individual languages are given in italics; numbers in parentheses indicate the number of languages in each unexpanded branch.