Historical linguistics (or comparative linguistics) is primarily the study of languages which are recognizably related through similarities such as vocabulary, word formation, and syntax. Historical linguistics aims to classify the world's languages by their genetic affiliations and to trace the historic development of languages. Studies have been focused mostly in the area of Indo-European languages.

Languages change over time. Eventually, they change so much that there is no similarity to the original. Estimates vary, but one plausible opinion is that if a group of Americans were sent to a distant galaxy, after 10,000 years they would be speaking a language that would be no more similar to English than to Chinese or Arabic.

Historical linguists construct family trees, an idea pioneered by the 19th century historical linguist August Schleicher. The basis for the trees is the comparative method: languages presumed to be related are compared with one another, and based on what is generally known about how languages can change, linguists reconstruct the best hypothesis about the nature of the common ancestor language from which the attested languages are descended.

Use of the comparative method is validated by its application to languages whose common ancestor is known. Thus, when the method is applied to the Romance languages (which include French, Spanish, Portuguese, Italian, and Romanian), the reconstructed common ancestor language comes out rather similar to Latin--not the classical Latin of Horace and Cicero, but something perhaps more akin to the Latin that must have been spoken in various dialects during the Dark Ages, following the breakup of the Roman Empire.

The comparative method can be used to reconstruct languages for which no written records exist, either because none were preserved or because the speakers were illiterate. Thus, the Germanic languages (which include German, Dutch, English, Norwegian, Swedish, Danish, Faroese, Icelandic, and the extinct Gothic) can be compared to reconstruct Proto-Germanic, a language that was probably contemporaneous with Latin and for which no records are preserved.

Germanic and Latin (more precisely, Proto-Italic, the ancestor of Latin and a few of its neighbors) are themselves related, being co-descended from Proto-Indo-European, spoken perhaps 5000 years ago. Scholars have reconstructed Proto-Indo-European on the basis of data from its ten daughter branches, which are: Germanic, Italic, Celtic, Greek, Baltic, Slavic, Albanian, Armenian, Indo-Iranian, and the two dead branches Tocharian and Anatolian.

The comparative method allows us to distinguish true linguistic descent (that is, the passing of a language from parents to children, down through the generations) from accidental resemblance due to cultural contact. For example, the majority of the vocabulary of Persian (Farsi) is taken from Arabic, as a result of the Arab conquest of Iran in the 8th century and much subsequent cultural contact. Yet Persian is Indo-European, being a member of the Indo-Iranian branch that also includes Sanskrit and many of the languages of modern India. The clue that Persian is Indo-European is that its core vocabulary generally has Indo-European cognates (as in madar 'mother'), and its essential grammatical elements are likewise Indo-European (as in bud 'was', which includes elements related to English "be" and the English past tense ending "-ed".)

The comparative method has been successfully used to reconstruct some very large language families, notably Austronesian (which includes Hawaiian, Tagalog, Indonesian, and Malagasy) and Niger-Congo (the majority of the languages of modern Africa). Once the various changes in the daughter branches have been worked out, and a fair amount of the core vocabulary and grammar of the protolanguage are understood, then scholars will quite generally agree that a relationship of genetic relatedness has been proven.

Vastly more controversial are hypotheses about relatedness which are not supported by application of the comparative method. Scholars who attempt to probe deeper than the comparative method supports (for example, by trying to relate Indo-European to neighboring languages) are often accused of scholarly wishful thinking. The problem is that any two languages have a huge number of opportunities to resemble one another just by accident, so merely pointing out isolated resemblances has little evidentiary value. A famous example is the Persian word for "bad", which is pronounced (more or less) just like English "bad". It can be shown that the resemblance between these two words is completely accidental, and has nothing to do with the (rather remote) genetic connection between English and Persian. For further examples, see False cognate.

Since supporting distant genetic relationships is so difficult, and the methodology for finding and proving such relationships is not well established (in the way that the comparative method is), the field of locating remote relationships is riven with scholarly controversy. Nevertheless, the temptation to pursue remote relationships remains a powerful lure to many scholars--after all, Proto-Indo-European must have seemed a rather wild hypothesis to many when it was first proposed.

The ultimate in remote reconstruction is the recovery of a Proto-World language. Not all scholars believe that such a language even necessarily existed. Moreover, it is difficult to reconcile Proto-World with what we know about prehistory. Pat Ryan and Joseph H. Greenberg have suggested that people coming out of northeast Africa around 50,000 BC spoke Proto-World. But that would violate the rule that no relationships would be recognizable after 10,000 years. If all the languages of the world are related, this relationship must have somehow formed more recently. Additionally, it contradicts genetic evidence that suggests the first humans migrated from the Horn of Africa into Yemen, rather than across the Sinai peninsula.

Dené-Caucasian has also been postulated to include Na-Dené (North America), Sino-Tibetan, Ket (Siberia), Burushashki (Pakistan), Caucasian (Chechen, Dagestan languages), and Basque. This language family is extremely hypothetical.

The Nostratic hypothesis was proposed by a Dane named Holger Pedersen, in 1903. The hypothesis claims that the Nostratic grouping includes such widely ranging language families as Indo-European, Afro-Asiatic, Uralic, Altaic, Sumerian, Elamo-Dravidian, and Kartvelian. Others claim other sets of languages. Some have speculated that the Nostratics were refugees from a Black Sea Flood of around 5600 BC, and some think this is the origin of Noah's Flood from the Bible. However, linguists have reached no firm conclusion about the validity of the Nostratic hypothesis.

  • Afro-Asiatic includes Semitic languages, Egyptian, Berber, and various languages spoken around the Sahel of Africa such as Chadic, Hausa, Somali, and Fulani.
  • Uralic includes Finnish, Hungarian, and some languages around the Ural mountains such as Bashkir.
  • Altaic includes Turkish, Mongolian, Manchu, Kazakh, Uzbek, and other languages mostly spoken in Central Asia.
  • Elamo-Dravidian includes the extinct Elamite language of ancient southwestern Iran. Dravidian was probably spoken in the Indus Valley Civilization. Today its descendants are mostly spoken in South India. The connection between Elamite and Dravidian is extremely hypothetical, and Elamite has generally been considered to be unrelated to any other known language.