What is a language?

Actually there is perhaps one question more scary for a linguist (professional or amateur) than “How many languages do you speak?

It is, simply: what is a language?

As noted in the above-linked article, in the same way astronomers cannot really define an apparently simple term like “planet”, linguists cannot really define an apparently simple term like “language”.

In any attempt to answer it, it is worth re-emphasising a core point at the outset. When we refer to “English”, usually (particularly when we refer to the language in teaching or administration) we in fact mean “the standard dialect of English based on the form deriving from the variety spoken by the educated classes in the Oxford-Cambridge-London triangle at around the time of the invention of printing”. French has a similar story to the area around the Sorbonne; Spanish to Salamanca; Portuguese to Coimbra; Italian a notably complex one back a little further in time but based around Florence; German an even more complex one involving Frankfurt (written) and Hanover (spoken). Regardless of the exact history in each case, in most cases we are generally referring to an agreed “standard” written variety and how that variety is reproduced in contemporary speech. There is, notably, a degree of artificiality to this, and yet any “standard” could not survive if it did not represent a variety of the language widely understood and accepted by its users.

Then, in most of Europe (and most places which speak languages of European origin) and parts of the Indian Subcontinent at least, it is worth noting that nearly all “languages” derive from a common source, most likely somewhere in modern Ukraine around 4000-5000 years ago. At that time, in that location, there was a tribe which spoke what we now refer to as “Indo-European”, which was of course never “standardised”. As that tribe broke out, notably westward (from a European point of view) and southward (from an Indian), its language dispersed. As speakers entered new areas, they had to describe different things (new types of tree, sorts of landscape, or even shades of colour, for example); and they came across other tribes from whom they borrowed words and who influenced grammar and pronunciation. The real issue here is that the difference between languages is not just one of space (notably through modern mutual intelligibility), but also time. At some stage Indo-Europeans were speaking a single language, and later they were speaking Latin, Ancient Greek, and Sanskrit; later still Italian/Romanian/French/Spanish/Portuguese, Modern Greek and Hindustani.

Additionally, at certain times but in very different epochs we find the first written examples of each tree, and then the first published examples – all of which may have an impact of our perception and sense of what is and is not a language. The issue here is that our instinctive Western bias towards defining “language” very closely alongside “Standard written variety” is problematic. Did “Latin” only exist once it was written? Did “German” only exist once it was published? Do Amazonian tribes with no concept of writing not speak “languages”? In future we may find generations rejecting any language which does not have at least 1,000,000 Wikipedia articles as evidence of its existence!

We also have to consider further the distinction between written and spoken varieties. Clearly, they are connected. However, they are also differentiated in ways which to many of us are simply intuitive. How often do you use the word “therefore” in daily speech, for example? If we take this further, we find that a majority of people globally in fact do not only switch between spoken versus written and/or formal versus informal registers, but actually between languages. A rural dweller in Morocco, for example, may well speak Berber at home, Arabic at the market, and French in education and government dealings. Does that person speak three languages, when they are not each used in all contexts? Indeed, if Berber is never used for commerce, education or administration (and is never written), is it a “language” at all? And then, if that person meets a trader from Syria who also purports to speak “Arabic” but they cannot understand each other at all, who is speaking what and are they different “languages”?

Most of the terminology around this issue is in fact borrowed from German – Abstand refers to language differentiation by linguistic distance (“Irish” is clearly linguistically different from “English” but not from “Gaelic”; “English is clearly different from “Irish” but not from “Scots”); Ausbau is the notion of how far a language is deliberately developed (not just towards written standards, but that is an obvious issue); Dachsprache is essentially a person’s sense of which language they are speaking (or writing) regardless of context (so a northwestern German farmer may linguistically speak something closer to Standard Dutch than Standard German at home, but if he regards himself to be speaking German then, arguably at least, by definition he is); and Halbsprache is a term used for a linguistic variety which is not fully developed as a written standard language of a community or communities, but has some sense of development and commonality (perhaps, for example, in literature) which goes beyond a perfectly regular non-standard regional dialect or similar.

It is here that we find “language” status, in the West at least, is an intensely political thing – the old maxim is that “a language is a dialect with an army and a navy”. At the time of the French revolution, Parisian French would have been easily understood by only a minority of the population, many of whom spoke completely different languages (from Breton to Dutch) and most of whom spoke a different variety originating from Latin; at the time of Italian unification it was openly admitted “We have created Italy; now we have to create Italians”. Of course, this political-linguistic emphasis can go the other way too – the successful revivals of Catalan and Welsh are tied, with different levels of connection and comfort, to nationalist/separatist political movements (as are many rather less successful ones). Countries such as Spain generally struggle with the challenge of so many languages at different levels of development and with different levels of popular support.

What is the solution to all of this? I have no idea! However, I would suggest the best solution I have seen is a language pyramid:

Spanish Arabic French
Japanese Russian German Hindi Indonesian
Thai Swahili Polish Dutch Gujurati Korean Wolof
Kannada Zulu Irish Catalan Afrikaans Papiamento Belarussian Maori Icelandic

Here, we can see (if formatting allows!) that English has a unique status as the foremost language of global trade, knowledge and diplomacy. Even here, this presents challenges, however. How different are the varieties and should we specify which one (American, British, or even a different non-native version) predominates? For how long has English had this unique status? Which language had it before and how did it lose it?

In the next level, purely by way of example, I include three languages of unquestionable global reach and cultural relevance. That said, even here they have attained this status by different means. Spanish has it by weight of numbers; Arabic due to its religious role; and French due to its previous role as the high language of Royal elites and global diplomacy. Some of these may not stand the test of time.

At the next level we have significant national languages, not only because they are spoken by a lot of people in globally relevant economies, but also because they have some degree of reach (Pokemon, vodka, Vorsprung durch Technik, guru, nasi goreng etc.). Even here, we have some challenges. What exactly does Russian cover? Do we allow for Austrian German in any way? Is Hindi to be considered distinctly from Urdu, and why? Is Indonesian to be considered alongside Malay, and does this affect its status?

At the next level we have significant national languages which perhaps do not have quite the same reach, or significant international trading languages in particular regions. These are quite distinct issues, and we are now touching on just how far our Western bias towards “Written Standards” takes us, versus the practical reality of trading and living in some form of “lingua franca” for hundreds of millions of people.

At the final range we have a lot of distinction: established regionally significant languages, national languages in restricted use but of historical significance, growing regional languages in large economies, languages in administrative use in regional powers, significant inter-regional trading languages, national languages whose status distinct from other languages is disputed, national languages within nations, and linguistically significant national languages of small countries. Maybe these do not all belong at the same “level”, but they show a range of uses and challenges in terms of definition of a “language” and why it may (or may not) be so defined – globally, nationally, regionally; socially, politically, economically; never mind linguistically!

Of course, most of the world’s languages would not even make it on to the above pyramid. From tribal languages of restricted range to languages of uncertain status (Ulster Scots anyone?), the challenges only multiply below the pyramid! This is to say nothing of constructed languages such as Esperanto or Klingon; or indeed codes or systems which meet some of the common definitions of “language”.

We may, in practice, never be able to agree on the definition of a “language”. We should at least reach some agreement, however, on the complexities which surround the challenge of agreeing that definition!





