Western European Languages – A Reference Guide: book order details

Western European Languages – A Reference Guide is now available online in all markets.

“It occurs to me that there should be a copy of this book in every household.”

“An easy book to dip into as well as read more thoroughly.”

“An exploration of these languages not just as they are spoken in Europe but across the world.”

Average 5* reviews on Amazon, Google and Loot.

“My new book, Western European Languages – A Reference Guide, is now available in all major markets online, including in:

UK & Ireland

United States

Canada

Japan

Germany

Netherlands

France

Italy

Spain

Sweden

Poland

Australia

South Africa

However, if you are in the UK, do contact me directly and I can save you the postage costs.

Eyebeetha… and the perils of “right” and “wrong”

I happened upon a podcast recently where there was a discussion about the “correct” or “right” pronunciation of Ibiza (discussed here three years ago). There were two key points for language learning and linguistics emerging from it.

The three gentlemen on the podcast had been told that their pronunciation, roughly “Eye-bee-tha“, was “wrong”. With regard to what was “wrong” about it, they discussed at length the final syllable, which they had been told was pronounced by Dutch people as “-tsa” rather than “-tha“. They nevertheless established that in fact “Spanish people” do pronounce it “-tha“, and thus determined that in fact their pronunciation was “right”. Well… there were two key points for language learning and linguistics emerging from this…

First is the obvious issue from the title of this post: who determines what is “right” and “wrong”? It had not occurred to the three gentlemen (and nor should it, by the way, as none of them was claiming linguistic competence) that there is no particular “right” or “wrong” here. What they were determining to be “right”, i.e. that the final syllable should be pronounced “-tha“, was specifically the pronunciation in what we can reasonably refer to as “Standard European Spanish” (hence the reference to “Spanish people”). However, of course, not all “Spanish people” speak “Standard European Spanish”; there are Spanish people, notably in the southwest, who when speaking “Spanish” nevertheless pronounce the final syllable “-sa” (and, as a direct result given that is where the boats for the New World typically departed from, this means that the overwhelming majority of Spanish speakers globally pronounce the final syllable “-sa” too). And then, of course, not all “Spanish people” speak “Spanish” at all: Catalan speakers also pronounce the final syllable closer to “-sa“, and this is highly relevant because Ibiza is of course part of the Catalan-speaking area; the focus of the podcast was entirely on attempting to replicate the Spanish-language name of the place, Ibiza, without considering the Catalan-language name of the place, Eivissa; yet in “Ibiza” itself, given that it falls within the “Catalan Lands“, signage refers to Eivissa not Ibiza. To some extent, therefore, “Ibiza” is not even the “right” spelling, so why would anyone aiming for the “right” pronunciation use it as a basis? Put more simply, in linguistics there are “good” (essentially “consistent”) and “bad” (“inconsistent”) answers, but rarely a straightforward “right” or “wrong”.

Second, however, is in some ways the more relevant issue for language learners, namely “noticing”. Even if we accept that by “right” the objective is “Standard European Spanish”, the issue with the pronunciation “Eye-bee-tha” was categorically not the final syllable. The issue was their pronunciation of the first syllable, something which, interestingly, they did not consider at all. Such was the focus on the letter ‘z’, they completely missed the first ‘i’! In “Standard European Spanish”, or indeed any version of Spanish, ‘i’ is consistently pronounced similarly to typical English “ee”, thus to replicate Spanish pronunciation we are looking for “Ee-bee-tha” (or, as an absolutely acceptable alternative, “Ee-bee-sa“) but certainly not something beginning with a syllable pronounced similarly to the English word “eye”!

As an aside, we also have to accept that we are unlikely ever to be able to replicate the “native” pronunciation, whichever “natives” we may mean by that, absolutely faithfully. There is, after all, also a problem with the ‘b’ for English speakers, which is much less plosive (almost half way to a ‘v’) for most Spanish speakers; and indeed if we opt for final “-sa” this too requires the ‘s’ to be moved part of the way towards what English speakers would think of as an “-sh-” and also quietened (‘s’ in English is pronounced much more loudly than in Romance languages, something which causes learners considerable difficulty in both directions).

It so happens that the initial syllable of the Catalan name Eivissa is closer to English “eye” (though the location and duration of articulation is far from identical), but then we absolutely must have “-sa” as the final syllable and we also have to consider that although the ‘v’ is pronounced in Standard Catalan similarly to the Spanish ‘b’ in Ibiza, it is pronounced much more like English ‘v’ in the variety of Catalan spoken on the island itself.

All of this means that there are various “right” pronunciations not just for the name of the island, but actually for every single syllable of that name! However, it is important to “notice” that they cannot be randomly jumbled up – if we start with something close to “eye” then we should probably finish “-sa“, but if we are determined to finish “-tha” (noting that there is no reason for such determination in English) then the first syllable really needs to be “ee-“.

After all that, the main point is not to be too insistent on exactly what is “right” or “wrong”…!

Why “say” but “sez”?

One of the most difficult aspects of English for learners is the apparently random way in which letter combinations can be spelled the same way but pronounced differently.

The noun ‘day’, for example, has the straightforward plural ‘days’. The verb ‘pays’ has the straightforward third person present indicative form ‘pays’ and the straightforward preterite and past participle form ‘paid’ (pronounced as if ‘payed’, and thus in speech perfectly regular). The verb ‘stay’ likewise becomes ‘stays’ and ‘stayed’, in this case even the spelling remains completely consistent but fundamentally the sound of the ending is entirely aligned (i.e. ‘stayed’ rhymes with ‘paid’ and the vowel sound is the same as in ‘stay’ or ‘pay’).

So why, then, does this not happen with the verb ‘say’? The forms are written as with ‘pay’; third person singular indicative ‘says’ and past ‘said’. Surely these too should be pronounced to rhyme with ‘pays’ and ‘paid’ or even, for that matter, ‘stay’ and ‘stayed’.

Here we run into a number of issues with whole concept of “standard” languages and language change.

Firstly, when we say that ‘said’ rhymes neither with ‘paid’ nor ‘stayed’, we are thinking of what we consider the “standard” language – in fairness, this is the one taught to foreigners. In fact, in many dialects, ‘says’ is pronounced as if it were simply ‘say’ followed by the letter -s, and ‘said’ is pronounced as if it were ‘say’ followed by the letter -d; this is common in parts of the north of England, for example.

Secondly, in language change we do have tendencies but few absolute rules; a sound shift may occur fairly regularly, but there will always be the odd exception – perhaps because of the “phonological environment” (a fancy way of saying the sounds around the sound which shifts) or because of the regularity with which a particular word is used. This latter may apply a little here: the verb ‘say’ is more common that the verbs ‘pay’ or ‘stay’, for example.

Fundamentally, what has happened is that the -ay ending of the base verb (sometimes represented in the past form as -ai-) was once pronounced even in the south of England by educated speakers much as it still is by many speakers in the north – as a higher long vowel. When this shifted downwards in the mouth, with the less common verbs ‘pay’ and ‘stay’ (and with the noun ‘day’) the derived forms shifted in the same way – leaving the modern ‘pays’, ‘stays’ and ‘days’ all rhyming across the south of England (and thus in the “standard” taught to foreigners) with the “newer” pronunciation. Yet the shift did not occur in the more commonly used derived forms of ‘say’; ‘says’ and ‘said’ in fact retained their older pronunciation, and then subsequently exhibited a different shift as the vowel shorted (as noted, since the shift did not occur in many northern English dialects in the first place, the derived forms did not need to shift either, so in those dialects they remain aligned).

But that brings us to another point: in fact, in some dialects the division between ‘say’ on one hand and the derived forms ‘says’ and ‘said’ on the other did occur even with the other words. In fact in Northern Ireland, even in fairly educated speech, you will hear ‘days’ to rhyme with ‘says’ with a distinct vowel sound from that in ‘day’ to rhyme with ‘say’ (or perhaps with a longer vowel than in ‘says’, more like the pronunciation in the north of England); likewise ‘paid’ will be distinct from ‘pay’ (again, though, quite often retaining a slightly longer and in fact more conservative vowel than in “standard” ‘said’), and potentially even the vowel of ‘stayed’ will be distinguished from that in ‘stay’.

For most people this is a relatively minor curiosity, but it is exactly this sort of thing – with the variation in different dialects – which enables us, in the absence of written evidence, to reconstruct languages back through time.

Even in Pompeii, for example, we see graffiti including the word veces ‘times, occasions’, which was no doubt an accurate depiction of the current pronunciation at the time but was in fact a non-classical (or “non-standard”) form, as Classical Latin had vices; yet modern “Standard” Spanish does indeed have veces. This is why it is so important to look beyond the “standard” for solutions to linguistic curiosities – and perhaps even an echo of generations long past.

War in the East – and priorities

Almost exactly a decade ago I wrote this and, with apologies to one regular correspondent, we need to talk about it a lot more.

Some time before that, in 1997, Buz Luhrmann released a single entitled “Wear Sunscreen“, a list of advice from fortysomethings (which I now just about still am) to twentysomethings (which I became that year). It contained the line “The real troubles in your life are apt to be the things that never crossed your troubled mind; the kind that blindsides you at 4pm some idle Tuesday“.

So it is with geopolitics, and indeed any politics. While the UK Government desperately, for reasons known only unto itself, battles with courts for the right to fly people to Rwanda and the United States embroils itself in a court case about hush money against someone who should long ago have been tried for sedition, the world is becoming increasingly turbulent. Unfortunately, hoping this turbulence will go away is not really a strategy. Nor, in fact, is merely hoping one side will magically “win“.

The Middle East, sadly, was always a tinder box. However, in 2023, there were more terrorists deaths in Burkina Faso than in any other country. Burkina Faso, on some rankings literally the poorest country on the planet, is in West Africa, a region of (nominally) independent states until recently still dominated in practice economically and militarily by France. Angered by what many see as neo-colonialism, some have now turned away from the French for protection – choosing instead the Russians.

Further east, the Sudan region continues down the road to chaos. There are significant interests there for China too, as well as for the West (never mind the people who actually live there). “Interventions” rarely have much to do with helping out locals, but they are significant in geopolitical terms. Nearby Djibouti now hosts military units from a whole host of countries; neighbouring Somalia is so unstable it regularly impedes global shipping (this is one reason the lead-in time for Japanese cars in Europe extended to nine months around a year ago, though it is now back down to around four).

It is probable, however, that our main issue is still in Europe. Ukraine’s counter-offensive in 2023 to try to split Russian-held territory in the southeast of the country in two essentially failed; this was partially evened up by the failure of Russia to secure full naval superiority in the Black Sea, but the truth (whether we want to listen to it or not) is Ukraine is not winning. Russia’s ultimate goal of linking up territory all the way around to Transnistria (where there were even polling stations for Russia’s recent Presidential Election, despite the fact that almost everyone recognises Transnistria is part of the Republic of Moldova) remains on track, despite the huge toll. It is quite possible that, by 2030, all of Transnistria, south and east Ukraine and Belarus (already a “Union State”) will be under Russian control.

It is then that we get to the real flashpoints. Russia would then inevitably try to destabilise the rest of the Republic of Moldova, in which as a response French troops are now stationed. Which is interesting, when you consider what is going on in West Africa (notably in Mali). French and Russian troops are beginning to displace each other or to stare at each other in at least two locations.

Throw in the potential for China to destabilise Taiwan all while the United States is distracted by another Trump presidency (a 50/50 chance currently), and it is close to inevitable that Russia will continue to reshape its new Soviet Union while all this is going on. At some stage, that will mean attention turns to the “Russian” minorities in the Baltic States, particularly Estonia and Latvia.

The three Baltic States are members of NATO and of the European Union, and here is the crux – both of those include mutual defence pacts, not just NATO. NATO is, if anything, the lesser of the two – of all its members until recently only three of its members (the US, the UK and Turkey) spent the supposedly required 2% of GDP on defence (remarkably, even France will not pass this figure until next year), leaving it much more poorly equipped to deal with threats than it would be if the requirement had been met since it was agreed thirty years ago. The main issue to be noted here, however, is that it is practically impossible to be a member of the EU without also, de facto at least, being allied to NATO – this is a particular point of note for Austria and Ireland, not least since Sweden and Finland have already made it the case de jure. On top of this, the United States is not the only country being tempted by nativism; leaving Brexit quite aside, Slovakia recently elected a hard man populist to match Hungary’s (and, for that matter, Russia’s).

As I wrote back in 2022, this is not to alarm people unnecessarily; it is, however, to suggest we may wish to rethink our priorities and recognise that, as so often, decisive early intervention is better than a desperate later cure.

For Enric, England and St George…

Next week is St George’s Day, a really not every big event in England but a huge one in Catalonia, where Sant Jordi is also the patron saint.

What better excuse to point out a strange “grammatical rule” which applies to both English and Catalan, but not to other languages?

Well, you’re reading now, so here goes…

In German, if someone asks you if you have time to help with something, you may say something like “Ja, ich habe viel Zeit” or “Nein, ich habe nicht viel Zeit” (where viel translates broadly as ‘much’); likewise in Dutch we might have “Ja, ik heb veel tijd” or “Nee, ik heb niet veel tijd“. That covers the main Continental West Germanic languages; deriving from Dutch, Afrikaans has an interesting double negation and an interesting common word borrowed from Malay (baie, cf. Malay/Indonesian banyak), but it too would be simply “Je, ek het baie tyd” or “Nee, ek het nie baie tyd nie“. So far, so simple.

In Romance languages, Catalan’s siblings, this simplicity is (mercifully for learners) maintained: in Spanish we might say “sí, tengo mucho tiempo” or “no, no tengo mucho tiempo“; in Italian “sì, ho molto tempo” or “no, non ho molto tempo“; in French “oui, j’ai beaucoup de temps” or “Non, je n’ai pas beaucoup de temps“. So far, still, so straightforward.

Yet in Catalan, in fact, we would most likely say “, tinc molt temps” but, and here is the catch, “no, no tinc gaire temps“. This complicates matters – the expression in the positive clause is “molt temps” but in the negative clause it is “gaire temps“.

Many English-speaking readers will be cursing Catalan: what kind of crazed language would cause such awkwardness? To which the answer is, of course, our own…

In English, though few of us notice it, the most likely translation of these clauses is “Yes, I have lots of time” or “No, I don’t have much time” – and there again, we have the distinction. While it is not held to be “wrong” in English to say “Yes, I have much time“, it would sound odd except if we are perhaps specifically emphasising a degree of poshness or even conservatism. Likewise, it is not strictly wrong to say “I don’t have lots of time“, it would sound strange; “I don’t have a lot of time” is fine in some contexts but it probably would not fit well here; it perhaps carries a subtly different meaning, emphasising the “time” element.

For language learners, there are two points of note here aside from the commonality of St George next week. First, sometimes frustrating “rules” (I am beginning to prefer the term “tendencies”) are in fact shared by our own language, perhaps without us even noticing; second, it is often difficult to distinguish between phrasing which is deemed “grammatically wrong” and phrasing which is deemed “idiomatically odd”. My own instinct is that it is the “idiomatically odd” which is more liable to catch us out in a foreign language.

(Oh, St George is also the patron saint of Portugal but I didn’t write about that because “não tenho muito tempo“… it appears St George doesn’t require idiomatic awkwardness in every one of his related languages…)

The pronunciation of Spanish foods – and chaos…

I have written two posts recently about the linguistics of Italian food (well, pasta) and drink (well, coffee), but if we ever want an example of how linguistics defies rhyme or reason when it comes to pronouncing foreign terms, perhaps Spanish is an even more obvious example.

A recent video on YouTube warned us absolutely not to pronounced chorizo as “chorr-its-o”, yet as I established on this blog some time ago I would argue that even that pronunciation (despite to modern ears sounding more Italian than Spanish) has some historical legitimacy.

There are two more examples we need to discuss as we enter this chaotic realm, however!

Firstly, there is the utterly bizarre English language pronunciation of jalapeño; here, there is a least a vague attempt to Hispanicise the first letter (which in Spanish has no obvious English equivalent but is not far from the “ch” in Scottish ‘loch’, but is often rendered by English speakers more or less as an “h-“), but after that we just seem to give up… the word is blatantly pronounced otherwise as if it were English, with no account taken of the tilde on the letter ñ, nor to pronounce any of the sounds in Spanish at all – to the extent that the “e” tends to be randomly lengthened, as it might be in ‘scene’ or to rhyme with the English pronunciation of the place name Reno (which in fact derives from a French family name, but let’s not go there). This all makes you wonder why, for consistency, we do not just pronounce the first letter as if it were a j- in January.

Secondly, there is the tricky example of the famous dish paella. The same YouTube video warned us in no uncertain terms that we have to make an effort particularly with the double ll here: we are looking for “pa-e-ll-a” where the double ll is pronounced with the tongue towards the back of the mouth as if set for an English y and then shifted to form a labial sound resembling (but not identical to) English l.

Except there’s a problem with that. Paella may be to many of us the quintessential Spanish dish, but its name is not in fact Spanish! Paella is a Catalan (actually, specifically in this case Valencian) word meaning simply ‘pan’ (cf. Italian padella from the Latin diminutive form patella), and that ll is pronounced quite differently in Catalan-Valencian from in Spanish. To be clear, it is not wrong to pronounce it as above in Spanish, but there is no particular reason to adopt the Spanish pronunciation in English given the word does not come from Spanish. Technically, therefore, what we are looking for is ll with pronounced with a more rounded mouth than in Spanish, so we are aiming for the l in ‘leaf’ but with the tongue starting further back and perhaps held just a smidgeon longer.

Or of course you could just say ‘paella’ as if it is an English word; after all, that’s what you do with (most of) jalapeño…

The curious uniqueness of English ‘w’

Proto-Indo-European was a language spoken up to around 4500 years ago, probably by a few thousand people around modern eastern Ukraine, and it is remarkable because it is the ancestral language to the one spoken natively by around 3.5 billion people globally today.

We can reconstruct it from daughter languages, and we know for example that those speaking it had wóynh2om, or ‘wine’.

This comes down into Modern German as Wein, Modern Dutch as wijn and Modern Scandinavian as vin. In all of these cases, the initial letter is pronounced as per English /v/.

It also came down into Latin as vinum and into its daughter languages as vino or something very close to it. Again, in each case, the initial letter in nearly all of the modern languages derived from Latin is pronounced as per English /v/ (Modern Spanish has shifted this slightly); indeed, the word ‘vine’ is a reborrowing back in English from Norman French.

This is a little peculiar, however, because in the Classical Latin of 2000 or so years ago the initial letter in vinum was in fact pronounced as in (or, at least, close to) Modern English /w/. This shifted over the next few centuries to /v/, hence the pronunciation in Late Latin and thus in modern languages derived from it; but there remains a peculiarity.

German wein and Dutch wijn may be pronounced with an initial /v/, but they are not spelled with one: and that is indeed because, until comparatively recently (i.e. within the past millennium – in linguistic terms, that constitutes “recent”), they were pronounced with an initial /w/. This shift to /v/ was entirely separate from that which occurred in Latin and it was also entirely separate from (and more recent than) that which occurred in Nordic languages including those spoken in Scandinavia (hence the spelling vin), but it does reflect a fairly common one which is also found in other language groups.

We can see, therefore, that in each of Latin (and thus the Romance languages), the Nordic Languages and the Continental West Germanic languages there was an identical but separate shift from /w/ to /v/.

The remarkable truth, in fact, is that this shift occurred some time in the history of every single modern language derived from Proto-Indo-European – except English (and Scots)!

English /w/ is therefore a complete linguistic outlier – and a unique echo of our ancestors and of five millennia ago.

Language change and the “sonority hierarchy”

We saw last week and in other recent posts the process of language change, as sounds get shifted, simplified or just dropped over time. There is an element of randomness to this – sometimes, a change just becomes “cool” with a particular community or generation. If there were no randomness, Latin would have developed in more or less the same way everywhere. However, thankfully, there are also some rules or, at least, identifiable tendencies.

One principle of language change is simply that of “least effort” – people do not want to make too much effort to recreate a sound! Put more formally, this can be demonstrated as a tendency, known as the “sonority hierarchy” (this is not a religious group, but a phonological trait!)

Essentially, this hierarchy requires syllables to conform to an order of sounds (at least as a preference) from more sonorous to less, starting from the middle out. Making this slightly more complex, the preference is for vowels; then glides (like the /w/ we heard about last week; also /y/ in words like ‘cube’); then liquids (/l/ or /r/ predominantly); then nasals (/n/ and /m/) and then obstruents (fricatives such as /f/ or plosives such as /p/).

The most obvious use for this hierarchy is in fact to assess the volume of each sound – vowels are much louder, in practice, than obstruents (consider /a/ versus /v/).

The relevance to language change, however, is fundamentally linked to phonotactics. Here, the tendency will be for the middle of the syllable to take the loudest sound (most obviously a vowel) and the extremes to take the quietest: the most commonly given example is ‘plant’, with two obstruents on the outside, a vowel in the middle and a liquid and a nasal towards the middle. This will be allowed in most languages which allow consonants together at all (though many, even major ones such as Japanese and Indonesian, do not), but it will be specific to this order: *lpatn will not generally be allowed, nor even in all probability will *palnt or *plnat.

This is contested and languages do of course have exceptions, hence the use of the word “tendency” rather than “rule” in this post. Nevertheless, it can be used to predict changes or identify dialect variations. For example Latin tabula was fine (three syllables centred on a vowel), as is modern Italian tavola and even Catalan taula (two syllables), but modern French table is tricky, because in practice it has become a single syllable where the obstruent /b/ comes before the liquid /l/ (thus contrary to the hierarchy). Although it is pronounced this way in Standard French (/tabl/), we can predict that this will become unstable. In fact we know this, because in Quebec it is already generally pronounced simply /tab/, with the final /l/ dropped, which thus brings it back into line with the hierarchy.

This does not mean that sounds will not otherwise be lost: English ‘talk’ does conform to the hierarchy (the final obstruent comes after a liquid) but, nevertheless, the /l/ has been lost. The hierarchy gives us a guide as to what to expect, but the “principle of least effort” can become all-conquering…

Romance languages’ past tense – and the Catalan oddity

If we look across the five major Western Romance languages – Portuguese, Spanish, Catalan, French and Italian (from west to east, as it were) – we can sometimes group them usefully. Quite often they fall into a single Iberian group on one hand and an Italo-Gallic group on the other; Catalan may tend towards the former but, as with locative pronouns, it can also go with the latter. There is one obvious way, however, in which it is a complete outlier.

When expressing the past in the modern spoken languages, there is an obvious divide between Iberian and Italo-Gallic. Iberian languages retain all three past formations – in Spanish, the imperfect (typically expressed in English by, for example, ‘I was singing’ or ‘I was asking’) is cantaba or pedía, the preterite or “past remote” (often expressed in English by ‘I sang’ or ‘I asked’) is canté or pedí, and the perfect (often ‘I have sung’ or ‘I have asked’) is he cantado or he pedido. Exactly how these are distinguished from each other varies (in particular, European Spanish uses the perfect markedly more often than most Latin American varieties); and modern Portuguese rarely uses the perfect other than to express something was continous (so tenho cantado or tenho pedido in most dialects would be understood more as ‘I have been singing’ or ‘I have been asking’). Fundamentally, however, the Iberian languages maintain all three grammatically, including in informal speech.

Italo-Gallic languages, at least Standard French and northern Italian, have largely abandoned the preterite in common use (though it may be found in formal texts, literature and, in Italian, in lyrics). Therefore, in French, j’ai chanté covers both ‘I sang’ and ‘I have sung’ (though it is distinguished from an imperfect: je chantais); and likewise in Italian ho cantato covers both (and is distinguished from imperfect cantavo). It is worth noting that the “past remote” form cantai will be heard in southern Italy which for clear historical reasons, both linguistically and culturally, is rather closer to Spanish anyway, and therefore learners may encounter all forms of the “past remote” at some stage; however, learners of both French and Italian can get away without proactively knowing the “past remote” forms (even if it is useful to recognise them). Hence, there is a clear boundary between Iberian languages which make regular use of the preterite (“past remote”) form in modern speech, and the Italo-Gallic languages which do not.

Which side does Catalan fall on, then? It is perhaps worth noting that in Valencian (broadly regarded by linguists as a variety of Catalan, but a clearly distinct one nonetheless) there is a tendency to go the Iberian route; the “past remote” form is quite common in the spoken language (cantí ‘I sang’). However, in Central Catalan, upon which the version taught to foreigners is based (and which is regarded in Catalonia itself and in the Balearics, at least up to a point, as the “standard” version), this form is rarely if ever heard in contemporary speech. This would suggest that Catalan is aligned to French and (northern) Italian in this regard, except that a distinction between “past one-off” (‘I sang’) and “past perfect” (‘I have sung’). This is where it gets curious…

To express Spanish he cantado, Catalan does indeed make use of he cantat (likewise for Spanish cantaba, Catalan makes use of cantava); however, for Spanish canté (or for that matter Valencian cantí), Catalan has an entirely different form – and a very odd one.

To indicate what is meant by the “past remote” in Spanish, Valencian and southern Italian, Catalan uses the verb anar ‘to go’ (sometimes in slightly modified form) plus the infinitive, thus vaig cantar translates generally as ‘I sang’ (although exactly how this is distinguished varies from other languages and even within Catalan).

This raises an obvious question – why? Typically, the form ‘to go’ plus infinitive or base form indicates future: French je vais chanter, Portuguese vou cantar, Spanish (in a slightly different form) voy a cantar and even English ‘I am going to sing’ (or, for that matter, Dutch ik ga zingen or Afrikaans ek gaan sing) all have clear future grammatical reference (Italian vado a cantare is to be taken more literally as meaning that I am in the act of going to sing, so as in Classical Latin or indeed modern German is literal rather than grammatical; however, the grammatical use in those other languages does derive from an original literal use).

What happened in Catalan, therefore? Well, surprisingly, there is nothing particularly future about “to go” grammatically. Even in some dialects of English, in colloquial speech we may hear forms which use “to go” for the past: in southern England, for example, we may hear something like “Well, would you believe it, he only goes and sings the whole song, doesn’t he!” – here “goes and” is effectively a grammatical marker but it refers to something in a narrative which occurred in the past.

This is not just a Cockney-Catalan link, by any means. There is evidence of a similar formulation developing across many Western European dialects, including most notably the one which became Standard French; it is only in recent centuries that je vais chanter has come definitively to mean future rather than past. Catalan in fact does allow the formulation vaig a cantar (with the preposition, as in Spanish) to refer to the future although, for obvious reasons of potential confusion, it is less common.

Essentially, therefore, Catalan’s development of what in most languages is a future grammatical form to refer instead to the past is not unlike that of some English dialects. It is a curiosity but, in modern Catalan (at least around Barcelona and along Catalonia’s coast), it has become a widespread one!

All languages are the same age…

Lá Fhéile Pádraig Sona Daoibh just a couple of days ahead of time in one of those rarities – a leap year with St Patrick’s Day on a Sunday.

Of course, St Patrick himself would not have been a native Gaeilgeoir, although he would of course have come to speak the language (most likely with an hint of an East Ulster accent). In fact, he would have arrived in Ireland where the Irish language was long established, having come perhaps from England where the English language was at the time completely unknown.

Names

In fact, language names are a funny thing because the above sentence would not really work in Irish itself; what in the English language is referred to as “Irish” is referred to in Irish as Gaeilge (literally ‘Gaelic’), whereas the adjective for ‘of Ireland’ is Éireann. Similarly, ‘England’ in Irish is Sasana but ‘English’ (as in the language) is Béarla.

Languages travel

This is in fact more accurate than in English; the island of Ireland has taken roughly its current form for as long as human habitation has existed on it (almost 10,000 years); yet Gaeilge has existed on the island for “only” a little over a quarter of that period. During the preceding millennia it had made its way across Europe, likely dipping south and one stage (which still likely part of a single Proto-Celtic language and perhaps even as a single Proto-Italo-Celtic language) before heading northwest, ultimately from what is now Eastern Ukraine. It arrived in Ireland about half its lifetime ago (if we define that as its existence as a separate language from the original Proto-Indo-European) and of course it kept going, to Scotland and the Isle of Man.

A little over a millennium after “Irish” arrived in “Ireland”, “English” arrived in “England”; it is worth emphasising that the language was named first. Various Germanic tribes headed across the North Sea from what is now northern Germany and southern Denmark, the most prominent among them being the Angles and the Saxons. Interestingly, the Saxons may have been more numerous in total, but the number travelling constituted a lower share of all Saxons than the number of Angles travelling (it is thought almost all Angles travelled). For whatever reason, the pure numbers determined that “Saxons” would give their name to Sasana directly (i.e. the land was named for the tribe). In their own language, however, the fact that almost all “Angles” travelled meant that the language spoken in the new location of eastern Britain came to be known as “Angle-ish” or “English”; ultimately, in English itself, the name of the kingdom followed the name of the language (i.e. there was an “English” language before there was an “England”). Curiously, Béarla derives from an older term simply meaning ‘speech’ (but, at least implicitly, ‘foreign speech’).

Ireland

This all means, of course, that “Irish” has spent about half its lifetime in “Ireland”, but people in Ireland spoke different languages for almost three quarters of humanity’s existence on the island.

Interestingly, if you define “lifetime” a little differently, it could be argued that “English” has spent roughly half its lifetime in Ireland too (having arrived in the 12th century, albeit as a minor sideshow given that government in England at the time took place in Norman French), taking as the starting date its departure from Continental Europe. Given that length of time in each case, it is small wonder that so many regret the decline of “Irish”, but also that so many of the most prominent exponents of literary “English” have in fact been “Irish” (although it is subjective, this must surely be a number disproportionate to population, particularly if we include contributions not just to writing but also to cinema and popular music).

Of course, there are other linguistic curiosities in Ireland. It is probably that some unexplained aspects of place names date from before even Irish (Gaelic); there has also been influence from other languages, most obviously Scots in the north but many others. It is not infrequent across the island now for more than 20 languages to be spoken in a single school. (However, it does need to be emphasised that this should not discount from the particular requirement to maintain Irish; Polish, Tagalog or Portuguese will manage just fine regardless of how they develop in Ireland, whereas Irish needs promotion in Ireland itself or its very survival is at risk.)

“Old” languages

We do need to take particular care of Irish, for reasons noted above, but we also need to be clear about one thing: all languages are actually the same age!

Irish does hold a notable distinction of having extant writing existing before any Germanic language (including English); this does not make it “older”, just “attested earlier” – even though it may be noted that this is of significant value to historical linguists, particularly in their reconstruction of the very Proto-Indo-European language from which Irish, English and indeed languages spoken natively by almost half the world’s population (from Spanish to Hindi) all derive.

English is in fact the “earliest attested” extant Germanic language (though we do have quite a number of texts in Gothic, a now extinct Germanic language, from centuries before this), but again this does not make it the “oldest”. It is perhaps apt that one of the best known translations of the Old English epic poem Beowulf was by the late Seamus Heaney, an Irishman also well acquainted with the Irish language!

Ultimately, however, all languages are the same age. They pre-date even Proto-Indo-European (spoken roughly between 6500 and 4500 years ago), of course – that too must have derived from a different language perhaps influenced by other different languages (though it is extremely unlikely that they bore any relation to the one spoken in Ireland at the time).

Ultimately every sentence we utter is an echo of the distant past. If we speak and Indo-European language like Irish or English, it is in fact an echo of a common distant past. Maybe that is a thought for St Patrick’s Day?!

Which Western European languages are closest to each other?

The major Western European languages fall broadly into two family groups – those descended from Latin, and those descended from Proto-West-Germanic (PWG, a language contemporary with Golden Age Latin but much less attested than it; to the north, Nordic languages are descended from Proto-North-Germanic, better known as Old Norse, which then shares a common ancestor farther back with PWG – of course, ultimately they all share a common ancestor when you go back far enough).

Even this is, of course, a simplification: English is derived from PWG (which gives it most of its core vocabulary – ‘here’, ‘three’, ‘give’) but took on board a lot of vocabulary first from Nordic languages (‘take’, ‘they’, ‘egg’) and then from Norman French (‘animal’, ‘chief’, ‘mayor’) and other Latin-derived languages (‘concert’, ‘cuisine’, ‘chocolate’). Words are shifting between those groups all the time in any case: German has borrowed words from Latin-derived languages (‘tchau‘, ‘studieren‘, ‘Nation‘) and Latin-derived languages such as Italian have borrowed words from German (‘fon‘, ‘norte‘, ‘guerra‘). Like a good song, a good word travels (pizza, anyone?)

Statistics

Sometimes attempts are made to show how closely languages are related. One very common way is to compare “lexicon” (vocabulary): a number of words is taken from a text (or even a dictionary) and then the researcher looks for cognates in another language. For example, Italian costa is very obviously cognate with Spanish costa, fairly obviously with English coast, and perhaps marginally less obviously with French côte (the circumflex gives it away there, as it often marks what was once a following -s). You will then see figures banded around such as “85% lexical similarity” (a fairly typical score for Western Romance languages). There are problems with this, however: how far back do you go? Ultimately German zehn is cognate with Portuguese dez ‘ten’, but that is not at all obvious either in modern pronunciation or in modern spelling and the link is many millennia ago. Also, what happens if you have a word with a different meaning (for example Italian controllare does not typically mean the same as English “control”, usually translating more as ‘check’; and the French anniversaire typically has a much more specific meaning of ‘birthday’ than English “anniversary”)?

Another attempt was made through a process known as lexicostatistics, which happened to be the topic of my own dissertation. This lined up a particular corpus of words (say, the most common one hundred, or a particular series of verbs) and then identified how many were cognate, with the figure then used to determine when the languages divided from their common ancestor. My own work demonstrated this to be, well, somewhere between problematic and utterly pointless…

Of course, all of this statistical work also focuses solely on vocabulary – not on phonological or grammatical similarities, for example. Ultimately, I do not think there is a purely objective way of determining how closely related any two languages are (nor indeed if they are even to be regarded as different languages at all – that is, really, a socio-political question).

I can then give only an opinion, and even that will be specifically with regard to the standard languages that are actually taught to outsiders.

West Germanic

From northwest to southeast, the major West Germanic languages are English, Dutch and German; practically, between Dutch and German needs to be inserted Afrikaans.

There are, as I have written before, also other considerations here. Luxembourgish now exists with its own standard separate from German, but is spoken by only a few hundred thousand people (and is intelligible, after a bit of tuning in, to German speakers). Swiss German is an umbrella term for the dialects of “German” spoken in Switzerland, Liechtenstein and (arguably) the western Austrian province of Vorarlberg; ultimately, however, its speakers still regard Standard German as their written language (and switch to it happily in the presence of outsiders). Low German, spoken in northern Germany and perhaps parts of the Netherlands (depending on definition) was once a language of considerable prestige in medieval trade and administration, but has largely dropped out of use leaving only a substrate; Scots (and its variant in Northern Ireland and Donegal known as Ulster Scots) has a similar linguistic status to Low German – it was once in a stronger position but retains a considerable historical prestige and modern revival movement. Frisian, spoken in different varieties in the northern Netherlands and small parts of Germany, is regarded as the closest language to English (largely because it too has made /k/ affricate, as in English ‘cheese’ versus Dutch kaas), but it is a discernibly regional rather than national language. Some may also argue that “Flemish” should be regarded as separate from “Dutch”, but most accept it is just a different name for the language as used by a narrow majority of Belgians overall and as the official language across the north of the country. Yiddish is also a language in religious use (though much less so than was once the case), which is a High German variety and thus very close to Standard German (although written in a different script and with markedly different vocabulary in particular areas). We should arguably not be discounting English-based creole languages, such as Jamaican Patois or Tok Pisin, which are sometimes in widespread use in Africa, the Far East and the Caribbean.

Leaving those considerations to one side and focusing on just the four major languages, there is little real doubt that:

  • the closest to English is Dutch;
  • the closest to German is also Dutch; but
  • by far the closest to Dutch is Afrikaans (and vice-versa).

My impression is that English is the outlier given its notably simplified grammar and considerable Latin and Norse lexical influence; that said, Afrikaans is also an outlier grammatically (in many ways, grammatically, the closest to Dutch is German).

One quirk here is that I would say Swiss German is slightly closer to Dutch both phonologically and particularly grammatically than Standard German is, despite being geographically farther removed from it (though obviously Swiss German is closer to Standard German than it is to Dutch).

Western Romance

From west to east, the major Western Romance languages (i.e. those derived from the language of “Rome”, thus Latin) are Portuguese, Spanish (aka Castellano), Catalan, French and Italian.

Again here we are omitting Romanian (most obviously), the national language not only of Romania but also of Moldova, because it is isolated to the east (and also, frankly, because I don’t speak it at all…); we are also omitting some consequential regional languages in Spain, most obviously Galician (which is particularly interesting as it shares its origins more directly with Portuguese than with Spanish); we are also including Valencian under Catalan (which would be controversial to some) and we are discounting Occitan spoken across southern France (again interesting because its origins are more closely bound to Catalan than to French); and we are removing from consideration the various dialetti of Italy (translating these as “dialects” does not do them justice; they are really separate languages derived from Latin separately from the version which became Standard Italian, but they have all – broadly to a greater extent in the north than the south and on the mainland than on the islands but to a considerable extent everywhere – been heavily influenced now by it). There is also a national language of Switzerland, Romansh, alongside related regional languages of Italy, Ladin and Friulian, which are long attested and hold some role in education even today, but are not considered here.

With the five remaining languages, I would say personally (but this is very open to debate) that:

  • French is the clear outlier (phonologically, grammatically and lexically), but of the other four Italian is closest to it followed by Catalan (and then actually by Portuguese);
  • Italian is a secondary outlier (definitely grammatically), not least because it is the farthest east, but is still on balance closer to the Iberian languages than to French (except perhaps lexically), although which Iberian language it is closest to is debatable (I err just towards Spanish but there is a good case, particularly lexically, for Catalan);
  • Catalan is on balance just about closer to Spanish than it is to the others, but it is much less clearly the case than you would perhaps expect – French would probably be the nearest challenger but it is also quite close, particularly lexically, to Italian;
  • Spanish, however, is closer to Portuguese than it is to Catalan (and then closer to Italian than it is to French);
  • Portuguese is closest to Spanish as you would expect, but then it is debatable – phonologically there is a case for French being text but, taking grammar and vocabulary into account, it is probably actually Italian followed by Catalan.

The geography does make some sense, but only if you recognise that it was once easier to cross sea than land: so Portuguese does naturally become Spanish which then kind of branches upwards into Catalan and sidewards into Italian, with the latter two then kind of meeting French together.

Esperanto

For the record, Esperanto has a distinct structure and quite a Slavic phonology, but lexically it is roughly two thirds Romance, a quarter Germanic, with most of the rest Slavic; on balance, lexically, it is probably closest to French then Italian (there was no specific attempt to take words into it from the Iberian languages, though common Romance words such as granda ‘big’ do appear). There is also a slight grammatical bias towards Romance generally in that the language contains a grammatically marked future tense, unlike Germanic languages.

However, as noted, this is really quite subjective – others will have legitimate alternative views on all of this!