About the map
This map draws on two main datasets: Glottolog and the UNESCO Atlas of the World's Languages in Danger.
Glottolog is an open academic database that catalogues the world's languages, dialects and language families. It aggregates data from thousands of published sources, which means its quality is only as good as the underlying research. For many languages, especially in South and Southeast Asia, that research is outdated, contested, or has simply never been documented. Speaker numbers from India in particular should be treated with caution, as they derive from a 2011 census that is now significantly out of date, and India's methodology for counting language speakers has long been disputed by linguists. Mapping isn’t easy - which is probably why UNESCO’s own map has been down for the entire time I (Sophia) have been researching language loss. That’s…over 3 years!
The UNESCO Atlas identifies languages at risk of disappearing, classifying them across five levels of endangerment. Like Glottolog, it reflects the state of documentation at the time of publication, and some classifications have since been disputed or revised by affected communities.
For labels where you think the information is out of date or wrong, I’d like to add a community tag. You can submit community tags here.
What this map cannot fully capture
Official status data reflects legal recognition on paper, which does not always correspond to real-world use. A language can be nationally official and still be losing speakers rapidly, Irish being a well-known example. Then, there are some languages with no official status remain healthy within their communities.
Another important note is that documentation levels show how well a language has been recorded by researchers, but this is not the same as a language being at risk of disappearing. A well-documented language can still be endangered, and a documented language that loses its last native speakers is not necessarily lost forever. Languages with strong written records can be and have been revived.
Linguists increasingly prefer the term sleeping over extinct for documented languages, to reflect this possibility. As you have noticed in the datasets I use, none of them have shifted their vocabulary and so I simply write what they write to accurately reflect their data.
In my work, I try to pivot the direction of scrutiny towards the agent of language loss, rather than the language itself, hence why I focus on linguicide. We might disagree on words like death, extinction, sleep - but I hope we can all agree that languages experience suppression, erasure and everything else linguicide attempts to capture.
Language situations change, communities push back against classifications imposed by outside researchers, and new speakers emerge. The map will always have a date on it from when it was last updated.