# ISO 639
ISO 639 language codes with names in English, French & German, provided in JSON, CSV, & a NodeJS module.

See https://en.wikipedia.org/wiki/ISO_3166

## Table of Contents
1. [Overview](#overview)
  1. [Information](#information)
  1. [Data & Formats](#data--formats)
1. [Examples](#examples)
  1. [Node Module](#node-module)
  1. [JSON](#json)
  1. [CSV](#csv)
1. [Scrapers](#scrapers)
1. [Sources](#sources)

### Overview

JSON & CSV files can be found in [data/](data/)

#### Information
This repository consists of all ~~six~~ four parts of the ISO 639 standard. Part 4 & 6 are not included because they are not relevant.

For a detailed description, see https://en.wikipedia.org/wiki/ISO_639

1. [**ISO 639-1** (Part 1): Alpha-2 codes (184 major languages)](#part1)
1. [**ISO 639-2** (Part 2): Alpha-3 codes (485 languages + 20)](#part2)
1. *(WIP)* [**ISO 639-3** (Part 3): Alpha-3 codes (? languages)](#part3)
1. ~~**ISO 639-4** (Part 4): N/A~~
1. *(WIP)* [**ISO 639-5** (Part 5): Alpha-3 codes (language families and groups)](#part5)
1. ~~**ISO 639-6** (Part 6): *withdrawn*~~

#### Data & Formats
For every JSON file, there is a beautified version (`.json`) & a minified version (`.min.json`).

For convenience & readability, JSON files are ordered by key alphabetically (i.e. `chv` always precedes `chy`).

Unless explicitly stated, CSV files with the same name as a JSON file will contain the same data, with the following qualifications:
* Array values are joined into a string. For example, see names (en/fr/de) in [Part 2](#part2).
* Values which are marked optional for a JSON file will be treated as an empty string in the corresponding CSV file.

#### Part 1: Alpha-2 Codes for Major Languages
<a name="part1"></a>

Part 1 contains two-digit alphabetical codes for 184 major languages.

[data/iso_639-1.json](data/iso_639-1.json) is keyed by ISO 639-1 Alpha 2 code. Each value has:
* `639-2 (string)`: A mapping to the (terminological) ISO 639-2 code
* `639-2/B (string; optional)`: A mapping to the bibliographic ISO 639-2 code, where relevant.
* `name (string)`: The name(s) of the language, in English.
* `nativeName (string)`: The name(s) of the language, in the language itself.
* `wikiUrl (string)`: A link the the Wikipedia article on the language.

##### Example
```
{
  ...
  "bo": {
    "639-1": "bo",
    "639-2": "bod",
    "639-2/B": "tib",
    "family": "Sino-Tibetan",
    "name": "Tibetan Standard, Tibetan, Central",
    "nativeName": "བོད་ཡིག",
    "wikiUrl": "https://en.wikipedia.org/wiki/Standard_Tibetan"
  },
  ...
  "ru": {
    "639-1": "ru",
    "639-2": "rus",
    "family": "Indo-European",
    "name": "Russian",
    "nativeName": "Русский",
    "wikiUrl": "https://en.wikipedia.org/wiki/Russian_language"
  },
  ...
}
```

#### Part 2: Alpha-3 Codes for More Languages
<a name="part2"></a>

Part 2 contains three-digit alphabetical codes for 485 major languages. There are 20 additional entries for bibliographic codes (based upon the English name for the language).

Note, part 2 has been practically superceded by part 3.

[data/iso_639-2.json](data/iso_639-2.json) is keyed by ISO 639-2 Alpha 3 code. Each value has:
* `639-1 (string; optional)`: A mapping to the ISO 639-1 code, if such exists.
* `639-2 (string)`: A mapping to the (terminological) ISO 639-2 code
* `639-2/B (string; optional)`: A mapping to the bibliographic ISO 639-2 code, where relevant.
* `en (Array<string>)`: The names of the language, in English.
* `fr (Array<string>)`: The names of the language, in French.
* `de (Array<string>)`: The names of the language, in German.
* `wikiUrl (string: optional)`: A link the the Wikipedia article on the language, unless the code is special (see below).

** Note: **
* The following special codes are:
  * Included: mis, mul, und, zxx
  * Excluded: qaa-qtz (a range reserved for local use)
  See [https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes]
* The German name for `zgh` has been manually added based upon a translation from English.
* The following errors exist in the sources:
  * The Library of Congress page incorrectly states that there are 21 additional entries for bibliographic codes while only listing 20. (Feb 2017)

##### Example
```
{
  ...
  "alb": {
    "639-1": "sq",
    "639-2": "sqi",
    "639-2/B": "alb",
    "de": [
      "Albanisch"
    ],
    "en": [
      "Albanian"
    ],
    "fr": [
      "albanais"
    ],
    "wikiUrl": "https://en.wikipedia.org/wiki/Albanian_languages"
  },
  ...
  "chu": {
    "639-1": "cu",
    "639-2": "chu",
    "de": [
      "Kirchenslawisch"
    ],
    "en": [
      "Church Slavic",
      "Old Slavonic",
      "Church Slavonic",
      "Old Bulgarian",
      "Old Church Slavonic"
    ],
    "fr": [
      "slavon d'église",
      "vieux slave",
      "slavon liturgique",
      "vieux bulgare"
    ],
    "wikiUrl": "https://en.wikipedia.org/wiki/Church_Slavic_language"
  },
  ...
  "sqi": {
    ...
  }
  ...
}
```

#### Part 3: Alpha-3 Codes for Comprehensive Coverage of Languages

Part 3 contains three-digit alphabetical codes for ? major languages.

#### Part 4

#### Part 5: Alpha-3 Code for Language Families and Groups

Part 3 contains three-digit alphabetical codes for ? major languages.

#### Part 6

### Examples
***TODO***

#### Node Module
```
npm install iso-639
```

This module is a work in progress. At present, it exposes two properties:
* `iso_639_1`: contains the contents of [data/iso_639-1.json](data/iso_639-1.json)
* `iso_639_2`: contains the contents of [data/iso_639-2.json](data/iso_639-2.json)

***TODO***

### Scrapers

Data can be re-sourced using the scripts in [scrapers/](scrapers/). Alternatively:
```
npm run scrape
```

### Sources

Data is sourced from the following webpages:

* Part 1: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
* Part 2:
  * https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes
  * http://www.loc.gov/standards/iso639-2/php/code_list.php
* Part 3: (WIP)
* Part 5: (WIP)
