doc: add documentation on ICU · nodejs/node@90fcccd

1+

# Internationalization Support

2+3+

Node.js has many features that make it easier to write internationalized

4+

programs. Some of them are:

5+6+

- Locale-sensitive or Unicode-aware functions in the [ECMAScript Language

7+

Specification][ECMA-262]:

8+

- [`String.prototype.normalize()`][]

9+

- [`String.prototype.toLowerCase()`][]

10+

- [`String.prototype.toUpperCase()`][]

11+

- All functionality described in the [ECMAScript Internationalization API

12+

Specification][ECMA-402] (aka ECMA-402):

13+

- [`Intl`][] object

14+

- Locale-sensitive methods like [`String.prototype.localeCompare()`][] and

15+

[`Date.prototype.toLocaleString()`][]

16+17+

Node.js (and its underlying V8 engine) uses [ICU][] to implement these features

18+

in native C/C++ code. However, some of them require a very large ICU data file

19+

in order to support all locales of the world. Because it is expected that most

20+

Node.js users will make use of only a small portion of ICU functionality, only

21+

a subset of the full ICU data set is provided by Node.js by default. Several

22+

options are provided for customizing and expanding the ICU data set either when

23+

building or running Node.js.

24+25+

## Options for building Node.js

26+27+

To control how ICU is used in Node.js, four `configure` options are available

28+

during compilation. Additional details on how to compile Node.js are documented

29+

in [BUILDING.md][].

30+31+

- `--with-intl=none` / `--without-intl`

32+

- `--with-intl=system-icu`

33+

- `--with-intl=small-icu` (default)

34+

- `--with-intl=full-icu`

35+36+

An overview of available Node.js and JavaScript features for each `configure`

37+

option:

38+39+

| | `none` | `system-icu` | `small-icu` | `full-icu`

40+

|-----------------------------------------|-----------------------------------|------------------------------|------------------------|------------

41+

| [`String.prototype.normalize()`][] | none (function is no-op) | full | full | full

42+

| `String.prototype.to*Case()` | full | full | full | full

43+

| [`Intl`][] | none (object does not exist) | partial/full (depends on OS) | partial (English-only) | full

44+

| [`String.prototype.localeCompare()`][] | partial (not locale-aware) | full | full | full

45+

| `String.prototype.toLocale*Case()` | partial (not locale-aware) | full | full | full

46+

| [`Number.prototype.toLocaleString()`][] | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full

47+

| `Date.prototype.toLocale*String()` | partial (not locale-aware) | partial/full (depends on OS) | partial (English-only) | full

48+49+

*Note*: The "(not locale-aware)" designation denotes that the function carries

50+

out its operation just like the non-`Locale` version of the function, if one

51+

exists. For example, under `none` mode, `Date.prototype.toLocaleString()`'s

52+

operation is identical to that of `Date.prototype.toString()`.

53+54+

### Disable all internationalization features (`none`)

55+56+

If this option is chosen, most internationalization features mentioned above

57+

will be **unavailable** in the resulting `node` binary.

58+59+

### Build with a pre-installed ICU (`system-icu`)

60+61+

Node.js can link against an ICU build already installed on the system. In fact,

62+

most Linux distributions already come with ICU installed, and this option would

63+

make it possible to reuse the same set of data used by other components in the

64+

OS.

65+66+

Functionalities that only require the ICU library itself, such as

67+

[`String.prototype.normalize()`][], are fully supported under `system-icu`.

68+

Features that require ICU locale data in addition, such as

69+

[`Intl.DateTimeFormat`][] *may* be fully or partially supported, depending on

70+

the completeness of the ICU data installed on the system.

71+72+

### Embed a limited set of ICU data (`small-icu`)

73+74+

This option makes the resulting binary link against the ICU library statically,

75+

and includes a subset of ICU data (typically only the English locale) within

76+

the `node` executable.

77+78+

Functionalities that only require the ICU library itself, such as

79+

[`String.prototype.normalize()`][], are fully supported under `small-icu`.

80+

Features that require ICU locale data in addition, such as

81+

[`Intl.DateTimeFormat`][], generally only work with the English locale:

82+83+

```js

84+

const january = new Date(9e8);

85+

const english = new Intl.DateTimeFormat('en', { month: 'long' });

86+

const spanish = new Intl.DateTimeFormat('es', { month: 'long' });

87+88+

console.log(english.format(january));

89+

// Prints "January"

90+

console.log(spanish.format(january));

91+

// Prints "January" or "M01" on small-icu

92+

// Should print "enero"

93+

```

94+95+

This mode provides a good balance between features and binary size, and it is

96+

the default behavior if no `--with-intl` flag is passed. The official binaries

97+

are also built in this mode.

98+99+

#### Providing ICU data at runtime

100+101+

If the `small-icu` option is used, one can still provide additional locale data

102+

at runtime so that the JS methods would work for all ICU locales. Assuming the

103+

data file is stored at `/some/directory`, it can be made available to ICU

104+

through either:

105+106+

* The [`NODE_ICU_DATA`][] environmental variable:

107+108+

```shell

109+

env NODE_ICU_DATA=/some/directory node

110+

```

111+112+

* The [`--icu-data-dir`][] CLI parameter:

113+114+

```shell

115+

node --icu-data-dir=/some/directory

116+

```

117+118+

(If both are specified, the `--icu-data-dir` CLI parameter takes precedence.)

119+120+

ICU is able to automatically find and load a variety of data formats, but the

121+

data must be appropriate for the ICU version, and the file correctly named.

122+

The most common name for the data file is `icudt5X[bl].dat`, where `5X` denotes

123+

the intended ICU version, and `b` or `l` indicates the system's endianness.

124+

Check ["ICU Data"][] article in the ICU User Guide for other supported formats

125+

and more details on ICU data in general.

126+127+

The [full-icu][] npm module can greatly simplify ICU data installation by

128+

detecting the ICU version of the running `node` executable and downloading the

129+

appropriate data file. After installing the module through `npm i full-icu`,

130+

the data file will be available at `./node_modules/full-icu`. This path can be

131+

then passed either to `NODE_ICU_DATA` or `--icu-data-dir` as shown above to

132+

enable full `Intl` support.

133+134+

### Embed the entire ICU (`full-icu`)

135+136+

This option makes the resulting binary link against ICU statically and include

137+

a full set of ICU data. A binary created this way has no further external

138+

dependencies and supports all locales, but might be rather large. See

139+

[BUILDING.md][BUILDING.md#full-icu] on how to compile a binary using this mode.

140+141+

## Detecting internationalization support

142+143+

To verify that ICU is enabled at all (`system-icu`, `small-icu`, or

144+

`full-icu`), simply checking the existence of `Intl` should suffice:

145+146+

```js

147+

const hasICU = typeof Intl === 'object';

148+

```

149+150+

Alternatively, checking for `process.versions.icu`, a property defined only

151+

when ICU is enabled, works too:

152+153+

```js

154+

const hasICU = typeof process.versions.icu === 'string';

155+

```

156+157+

To check for support for a non-English locale (i.e. `full-icu` or

158+

`system-icu`), [`Intl.DateTimeFormat`][] can be a good distinguishing factor:

159+160+

```js

161+

const hasFullICU = (() => {

162+

try {

163+

const january = new Date(9e8);

164+

const spanish = new Intl.DateTimeFormat('es', { month: 'long' });

165+

return spanish.format(january) === 'enero';

166+

} catch (err) {

167+

return false;

168+

}

169+

})();

170+

```

171+172+

For more verbose tests for `Intl` support, the following resources may be found

173+

to be helpful:

174+175+

- [btest402][]: Generally used to check whether Node.js with `Intl` support is

176+

built correctly.

177+

- [Test262][]: ECMAScript's official conformance test suite includes a section

178+

dedicated to ECMA-402.

179+180+

[btest402]: https://github.com/srl295/btest402

181+

[BUILDING.md]: https://github.com/nodejs/node/blob/master/BUILDING.md

182+

[BUILDING.md#full-icu]: https://github.com/nodejs/node/blob/master/BUILDING.md#build-with-full-icu-support-all-locales-supported-by-icu

183+

[`Date.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Date/toLocaleString

184+

[ECMA-262]: https://tc39.github.io/ecma262/

185+

[ECMA-402]: https://tc39.github.io/ecma402/

186+

[full-icu]: https://www.npmjs.com/package/full-icu

187+

[ICU]: http://icu-project.org/

188+

["ICU Data"]: http://userguide.icu-project.org/icudata

189+

[`--icu-data-dir`]: cli.html#cli_icu_data_dir_file

190+

[`Intl`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Intl

191+

[`Intl.DateTimeFormat`]: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/DateTimeFormat

192+

[`NODE_ICU_DATA`]: cli.html#cli_node_icu_data_file

193+

[`Number.prototype.toLocaleString()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Number/toLocaleString

194+

[`String.prototype.localeCompare()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/localeCompare

195+

[`String.prototype.normalize()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/normalize

196+

[`String.prototype.toLowerCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toLowerCase

197+

[`String.prototype.toUpperCase()`]: https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/String/toUpperCase

198+

[Test262]: https://github.com/tc39/test262/tree/master/test/intl402