Intl.Segmenter
Intl.Segmenter splits text into graphemes, words, or sentences according to locale-aware rules. It is especially useful for languages where simple whitespace splitting is not enough.
Overview
Intl.Segmenter splits text into graphemes, words, or sentences according to locale-aware rules. It is especially useful for languages where simple whitespace splitting is not enough.
Browser support
| Feature | Desktop | Mobile | ||||
|---|---|---|---|---|---|---|
| Chrome | Edge | Firefox | Safari | Chrome Android | Safari iOS | |
| 87 | 87 | 125 | 14.1 | 87 | 14.5 | |
| Built-in object | ||||||
| The Intl.Segmenter() constructor creates Intl.Segmenter objects. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| The resolvedOptions() method of Intl.Segmenter instances returns a new object with properties reflecting the options computed during initialization of this Segmenter object. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| The segment() method of Intl.Segmenter instances segments a string according to the locale and granularity of this Intl.Segmenter object. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| The Intl.Segmenter.supportedLocalesOf() static method returns an array containing those of the provided locales that are supported in segmentation without having to fall back to the runtime's default locale. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| A Segments object is an iterable collection of the segments of a text string. It is returned by a call to the segment() method of an Intl.Segmenter object. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| The [Symbol.iterator]() method of Segments instances implements the iterable protocol and allows Segments objects to be consumed by most syntaxes expecting iterables, such as the spread syntax and Statements/for...of loops. It returns a segments iterator object that yields data about each segment. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
| The containing() method of Segments instances returns an object describing the segment in the string that includes the code unit at the specified index. | 87 | 87 | 125 | 14.1 | 87 | 14.5 |
Syntax
const segmenter = new Intl.Segmenter('ja', { granularity: 'word' });
const segments = [...segmenter.segment('今日は天気がいいです')];
// Each segment: '今日', 'は', '天気', 'が', 'いい', 'です' Live demo
Intl.Segmenter
Text word or text to split.Japanese etcemptywhite in blockcut word language to use.
Use cases
Word-aware processing
Segment text for counters, highlights, or analysis in languages that do not rely on spaces between words.
Grapheme-safe editing
Treat user-perceived characters more accurately when cursor movement or counting must respect composed characters.
Cautions
- Segmentation rules vary by locale, so choose the locale intentionally rather than assuming one universal behavior.
- It improves tokenization but does not replace full natural-language understanding or grammar-aware parsing.
Accessibility
- Better segmentation can improve counters, truncation, and editing behavior for multilingual users, reducing broken text experiences.
Related links
Powered by web-features