text boundary analysis
Chrome V8 detects text boundaries with:
- Specifically, v8 uses ICU to do a bunch of Unicode-related text processing things, including breaking text up into words. The ICU boundary-detection code includes a “Dictionary-Based BreakIterator” for languages that don’t have spaces, including Japanese, Chinese, Thai, etc.
Character boundary rules: http://www.unicode.org/reports/tr29/#Grapheme%5FCluster%5FBoundaries