Terms in languages without word delimiters (CJK, Thai, etc.) are not detected

### Problem

Contextive currently relies on delimiter-based tokenisation (spaces, parentheses, etc.) and regex-based splitting (camelCase, snake_case) to identify candidate terms. This works well for languages that use spaces between words, but languages that don't use spaces to delimit words — such as Japanese, Chinese, Korean, Thai, Lao, and Khmer — cannot match glossary terms at all.

### Reproduction

1. Create a glossary file:

```yaml
contexts:
  - terms:
    - name: 注文
      definition: An order placed by a customer
```

2. Open a file containing the text `注文が届く`
3. Hover over `注文` — no hover result is shown

### Why this happens

The Tokeniser extracts the entire `注文が届く` as a single token (no delimiters within it). The CandidateTerms regex then attempts to split it by camelCase/snake_case patterns, but since CJK characters are not in the `\p{Lu}` / `\p{Ll}` ranges used by the regex, no splitting occurs. The full string `注文が届く` is looked up as-is in the index, which only contains `注文`, so no match is found.

### Affected languages

Any language that does not use spaces between words:

| Language | Script | Example |
|----------|--------|---------|
| Japanese | Kanji / Kana | 注文が届く |
| Chinese | Hanzi | 购物车管理 |
| Korean | Hangul | 주문을처리 |
| Thai | Thai script | สวัสดีครับ |
| Lao | Lao script | ພາສາລາວ |
| Khmer | Khmer script | ភាសាខ្មែរ |
| Myanmar | Myanmar script | မြန်မာဘာသာ |

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terms in languages without word delimiters (CJK, Thai, etc.) are not detected #118

Problem

Reproduction

Why this happens

Affected languages

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Language	Script	Example
Japanese	Kanji / Kana	注文が届く
Chinese	Hanzi	购物车管理
Korean	Hangul	주문을처리
Thai	Thai script	สวัสดีครับ
Lao	Lao script	ພາສາລາວ
Khmer	Khmer script	ភាសាខ្មែរ
Myanmar	Myanmar script	မြန်မာဘာသာ

Terms in languages without word delimiters (CJK, Thai, etc.) are not detected #118

Description

Problem

Reproduction

Why this happens

Affected languages

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions