Unless they changed it, it's probably similar to CFStringTokenizer which used ICU Boundary Analysis (and maybe mecab for Japanese).
https://unicode-org.github.io/icu/userguide/boundaryanalysis...
Is that the same as the macOS dictionary being parsed here? It seems like a pretty big file to grep every time!
I assume at compile time it's converted to a more efficient query format
Unless they changed it, it's probably similar to CFStringTokenizer which used ICU Boundary Analysis (and maybe mecab for Japanese).