How to breaks a text or sentence into words?
At first it might look simple. We can just split the text using the
String.split(), the word is splitted using space. But what if a word ends with questions marks (?) or exclamation marks (!) instead? There might be some other rules that we also need to care.
java.text.BreakIterator makes it much simpler. The class's
getWordInstance() factory method creates a
BreakIterator instance for words break. Instantiating a
BreakIterator and passing a locale information makes the iterator to breaks the text or sentence according the rule of the locale. This is really helpful when we are working with a complex language such as Japanese or Chinese.
Let us see an example of using the
Here are the program output:
Iterates each word:
'The' found at (0, 3)
'quick' found at (4, 9)
'brown' found at (10, 15)
'fox' found at (16, 19)
'jumps' found at (20, 25)
'over' found at (26, 30)
'the' found at (31, 34)
'lazy' found at (35, 39)
'dog' found at (40, 43)
Number of word 'dog' found = 1