问题描述:

I have a text. I split it into sentences and words. Next I must split it on tokens(`,`

,`.`

,`?`

,`!`

, ...) And I have a trouble here. Can you advise me which regex choose?

This is my code which split text into sentences and words.

`String s = ReadFromFile();`

String sentences[] = s.split("[.!?]\\s*");

String words[][] = new String[sentences.length][];

for (int i = 0; i < sentences.length; ++i)

{

words[i] = sentences[i].split("[\\p{Punct}\\s]+");

}

System.out.println(Arrays.deepToString(words));

So, I have a separete array of sentences and array of words. But with tokens I have a problem.

Input data

Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators:

Assume integer variable A holds 10 and variable B holds 20, then:

Expected result

. : , :

Simplest solution is to not use `split`

which requires from you description of things you don't want in result, but using `Matcher#find`

and describing things you want to find.

```
String s = "Arithmetic operators are used in mathematical expressions in the same way that they are used in algebra. The following table lists the arithmetic operators: Assume integer variable A holds 10 and variable B holds 20, then:";
Pattern p = Pattern.compile("\\p{Punct}");
//or Pattern.compile("[.]{3}|\\p{Punct}"); if you want to find "..."
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
```

Output:

```
.
:
,
:
```

Instead of printing `m.group()`

you can store it in collection like List.