我想删除Java中文本的标点符号。我知道有一种模式可以匹配所有标点,即.{p{Punct},但这将删除所有标点。然而,我想保留首字母缩略词和连字符的单词。例如,在我删除标点符号时,保留“m.i.t.”或“最新技术”、“9.4”、“11:00”、“p.m.”、“976-4275”。
我试过\p{Punct},但它会删除所有标点符号。
String text = "There's a string from M.I.T., written by Jason at 11:00 p.m. 976-4275, 9.5, another word is state-of-the-art.";
text.replaceAll("\\p{Punct}", "");
System.out.println(text);
结果将是:
"There s a string from MIT written by Jason at 1100 pm 9764275 95 another word is stateoftheart"
但我想要的是:
"There s a string from M.I.T. written by Jason at 11:00 p.m. 976-4275 9.5 another word is state-of-the-art"
请添加代码
解决方案:
text.replaceAll("[\\p{Punct}&&[^.]]", "");