首先,您应该提取单词
来删除。由于您将单词定义为除(白色)空格和标点符号以外的任何字符,因此patter
可以写成
using System.Linq;
using System.Text.RegularExpressions;
...
// One or more characters which are not whitespaces and not punctuations
string pattern = @"[^\s\p{P}]+";
现在我们在Linq和正则表达式的帮助下收集单词
:
string exclude = "A quick brown fox (лисиця) jumps over (very!) lazy dog";
HashSet<string> words = new HashSet<string>(Regex
.Matches(exclude, pattern)
.Cast<Match>()
.Select(match => match.Value), StringComparer.OrdinalIgnoreCase);
然后也用正则表达式删除这些单词:
string source = "Лисиця (Fox) is a red smart wild dog.";
string result = Regex.Replace(
source,
pattern,
match => words.Contains(match.Value) ? "" : match.Value);
一起来看看:
Console.Write(result);
结果:
() is red smart wild .
请注意,wirmoscelsfoch
、fox
、a
、dog
等字将被删除