在Python中使用正则表达式捕获所有连续的全大写单词？

提问者：小点点

在Python中使用正则表达式捕获所有连续的全大写单词？

我正在尝试使用Python中的正则表达式匹配所有连续的所有大写单词/短语。鉴于以下情况：

    text = "The following words are ALL CAPS. The following word is in CAPS."

代码将返回：

    ALL CAPS, CAPS

我目前正在使用：

    matches = re.findall('[A-Z\s]+', text, re.DOTALL)

但这也带来了：

    ['T', ' ', ' ', ' ', ' ALL CAPS', ' T', ' ', ' ', ' ', ' ', ' CAPS']

我显然不想要标点符号或“t”。我想只返回连续的单词或只包括所有大写字母的单个单词。

谢谢

共3个答案

匿名用户

这一个做的工作：

import re
text = "tHE following words aRe aLL CaPS. ThE following word Is in CAPS."
matches = re.findall(r"(\b(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b(?:\s+(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b)*)",text)
print matches

输出：

['tHE', 'aLL CaPS', 'ThE', 'Is', 'CAPS']

说明：

(           : start group 1
  \b        : word boundary
  (?:       : start non capture group
    [A-Z]+  : 1 or more capitals
    [a-z]?  : 0 or 1 small letter
    [A-Z]*  : 0 or more capitals
   |        : OR
    [A-Z]*  : 0 or more capitals
    [a-z]?  : 0 or 1 small letter
    [A-Z]+  : 1 or more capitals
  )         : end group
  \b        : word boundary
  (?:       : non capture group
    \s+     : 1 or more spaces
    (?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+) : same as above
    \b      : word boundary
  )*        : 0 or more time the non capture group
)           : end group 1

匿名用户

您的正则表达式依赖于显式条件（字母后面的空格）。

matches = re.findall(r"([A-Z]+\s?[A-Z]+[^a-z0-9\W])",text)

如果没有尾随小写或无字母字符，则捕获A到Z的重复。

匿名用户

保持你的正则表达式，你可以使用带（）和过滤器：

string = "The following words are ALL CAPS. The following word is in CAPS."
result = filter(None, [x.strip() for x in re.findall(r"\b[A-Z\s]+\b", string)])
# ['ALL CAPS', 'CAPS']

在Python中使用正则表达式捕获所有连续的全大写单词？

共3个答案

相关问题

热门标签

在Python中使用正则表达式捕获所有连续的全大写单词？

共3个答案

相关问题

热门标签

微信关注