Python正则表达式匹配整字

提问者：小点点

Python正则表达式匹配整字

我无法为以下方案找到正确的正则表达式:

让我们说:

a = "this is a sample"

我想匹配整个单词-例如，匹配“hi”应该返回False，因为“hi”不是单词，而“is”应该返回True，因为左侧和右侧没有alpha字符。

共3个答案

匿名用户

试试看

re.search(r'\bis\b', your_string)

从文档中:

\b匹配空字符串，但仅在单词的开头或结尾处。

请注意，re模块将“word”简单定义为“字母数字或下划线字符序列”，其中“字母数字”取决于区域设置或unicode选项。

还要注意，如果没有原始字符串前缀，\b将被视为“Backspace”，而不是regex单词边界。

匿名用户

尝试使用regex模块re中的“Word Boundary”字符类:

x="this is a sample"
y="this isis a sample."
regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)

regex.findall(y)
[]

regex.findall(x)
['is']

来自re.search()的文档。

\b匹配空字符串，但仅在单词的开头或结尾处

。。。

例如，R'\bfoo\b'匹配'foo'，'foo.'，'(foo)'，'bar foo Baz'，但不匹配'foob'或'foo3'

匿名用户

我认为，使用给出的答案并没有完全达到OP所期望的行为。具体地说，没有实现所需的布尔值输出。所给出的答案确实有助于说明这一概念，我认为这些答案非常出色。也许我可以说明我的意思，我认为OP使用所使用的例子是因为以下原因。

给出的字符串是，

a=“This is a sample”

该行动计划随后指出，

我想匹配整个单词-例如，match“hi”应该返回false，因为“hi”不是单词。。。

据我所知，引用的是搜索令牌“hi”，因为它在单词“this”中可以找到。如果有人要在字符串a中搜索单词“hi”，他们应该收到false作为响应。

行动继续进行，

。。。和“is”应该返回true，因为左侧和右侧没有alpha字符。

在本例中，引用的是搜索令牌“is”，因为它位于单词“is”中。我希望这有助于澄清我们为什么要使用词的界限。其他答案的行为是“不要返回一个单词，除非该单词是自己找到的--而不是在其他单词中。” “Word Boundary”速记字符类很好地完成了这项工作。

到目前为止，示例中只使用了单词“is”。我认为这些答案是正确的，但我认为这个问题还有更多的基本含义需要解决。应该注意其他搜索字符串的行为，以便理解概念。换句话说，我们需要使用re.match（r“\bis\b”，your_string)推广@Georg的（出色的）答案。@Omprakash的答案中也使用了相同的r“\bis\b”概念，他通过显示

>>> y="this isis a sample."
>>> regex=re.compile(r"\bis\b")  # For ignore case: re.compile(r"\bis\b", re.IGNORECASE)
>>> regex.findall(y)
[]

假设应该展示我所讨论的行为的方法被命名为

find_only_whole_word(search_string, input_string)

然后应预期以下行为。

>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True

再一次，这就是我如何理解OP的问题。通过@Georg的回答，我们已经向这个行为迈出了一步，但是它有点难以解释/实现。意思是

>>> import re
>>> a = "this is a sample"
>>> re.search(r"\bis\b", a)
<_sre.SRE_Match object; span=(5, 7), match='is'>
>>> re.search(r"\bhi\b", a)
>>>

第二个命令没有输出。来自@Omprakesh的有用答案显示输出，但不显示true或false。

下面是预期行为的更完整的示例。

>>> find_only_whole_word("this", a)
True
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("a", a)
True
>>> find_only_whole_word("sample", a)
True
# Use "ample", part of the word, "sample": (s)ample
>>> find_only_whole_word("ample", a)
False
# (t)his
>>> find_only_whole_word("his", a)
False
# (sa)mpl(e)
>>> find_only_whole_word("mpl", a)
False
# Any random word
>>> find_only_whole_word("applesauce", a)
False
>>>

这可以通过以下代码完成:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
#@file find_only_whole_word.py

import re

def find_only_whole_word(search_string, input_string):
  # Create a raw string with word boundaries from the user's input_string
  raw_search_string = r"\b" + search_string + r"\b"

  match_output = re.search(raw_search_string, input_string)
  ##As noted by @OmPrakesh, if you want to ignore case, uncomment
  ##the next two lines
  #match_output = re.search(raw_search_string, input_string, 
  #                         flags=re.IGNORECASE)

  no_match_was_found = ( match_output is None )
  if no_match_was_found:
    return False
  else:
    return True

##endof:  find_only_whole_word(search_string, input_string)

下面是一个简单的演示。从保存文件的同一目录find_only_whole_word.py运行Python解释器。

>>> from find_only_whole_word import find_only_whole_word
>>> a = "this is a sample"
>>> find_only_whole_word("hi", a)
False
>>> find_only_whole_word("is", a)
True
>>> find_only_whole_word("cucumber", a)
False
# The excellent example from @OmPrakash
>>> find_only_whole_word("is", "this isis a sample")
False
>>>