new_state_line = """
08 FEB 20 HME FEB08 WEBLW HGH @10:08 359.00
08 FEB 20 HME FEB08 WEBLW HGH @10:10 550.00 912.00
18 FEB 20 JJ MAYOR WINNER 34.06 875.94
28 FEB 20 ADVICE CONFIRMS RBC280W5F82WW SOMETING GIVEN 3,459.00 4,333.94
02 MAR 20 STAGECOACH SHOW STOP 59.50 4,277.44
"""
我写了以下正则表达式模式:
>>pattern = r'(\d{2}\s[A-Z]{3}\s\d{2}) (.+)\s([0-9,]+\.[0-9]+)\s*(([0-9,]+\.[0-9]+)|$)'<<
for ech_line in new_state_line.split('\n'):
reg = re.search(pattern, ech_line.upper())
if(reg):
print(reg.group(3), reg.group(4))
它给出了输出
359.00
912.00
875.94
4,333.94
4,277.44
希望看到类似以下内容的输出:
359.00 None\''
550.00 912.00
34.06 875.94
3,459.00 4,333.94
59.50 4,277.44
这是用Python编写的。有人可以帮助编写正则表达式模式吗?因为我在这里迷路了。
你的第二个捕获组太贪婪了,正在吃你想要的两个数字值中的第一个。添加一个'?'到量词将使它懒惰,并将您想要的数值留给第三个捕获组。像这样:
(\d{2}\s[A-Z]{3}\s\d{2})(.?)\s([0-9,]\[0-9])\s*([0-9,]\[0-9])\$)
事实上,有一种更简单的方法:
new_state_line = """
08 FEB 20 HME FEB08 WEBLW HGH @10:08 359.00
08 FEB 20 HME FEB08 WEBLW HGH @10:10 550.00 912.00
18 FEB 20 JJ MAYOR WINNER 34.06 875.94
28 FEB 20 ADVICE CONFIRMS RBC280W5F82WW SOMETING GIVEN 3,459.00 4,333.94
02 MAR 20 STAGECOACH SHOW STOP 59.50 4,277.44
"""
lines = new_state_line.split("\n")
result = []
for line in lines:
data = line.split(" ")
try:
float(data[-2])
result.append((data[-2],data[-1]))
except:
result.append((data[-1],))
print(result)
# [('',), ('359.00',), ('550.00', '912.00'), ('34.06', '875.94'), ('4,333.94',), ('59.50', '4,277.44'), ('',)]
简单多了,对吧?如果你觉得我的回答有帮助,请接受。