用R删除某个单词前的字符串

提问者：小点点

用R删除某个单词前的字符串

我有一个需要清理的字符向量。具体来说，我想删除“投票”之前的数字请注意，数字有一个逗号分隔千，因此更容易将其视为字符串。

我知道gsub（"*.投票"，"，文本）将删除所有内容，但如何删除数字？另外，我如何将重复的空间折叠成一个空间？

谢谢你的帮助！

示例数据：

text <- "STATE QUESTION NO. 1                       Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee?                    558,586 Votes"

共2个答案

匿名用户

你可以用

text <- "STATE QUESTION NO. 1                       Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee?                    558,586 Votes"
trimws(gsub("(\\s){2,}|\\d[0-9,]*\\s*(Votes)", "\\1\\2", text))
# => [1] "STATE QUESTION NO. 1 Amendment to Title 15 of the Nevada Revised Statutes Shall Chapter 202 of the Nevada Revised Statutes be amended to prohibit, except in certain circumstances, a person from selling or transferring a firearm to another person unless a federally-licensed dealer first conducts a federal background check on the potential buyer or transferee? Votes"

请参阅在线R演示和在线正则表达式演示。

细节

（\\s）{2，}-在捕获将使用替换模式中的\1占位符重新插入的最后一个匹配项时，匹配2个或多个空白字符
|-或
\\d-一个数字
[0-9，]*-0或更多数字或逗号
\\s*-0个空格字符
（投票）-第2组（将使用\2占位符在输出中恢复）：一个投票子字符串

请注意，trimws将删除任何前导/尾随空格。

匿名用户

最简单的方法是使用stringr：

> library(stringr)
> regexp <- "-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+"
> str_extract(text,regexp)
[1] "558,586 Votes"

要执行相同的操作但只提取数字，请将其包装在gsub：

> gsub('\\s+[[:alpha:]]+', '', str_extract(text,regexp))
[1] "558,586"

这里有一个版本，它将去掉“投票”这个词之前的所有数字，即使它们有逗号或句号：

> gsub('\\s+[[:alpha:]]+', '', unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+",text) )) )
[1] "558,586"

如果您也想要标签，那么只需扔掉gsub部分：

> unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]* Votes+",text) )) 
[1] "558,586 Votes"

如果你想找出所有的数字：

> unlist(regmatches (text,gregexpr("-?[[:digit:]]+\\.*,*[[:digit:]]*\\.*,*[[:digit:]]*",text) ))
[1] "1"       "15"      "202"     "558,586"

用R删除某个单词前的字符串

共2个答案

相关问题

热门标签

用R删除某个单词前的字符串

共2个答案

相关问题

热门标签

微信关注