给定这个DOM
$html=<<<'EOD'
<div class='container clickable' data-param='{"footer":"<div>Bye</div>","info":"We win"}'>
<img src='a.jpg' />
</div>
<a href='a.html'>The A</a>
<span></span>
<span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
<a></a>
</span>
EOD;
我正在尝试使用以下表达式来preg_match_all html标记:
$tags = array();
if(preg_match_all('~<\s*[\w]+[^>]*>|<\s*/\s*[\w]+\s*>~im',$html,$matchall,PREG_SET_ORDER)){
foreach($matchall as $m){
$tags[] = $m[0];
}
}
print_r($tags);
此表达式的输出为:
数组
(
[0]=>
产出:
Array
(
[0] => <div class='container clickable' data-param='{"footer":"<div>Bye</div>","info":"We win"}'>
[1] => <img src='a.jpg' />
[2] => </div>
[3] => <a href='a.html'>
[4] => </a>
[5] => <span>
[6] => </span>
[7] => <span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
[8] => <a>
[9] => </a>
[10] => </span>
)
此正则表达式在您的代码中工作,不需要附加代码:
<\s*(?:/\s*)?\w++(?>[^>'"]++|'[^']+'|"[^"]+")*>
演示