提问者:小点点

preg_match_all html标记(双引号或单引号中的标记除外)


给定这个DOM

$html=<<<'EOD'
<div class='container clickable' data-param='{"footer":"<div>Bye</div>","info":"We win"}'>
 <img src='a.jpg' />
</div>
<a href='a.html'>The A</a>
<span></span>
<span data-span-param='{"detailTag":"<span class=\"link\">Anything here</span>"}'>
 <a></a>
</span>  
EOD;  

我正在尝试使用以下表达式来preg_match_all html标记:

$tags = array();
if(preg_match_all('~<\s*[\w]+[^>]*>|<\s*/\s*[\w]+\s*>~im',$html,$matchall,PREG_SET_ORDER)){
   foreach($matchall as $m){
       $tags[] = $m[0];
   }
}  
print_r($tags);

此表达式的输出为:

数组
(
[0]=>


[3]=>

[4]=>
[5]=>

[6]=>
[7]=>

[8]=>
[2]=>

[3]=>
[4]=>

[5]=>
[6]=>

[7]=>

产出:

Array  
(  
[0] => <div class='container clickable' data-param='{&quot;footer&quot;:&quot;&lt;div&gt;Bye&lt;/div&gt;&quot;,&quot;info&quot;:&quot;We win&quot;}'>  
[1] => <img src='a.jpg' />  
[2] => </div>  
[3] => <a href='a.html'>  
[4] => </a>  
[5] => <span>  
[6] => </span>  
[7] => <span data-span-param='{&quot;detailTag&quot;:&quot;&lt;span class=\&quot;link\&quot;&gt;Anything here&lt;/span&gt;&quot;}'>  
[8] => <a>
[9] => </a>  
[10] => </span>  
)

匿名用户

此正则表达式在您的代码中工作,不需要附加代码:

<\s*(?:/\s*)?\w++(?>[^>'"]++|'[^']+'|"[^"]+")*>

演示