提问者:小点点

获取多个标题和每个标题的第一段


我很难弄明白如何才能得到多个标题和标题的第一段。在这种情况下,我只需要h3标题和以下段落。

示例代码

function everything_in_tags($string, $tagname)
{
    $pattern = "#<\s*?$tagname\b[^>]*>(.*?)</$tagname\b[^>]*>#s";
    preg_match($pattern, $string, $matches);
    return $matches[1];
}
$tagname = "h3";

$string = "<h1>This is my title</h1>

<p>This is a text right under my h1 title.</p>
<p>This is some more text under my h1 title</p>

<h2>This is my level 2 heading</h2>
<p>This is text right under my level 2 heading</p>

<h3>First h3</h3>
<p>First paragraph for the first h3</p>

<h3>Second h3</h3>
<p>First paragraph for the second h3</p>

<h3>Third h3</h3>
<p>First paragraph for the third h3</p>
<p>Second paragraph for the third h3</p>

<h2>This is my level 2 heading</h2>
<p>This is text right under my level 2 heading</p>";

//OUTPUT: First h3
echo everything_in_tags($string, $tagname);

我想实现一个foreach循环,但这需要上面的循环按预期工作。

foreach ($headings as $heading && $paragraphs as $paragraph) {
    echo "<h3>".$heading."</h3>";
    echo "<p>".$paragraph."</p>";
}

//Expected output:
//<h3>First h3</h3>
//<p>First paragraph for the first h3</p>

//<h3>Second h3</h3>
//<p>First paragraph for the second h3</p>

//<h3>Third h3</h3>
//<p>First paragraph for the third h3</p>

所以在上面的例子中,我可以得到第一个h3。但是经过大量的阅读,我似乎找不到如何获得所有的h3和每个的第一段。

如果有人能为我指出正确的方向,并向我解释如何做到这一点,我将不胜感激。非常感谢。


共1个答案

匿名用户

有一个强制性的事实上的答案,这是不使用正则表达式的HTML。受控HTML也有例外,或者错误/bug并不重要,但一般来说,我会同意这一点,相反,我会告诉您一个DOM感知的东西,您可以表达HTML标记和“下一步”的概念。

这是一个有效的示例,尽管您可能需要调整我正在转储的位置。

<?php

$html = <<<TAG
<h1>This is my title</h1>

<p>This is a text right under my h1 title.</p>
<p>This is some more text under my h1 title</p>

<h2>This is my level 2 heading</h2>
<p>This is text right under my level 2 heading</p>

<h3>First h3</h3>
<p>First paragraph for the first h3</p>

<h3>Second h3</h3>
<p>First paragraph for the second h3</p>

<h3>Third h3</h3>
<p>First paragraph for the third h3</p>
<p>Second paragraph for the third h3</p>

<h2>This is my level 2 heading</h2>
<p>This is text right under my level 2 heading</p>
TAG;


$dom = new DomDocument();
// Load the HTML, don't worry about it being a fragment
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

// Grab all H3 tags. This might need to be adjusted if there's more to the depth
$results = $xpath->query("//h3");
foreach ($results as $result) {
    var_dump(sprintf('<h3>%1$s</h3>', $result->textContent));
    
    // See if the next element is a P tag
    $next = $result->nextElementSibling;
    if ($next && 'p' === $next->nodeName) {
        var_dump(sprintf('<p>%1$s</p>', $next->textContent));
    }
}

输出:

string(17) "<h3>First h3</h3>"
string(39) "<p>First paragraph for the first h3</p>"
string(18) "<h3>Second h3</h3>"
string(40) "<p>First paragraph for the second h3</p>"
string(17) "<h3>Third h3</h3>"
string(39) "<p>First paragraph for the third h3</p>"

此处演示:https://3v4l.org/gvBrv