Regex从HTML移除带有样式标签的图像
问题说明
我是Regex的新手,但是我认为这是实现所需操作的最简单方法.基本上,我有一个字符串(在PHP中),其中包含整个HTML代码负载...我想删除所有具有style = display:none ...
I am new to Regex, however I decided it was the easiest route to what I needed to do. Basically I have a string (in PHP) which contains a whole load of HTML code... I want to remove any tags which have style=display:none...
例如
<img src="https://www.it1352.com/1824314.html" />
<img src="https://www.it1352.com/1824314.html" >
等...
到目前为止,我的正则表达式是:
So far my Regex is:
<img.*style=.*display.*:.*none;.* >
但是,这似乎遗留了html的某些内容,并且当在带preg_replace的php中使用时,还会删除下一个元素.
But that seems to leave bits of html behind and also take the next element away when used in php with preg_replace.
正确答案
Like Michael pointed out, you don't want to use Regex for this purpose. A Regex does not know what an element tag is. <foo>
is as meaningful as >foo<
unless you teach it the difference. Teaching the difference is incredibly tedious though.
DOM非常方便:
$html = <<< HTML
<img src="https://www.it1352.com/1824314.html" />
<IMG src="https://www.it1352.com/1824314.html" >
<img src="https://www.it1352.com/1824314.html" >
HTML;
以上是我们的(无效)标记.我们像这样将其提供给DOM:
The above is our (invalid) markup. We feed it to DOM like this:
$dom = new DOMDocument();
$dom->loadHtml($html);
$dom->normalizeDocument();
现在,我们在DOM中查询包含样式"属性的所有"IMG"元素,其中样式"属性包含文本显示".我们可以在XPath中查询"display:none",但是我们的输入标记出现了,中间没有空格:
Now we query the DOM for all "IMG" elements containing a "style" attribute that contains the text "display". We could query for "display: none" in the XPath, but our input markup has occurences with no space inbetween:
$xpath = new DOMXPath($dom);
foreach($xpath->query('//img[contains(@style, "display")]') as $node) {
$style = str_replace(' ', '', $node->getAttribute('style'));
if(strpos($style, 'display:none') !== FALSE) {
$node->parentNode->removeChild($node);
}
}
我们遍历IMG节点,并从其样式属性内容中删除所有空格.然后,我们检查它是否包含"display:none",如果是,则从DOM中删除该元素.
We iterate over the IMG nodes and remove all whitespace from their style attribute content. Then we check if it contains "display:none" and if so, remove the element from the DOM.
现在我们只需要保存HTML:
Now we only need to save our HTML:
echo $dom->saveHTML();
给予我们
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><img src="https://www.it1352.com/1824314.html" style="width:11px"></body></html>
螺丝正则表达式!
Addendum: you might also be interested in Parsing XML documents with CSS selectors
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /reply/detail/tanhcgbhhf
-
YouTube API 不能在 iOS (iPhone/iPad) 工作,但在桌面浏览器工作正常?
it1352 07-30 -
iPhone,一张图像叠加到另一张图像上以创建要保存的新图像?(水印)
it1352 07-17 -
保持在后台运行的 iPhone 应用程序完全可操作
it1352 07-25 -
使用 iPhone 进行移动设备管理
it1352 07-23 -
在android同时打开手电筒和前置摄像头
it1352 09-28 -
扫描 NFC 标签时是否可以启动应用程序?
it1352 08-02 -
检查邮件是否发送成功
it1352 07-25 -
Android微调工具-删除当前选择
it1352 06-20 -
希伯来语的空格句子标记化错误
it1352 06-22 -
Android App 和三星 Galaxy S4 不兼容
it1352 07-20