How to extract image src from HTML Content using PHP

Updated on September 1, 2017

If you are looking to load images from RSS Feeds, then you better need to know on how to parse the content to get the image source, as some of the RSS Feeds will not have the content:media tag explicitly. Instead, they combine images along with the contents and deliver it in content:encoded or description tags. Here i will be showing you two ways to extract the image source and finally recommend the right way of doing it.

[sc:demobuttons demolink=”http://demo.techglimpse.com/php_preg_match/” boxlink=”http://adf.ly/PkrbE” ]

This can be done in two ways. One using the php regular expression matching and other one is using the Document Object Model (DOM).

Using regexp : Using regexp to solve this kind of problem is a bad idea and will likely lead in unmaintainable and unreliable code. But still here i will be showing you how to extract the image source from the HTML Content shown below.

<p>
<img title="Renault Clio Advertisement" src="https://techglimpse.com/wp-content/uploads/2013/03/renault-clio-video-ad.jpg" alt="renault clio video ad Renault Clios Va Va voom brings Lingerie girls in Front of you : Prank Video" width="637" height="348" />
</p>
<p>The ad was created by Unruly and Scorch London, where the guys take Renault Clio for a test drive and after driving few meters into the street, the salesman shows of a &#8220;va va voom&#8221; button, which actually brings a romantic scene featuring couples and then the real hot scene; the hot lingerie girls dancing around you.</p>

The regexp code is as follows. Here to extract, src, title and alt, you need to have well defined regular expression. Since HTML’s are not a constructed sophisticatedly, this way of extracting is very bad.

preg_match_all('/src="([^"]*)"/', $description, $result);
echo "Image Source : " . $result[1][0];

Using DOM : Here is a DOMDocument/DOMXPath based example of how to do it. This is arguably the only right way to do it, because unless you are really good at regular expressions there will most likely always be edge cases that will break your logic.

$doc=new DOMDocument();
$doc->loadHTML($description);
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
 echo " Image Source : " . $img['src'] . "";
 echo " Image Alt : " . $img['alt'] . "";
 echo " Image Title : " . $img['title'] . "";
 echo " Image Width : " . $img['data-src-width']. "";
}

[sc:demobuttons demolink=”http://demo.techglimpse.com/php_preg_match/” boxlink=”http://adf.ly/PkrbE”]

Was this article helpful?

Related Articles

Leave a Comment