Using PHP cURL to read RSS feed XML
Thursday, June 19th, 2008I very rearly, use feeds, i prefer to go to my favourite website and see what is going on instead of downloading their content to my computer with some RSS reader. However the fact is feeds are getting popular and people are searching for a ways to access and easily automatically parse those feeds with PHP, and because RSS and ATOM are nothing more then XML documents then i have for you really simple way of handling those feeds which i want to share with you (thru my blog as well thru my blog feeds
).
Obviously to parse feeds we have to get them first, very popular tool for handling HTTP connections (as well as other types of connections) is cURL library. Actually libcurl is desgned for connections and communication between different servers thru different protocols, it not only supports HTTP, HTTPS, gopher, telnet connections but also allows users to send data thru POST and GET and even allows to manage cookies send by server, so basically using this library you can get data feeds from literaly any page on the Internet, no matter if this data is password protected or requires to POST some data.
Using PHP cURL
To use cURL on windows you only need to uncomment it in php.ini file, on Linux (like always) you need to compile PHP with –with-curl. To connect to some RSS feeds with cURL first you need to init cURL resource handle this is done with:
$ch = curl_init("http://localhost/curl/rss.xml");
where obviously, the first param is the URL of website (feeds in our case) to which you want to connect to, next wee need to setup few connection options by using curl_setopt() with three parameters, where first param is cURL resource we created earlier, second cURL option key and third option value, for our simple connection we will need only two options.
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); curl_setopt($ch, CURLOPT_HEADER, 0);
The most important here is to set CURLOPT_RETURNTRANSFER, because by default in PHP curl only sends data to server and do not wait to get response, it sometimes useful to only send request and do not wait for response, however in our case we want to parse XML feeds and it will be quite difficult if the server we are connecting to won’t send them to us.
Ok, next step is to execute connection, wait for a response, and close it, sounds like a lot of work but it is not, actually it is done with only two lines of code:
$data = curl_exec($ch); curl_close($ch);
Half of the work is done if everyting went right and page we connected to contained RSS feeds or some kind of XML data, then $data variable contains string which can be no parsed. A lot of newbies try to parse such string with regular expressions or explode string and psudo parse it line by line, by removing tags with str_replace(). This is something i do not encourage, not only because it is very unprofessional, but also because since PHP 5.x we have built in tools for parsing XML data it will be not only more pro but a lot easier to do then pseudo parsing line by line.
Working with SimpleXML
Currently in PHP manual there are described 13 librarys for handling XML data, quite a lot, but most of them is designed to help build XML file not parse, so in our case the best bet is to use SimpleXML library, which is not only the best for converting string to XML object but is also built into PHP core so there is no need to install it. We have our XML string in $data variable so eveything we need to do to parse it is this:
$doc = new SimpleXmlElement($data, LIBXML_NOCDATA);
Note, that we could also pass third (boolean) parameter to the constructor, which by default is set to false. If we would set it to true then first argument should be an URL pointing to XML document instead of XML data, so actually we do not have to use cURL at all.
That’s it now doc is an instance of SimpleXmlElement, which basically consists only from fields and arrays, if node occurs only once in XML document then it is a field, if it occurs many times then it is an array … well usually. If you want to know what is inside this object use old fashioned:
print_r($doc);
So far so good, but now comes the hard part, as you know on the web there are two popular feeds standards RSS and ATOM, each of them has a different structure, and what is worst different node names, fortuntely with SmpleXML it is easy to check it.
if(isset($doc->channel)) { parseRSS($doc); } if(isset($doc->entry)) { parseAtom($doc); }
All RSS documents have <channel> node so if our document contains this node then there is a chance that it is a RSS document on the other hand if it contains <entry> node there is a chance that it is an ATOM document. I used here if{ … } if { … } instead of if { … } else { … } because there is a chance that document we parsed is neither RSS nor ATOM. In code you can also see two functions parseRSS() and parseAtom() this are functions we will use to get data out of SimpleXmlElement objects and we are going to write them right now.
function parseRSS($xml) { echo "<strong>".$xml->channel->title."</strong>"; $cnt = count($xml->channel->item); for($i=0; $i<$cnt; $i++) { $url = $xml->channel->item[$i]->link; $title = $xml->channel->item[$i]->title; $desc = $xml->channel->item[$i]->description; echo '<a href="'.$url.'">'.$title.'</a>'.$desc.''; } }
RSS is much more easier to handle then ATOM because it do not contins important data in attributes; Well there is not really much to talk about here you have access to any of nodes by using simple sytax $xml->node->childNode, if node is an array then you sligthly change the code to $xml->node[$i]->childNode->childChildNode.
The following example will a bit more complicated because in order to access entry URL we need to read <link> node attribute:
function parseAtom($xml) { echo "<strong>".$xml->author->name."</strong>"; $cnt = count($xml->entry); for($i=0; $i<$cnt; $i++) { $urlAtt = $xml->entry->link[$i]->attributes(); $url = $urlAtt['href']; $title = $xml->entry->title; $desc = strip_tags($xml->entry->content); echo '<a href="'.$url.'">'.$title.'</a>'.$desc.''; } }
Note, how we get to node attributes: $urlAtt = $xml->entry->link[0]->attributes(), now $urlAtt is associative array where attribute name is array key and attribute value is a value for this key.
Well, i do not what more to write here this is all really simple, probably this is why they called this library SimpleXml, if you wrote your code and want to test it, then for RSS feeds use some WordPress blog feeds, for ATOM feeds use some Blogger blog.
tagged under: cURL.PHP


The ditio.net is amazing site, thanks, admin.
By.
WOW Nice tutorial, their are very less articles on php CURL considering its power CURL is best library available but lack of documentation making it difficult to implement.
Thanks for your comment Blogsdna.
I agree with you on that, in fact there is more great extensions which are documented poorly, we can only hope it will change in future.
I am going to write soon another article on cURL as i see in my stats that this is the topic a lot of readers are interested in.
Excellent script.
Thanks for sharing this!
Hi all, I recently added PHP cURL routines to my AJAX library which displays formatted files including XML RSS Feeds with an XSL translation.
)
Not to spam (but it’s also the easiest way to get you the free source and examples)… But you can download it all at http://ajamyajax.com. Again it is all free and I think you might find the PHPcURL and RSS Feed examples useful, in addition to the ideas here of course.
Best regards, Mark
There is an error in your code
for($i=0; $<$cnt; $i++)
should be
for($i=0; $i<$cnt; $i++)
@Mark Thanks for your comment, your library seems to be pretty advanced. I usually delete links in comments, but deleting your link would be a big loss for all guys reading this post.
@Brent Fixed. This article is online for over 3 months now and no one seemed to notice that. Maybe no one got that far
. Anyway thanks for pointing that out.
Thanks Greg, appreciate the comments and the link. Mainly the idea with my site is to help share ideas, just as you are doing here. I also post Firefox workarounds on Bugzilla and just added a PHP tip on php.net today… I hope to do more of that. But also, on your other comment I wonder how many developers either don’t know about PHP cURL or haven’t tried it much, so don’t realize the potential? I might spread the word on some AJAX boards. My theory is there are a LOT more of them… So we’ll see how that goes. Who knows, maybe I can send a few back here.
Good cURL coding then! Mark
No Prob, Wanted to point out as well that this is a great tutorial. I have never used the “SimpleXmlElement” before and this tutorial explained it very well and a way that it worked.
Thanks
@Brent It’s funny/scarry that there are literally tons of tutorials that simply do not work at all. Actually mine didn’t work either but thanks to you it is ok now
@Mark It looks like you are doing a great job helping people out. I agree with you that a lot of PHP programmers do not know about cURL. Probably the biggest reason for that is lack of good tutorials and articles about it. Let’s take this article for example i showed how to use cURL but the fact is i could complete the same tutorial without using cURL at all, or using function like fsockopen(). Besides cURL usually does it’s job in background, so it is not so easy to show what cURL is capable of.
On the other hand web developers do not know it’s potential and are looking for simple tutorials explaining how to use basic cURL functions.
You make good points, Greg. I like PHP cURL a lot recently (+ PHP’s XML, XSL lib functions etc.) because I can write XMLHttpRequest style routines with less code than I could in JavaScript AJAX… But it really isn’t needed or there are other ways to do things as you mentioned. So maybe we should champion PHP more in general? You are doing that already, obviously. Maybe that’s what I will do elsewhere. Thanks again for the blog and the helpful tutorials for all to learn from. Mark
Mark, English is not my native language so i am not sure what “champion PHP” mean
Btw I contacted you by email you gave in comment form, did you get that email? If not please contact me with contact form on this website.
Hi Greg, I got your e-mail and just sent a reply. Sorry about the English phrase above, what I meant was “promote” or suggest using PHP more to others who could still be using mostly JavaScript. They might not even know about some PHP features like cURL as we discussed before. Your English is very good otherwise! You are obviously smarter than I am because I only speak one language.
Talk to you later, have a good one.
Well i think this is how things kind of go, knowing English is a must if one is thinking about becoming programmer or developer. I wouldn’t tell that i am smarter then you are. Have a nice day as well
Btw: it is almost night for me right now
Wow… Excellent tutorial. I am planning to use curl for twitter.. its really easy to use… Thx for such a gr8 tutor.
Thanks for the tutorial man!
Thanks for the great script!
I must say that you provide genuine, quality information. Thanks for this!
BTW, dpn’t you think your blog needs a better wordpress template?
Hi Greg,
I recently removed the free download from my web site and thought you might want to remove my comment here that references the download (#5 I think), and this message, and any other also if you like. It’s your blog, do what you want here of course. Thanks! Just trying to do the right thing for your readers. Best of luck to you.
Mark
Thanks, for the info Mark, i just stroke out a part of the comment for the sake of discussion. In other case i would have to probably moderate few other comments as well.
Thanks a lot for this, this helped a lot to code my application
i found very much informative article about PHP cURL and RSS feed XML .
This is a fantastic tutorial. Very simple and straight to the point. I had looked at the SimpleXmlElement in the past, but had been put off by the (unneccessary) complexity of other examples / tutorials. This has really helped me, and may even get me a job (in the process of building an example for an interview!). Thanks again.
Thank you for that information. I have been very confused about CURL for a while, this helps clear it up.
its what i was looking for about parsing rss with php
Thanks, for the info Mark….
Hi,Paul Reinheimer
Thanks for your script PHP,
Thank you very much indeed
Thank you for this script
but you have error in you parseAtom() function
$title = $xml->entry->title;
$desc = strip_tags($xml->entry->content);
this must be
$title = $xml->entry[$i]->title;
$desc = strip_tags($xml->entry[$i]->content);
please replace this line for dynamic title and description. otherwise it will disaply first title to all hyper link.
Thank you
Vasim Padhiyar
For Accurate result of post url use following script
instead of
$urlAtt = $xml->entry->link[$i]->attributes();
$url = $urlAtt['href'];
use my following code
$total_link = count($xml->entry[$i]->link);
for($j=0; $jentry[$i]->link[$j]->attributes();
if($urlAtt['rel']==”alternate”)
{
$url = $urlAtt['href'];
}
}
This will give you URL of your post.
Thank You.
Vasim Padhiyar
$total_link = count($xml->entry[$i]->link);
for($j=0; $jentry[$i]->link[$j]->attributes();
if($urlAtt['rel']==”alternate”)
{
$url = $urlAtt['href'];
}
}
it seems my original code does not looks like above , i don’t know why it converts some nonsense code.
for($j=0; $j<$total_link; $j++)
this must be like this
for ( $j [equal to] 0; $j[is less than]$total_link; $j++ )
great, i looking for this..
now my starred google reader’s atom feed could be sent automatically to my blog via mail-to-blogger, and posterous too…
thx.
@Vasim Padhiyar Good catch! I couldn’t figure out why the Atom feed was crashing. But your help fixed it up. Hopefully the script can be fixed in the actual tutorial.
The market is an activity, not a chore. ,
Set and follow a regular maintenance schedule for your well, and keep up-to-date records. ,
Excellent Article. Thanks. This has been great help.
great tutorial..
thank you very much..
Thanx for the great tut, but you don’t have to parse the RSS, you can use get_object_vars() to get the data out. However, you need to do this more than once; $doc = get_object_vars($doc);
$doc = get_object_vars($doc["channel"]);
if you print_r it then, you’ll see it’s become an array. Hope this helps!
Martijn
Nice tutorial. You can also use biterscripting for parsing RSS. There are several scripts posted on the net. You can check http://www.biterscripting.com for free download.
Nice article. But I was looking for a way to download RSS feed that are updated since the last download. I’m working on a little pet project and this is the only thing keeping it on hold.
Any idea on how to implement it?
@Helen, you have to save somewhere last update date and fetch only feeds with date greater then your saved date. After that, again save somewhere date and repeat the process. Hope that helps.
Hi, I found this weblog on Aol Blogs and believe that it is actually quite fascinating and delivers great material. Many thanks with regard to the great post, I will certainly share this on Twitter. Have a wonderful day.