Quick and dirty benchmark of RSS/ATOM parsing libs
Posted on July 10, 2014 in Dev • 5 min read
EDIT: I just realized that the PHP function microtime
does not return what I expected. This does not change much the results
(to compare the solutions) but change the units. I update the results accordingly.
As I wrote in a previous article. I am working on a rss reader that could fit my needs. For this purpose, I am currently trying to see which way is the best way to parse RSS and ATOM feeds in PHP.
I searched on the web for benchmarks, but I could only find old benchmarks, for old version of the libs and weird stuff (like parsing directly the feed with regex). So, I did a quick and dirty benchmark (and this is the reason why this article is in english :).
Which lib is the best one to parse RSS and ATOM feeds ?
I searched on the web for the available solutions. I found three main solutions (ordered from the most lightweight one to the less lightweight one):
- feed2array, a lib by Bronco which is basically a wrapper around SimpleXML and is used by timo in his RSS reader implemented in blogotext. So it is tested on a quite wide range of feeds and should be considered fully working.
- lastRSS, a dedicated lib written in PHP
- SimplePie, the well known lib, very complete, able to handle a wide range of feeds, correctly and incorrectly formatted, but very heavy.
My goal was just to do a quick benchmark, so it is complete dirty and may not be very precise, but I did not need more. I did not test extensively all the available libs, especially all the wrappers around SimpleXML as the one I found was sufficient, and is basic enough to reflect a general result.
My test lies on six RSS and ATOM feeds (both of them, to be sure that the lib worked on them) with a total of 75 articles. I parse them with the corresponding lib, and I do not display anything but the total time to parse them. I do not mind the ability of the lib to handle specially malformed feeds as these should not exist and parsing them may encourage their use. So, I am just interested in the time needed to parse these 6 feeds.
The three libs parsed all of them successfully. I ran the test on my laptop, which can be considered almost 100% idle.
The results are:
Library | Time |
---|---|
feed2array (and similar basic simpleXML based solution) | about 40ms |
lastRss | about 120ms and I got some mistakes |
SimplePie | about 280ms |
So, for my personnal case, I would simply say “the simpler the better” and go for feed2array that works perfectly on the feeds I want to use and is way faster than the overkill libraries. Very often I read that SimplePie was heavy and slow (despite their advertisement as “super fast”) and it seems to be confirmed by my results.
In conclusion, however these results are just to be considered as orders of magnitude, and not precise measurements, I would say that you should avoid any complicated and overkill library unless you really need some of the advanced features it has. Your script will be way faster (up to 5 times faster or so according to these results).
Note: I only focused on these three libraries as it appears that they are the three main libraries available for this purpose (except for feed2array for which there are plenty of similar scripts). I wanted only scripts under a fully open source license, which eliminated some of the others. The only notable ones that I could have taken into account (I think) are the feed library from Zend, but I did not want to search for a way to get only the relevant functions from Zend, and the newly integrated PHP extensions such as XSLT. However, these PHP extensions are not widely available, and not built-in at all, so they may not be available on most of the shared hostings.
Store in a database / files or parse it each time ?
Next question I had was how do this time compare with retrieving infos from a database. For this purpose, I compared three times:
- time to parse the feeds using feed2array, which is about 40ms, as found before.
- time to load the arrays representing the feeds from serialized and gzipped files, which is about 8ms.
- time to load 75 elements (id, description, guid, link and pubDate) from a sqlite database, not optimized at all, which is about 2ms.
As we could expect, it is longer to parse the feeds than to load them from a storage. So, it is definitely not a good idea to parse them at each page generation. Plus RSS format is not practical at all to do search and complex queries.
The legit solutions are then to use flat files or a database. The difference between the two times is not so large, considering that files are gzipped and that I actually stored a bit more information in the file than in the table.
However, there is not much optimization to do with files, whereas there are many ways to improve my results with a database. For instance, I used a basic sqlite table, without any potential optimization. But I could have used a more robust solution. If performances are really a concern, I could even use a temporary database, stored in RAM, to store the feeds elements. If this table is lost, that is not a big deal, as I will only have to do a refresh to get them back.
Finally, one of the major problems with SQLite seems to be that it may be slow to write and completely locks the database when writing inside. But, this is also the case for flat files.
In conclusion, I would say that the best solution appears to be SQLite with PDO. Actually, the use of PDO will enable to change the database very easily, and SQLite might be as good (if not better) as flat files.
Note: I put all my code and the test rss feeds in a zip archive available here.