Developing Webbots, Spiders and Screen Scrapers with PHP cURL

Written by on September 21, 2008 in Reviews - 1 Comment

For a long time i wanted to write detailed tutorial on how to use PHP and cURL to create bots and i am still going to do that, but while browsing Amazon i found very interesting book related to this topic A Guide to Developing Internet Agents with PHP/CURL written by Mike Schrenk. I did not read this book yet, but it looks very promising and interesting … atleast to me, because it is not yet-another-php-for-beginners, but a book that focuses and deeply explores one single topic – creating webbots.

In my opinion knowing how to create web bots is one of the most important web programmer skills, there is so much data on the internet that we need bots to take full adventage of the resources on the web and automate day to day boring online tasks. More over you will be surprised how often you will be asked on job interview if you can write a webbot.

Ok so, what is inside this book? First author starts with some introduction and explains why you should learn developing webbots, well this not differ much from other books, all of them start with some generic introduction. Then it gets better, seven pages of ideas and inspirations for webbot projects, it is always good to have something like that, because authors often fail to give a reader some practical examples or explain why reader should read their book.

Next, we have something author calls fundamental techniques:

  • Downloading web pages: using PHP builtin functions, PHP and cURL and UB_http (do not know what it is but probably some PEAR extension)
  • Parsing data: using LIB_parse and some PHP built in functions
  • Automating form submission – very important topic, it something that a lot of guys have problems with, while submiting a form is one of the most common action for a webbot
  • Managing large amounts of data: some database related stuff

Part two and three has got some advanced techniques, author explains how to create common webbots like price monitoring spider, aggregation bot, webbots for exchanging data thru FTP and NNTP servers, bots for sending and receiving emails.

Part three idescribes techniques for creating auction bidding bot, bot that can authenticate himself and bot that can handle cookies sent by server – this is where many programmers get stuck as they do not know how to handle cookies with PHP so they are unable to create very useful bots that can login to user account. It’s nice that Mike describes this topic because it seems like a secret knowledge – there are almost none good resources on the web covering this topic (or i couldn’t find them)

Well this is actually my first review ever, i do not know what more to add here, sure i could write more about Mike’s book, but i think that is enough, besides you can go directly to the Amazon website and take a look inside.

To sum this review up, great book. I rate it 9/10. It covers in details important topics for web developer, that are very hard to find on the web. I scanned table of contents and probably there are all information you need to develop any kind of bot from simple scraper to advanced bot with … well for the lack of better term let’s call it AI.

Get it here: Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL

About the Author

Greg Winiarski is a freelance PHP and JavaScript programmer. He specializes in web applications and WordPress development.

One Comment on "Developing Webbots, Spiders and Screen Scrapers with PHP cURL"

  1. Steve Harrison June 20, 2009 at 8:28 pm ·

    Any comments or info about the web bot project used to make future predictions in the book?

Leave a Comment