Win a copy of TDD for a Shopping Website LiveProject this week in the Testing forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
Sheriffs:
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
Bartenders:
  • Piet Souris
  • Himai Minh

Q for Mike Clark: screen scraping

 
blacksmith
Posts: 979
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear Mike,

In monitoring java applications
you mention screen scraping.

I've recently put a post on General
Computing forum asking about screen
scraping. I'm completely new on this
subject.

A useful reply was to use HttpUnit,
and other frameworks to do the job.

Could you go (a bit) more in detail with
the subject of 'screen scraping a web
application'. How would you tackle this,
do you have examples?

My intent is to write a screen scraping
functionality in a web application which
has to peek into another web application's
pages.

Cheers,

Gian Franco Casula
[ September 23, 2004: Message edited by: Bear Bibeault ]
 
author
Posts: 83
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If I simply want to monitor that a web application is alive and well, I'll just use a simple Unix shell script. Here's the description of an example that does just that:

http://www.pragmaticautomation.com/cgi-bin/pragauto.cgi/Monitor/ProgramsThatGrowl.rdoc

(After scraping the site, that script does a Mac-specific thing, but you can do whatever you want. Sending a text message to your cell phone is a fun and effective thing to do, for example.)

You can just as easily do this on Windows using the 'wget' program to hit the page instead of 'curl', or better yet, by using a scripting language such as Ruby.

If you need to drill into pages, and HTTP GETs with URLs won't get you there for some reason, you can use something like HttpUnit. Here's a simple method that uses the HttpUnit API to log into a web application using a login form, which then redirects the user to a product catalog page. Then it uses the resulting 'response' object to scrape an HTML table's contents:



You can also get the DOM tree back from an HttpUnit 'response' object and traverse it using XPath, for example.

Using HttpUnit is more heavy-handed than a scripting approach, but it demonstrates another way of going about screen-scraping deep into a web application.

Mike
[ September 23, 2004: Message edited by: Mike Clark ]
 
Sheriff
Posts: 67637
173
Mac Mac OS X IntelliJ IDE jQuery TypeScript Java iOS
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Growl is something that I hadn't heard of that looks like it could be very useful to me! Thanks!
 
Gian Franco
blacksmith
Posts: 979
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Excellent, thank you very much!
 
My favorite is a chocolate cupcake with white frosting and tiny ad sprinkles.
Free, earth friendly heat - from the CodeRanch trailboss
https://www.kickstarter.com/projects/paulwheaton/free-heat
reply
    Bookmark Topic Watch Topic
  • New Topic