• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Liutauras Vilda
  • Henry Wong
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Mikalai Zaikin
  • Himai Minh

screen scrape

 
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
hi all ,

can anyone give me an example of screen scraping a website and return the result e.g. html as a string

or lead me to some tutorials / examples

many thanks
 
Sheriff
Posts: 11343
Mac Safari Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Like TESS?
 
Ranch Hand
Posts: 263
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dale,

An old one I did a while ago that screenscraped the USPS Zip+4 info is located at http://www.mycgiserver.com/~tblough/screenscrape.htm.

It looks like it doesn't work any more because USPS changed the website from CGI to jsp based pages, but the idea still works. The source is available from a link on the page.

Cheers,
 
Java Cowboy
Posts: 16084
88
Android Scala IntelliJ IDE Spring Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Reading the content of a webpage is simple enough with class java.net.URL:

After you've done that, you'll have to find the stuff you want to find in the HTML page. You could do it the simple way, with String.indexOf() for example, but maybe that won't be flexible enough.

You could try regular expressions, or you could use a HTML parser to walk through the structure of the HTML and find the text you're looking for. Something like http://htmlparser.sourceforge.net/ might be useful for that.
 
dale con
Ranch Hand
Posts: 93
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
cheers guys for all your help, much appreciated



i know this is a relatively old thing to do but trying to find stuff out there is quite difficult
 
Don't get me started about those stupid light bulbs.
reply
    Bookmark Topic Watch Topic
  • New Topic