• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • paul wheaton
  • Ron McLeod
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Liutauras Vilda
  • Paul Clapham
Saloon Keepers:
  • Tim Holloway
  • Carey Brown
  • Piet Souris
Bartenders:

Screen scrapping(extract data from webpage) in java

 
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I'd probably use a library like jWebUnit for downloading the pages, and extracting the relevant parts. Then you can use any XML- or XLS-creating library you like for storing the interesting parts.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi ulf,


Thanks for your quick response.




I was searched in net and i got a one open source tool. it's working fine for "HTTP" only.... i need to scrap the data from "HTTPS"....


Im a new bie.... i tried to write the code using JWEBUNIT. but it's not working... can you give me sample code to write in JWEBUNIT and also i want to know "JWEBUNIT" support "HTTPS", because ineed to extract the data from "HTTPS" also......

Awaiting for your reply......

--
With Thanks
M. Bharathi
 
Ranch Hand
Posts: 378
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
this may be what you need. First result for "jwebunit https" in google.

The site talks about untrusted certificates. So jwebunit may already be trusting a number of certificates from certificate authorities. It might be using the java truststore itself?
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,


Thanks for your response. I have scrap the data from http / https through one opensource web data extractor tool...

But one issue in that tool. i cannot scrap the data from https having session(the page has session). please help me or guide me for this issue. i was searched in net but... i face only failure....


--
with thanks,
M. Bharathi
 
Marshal
Posts: 80616
468
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Too difficult a question for beginners. Moving.
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

But one issue in that tool. i cannot scrap the data from https having session(the page has session).


Why not? jWebUnit supports cookie, if that's what's used for the sessions. If the session use URL rewriting, then there's no problem to begin with.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,


i was tried a lot. but i cant get the output. please give me sample source....




--
regds,
M. Bharathi
 
Ulf Dittmer
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What have you tried? Post a relevant code excerpt. What, exactly, happened when you ran it?
 
Greenhorn
Posts: 3
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

muthu bharathi wrote:Hi everybody,


i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....


Please help me as soon as possible


--
Regards,
M. Bharathi



View here, screenshot http://binhgiang.sourceforge.net/xmlalbum/screenshots.html

and download free version web data extrator http://binhgiang.sourceforge.net/site/download.jsp.

VDer build from java html parser, download from http://sourceforge.net/projects/binhgiang/files/htmlparser/HTMLParser2_Build9.zip/download. Is is open source.
 
muthu bharathi
Ranch Hand
Posts: 97
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,


Thanks for your valuable guidance....

One thing i need to be known is it scrap the https data's........



--
With Thanks
M. bharathi
 
reply
    Bookmark Topic Watch Topic
  • New Topic