i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....
I'd probably use a library like jWebUnit for downloading the pages, and extracting the relevant parts. Then you can use any XML- or XLS-creating library you like for storing the interesting parts.
I was searched in net and i got a one open source tool. it's working fine for "HTTP" only.... i need to scrap the data from "HTTPS"....
Im a new bie.... i tried to write the code using JWEBUNIT. but it's not working... can you give me sample code to write in JWEBUNIT and also i want to know "JWEBUNIT" support "HTTPS", because ineed to extract the data from "HTTPS" also......
this may be what you need. First result for "jwebunit https" in google.
The site talks about untrusted certificates. So jwebunit may already be trusting a number of certificates from certificate authorities. It might be using the java truststore itself?
Thanks for your response. I have scrap the data from http / https through one opensource web data extractor tool...
But one issue in that tool. i cannot scrap the data from https having session(the page has session). please help me or guide me for this issue. i was searched in net but... i face only failure....
i want to know how to do the screen scrapping in java. or it have any open source tool to extract the data from the website and stored it in a XML or excel any format....