• Post Reply Bookmark Topic Watch Topic
  • New Topic

Submitting and Extracting information from Web Site  RSS feed

 
Craig Worsell
Greenhorn
Posts: 14
Java MySQL Database Notepad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am fairly new to java but thought this question would be too "big" to ask on the beginners section..

Basically, I am trying to scrape a website for prices but using our existing customer data.

I am unsure whether I can do this with an OutPutStreamWriter on a URLConnection as I need to fill in specific elements (preferably by ID) which I can see the org.w3c.dom may be able to handle but I cannot find how to "POST" this.

The Scrape also needs to follow this journey on the site.
1. Go to a landing page (to set certain cookies) then navigate to form page
2. Fill in the information and submit
3. Submit a confirmation screen
4. Get the price details from the generated Price Summary page.

Any help or pointers with this will be very much appreciated.

ps I currently have a program built with Excel VBA that does this job but I am hoping java will be more efficient and save me from working with the actual browser.

Thanks.
 
Ove Lindström
Ranch Hand
Posts: 326
Android Firefox Browser Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Websites are quite often written in HTML and fairly well structured. If you know your structure, you can get the HTTP connection to that site and get the page. I think I would use something like Apache HTTP Components/HTTP Client.

Then you can use a XML-parser or HTML-parser to find your objects.
 
Craig Worsell
Greenhorn
Posts: 14
Java MySQL Database Notepad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your reply Ove, unfortunatley I do not know how to use what you are suggesting. I can connect to the page with the code below:



But I really don't know how to work with the page/s from here.
 
Tim Moores
Saloon Keeper
Posts: 4024
94
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can make your job easier by using a library like HtmlUnit for this. It takes care of all the networking, let's you navigate a web site priogrammatically, and has APIs for extracting individual parts of a web page.
 
Craig Worsell
Greenhorn
Posts: 14
Java MySQL Database Notepad
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can make your job easier by using a library like HtmlUnit for this. It takes care of all the networking, let's you navigate a web site priogrammatically, and has APIs for extracting individual parts of a web page.

Thanks Tim, looks like it'll do the job perfectly. I'll give it a go.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!