This week's book giveaway is in the Other Languages forum.
We're giving away four copies of Functional Reactive Programming and have Stephen Blackheath and Anthony Jones on-line!
See this thread for details.
Win a copy of Functional Reactive Programming this week in the Other Languages forum!
    Bookmark Topic Watch Topic
  • New Topic

Java code for extracting data from a HTML Table from a web page

 
Nandu Vajjala
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Report post to moderator
Hi
We need some pointers on how do we extract data from a HTML Table from a web page using a Java program.
For instance: http://www.fsa.gov.uk/ukla/hcaList.do

Above link has a table in the below format

Company name Country of Incorporation Home member state
3I INFRASTRUCTURE PLC CHANNEL ISLANDS UNITED KINGDOM
888 HOLDINGS PLC GIBRALTAR UNITED KINGDOM

We need to extract the data and convert it to a csv format file.

Thanks
Anand Vardhan


 
Praveen mourya Kumar
Greenhorn
Posts: 16
Hibernate Java Spring
  • Mark post as helpful
  • send pies
  • Report post to moderator
Hi,

your problem can be solve by using the webcrawler in Java. for more help, please try to visit the following link :
http://java.sun.com/developer/technicalArticles/ThirdParty/WebCrawler/
Good Luck
 
Alec Lee
Ranch Hand
Posts: 569
  • Mark post as helpful
  • send pies
  • Report post to moderator
You should better use Javascript to extract the HTML data. Using Java means you program needs to act as an HTML client. Although open source solution like Jakarta's HttpClient already existing, Javascript is much better choice as the browser already support it. In particular, you will probably need to use HTA (HTML Application) file (see MSDN for it, basically a HTML file with embedded javascript renamed to .hta).
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Report post to moderator
Do not duplicate threads. Closing this one. Continue there.
 
    Bookmark Topic Watch Topic
  • New Topic