Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Java code for extracting data from a HTML Table from a web page

 
Nandu Vajjala
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
We need some pointers on how do we extract data from a HTML Table from a web page using a Java program.
For instance: http://www.fsa.gov.uk/ukla/hcaList.do

Above link has a table in the below format

Company name Country of Incorporation Home member state
3I INFRASTRUCTURE PLC CHANNEL ISLANDS UNITED KINGDOM
888 HOLDINGS PLC GIBRALTAR UNITED KINGDOM

We need to extract the data and convert it to a csv format file.

Thanks
Anand Vardhan


 
Amit Vinod Dali
Ranch Hand
Posts: 42
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You can use HTML Parsers for this requirement as;
Open Source HTML Parsers in Java
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The easiest would be to use a library like HtmlUnit which provides various ways to extract data from a web page (and also handles the HTTP communication, cookies, etc.).
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic