Win a copy of Cross-Platform Desktop Applications: Using Node, Electron, and NW.js this week in the JavaScript forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

java html parser.  RSS feed

 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
 
Paul Sturrock
Bartender
Posts: 10336
Eclipse IDE Hibernate Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to look at java.util.regex which contains the SDK's regular expression classes. At least I think you do - but then I've no idea what a .dco file is so you might be asking something completely different.
 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanx 4 ur reply...by .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that document...like wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
byz..

well sorry for typing mistake..its .doc
 
Arvind Giri
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
did u tried POI?
actually nw a days I m working on the same project.
So me too is searching for a efficient parser.
 
Joe Ess
Bartender
Posts: 9425
12
Linux Mac OS X Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI HWPF is in a very-alpha state and is not under active development but it will let you read the contents of a word doc (get the latest version out of SVN). If you are doing anything more complex (editing, converting), I recommend Open Office.
 
Casper Maxwell
Ranch Hand
Posts: 88
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For a list of HTML parsers, you can try the following link:

http://www.java-tips.org/java-libraries/html-parser/

And if you want to see some examples of usage of regex package, you can visit the following url:

http://www.java-tips.org/java-se-tips/java.util.regex/

One related example available there is:

How to find and display hyperlinks contained within a web page
http://www.java-tips.org/java-se-tips/java.util.regex/how-to-find-and-display-hyperlinks-contained-within-a-web-page.html
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!