Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Tim Cooke
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • paul wheaton
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Ron McLeod
  • Piet Souris
  • Ganesh Patekar
Bartenders:
  • Tim Holloway
  • Carey Brown
  • salvin francis

java html parser.

 
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi frendz...
I want to make a html parse that will take a .dco fille as input and parse it..
plz help me if someone knows abt it...
 
Bartender
Posts: 10336
Hibernate Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You need to look at java.util.regex which contains the SDK's regular expression classes. At least I think you do - but then I've no idea what a .dco file is so you might be asking something completely different.
 
Zeena Shah
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanx 4 ur reply...by .doc i mean any MS word document...infact i want to make a programme that will read in a word file and pull up all the keywords that a user can use for searching that document...like wat is done in google search engine...i made a search engine but that will be too tiring process to manually feed aal the related keywords in the database so that document is availabe when searched.

hope u will understand wat i want...
byz..

well sorry for typing mistake..its .doc
 
Ranch Hand
Posts: 91
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
did u tried POI?
actually nw a days I m working on the same project.
So me too is searching for a efficient parser.
 
Bartender
Posts: 9565
12
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
POI HWPF is in a very-alpha state and is not under active development but it will let you read the contents of a word doc (get the latest version out of SVN). If you are doing anything more complex (editing, converting), I recommend Open Office.
 
Ranch Hand
Posts: 88
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
For a list of HTML parsers, you can try the following link:

http://www.java-tips.org/java-libraries/html-parser/

And if you want to see some examples of usage of regex package, you can visit the following url:

http://www.java-tips.org/java-se-tips/java.util.regex/

One related example available there is:

How to find and display hyperlinks contained within a web page
http://www.java-tips.org/java-se-tips/java.util.regex/how-to-find-and-display-hyperlinks-contained-within-a-web-page.html
 
today's feeble attempt to support the empire
Enterprise-grade Excel API for Java
https://products.aspose.com/cells/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!