• Post Reply Bookmark Topic Watch Topic
  • New Topic

How to convert Word Doc to HTML using Java  RSS feed

 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have been trying to convert MS WORD doc to HTML, PDF to HTML, RTF to HTML. There are lot of code and tools available for this in other languages like VC++, C etc.

Can anyone suggest a solution for this. Can I write a custom parser using POI or PDFBox. Please lead with some examples.

:roll:
 
Chris Stehno
Ranch Hand
Posts: 180
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'd say that PDFBox and POI are your best bets... though you may want to look into the OpenOffice (http://www.openoffice.org) api. They may have components that you can use. If so, you could load the doc and then use their conversion apis.

Hope this helps.
 
Pradeep Kadambar
Ranch Hand
Posts: 148
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I saw OpenOffice. There is facility to convert documents to html. But it requires to use some service through url.

Can anyone suggest how to use this as a bean. How can you use it for converting doc to html.

 
Joe Urbanek
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Why dont you just open the document in Word and save it as HTML?
 
Horatio Westock
Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Joe Urbanek:
Why dont you just open the document in Word and save it as HTML?


I see your point, but what if they need to convert 10,000 of them, or convert them once they have been uploaded to a server etc. etc.

Also, the default HTML output from word is the most horrifically bad code I have ever seen - by writing a conversion program using the APIs mentioned, you have a lot more control over the output.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!