Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How convert Word Doc to HTML/WML/XML?

 
Robert Paris
Ranch Hand
Posts: 585
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does anyone know how (using Java on Linux AND/OR on Windows) to convert a Word doc to any of the following: HTML, XML, WML?
 
Barry Andrews
Ranch Hand
Posts: 523
C++ Java Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Check out HDF (Horrible Document Format) in the jakarta POI project. http://jakarta.apache.org/poi/index.html
 
Robert Paris
Ranch Hand
Posts: 585
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess HDF was never tested (even by the person who wrote it?) It always fails on a "NegativeArrayException" so I submitted the bug and I guess they couldn't solve it because they accepted it. I can't figure out the problem but I'll say this:
1. Apparently somehow (to anyone who understands this) in the LVLF, the cbGrpprlChpx is -1. How is this possible? Is it a value for null? If so, how should that null be handled?
2. HDF is some of the worst code I've seen in a while (no offense meant). I was surprised because it's apache controlled, but there's NO exception handling ANYWHERE. The program either completes (which it doesn't do) or throws an exception and never cleans anything up (nor closes streams) but exits. It's ugly.
3. There's NO explanation of what they're doing at any point in the code. I had to do A LOT of research to figure out what they were even doing.
4. If anyone has the solution let me know! I need this to work! the problem only occurs when you have multi-level lists (like bullets) in the document.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic