Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Reading contents from microsoft word document

 
Amirtharaj Chinnaraj
Ranch Hand
Posts: 241
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi guys

my need is to read the microsoft word document

and print it in the console while doing that

i faced a problem . iam getting some ascii characters that are

not present in the document. when i do the same thing with

text (*.txt) file things are fine
 
jeroen dijkmeijer
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think you should have a look at the POI (apache) framework.
regards,
Jeroen.
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
.doc files contain many characters that are not part of the actual text (e.g., layout information and such). If you just want the text, use POI as suggested. This page explains how it can be used for text extraction.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic