• Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading MS Word File  RSS feed

 
Maneeesh Saxena
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Ranchers,

I need to write a java code to read a ms word file. I've used org.apache.poi.hwpf.extractor.WordExtractor library for doing that. Below is the code snippet I tried.

package com.test.model;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.poi.hwpf.extractor.WordExtractor;

public class TestClass {

public static void main(String[] args) throws IOException {
File file = null;
file = new File("C:\\Test.docx");
FileInputStream inputStream = new FileInputStream(file);
WordExtractor extractor = new WordExtractor(inputStream);
String text = extractor.getText();
System.out.println(text);
}
}

But I am getting below exception


Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:116)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:153)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:96)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53)
at com.test.model.TestClass.main(TestClass.java:17)

Kindly let me know how can I modify my code to get my work done.

Best Regards
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The error message is very helpful in pointing out exactly what is going wrong, and how to fix it; do you understand what it is trying to tell you? Check the POI javadocs to find the class you need to use instead; it's a subclass of one of WordExtractor's superclasses.
 
Maneeesh Saxena
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your kind reply. Actually I've never used this API ... Was wondering if you can indicate me which class I should use to get my work done ...

Best Regards
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're serious about programming with Java, then you need to get comfortable using the javadoc documentation; the one for POI is at http://poi.apache.org/apidocs/index.html (and is also part of the POI download).

Find the WordExtractor class, look up its superclasses, and then check their respective subclasses for something suitable.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!