• Post Reply Bookmark Topic Watch Topic
  • New Topic

Reading MS Word File

 
Maneeesh Saxena
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi Ranchers,

I need to write a java code to read a ms word file. I've used org.apache.poi.hwpf.extractor.WordExtractor library for doing that. Below is the code snippet I tried.

package com.test.model;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.poi.hwpf.extractor.WordExtractor;

public class TestClass {

public static void main(String[] args) throws IOException {
File file = null;
file = new File("C:\\Test.docx");
FileInputStream inputStream = new FileInputStream(file);
WordExtractor extractor = new WordExtractor(inputStream);
String text = extractor.getText();
System.out.println(text);
}
}

But I am getting below exception


Exception in thread "main" org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:116)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:153)
at org.apache.poi.hwpf.HWPFDocumentCore.verifyAndBuildPOIFS(HWPFDocumentCore.java:96)
at org.apache.poi.hwpf.extractor.WordExtractor.<init>(WordExtractor.java:53)
at com.test.model.TestClass.main(TestClass.java:17)

Kindly let me know how can I modify my code to get my work done.

Best Regards
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The error message is very helpful in pointing out exactly what is going wrong, and how to fix it; do you understand what it is trying to tell you? Check the POI javadocs to find the class you need to use instead; it's a subclass of one of WordExtractor's superclasses.
 
Maneeesh Saxena
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for your kind reply. Actually I've never used this API ... Was wondering if you can indicate me which class I should use to get my work done ...

Best Regards
 
Ulf Dittmer
Rancher
Posts: 42970
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you're serious about programming with Java, then you need to get comfortable using the javadoc documentation; the one for POI is at http://poi.apache.org/apidocs/index.html (and is also part of the POI download).

Find the WordExtractor class, look up its superclasses, and then check their respective subclasses for something suitable.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!