Hi All,
I want to convert
Doc, Docx and
rtf files to text format.
Below is my code.
import java.io.*;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class DocToText {
public static void main(
String[] args) {
File file = null;
try {
// Read the Doc/DOCx file
file = new File("E:\\Search.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();
//write the text in txt file
File fil = new File("E:\\New.txt");
Writer output = new BufferedWriter(new FileWriter(fil));
output.write(text);
output.close();
} catch (Exception exep) {
exep.printStackTrace();
}
}
}
For some files it is working properly, but some file i am getting the exception as
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13].
How to resolve the issue, any help.
Thanks.