• Post Reply Bookmark Topic Watch Topic
  • New Topic

InvalidFormatException  RSS feed

 
vamshi gurudu
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I want to convert Doc, Docx and rtf files to text format.
Below is my code.

import java.io.*;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class DocToText {
public static void main(String[] args) {
File file = null;

try {
// Read the Doc/DOCx file
file = new File("E:\\Search.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();

//write the text in txt file
File fil = new File("E:\\New.txt");
Writer output = new BufferedWriter(new FileWriter(fil));
output.write(text);
output.close();
} catch (Exception exep) {
exep.printStackTrace();
}
}
}

For some files it is working properly, but some file i am getting the exception as
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13].

How to resolve the issue, any help.

Thanks.
 
Greg Charles
Sheriff
Posts: 3014
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've never used POI, but judging from the number of questions we get about it here, it must be pretty popular. Here's my process when faced with a cryptic error message:

1. Copy the whole text of the message into a Google search box.
2. Click on whichever result looks the most relevant.

In this case, I get: http://comments.gmane.org/gmane.comp.jakarta.poi.devel/18439, which says:

POI does not support the Open Document Format (odt, ods, etc) files
irrespective of which application created them. If you are looking for a
Java api that will support this type of file then you need to use the
ODFToolkit - http://odftoolkit.openoffice.org/


So, it looks to me like you've got an Open Office spreadsheet there and not an MS Excel one. Is that possible?
 
Rob Spoor
Sheriff
Posts: 21050
85
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think it's a ODT file, unless the extension is completely wrong.

Vamshi, what happens if you switch from org.apache.poi.xwpf.extractor.XWPFWordExtractor and org.apache.poi.xwpf.usermodel.XWPFDocument to org.apache.poi.hwpf.extractor.WordExtractor and org.apache.poi.hwpf.HWPFDocument? Perhaps the document isn't a Word 2007/2010 document but comes from a previous version.
 
vamshi gurudu
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for replies.

I had solved the problem
Below is the sample code.

For DOC : org.apache.poi.hwpf.extractor.WordExtractor.WordExtractor(InputStream is) if its throw exception then i am using org.apache.poi.xwpf.usermodel.XWPFDocument.XWPFDocument(InputStream is)

For DOCX : org.apache.poi.xwpf.extractor.XWPFWordExtractor.XWPFWordExtractor

For rtf : javax.swing.text.rtf.RTFEditorKit.RTFEditorKit()

all this will convert to text.

Thanks Rob Spoor your reply gave me to do some experiment.

Thanks both of you for giving such a good information.


 
Rob Spoor
Sheriff
Posts: 21050
85
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!