Win a copy of OCP Oracle Certified Professional Java SE 11 Programmer I Study Guide: Exam 1Z0-815 this week in the Programmer Certification forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Paweł Baczyński
  • Piet Souris
  • Vijitha Kumara

InvalidFormatException

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi All,

I want to convert Doc, Docx and rtf files to text format.
Below is my code.

import java.io.*;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class DocToText {
public static void main(String[] args) {
File file = null;

try {
// Read the Doc/DOCx file
file = new File("E:\\Search.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();

//write the text in txt file
File fil = new File("E:\\New.txt");
Writer output = new BufferedWriter(new FileWriter(fil));
output.write(text);
output.close();
} catch (Exception exep) {
exep.printStackTrace();
}
}
}

For some files it is working properly, but some file i am getting the exception as
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13].

How to resolve the issue, any help.

Thanks.
 
Sheriff
Posts: 3034
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've never used POI, but judging from the number of questions we get about it here, it must be pretty popular. Here's my process when faced with a cryptic error message:

1. Copy the whole text of the message into a Google search box.
2. Click on whichever result looks the most relevant.

In this case, I get: http://comments.gmane.org/gmane.comp.jakarta.poi.devel/18439, which says:

POI does not support the Open Document Format (odt, ods, etc) files
irrespective of which application created them. If you are looking for a
Java api that will support this type of file then you need to use the
ODFToolkit - http://odftoolkit.openoffice.org/



So, it looks to me like you've got an Open Office spreadsheet there and not an MS Excel one. Is that possible?
 
Sheriff
Posts: 21817
104
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't think it's a ODT file, unless the extension is completely wrong.

Vamshi, what happens if you switch from org.apache.poi.xwpf.extractor.XWPFWordExtractor and org.apache.poi.xwpf.usermodel.XWPFDocument to org.apache.poi.hwpf.extractor.WordExtractor and org.apache.poi.hwpf.HWPFDocument? Perhaps the document isn't a Word 2007/2010 document but comes from a previous version.
 
vamshi gurudu
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for replies.

I had solved the problem
Below is the sample code.

For DOC : org.apache.poi.hwpf.extractor.WordExtractor.WordExtractor(InputStream is) if its throw exception then i am using org.apache.poi.xwpf.usermodel.XWPFDocument.XWPFDocument(InputStream is)

For DOCX : org.apache.poi.xwpf.extractor.XWPFWordExtractor.XWPFWordExtractor

For rtf : javax.swing.text.rtf.RTFEditorKit.RTFEditorKit()

all this will convert to text.

Thanks Rob Spoor your reply gave me to do some experiment.

Thanks both of you for giving such a good information.


 
Rob Spoor
Sheriff
Posts: 21817
104
Eclipse IDE Spring VI Editor Chrome Java Ubuntu Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
 
So I left, I came home, and I ate some pie. And then I read this tiny ad:
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!