• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Liutauras Vilda
  • Ron McLeod
Sheriffs:
  • Jeanne Boyarsky
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Scott Selikoff
  • Tim Holloway
  • Piet Souris
  • Mikalai Zaikin
  • Frits Walraven
Bartenders:
  • Stephan van Hulst
  • Carey Brown

InvalidFormatException

 
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I want to convert Doc, Docx and rtf files to text format.
Below is my code.

import java.io.*;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;

public class DocToText {
public static void main(String[] args) {
File file = null;

try {
// Read the Doc/DOCx file
file = new File("E:\\Search.doc");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();

//write the text in txt file
File fil = new File("E:\\New.txt");
Writer output = new BufferedWriter(new FileWriter(fil));
output.write(text);
output.close();
} catch (Exception exep) {
exep.printStackTrace();
}
}
}

For some files it is working properly, but some file i am getting the exception as
org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13].

How to resolve the issue, any help.

Thanks.
 
Sheriff
Posts: 3064
12
Mac IntelliJ IDE Python VI Editor Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I've never used POI, but judging from the number of questions we get about it here, it must be pretty popular. Here's my process when faced with a cryptic error message:

1. Copy the whole text of the message into a Google search box.
2. Click on whichever result looks the most relevant.

In this case, I get: http://comments.gmane.org/gmane.comp.jakarta.poi.devel/18439, which says:

POI does not support the Open Document Format (odt, ods, etc) files
irrespective of which application created them. If you are looking for a
Java api that will support this type of file then you need to use the
ODFToolkit - http://odftoolkit.openoffice.org/



So, it looks to me like you've got an Open Office spreadsheet there and not an MS Excel one. Is that possible?
 
Sheriff
Posts: 22818
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I don't think it's a ODT file, unless the extension is completely wrong.

Vamshi, what happens if you switch from org.apache.poi.xwpf.extractor.XWPFWordExtractor and org.apache.poi.xwpf.usermodel.XWPFDocument to org.apache.poi.hwpf.extractor.WordExtractor and org.apache.poi.hwpf.HWPFDocument? Perhaps the document isn't a Word 2007/2010 document but comes from a previous version.
 
vamshi gurudu
Greenhorn
Posts: 21
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for replies.

I had solved the problem
Below is the sample code.

For DOC : org.apache.poi.hwpf.extractor.WordExtractor.WordExtractor(InputStream is) if its throw exception then i am using org.apache.poi.xwpf.usermodel.XWPFDocument.XWPFDocument(InputStream is)

For DOCX : org.apache.poi.xwpf.extractor.XWPFWordExtractor.XWPFWordExtractor

For rtf : javax.swing.text.rtf.RTFEditorKit.RTFEditorKit()

all this will convert to text.

Thanks Rob Spoor your reply gave me to do some experiment.

Thanks both of you for giving such a good information.


 
Rob Spoor
Sheriff
Posts: 22818
132
Eclipse IDE Spring Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You're welcome.
 
Everyone is a villain in someone else's story. Especially this devious tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic