File APIs for Java Developers
Manipulate DOC, XLS, PPT, PDF and many others from your application.
http://aspose.com/file-tools
The moose likes I/O and Streams and the fly likes Which parser to choose for MIME multipart/mixed Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login
JavaRanch » Java Forums » Java » I/O and Streams
Bookmark "Which parser to choose for MIME multipart/mixed " Watch "Which parser to choose for MIME multipart/mixed " New topic
Author

Which parser to choose for MIME multipart/mixed

Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
I want to use a MIME parser to parse a file like the following, which is not an email, nor it is http request body.


The file has one header section, which is in XML format and several sections whose first part is subHeader and second part is binary data.
That is, section immediately after "Content-Location: dataX.bin" is binary data section. The length of this binary data is given inside the XML header ("dataPoints").
Each subHeader section has information specific to the binary data that follow.

Once I parse the XML header part, I know how to extract values from that section.
I want to parse and store the binary data (let's say "data1.bin") into an array for further processing. I am not sure how I will after I get to this point but I want to get to that point first.
(The binary data is encoded in IEEE754 float- 32 bit single precision, I need to convert those bunches to "normal looking numbers"- even though they have 10 to 12 digits and do all sorts of calculation.)

Many samples I read in the internet are for emails.
Google has given me javaMail, mime4j, mimeEntity of IBM/Lotus domino, netscape.messaging.mime.
I am thinking to use mime4j. Or should I use javamail?

There are different types of binary data sections for each subHeader, all properly separated by "--MIME_boundary-2"s, from which I will only need, say "data1.bin", so a blanket reading of all content is not efficient. That's why I want to use a parser and write proper codes (in Java) that basically grabs what is required to grab.

I will have to read 50-60 files like that, so a faster implementation is ideal.

I found a sample at mozgoweb.com/posts/how-to-parse-mime-message-using-mime4j-library/ but it parses emails only.
I will keep on googling, but with the kind of skill I have, I doubt I will find a right one and understand.
Can someone give some idea, an outline, some code snippets if possible, how I read and get those parts.

Thank you for your time reading and possibly giving some hints.

Cheers!
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
In Stackoverflow, the.jxc has written:
Get the "Boundary" and split the message on lines matching "--". Then for each chunks, parse headings until you get to CRLF, CRLF. Then your content starts.

Easy for gurus. Greek for me.
[I do see "0A" "0A" at the end of the "dataX.bin", before beginning the binary part, when I use a Hex editor (FelxHex or HexNeo). That is probably "LF".]
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
I found a site that was like manna for me.
I was able to extract header info, and the mime parts.
I don't know if I am allowed to link this website, so I will just give the name of the file- "SimpleMimeReader.java".

A HUGE part of my worries are gone. JR, Thank you so much.

I will need to modify a section of his code.
My next struggle is how to extract a particular "part" when the Content-Type is multipart/related.
Right now, as it is, the SimpleMimeReader.java extracts all the constituent parts, say, in one block, because it does not
distinguish boundaries inside boundaries. (The secondary boundary being, "MIME_boundary-2")
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2310
    
  49
Thanks for posting your solution so far.
Yes you can post relevant links to other sites but the class name should be enough for any interested party to find the site.

If you get stuck on the next section post your code and the details and we will try to help.
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
Tony:
Thanks for the words of encouragements. Yes, I have made a lot of progress since the last post.
To extract information from within the big "part", which had a boundary like "MIME_boundary-2", all I had to do was to call the SimpleMimeReader.java again, but setting boundary as "--MIME_boundary-2".
Next, I added some lines here and there, basically skipping "parts" that I didn't need. Works beautifully. I am grateful to the author of that program.

My next challenge is how to read/parse a file (or rather, a stream, which if written to a file, would be an xml file) that has not been written.

Here is the scene.

At the very beginning, the above mentioned parser gets me a header section. The header is written in XML. In fact, for test purposes, I wrote that extracted header XML into a file, successfully read it using a DOM parser and extracted what I needed. I can write the files now because I am in test phase. I *do* need information from the headers, but I don't want to write header files for every file I process. There will be many dozens of files, it may slow down processing. (I can probably delete the files after I read it, but if I can get away by not physically writing the files, that would be something!)

That is what I meant by reading a file without writing it.

Here are some snippets how I created header (XML) file, using the SimpleMimeReader.



Somewhere else, I have an xml reader code, snippets:

What would you suggest?
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
Looks like I am not going to get answers.
In the meantime, I have no choice but to create xml header files and later delete using file.delete() command from within the program.
Time to close this thread and move ahead.
Thanks everyone and Cheers!
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
Where is "Close this thread" button? Can't see.
Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2310
    
  49
Tien Shan wrote:Looks like I am not going to get answers.
In the meantime, I have no choice but to create xml header files and later delete using file.delete() command from within the program.
Time to close this thread and move ahead.
Thanks everyone and Cheers!

Sorry for the lack of response, I had marked the thread as resolved after you had posted the solution and hadn't spotted you had posted again.

Reading directly from a stream rather than a file should be simple. The SimpleMimeReader class has a constructor which takes an InputStream as an argument, ie the one you are currently using when you create a FileInputStream object to read the file. So if you have an InputStream for reading the header file just substitute the FileInputStream for it.
Tien Shan
Ranch Hand

Joined: Oct 08, 2004
Posts: 38
Hi TD,
It worked alright, just as you wrote. I was scared even without trying!

One reason I was hesitant was that I am using ByteArrayOutputStream (baos) to write the header information to a file (an xml file).
All I had to do was convert that baos to ByteArrayInputStream- and pass that input stream.

Some approximate code:


Cannot ask for more!

Now, please close this thread or tell me how to do it.

Cheers.

Tony Docherty
Bartender

Joined: Aug 07, 2007
Posts: 2310
    
  49
Glad to hear your problem is now resolved.

As I said earlier the thread is already marked as resolved which is probably why you can't see an icon to close the thread.
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: Which parser to choose for MIME multipart/mixed