• Post Reply Bookmark Topic Watch Topic
  • New Topic

reading SMTP mail text file

 
larry upnorth
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am reading what appears to be an ascii file with some sort of non-ascii header and footer/trailer. The beginning of the file contains approx. 26 bytes of what looks to be a header followed by cr/nl chars, then a bunch of spaces before the actual data I need to be at. After the header the file appears to be an ascii file, ending with 7 bytes, two of which are a cr/nl pair.
My question is, what is the best way to read this file? Should I just open a FileInputStream and skip the first 26/28 bytes and then read the rest of the file one byte at a time or can I switch to another kind of reader adn read a line at a time?
Finally, what to do about the trailer??
Thanks very much for you input
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
My question is, what is the best way to read this file? Should I just open a FileInputStream and skip the first 26/28 bytes and then read the rest of the file one byte at a time or can I switch to another kind of reader adn read a line at a time?
That (the latter) is what I would recommend, if not for that pesky trailer. Unless you can learn more about the trailer - like, is it always the same? If we read it with a Reader, what do the bytes look like? But I'm thinking that unless you can establish any other definite rules for what the trailer contains, you should just avoid trying to interpret it with a Reader; you'll just get gibberish.
Hmmm... if the files aren't too horribly long, the easiest thing might be something like:

[ January 30, 2003: Message edited by: Jim Yingst ]
 
larry upnorth
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jim, thanks for your reply. I'm including a copy of the code snippet you created along with my comments so you can see what I think it does.
File file = new File("foo.txt");
// create stream for input file
InputStream in = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream(file.length());
byte[] buffer = new byte[1024];
// read number of bytes equal to size of bufer
for (int count; (count = in.read(buffer)) != -1; ) {
// write everything in buffer to output stream
out.write(buffer, 0, count);
}
// create new byte array
byte[] array = out.toByteArray();

String encoding = "ISO-8859-1"; // or whatever
// create new string utilizing specified encoding
String text = new String(array, 28, (array.length - (28+7)), encoding));
At this point I'm thinking I didn't do a very good job of explaining my question. Since each record in the file is followed by a cr/nl pair, would it be Ok to utilize a method in one of the file input classes to read up to and including the cr/ln. Once I have the record I will need to identify if it consists of ascii characters, meaning one I want to parse and write out to a new file with some reformatting, or ignore it as being a header or trailer record. What class(es) and methods do you think are safe for doing this?
Thanks in advance
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
// read number of bytes equal to size of bufer
This is more like, read number of bytes up to the size of the buffer. May be less if (a) there's no more input, (b) file is fragmented and the hard drive needs time to reposition to deliver the remaining bytes, or (c) a system buffer somewhere is smaller than our byte buffer, and the FileInputStream implementation has elected to use the smaller buffer size for efficiency. Or other possibilities.
As for your last paragraph: looking at the bytes in a line to decide if they're really ascii or not may be simple, or it may be complex. What if the headers/trailers usually use byte values that fall in the expected ascii range, but sometimes don't? We need some sort of reliable rule to identify if bytes are part of the header, the tail, or text. From your original problem description I gathered that the one thing that was certain was that the first 28 bytes were header, and the last 7 were trailer (including \r\n in both cases). So I made use of that. I'm pretty happy with the byte array solution unless the file sizes are prohibitively large; most alternatives I can imagine get a lot more complex, and don't seem worth the time to work on.
One modification you might prefer is to replace the final "new String" with something like

Now you can read individual lines with the readLine() method of BufferedReader. Which is usually the "normal" way to read lines from a text file, so it may be seen as preferable.
[ January 31, 2003: Message edited by: Jim Yingst ]
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!