Granny's Programming Pearls
"inside of every large program is a small program struggling to get out"
JavaRanch.com/granny.jsp
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Using Apache Commons TarArchiveInputStream results in corrupt un-archived files  RSS feed

 
Ron McLeod
Saloon Keeper
Posts: 1564
222
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a web service which consumes a TAR archive of binary files and passes them as a List of files to another subsystem for processing. I am using the Apache Commons Compress library (version 1.11) to work with the TAR formatted input.

I am finding that when I use TarArchiveEntry#getSize to determine the size, allocate storage for the contents, and then use TarArchiveInputStream#read, that the file may be different than original files included in the uploaded archive. If I just read from the TarArchiveInputStream in chunks through, the resulting files are fine. I noticed that the smallest file (length of 594 bytes) in the archive of 3 was the same with both implementations.

This is my first time working with the library so I am probably missing something. Any ideas or suggestions?


Working Code
Console output:
Contents MD5: df194ba4f2fe114be709c5605839930f (9627051 bytes)
Contents MD5: 3996f04fc6a830520c336825ef5afc1b (508571 bytes)
Contents MD5: 1cf5fca3f6209042fac634f718d30d43 (594 bytes)


Problematic Code
Console output:
Contents MD5: 3ee34d1e3ad7761303107cf9c3a5f6ad (9627051 bytes)
Contents MD5: c5c5dd952977fa6068d717586e57d9a8 (508571 bytes)
Contents MD5: 1cf5fca3f6209042fac634f718d30d43 (594 bytes)
 
Tony Docherty
Bartender
Posts: 3264
81
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've not used TarArchiveInputStream but if the read() method conforms to the standard InputStream read() (which it almost certainly will as it extends the InputStream class) then the problem you are facing is because read does not guarantee filling the buffer you pass in. The API docs state "Reads up to len bytes of data from the input stream into an array of bytes. An attempt is made to read as many as len bytes, but a smaller number may be read. The number of bytes actually read is returned as an integer." so you need to check the returned value and if it isn't the same as len then read again and again until the total number of bytes read in is equal to len.
 
Ron McLeod
Saloon Keeper
Posts: 1564
222
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Tony - I'm sure that's what is it.
 
Rob Spoor
Sheriff
Posts: 21050
85
Chrome Eclipse IDE Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Which points out another (potential) bug in your code:
Ron McLeod wrote:Working Code

0 is a perfectly fine result for InputStream.read(byte) which does not indicate that the stream is exhausted. You should use >= 0, > -1 or != -1.
 
Ron McLeod
Saloon Keeper
Posts: 1564
222
Android Angular Framework Eclipse IDE Java Linux MySQL Database Redhat TypeScript
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yikes -- I'm getting sloppy. Thanks for pointing that out.
 
Rob Spoor
Sheriff
Posts: 21050
85
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!