• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Access the entry such as entry filename in gzip file

 
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi all,

Have anyone known how to access the entry in gzip file such as the file name just like zip api from java.util.zip does ( by calling getEntry() of ZIPInputStream and then entry.getName() of ZipEntry )

The gzip file I have is created from GZIPOutputStream.

Thanks!
 
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
GZIP compression is very, very different from ZIP compression.

Unlike ZIP, GZIP can only store one single file. The name of the file is usually the same as the GZIP file minus the .gz
This is also why TAR is so popular in combination with GZIP - to pack multiple files (and folders) in one file that can then be GZIPPED.

If you have a file called "myfile.tar.gz", unzipping it with gunzip will create file "myfile.tar". If you rename the .gz file the resulting will also be named differently.


Now if you have not used the same naming approach when creating the GZIP file then there is no way at all to retrieve the original way.
 
Ken Kirin
Greenhorn
Posts: 26
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Rob Prime,

The gzip file I mentioned is not created using tar utility. For example, RDC2008081300085624.DAT.gz

As my understanding if the gz file is created by Tar API you can jump into this gzip file and validate if the entry name in this gzip file is valid or not.

My intention is actually would like to find some ways to validate the entry in the gzip file ( Not created from Tar utility, if it is created from Tar utility I can use, for example, getTarEntry() from the available API ) whether the entry file name is following the validation rule or not. If not, I can just simply reject that gzip file and not allow it to be processed further.

My guess is that there is no way that we can jump into that type of gzip file and get the entry information.

Cheers!
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
The closest thing you can get out of the file is "RDC2008081300085624.DAT". Seeing that it has a timestamp in it (2008-08-13 00:08:58, don't know what the 24 is) you could get it back to RDC.DAT but that's about it.

And TAR was just an example because that's where GZIP is used the most for (at least in the Unix / Linux world). You can use GZIP to compress any single file.
[ September 18, 2008: Message edited by: Rob Prime ]
 
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are wrong.

- gzip DOES store the filename
- gzip DOES allow more files per archive

Read the man page at least.
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You are right about the name; if you use "gunzip -N" you can get the original file name back, unless the file was zipped using "gzip -n". (Guess I learned something new today )
That's not default behaviour though (gunzip -n and gzip -N are defaults), and not supported with the java.util.zip classes.

gzip does NOT support multiple files though; when you pass multiple files as arguments, it will convert each of them into their own .gz file. I read the entire man file, and found nothing about multiple files per archive; only the behaviour I just described.
 
David Balažic
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
From wikipedia:

"the gzip format allows one .gz file to contain multiple compressed files"
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
From both experience and the man pages: nonsense.

And your Wikipedia quote is either older or you misinterpretted it. What I found about multiple files:

Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files. Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip. The final .tar.gz or .tgz file is usually called a tarball.


So yes, according to this quote it is possible to compress multiple files. But in the end, it will turn up as one huge file with all separate file contents chained. So all in all, you can still get one single file from it. You would have to separate it yourself to get the original multiple files back.
 
David Balažic
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It stores them separately.
It is very clearly described there (wikipedia).

No, the GNU gzip v1.3.12 program does not give any easy way to decode each separately. But the format allows and supports it.
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can you post the Wikipedia URL then? Because http://en.wikipedia.org/wiki/Gzip says nothing about multiple files except what I have quoted before.
 
David Balažic
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But that's it.

"Although its file format also allows for multiple such streams[/B] to be concatenated."
The "streams" are "files". They have a length, name and body (file content).

See also RFC 1952. The stream is called "member" there.

It stores one complete compressed file. And the GZ file can have more such streams/members.
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
But it also says "zipped files are simply decompressed concatenated as if they were originally one file". So adding multiple files is no problem. Getting them back as multiple files is, because you only get one file back.
 
David Balažic
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
That depends on the tool used for extraction. I believe the original poster wants to use some Java code and not the GNU zip utility.
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hmm, we are getting quite offtopic

Conclusion: it's not possible with Java. You can only retrieve the stored contents as one large stream which would have to be separated by the programmer himself.
 
David Balažic
Ranch Hand
Posts: 98
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It is not possible with the current version of java.util.zip.GZIPInputStream.

It is quite possible in Java (by writing own code to do it).
 
Rob Spoor
Sheriff
Posts: 22781
131
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by David Balažic:
It is not possible with the current version of java.util.zip.GZIPInputStream.


That's what I meant. Thanks for the correction
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic