This week's giveaway is in the Java in General forum.
We're giving away four copies of Java Challengers and have Rafael del Nero on-line!
See this thread for details.
Win a copy of Java Challengers this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • paul wheaton
  • Devaka Cooray
Sheriffs:
  • Jeanne Boyarsky
  • Tim Cooke
  • Liutauras Vilda
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Piet Souris
Bartenders:
  • salvin francis
  • Mikalai Zaikin
  • Himai Minh

Any tutorial recommendations for compression algorithms?

 
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have a project that needs to compress and decompress files. However, I don't know where to start learning compression algorithms.
The project requires compressing files in .zip format and decompress .zip and .rar files.
I would like to ask for some resources about compression algorithms.
 
Saloon Keeper
Posts: 6928
164
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Are you supposed to code zip (de)compression yourself? If not, the java.util.zip package has classes that can do it for you.
 
Sheriff
Posts: 22153
117
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
And if ZIP or GZIP isn't the compression algorithm you need, you can always check out Apache Commons Compress.
 
Saloon Keeper
Posts: 23540
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
ZIP files have about 4 different compression algorithms available to them. One of them ("store") actually doesn't compress anything at all. You could use that if you just wanted to pack a directory into a single file without compression (although that seems kind of pointless), but the main use is that actually all compression algorithms have a worst-case scenario where the output "compressed" file ends up bigger than the input! This is especially true with small files like you'd store configuration options in.

What ZIP actually does for each eligible file is try its battery of different compression algorithms and then (unless told otherwise) use the one that compresses most for the actual ZIP storae. It does this on a file-by-file basis for all of its inputs. You'll see which one got used by the progress report that ZIP outputs, when it lists each file being processed followed by words like "storing", "compressing", "squeezing" and so forth.

The GZIP program acts similarly, but since it only stores one file, there's no ZIP directory in the output file.

RAR is a bit more problematical, since parts of RAR are proprietary, I believe.
 
Rob Spoor
Sheriff
Posts: 22153
117
Eclipse IDE Spring VI Editor Chrome Java Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Holloway wrote:You could use that if you just wanted to pack a directory into a single file without compression (although that seems kind of pointless)


tar
 
Tim Holloway
Saloon Keeper
Posts: 23540
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If you have tar. It's pretty universal in the Unix world, but an extra-cost add-on for Windows. ZIP is actually more universal. In fact ZIP understands filesystems and OS's you've never heard of. On the Commodore Amiga, for example, you could attach a plain-text note to the directory entry of a file and ZIP would happily include it as part of the ZIP archive.

Also note that compressed tarballs are not per-file compressed like in ZIP. Instead the tarball is first written uncompressed, then the resulting tarball is compressed as a single file. The "z" option for compressed tars was a relatively recent addition added because people got tired of piping tar through gzip or compress.
 
Elya Matsunomi
Greenhorn
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tim Moores wrote:Are you supposed to code zip (de)compression yourself? If not, the java.util.zip package has classes that can do it for you.



Thanks for this! I didn't have any idea that java has this package. I thought I needed to create my own zip package.
Although, I wanna try to code compression algorithm by hand someday in the future.
Now, decompressing .rar files is the main problem. I'll let others do that functionality, It seems hard to do since winrar
is a proprietary software.
 
Tim Moores
Saloon Keeper
Posts: 6928
164
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
https://stackoverflow.com/questions/11647362/using-java-to-extract-rar-files discusses a library called junrar, which can decompress RAR file, although apparently not RAR v5, whatever that is.
 
Tim Holloway
Saloon Keeper
Posts: 23540
161
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Before you get too excited about the ZIP libraries in the core JDK, I should warn you that they cannot do full CRUD on ZIPfile members. The Java ZIP classes can read ZIPfiles (and therefore things like WAR files) and they can create new ZIP files, but they cannot alter what's in a ZIP file on a file by file basis.

The actual question was on compression algorithms, that that's a science unto itself. One of the simplest algorithms is Run Length Encoding (RLE) which replaces multiple instances of a repeated character or bit pattern with a count and character indicator. Not usually a good compactort of data (although I could mention some cases!), but very simple to implement and very fast.

The Lempel-Zev and similar algorithms that can routinely compress almost everything around 50% do statistical analysis on their inputs to look for repeated patterns. It's similar to RLE, except that the patterns can be many bits long, depending on what the algorithm finds in the data. Also the compression is not necessarily bit-aligned so that things can be packed in tighter.

To get a good idea of how variable run-length encoding works, look at Morse Code:

The dot/dash patterns are assigned based on frequency. In English, the letters most likely to occur in normal text are (from most to less frequent) etaionshrdlu... with x and z at the far end of that list. So the Morse patterns for "e" and "t" are simply "dot" and "dash".  "a" and "n" are dot-dash and dash-dot. You'll notice that "m" is a shorter sequence than "o" here, but I don't know exactly why. I'm not 100% sure that the common Morse is solely indebted to English and there are other Morse code dialects as well.
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic