OK, so one of my teachers from college just made me do a file compressing utility, using the Huffman encoding. I don't have much problem with the code tree, since I've been reading quite a bit about data structures in Java. The problem, however, is with the encoding process itself.
The thing is, that Huffman encoding creates code words of absolutely any length, from 1 bit to whatever the **** you get. But guess what? A computer can only handle full bytes. That's a big problem for me, 'cause I have absolutely no idea on how to handle these "partial bytes", and I don't think padding with zeroes would do, because that would make my program much, much less efficient. I've been looking on the Internets a way to read and handle "partial bytes" with Java, partly because I'm really good at Java, partly because I know absolutely nothing about C++, but so far I've found nothing at all.
Any idea on how can I keep my Huffman encoding efficient?
You may want to ask your professor on what you are allowed to use.
If I remember correctly, way back in college, when I did this assignment, I was expected to use the bit operators to mask and shift the bits into place. I did it in C back then -- I wanted to do it in assembly (for my Commodor 64), but my professor insisted that it ran on the school's DEC system.
Anyway, Java has all of the bit operators from the C language, so you should check if this is what you are expected to use.
Originally posted by Acoyani Garrido Sandoval: But guess what? A computer can only handle full bytes. That's a big problem for me, 'cause I have absolutely no idea on how to handle these "partial bytes", and I don't think padding with zeroes would do, because that would make my program much, much less efficient.
As others have said, why worry about efficiency.
Sounds like a good assignment. One thing it will teach you is that its only modern computers driven from a particular set of engineering assumptions that even have a concept of a 'byte' which seems to be universally defined as 8 bits.
Some of the greatest old computers were word oriented, and could pick up and write any number of bits anywhere in the word (which was often 36 bits). So it is good to break out of the mindset that a byte is an important concept, and that all bytes contain 8 bits.
Most compression systems use heavy packing, not only Huffman, but LZW, and similar approaches.
You can always use shift and XOR/AND/OR operators to do anything.
Every snowflake is perfect and unique. And every snowflake contains a very tiny ad.