Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

file compression and hard disk block sizes

 
paul wheaton
Trailboss
Pie
Posts: 21664
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I read on wikipedia that NTFS has LZ77 compression. Is it always on? Or is it usually off?

Further, while NTFS allows block sizes of up to 64k, I thought most installations used block sizes of about 8k. But some reading today seemed to suggest that a lot of people are using the 64k max. True?

These wacky questions are tied to the idea of having, say, 10000 text files ranging in size from 5k to 100k - averaging around 40k. The design in front of me says that each file is to be compressed to one file. And that these text files are getting an amazing 98% compression! And I'm thinking that it is quite likely that while the disk system is reporting that the file is smaller, the amount of disk space used is probably about the same.

I'm thinking I want to advocate putting all of the files into one zip file instead of the current approach. But I want to get my facts straight first.

Anybody have much knowledge about industry norms with NTFS compression or NTFS block size?
 
Balasubramanian Chandrasekaran
Ranch Hand
Posts: 215
Firefox Browser Linux VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry i am not a expert in this area but,i think this link will help you get your answers.
 
Tim Holloway
Saloon Keeper
Pie
Posts: 18277
56
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Compression is off by default, though IIRC, you can specify it to be on for a directory subtree (all the way up to the root, if desired). Compressed files/directories have their names display differently in the GUI, I believe as well. The Mac equivalent was to italicize, but I think NT just used an alternate font color if the standard preferences were in use.

There's 2 different space-saving mechanisms available here. One is sparse files, such as when you create a 10GB file and write 6 bytes at one end and 4 bytes at the other. NTFS won't allocate any of the intervening 9.999(?) GB until there's actually data for it. I think that feature is always on (I'd have to check the create file function defaults to be sure). The other is actual data compression (LZW or otherwise). Depending on the data, you may see huge space savings or larger files than they would be uncompressed (worst-case scenario).

For the ultimate in compression, a ZIP file is still better, since even if you create a compressed directory, the system overhead for the directory and its files is still more than for a single file containing a ZIP directory. Plus the compression copies when you copy the ZIP.
 
paul wheaton
Trailboss
Pie
Posts: 21664
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Balasubramanian, thanks for the link. I've read a dozen pages like that one. The big question right now is: what's the norm?

Tim, you had me at "off by default".

When you advocate zip, I take it you advocate many files in one zip, not one file per zip?

And Tim: I love that sig!

... I'm currently running XP, and when I look at the properties on a tiny file, it shows "size" and "size on disk". I take it that the difference has to do with the block size. On my machine it is 4k.

Anybody care to share what their block size is?
 
Paul Clapham
Sheriff
Posts: 21316
32
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm using XP too, and my block size is 4KB as well.

But I'm not sure you need to know what's the normal value unless you plan to put those text files on many different computers. Don't you just need to know the value for the computer you plan to put them on?

Or to put it another way, what would you do differently if you found that 20% of disks had an 8KB block size?
 
Marilyn de Queiroz
Sheriff
Posts: 9066
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
AIX has 512K blocks. It seems like you would need to know the block size on an NTFS system.
 
paul wheaton
Trailboss
Pie
Posts: 21664
Firefox Browser IntelliJ IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Marilyn de Queiroz:
AIX has 512K blocks. It seems like you would need to know the block size on an NTFS system.


512K? Half a meg?
 
Marilyn de Queiroz
Sheriff
Posts: 9066
12
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
GPFS offers five block sizes for file systems: 16KB, 64KB, 256KB, 512KB, and 1024KB. You should choose the block size based on the application set that you plan to support:

* The 256KB block size is the default block size and normally is the best block size for file systems that contain large files accessed in large reads and writes.
* The 16KB block size optimizes use of disk storage at the expense of large data transfers.
* The 64KB block size offers a compromise. It makes more efficient use of disk space than 256KB while allowing faster I/O operations than 16KB.
* The 512KB and 1024KB block size may be more efficient if data accesses are larger than 256KB. You may also consider using these block sizes if your RAID (Redundant Arrays of Independent Disks) hardware works optimally with either size.
Reference

On the other hand, I read this
The smallest file extension is 4Kb. If a user creates or extends a file anywhere from 0-4096 bytes, a 4K block will be allocated from the free list to accommodate that request.

So I guess I would have to say that it depends.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic