jQuery in Action, 2nd edition*
The moose likes Hadoop and the fly likes Hive Gzip Compression splitting supported now? Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login

Win a copy of Murach's Java Servlets and JSP this week in the Servlets forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "Hive Gzip Compression splitting supported now?" Watch "Hive Gzip Compression splitting supported now?" New topic

Hive Gzip Compression splitting supported now?

Darrel Riekhof

Joined: May 21, 2009
Posts: 1
Does Hadoop automatically support splitting Gzip files into Blocks now? I have read that splitting doesn't work for tables using gzip compression in Hadoop/Hive here:


From the above link: "in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause under-utilization of your cluster's 'mapping' power."

However, when I load my table exactly as they describe, I notice that the gz I load is definately split up into blocks in the place it stores my HDFS files. It looks like this after doing the load:

It is clearly chopping it up into 64 mb blocks during the load to HDFS.

Is this something they have added recently? I'm using Hadoop 1.0.4, r1393290 in psuedo cluster mode.
I agree. Here's the link: http://aspose.com/file-tools
subject: Hive Gzip Compression splitting supported now?
Similar Threads
java.io.FileNotFoundException: Too many open files
How to Enable SSL on Tomcat 7 on Linux?
text file extractor
PGPPublicKeyRingCollection doubts
Issues with Tortoise CVS and Cruise control