• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

digesting a file

 
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Is it possible to hash a file of size 1 GB with MessageDigest class. All the update method of the MessageDigest class are having only byte[] as argument. I guess reading the entire file(of size 1GB) and creating a byte array and updating it in to the MessageDigest class is not a good idea. Instead of doing that any other ways/API supports updating a FileInputStream to a MessageDigest class is available? which will take care of intelligent hashing of large files.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
As I understand it - you would read some convenient buffer length of bytes and call update repeatedly until the entire file is processed. Watch out for the last bufferload being less than the full byte[] - thats what the
update( byte[], start, length )
method call is for.
Bill
 
muthu muruges
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the reply. I tried with multiple updates(of MessageDigest class) and it is taking 80 seconds to digest a single file of size 1.4GB. I believe it is huge time to get a hash of a file. In my system I am having billions of files that need to be hashed. Please suggest me if you have any other mechanism to do it very quickly.
 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
m muruges

This display name doesn't comply to the JavaRanch naming rules described here

We require your display name to be two words: your first name, a space, then your last name. Obviously short first name (like that above) is not allowed. Accounts with invalid display names get deleted, often without warning.

If you want to get help from this community, we require that you follow the rules of the community.

Thanks, Anna
[ May 31, 2004: Message edited by: Anna Kapricornikova ]
 
muthu muruges
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Now my display name is "name policy" compliance. Can you please answer my question?
 
author and iconoclast
Posts: 24207
46
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
80 seconds is a long time, but the digest algorithm is complicated enough that it does take a long time to process your 1.4 billion bytes, one at a time. Let's say you have a 1 GHz processor, and it spends only 12 cycles on each byte (this would be a very fast digest algorithm even in native code ;) that's 12 nanoseconds * 1.4 billion = 17 seconds right there, if you've got enough RAM to hold your whole program and all the data in RAM at once. If you don't have a couple GB of RAM, then you're going to spend a lot of time swapping data onto disk, as well. Now, factor in that Java isn't quite as fast as native code, and you're already in the ballpark of your 80 seconds -- so although I can imagine you possibly speeding things up by a factor of 2 or so, you're not going to do any better than that.
[ May 31, 2004: Message edited by: Ernest Friedman-Hill ]
 
muthu muruges
Ranch Hand
Posts: 70
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks for the update. I will try it with 4GB RAM.
 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

I have been looking for some help with generating an MD5 hash of a file.
I got it wrong, when i tried the below code ..i am getting wrong hash values. I think its with the way i update chunks.

any help would be useful.
Thanks

public String getMD5HashofFile(String fileFullname)
{
try{
MessageDigest digest1=MessageDigest.getInstance("MD5");
File oFile;
FileInputStream oFS;
BufferedInputStream inputBuff;

oFile = new File(fileFullname);
long iLengthofData = oFile.length();
oFile = null;

oFS = new FileInputStream(fileFullname);
inputBuff = new BufferedInputStream(oFS);

int iChunkSize = (iBuffLenForHashingFile * 1024);
long iBytesRead = 0; //this will actually be bytesRead or Hashed, of file.

byte[] buff1 = new byte[iChunkSize];

digest1.reset();
String finalHash="";
while (iBytesRead < iLengthofData)
{
int iActualBytesRead;
iActualBytesRead=inputBuff.read(buff1,0,iChunkSize); //bytes read.

//digest1.reset();
digest1.update(buff1,0, iActualBytesRead); //update the MessageDigest.
//byte[] digestBuf = digest1.digest(); //get hash
//finalHash += new String(digestBuf);

iBytesRead += iChunkSize;
}
digest1.reset();
byte[] digestBuf = digest1.digest(); //get hash
return new String(digestBuf);
//return finalHash;

}catch (Exception e){
return null;
}
}
}
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic