Win a copy of Spring Boot in Practice this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • Jeanne Boyarsky
  • Paul Clapham
Sheriffs:
  • Liutauras Vilda
  • Henry Wong
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Al Hobbs
  • Carey Brown
Bartenders:
  • Piet Souris
  • Mikalai Zaikin
  • Himai Minh

Fastest way of calculating MD5

 
Greenhorn
Posts: 8
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I want to calculate MD5 values for a large Files I am using the following Code

public String Checker(File fi) throws NoSuchAlgorithmException,
FileNotFoundException {
MessageDigest md = MessageDigest.getInstance("MD5");
StopWatch stopWatch = new StopWatch();
InputStream is = new FileInputStream(fi);
byte[] buffer = new byte[4000];
int read = 0;
try {
stopWatch.start();
while ((read = is.read(buffer)) > 0) {
md.update(buffer, 0, read);
}
byte[] md5sum = md.digest();
BigInteger bigInt = new BigInteger(1, md5sum);
String output = bigInt.toString(16);
System.out.println("MD5 : " + output);
stopWatch.stop();
long s = stopWatch.getTime();
System.out.println("MD5 Time taken: " + s);
return output;
} catch (IOException e) {
throw new RuntimeException("Unable to process file for MD5", e);
} finally {
try {
is.close();
} catch (IOException e) {
throw new RuntimeException(
"Unable to close input stream for MD5 calculation", e);
}
}

}

For Million record file it takes 55 seconds


How can i increase the performance(ie decrease the processing time)

Any Suggestions or Code would help

Thanks take care
 
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Processing a million of anything will take some time.
How long does it take to run the file through md5sum? I'd think that's pretty much as fast as you'll get.
 
Ranch Hand
Posts: 862
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How much time is IO taking, and how much time is the digest taking? You should time the IO sepeartely and see how long just IO takes. I am not that familiar with IO in java, however you should make sure your IO is buffered, and also ensure that you are using the fastest IO classes available.

Also, not sure what you are doing with this message digest when you are done. If you simply want to compare it to other files you receive, there may be faster tests you can do on the input files and only if these fail calculate the message digest (i.e. for example do the number of bytes, or how rows match the original?)
 
author
Posts: 14112
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, try wrapping your FileInputStream in a BufferedInputStream.
 
Rancher
Posts: 4686
7
Mac OS X VI Editor Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You can just comment out the actualy MD5 calculation, what's left is the time to read the file.

In general, MD5 and SHA can calculate values much faster than IO.

Do the buffering suggestions mentioned upthread.
 
Ranch Hand
Posts: 1970
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Recommending buffering the input stream seems wrong here, to me. The input is already being read in biggish chunks, using a 4000-byte buffer.

Experimenting with the size of this buffer would make more sense. I would hazard a guess that a bigger buffer might give slightly better performance. But measurement is the only way to know for sure.

Putting an additional buffer in the way, as with BufferedInputStream, seems to me unlikely to help. More likely, it will make it very slightly slower.
 
Ranch Hand
Posts: 66
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
You mentioned you were reading a million record. Even if a record is around 0.5 Kb, its almost 500Mb.

So I tried reading a 500Mb file. And adding a simple BufferedInputStream with 16Kb brings down the response time by 1/3.

InputStream is = new BufferedInputStream(new FileInputStream(fi), 16000);
byte[] buffer = new byte[16000];

~Rajesh.B
 
Joe Ess
Bartender
Posts: 9626
16
Mac OS X Linux Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by rajesh bala:

So I tried reading a 500Mb file. And adding a simple BufferedInputStream with 16Kb brings down the response time by 1/3.



Did you try it just reading 16k at a time from an InputStream?
My guess is it would be pretty close. As Peter suggested before, the performance improvement is from the size of the buffer, because all BufferedInputStream does is duplicate the effort of reading a chunk at a time.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator


I suggest you use a buffer size that matches file system allocation blocks, just to simplify. These are always powers of two - try 4096 for example.

Timing tests to determine the optimum gets tricky because the operating system and possibly the hard drive itself will be buffering large chunks of data.

Bill
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Ive actually heard of a way faster then 55 seconds.
 
reply
    Bookmark Topic Watch Topic
  • New Topic