Only 44 hours left in the trailboss' kickstarter!

New rewards and stretch goals. CLICK HERE!



  • Post Reply Bookmark Topic Watch Topic
  • New Topic

fast list/map??  RSS feed

 
olze oli
Ranch Hand
Posts: 169
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi,

following problem:
i have a textfile which contains about 20.000 md5sums and some more informations
and i have a directory which contains files called like those md5sums

i created a class which contains the md5sum and some further informations (retrieved from the txt)

i know need to check if every file in the directory is in the textfile, and if so, store the further informations about that file in the class (thats why i need a extra class)

whats the fastest way to do this? when i use a normal linkedlist i need about 30min
i even loaded the whole file into ram which reduces the time (for searching etc.) to about 15min

 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24215
37
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Checking for an item in a list is quite slow, because you have to search one-by-one through the list. Finding something in a Java Set, though, is quite fast: TreeSet and especially HashSet lookups take very little time even for huge sets. Read your list of file names and put them into a HashSet, and then check whether the HashSet contains each name -- you should find this to be enormously faster.
 
olze oli
Ranch Hand
Posts: 169
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
what you said i just read on another website but thats really confusing me
how can it be faster to put some thousands objects in a hashsetand iterate over it until the matching one is found?
i thought using a list with binarySearch should be the fastest as possible way to get this

edit: when i use hashset how can i get an object?
 
Matthew Brown
Bartender
Posts: 4568
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Don't iterate over it - just use the contains() method. HashSet is written to optimise this sort of thing.

Another option would be to use a HashMap, mapping the MD5 sum to the object you want to store. That'll give you very fast lookup based on the MD5 sum - much faster than iterating over a list. And that has the advantage of giving you access to the object.
 
olze oli
Ranch Hand
Posts: 169
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
that sounds pretty good
i will try this tomorrow

thanks!
 
Ernest Friedman-Hill
author and iconoclast
Sheriff
Posts: 24215
37
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A TreeSet would be about as fast as a binary search on a list (as long as you only have to sort the list once!) But a HashSet is even faster. You can read on Wikipedia how hash tables work and why they are fast.

I'm afraid I don't understand your last question; can you rephrase it?

Also, I'm a little concerned -- I can't imagine how 20,000 binary searches could take 15 minutes or more; there must be something else going on.
 
olze oli
Ranch Hand
Posts: 169
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
its working pretty fast now with a hashmap where the md5 is the key and the value is the object i want to retrieve
thanks
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!