Win a copy of Programmer's Guide to Java SE 8 Oracle Certified Associate (OCA) this week in the OCAJP forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Question about file comparison

 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello there,

I have 500k files in GFS, and every time I add a new file into system, I need compare the new one with the other 500k files to see whether it exists or not, if no, add it to the system.

here is my question: How I can design a effective method to make comparison? using database? or some other way? please help me out, any suggestions will be much appreciated.
 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ps: I only need compare the name of files...
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What is a "GFS"?
 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the files are stored in Global File System
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Will File.exists() do what you want ?
 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
sure it can, but how much time dose it cost, I need handle a huge number of articles per day, so I'm looking for the most effective way to achieve the goal
 
Joanne Neal
Rancher
Posts: 3742
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by jonathan:
sure it can, but how much time dose it cost, I need handle a huge number of articles per day, so I'm looking for the most effective way to achieve the goal


I would imagine that will depend on your file system. Try it with File.exists() and see if it meets your performance requirements. If it doesn't then try something else. As long as you design your system correctly it should be straightforward to change the code that does the check without having to modify the rest of your system.
 
Ulf Dittmer
Rancher
Posts: 42968
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Could you spare 10 or 20MB to cache the file names in memory, say, in a List?
 
Jesper de Jong
Java Cowboy
Saloon Keeper
Pie
Posts: 15436
41
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"jonathan", please check your private messages. You can see them by clicking My Profile.
 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Ulf Dittmer:
Could you spare 10 or 20MB to cache the file names in memory, say, in a List?

I know that's the way to solve the problem, the key point here is which way is more effective: using File.Exist() or hashtable
 
Jim Yingst
Wanderer
Sheriff
Posts: 18671
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would expect that a HashSet would be fastest. (I would never use a List for this.) But any Collection or Map will take some memory, so as Ulf indicated, you need to determine if the amount of memory required is acceptable, and if it's worth the increased speed. In general I would think that File.exists() is pretty fast in the first place, but there's really no way for us to determine if it will be fast enough for you. Seems like it would be pretty easy to just write the code yourself and see how fast it is. Using File.exists() is very simple, and using a HashSet is only a little more complex. It should be easy to change your code from using one to using the other, if you need to. Asking people here won't really answer your question, I think. Try it and see.
 
jonathan ford
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jim Yingst:
I would expect that a HashSet would be fastest. (I would never use a List for this.) But any Collection or Map will take some memory, so as Ulf indicated, you need to determine if the amount of memory required is acceptable, and if it's worth the increased speed. In general I would think that File.exists() is pretty fast in the first place, but there's really no way for us to determine if it will be fast enough for you. Seems like it would be pretty easy to just write the code yourself and see how fast it is. Using File.exists() is very simple, and using a HashSet is only a little more complex. It should be easy to change your code from using one to using the other, if you need to. Asking people here won't really answer your question, I think. Try it and see.


it helps a lot, I'll try it
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic