Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Duplicated random numbers

 
Barry Harvey
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We have a situation where we are running the same code in separate JVMs. Part of what the code does is generate a file with a name based on the datetime to the second and just to make sure, a random number.

Unfortunately, we've just had a situation where two processes generated exactly the same datetime and, I assume because the dates were the same, the same random number. Both processes were running the same code but were in separate JVMs.

My question is, do I have to use some third shared party (such as referring to our database) to generate a sequence number, or is there some other reference that would make each instance unique?

I was hoping that hashcode would be the answer. Could someone tell me if two objects from different JVMs could have exactly the same hashcode?
 
Ernest Friedman-Hill
author and iconoclast
Marshal
Pie
Posts: 24211
35
Chrome Eclipse IDE Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes they can and do, absolutely.

Can you just use java.io.File.createTempFile() ?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Barry Harvey:
Could someone tell me if two objects from different JVMs could have exactly the same hashcode?


Actually the hashcode isn't even guaranteed to be unique inside a single JVM.
 
Dave Wingate
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Perhaps you could use Process ID or Thread ID to ensure that the file names are unique. I've never had to do this in Java, but it works well in Bash scripting.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We use a multi-part key where we get the first part from a database sequence number and increment a simple counter for the second part. The part sizes are configurable; right now the counter is 3 digits so I go to the database for a new sequence number every 1000 keys. A cluster of six servers shares the database, and the database manages concurrency in getting the first parts.

If don't want to use a database you could designate one server in your cluster as the high-order-part vendor. It could use some persistence scheme or any other technique to make sure it vends unique parts.
 
Barry Harvey
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We're going to try the unix process id idea.

The class involved is purely used to run unix commands by writing them out to scripts and running them. Dropping out to unix to call ps fits in nicely with the workings of the class.

Thanks everyone for the advice.
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Just curious cause I don't know Unix ... if you run PS and get back a list of processes, how do you know which one to use? Could there be a bunch called "java" and maybe some called "ps" ?
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
the command-line options to "ps" can be tuned almost indefinitely; for example excluding processes owned by other users and processes executing in the background or on other TTYs might help. with a bit of ingenuity and a lot of poring through the man page for the ps command, one can usually narrow the field sufficiently.

but really, the best way to do this would be for the Java API to have a method somewhere to return the PID of the current JVM. i can't imagine why one doesn't seem to exist - even if the JVM has been ported to run on platforms that have no equivalent concept to the "process ID", the sensible thing for them to do would be to just throw a checked exception indicating the method is not applicable.
 
Jeroen Wenting
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Back to mathematics and random: random means that there's no guarantee about which will turn up.
Getting the same number 10 times in a row can therefore happen in a random sequence of numbers, but the probability of it happening is low.

Far better to use a timestamp down to millisecond level, maybe in combination with a random number.
 
Horatio Westock
Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If I understand the requirement correctly, you want to have files named differently across a number of different machines, presumably the files are going to some shared network storage or something. I also presume that these files are aggregated to the shared storage by some external process such that the name can't be checked for uniqueness at creation time. (Note that you could write you aggregation process such that it renames a file if it finds a duplicate name during the process).

If this is the case, I'd say that ProcessId is a poor choice. It seems like this is one of the more likely candidates for a collision between servers.

I think Stan's suggestion is probably the best. The servers are requesting a unique id (or block of) from a master process which manages them.

If you really want independantly generated (very likely to be) unique numbers, something like a UUID generator would probably be a good solution.

There are various implementations around: jakarta commons (sandboxed), jini, one called 'JUG'. It would be worth reading the docs for each and seeing which is suitable for you. Some use the MAC address from the network card and this requires some native code.

Hope this helps.
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
a random number appended to a string identifying which process, on which machine, created it, might also be unique. down the lines of some USENET message-id's, maybe; "machine.name.com : process-id-of-some-kind : date-time-stamp : random-number". it'd be long-ish, but filenames don't have to be short, do they?
[ April 07, 2005: Message edited by: M Beck ]
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic