• Post Reply Bookmark Topic Watch Topic
  • New Topic

unique id / not too long  RSS feed

 
matthew kane
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi, I am looking for a program to generate a unique alphanumerical identifier that is not too long; for example would start out with 6 digits like a licence plate or a postal code ex: AAA001 and use up all the possible combinations until 999ZZZ (just an example) and then when the possibilities are exhausted a 7th digit is added and so on. It matters not if they are sequential, the identifier just needs to be unique and not be too hard to remember (also i don't want to use ip adress or any personal identification). Any insight into how I can accomplish this using java would be appreciated. It is for a non profit humanitarian cause that relies on anonymous participation. many thanks!
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
000001, 000002, 000003, 000004 ...
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:hi, I am looking for a program to generate a unique alphanumerical identifier that is not too long...

Other than a sequence number (which have their own issues; particularly in multi-user situations): trickier than you might think.

Is this an ID or a PIN? The latter tend to be fairly random (and you could make them so), and also memorable to the user - like passwords - but are not unique unless you force them to be.
Another possibility is a good hashcode, but even they can't be guaranteed to be unique.
Yet another is a UID (java.rmi.server.UID) or UUID (java.util.UUID), but neither are particularly short.

Any insight into how I can accomplish this using java would be appreciated. It is for a non profit humanitarian cause that relies on anonymous participation.

A few:
1. (as Richard said) A sequence number.
2. A code (ie, a password/PIN) chosen by the prospective user. If it must be unique, then it must be rejected if already in use.
3. A random code chosen by the machine, with the same constraints as (2).

For the last two, it's best if you have some idea of how many people are likely to use the service, because you'll need enough possible values to choose from to avoid the "Birthday" paradox. Option 2 is also far more prone to "collisions" (and therefore rejection, if it must be unique).

Oddly enough, a truly random 6-character alphanumeric code (as opposed to the "pattern" you suggest in your OP) allows almost exactly the same number of combinations as an int (2,116,316,160 to be precise), which should certainly allow for several million "users" without too much probability of collision.

HIH

Winston
 
matthew kane
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
it is an id not a pin. it needs not have a pattern. it needs to be as short as possible (alphanumeric allows it to be shorter than with just numbers). needs to accomodate millions of unique id's(does anyone know how many unique possibilities there are with 6 digit alpha-numeric ?). it is to be used as the unique confidential identifier in a database. the confidential part makes it so that i can't use ip adress or mac adress to form the id.
 
Paweł Baczyński
Bartender
Posts: 2087
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:it is an id not a pin. it needs not have a pattern. it needs to be as short as possible (alphanumeric allows it to be shorter than with just numbers). needs to accomodate millions of unique id's(does anyone know how many unique possibilities there are with 6 digit alpha-numeric ?). it is to be used as the unique confidential identifier in a database. the confidential part makes it so that i can't use ip adress or mac adress to form the id.


Exactly 6-character long? And case insensitive?
Well there are 26 letters in latin alphabet and there are 10 digits. Together 36 distinct characters.
So the number of distinct 6-character sequences is 36^6 = 2,176,782,336

@Winston, how did you get 2,116,316,160?
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pawel Pawlowicz wrote:Well there are 26 letters in latin alphabet and there are 10 digits. Together 36 distinct characters.
So the number of distinct 6-character sequences is 36^6 = 2,176,782,336
@Winston, how did you get 2,116,316,160?

Because I was overthinking.

I was thinking of it like a number, so that in order to have 6 characters, the number had to be >= 36^5, which of course is nonsense.

Damn Alzheimers...

Winston
 
Paweł Baczyński
Bartender
Posts: 2087
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ah right, 36^6 - 36^5 = 2,116,316,160 ;)

I thought it might be 36^6 - 36^5 before I asked, but I calculated it incorrectly that time :P
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:it is an id not a pin. it needs not have a pattern. it needs to be as short as possible (alphanumeric allows it to be shorter than with just numbers). needs to accomodate millions of unique id's(does anyone know how many unique possibilities there are with 6 digit alpha-numeric ?). it is to be used as the unique confidential identifier in a database. the confidential part makes it so that i can't use ip adress or mac adress to form the id.

Millions of IDs suggests to me that you're going to be storing them in a database, so a sequence number is probably best. The main thing is that these should only be established when the new user is added (indeed, many databases allow for a primary key to be based on a sequence number); otherwise you have to start worrying about "gaps", which usually precludes a "your ID will be" until you've completed the registration process.

OTOH, if there is any security aspect to this ID - ie, you don't want people to simply "guess" an ID - then a sequence is probably not a good solution. Unfortunately, you're then into the realm of random or time-based ID's, and these are usually quite big. However, if that's the case, you should probably combine an ID with some sort of non-unique PIN or password.

Also: Don't confuse how the ID is stored (database ID's are usually integers) with how it's displayed (see Integer.toString(n, radix)) or input (Integer.parseInt(str, radix)).

HIH

Winston
 
matthew kane
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
sequential would be ok and 2 billion would be plenty( i think there may be an error in your formula; you can't count the number 10 since it is double digits : 1 to 9 + a to z is 35) how much is that?
I want users to answer questions and when they complete the "quiz" the sequencial alpha-numeric code is generated and submitted along with the answer to the questions into the database and user can jot down the short unique identifier for future reference. Only the user will know which code is theirs and site will not have cookies or collect any personal identification data (a completely anonymous participation)
 
Paweł Baczyński
Bartender
Posts: 2087
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:sequential would be ok and 2 billion would be plenty( i think there may be an error in your formula; you can't count the number 10 since it is double digits : 1 to 9 + a to z is 35) how much is that?

Well... There are 10 digits, you know. 0 to 9. I'm not counting number 10.
You didn't say you don't want zeroes. And if you don't want zeroes then there are 35 different characters. I think you should be able to calculate how many combinations that would be based on previous posts.
 
Steve Fahlbusch
Bartender
Posts: 612
7
Mac OS X Python
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If in fact you want both letters and number (assuming only 1 case of letter) and there is no pattern, like: LLLNNN;

Then you are probably want to cull certain letters / numbers - i.e. the O / 0 you may not want the 0, or the I / 1, or even the Z / 2.

And you have to realize once you are talking about 3+ random letters, then you need some sort of way of dropping inappropriate words and you may have to do this for multiple languages.

-steve
 
matthew kane
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
good points! I had forgotten the 0 but as steve mentioned, it may be a good idea for clarity, to cull the 0 and the o. I think Z and the 2 are different enough provided a clear font is used. Perhaps I can use a font that has a line through the zero and just cull the letter o. Thanks for the insights!
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:I think Z and the 2 are different enough provided a clear font is used. Perhaps I can use a font that has a line through the zero and just cull the letter o.

I wouldn't rely on that. Different systems render fonts differently. This site, for example, looks quite different now I'm on Linux than when I was using Windows, even though I'm using the same browser.

One other tip: Remove letters, not digits. It might be worth noting that just removing 'O' and 'I' reduces the number of combinations to 1,544,804,416. If you remove 'Z' as well, that comes down to 1,291,467,969. Still plenty, I'd say. And if you remove one more (L?) then you'll have an "alphabet" of 32 symbols, which makes conversion extremely quick.

However, I'm still not quite sure exactly what this 'ID' is used for. As I said earlier: if it's only to uniquely key a user, and it's something they enter to identify themselves, then a sequence number provides almost no security at all. All I'd have to do is go on your site and enter the code for '1', and I'd probably get a hit. That's why I say that identification probably needs to involves something else as well.

HIH

Winston

PS: The reason I suggest 'L' is that then you don't have to worry about the case your letters are rendered in either (lowercase 'l' is often confused with '1').
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:Only the user will know which code is theirs and site will not have cookies or collect any personal identification data (a completely anonymous participation)

Hmm, just noticed this, and it's precisely what I was worried about. A sequence provides almost no security at all (see the post above) and a randomly selected ID only provides minimal security.

Winston
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm sceptical about this whole approach. If, as the OP seems to say, the ID is an identity for use in identifying a record in a database then one should use use an auto-increment data type for the identity column. The value should never be revealed outside of the application and a simple (user name, password) approach should be used to tie the a user to the identity.

 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Richard Tookey wrote:I'm sceptical about this whole approach. If, as the OP seems to say, the ID is an identity for use in identifying a record in a database then one should use use an auto-increment data type for the identity column. The value should never be revealed outside of the application and a simple (user name, password) approach should be used to tie the a user to the identity.

Yup. I'd go along with that.

Winston
 
matthew kane
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
users will be answering a one time survey and getting the id so that they can check their results at a later time from a "read only" database. The database will be open-confidential: everyone can view all results but only you will know which result is yours because of the supplied id. Since it is read only and open-source/confidential I don't think that knowing the id of the previous or next entry will matter. As long as nothing identifies users and everyone has access to all results the secrecy of the id is that only the user knows the id is theirs.
 
Richard Tookey
Bartender
Posts: 1166
17
Java Linux Netbeans IDE
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So let the users select their own ID and you just map that to their database entry.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
matthew kane wrote:As long as nothing identifies users and everyone has access to all results the secrecy of the id is that only the user knows the id is theirs.

In which case I totally agree with Matthew: Your ID is spurious.

In database terms, an ID is something that identifies a item. No more, no less. There is absolutely no need for anyone that uses the system to know what that is; it's simply an internal number. If you want users to identify themselves, get them to enter their name and password; and fetch items on that basis. And in the case of "John Smith" (if you have so many that it causes problems), add another question (post-/zipcode, birthdate?) that resolves duplications.

Winston
 
Paul Clapham
Sheriff
Posts: 22835
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have an identity on a lot of websites, and I'm sure you all do as well. There are a couple of them which have told me what my ID is, as opposed to letting me choose it or use an e-mail address. Both of those websites are managing some of my money, for what it's worth, and the number points to my account. And both of them are just 7-digit numbers. However by far the commonest identification method is where I choose my user ID.
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Paul Clapham wrote:Both of those websites are managing some of my money, for what it's worth, and the number points to my account. And both of them are just 7-digit numbers. However by far the commonest identification method is where I choose my user ID.

Mine is a 10-digit ID, and I still (after all this Alzheimers) have no trouble remembering it. However, it's only the first of a wall of things I have to remember (or enter) to get to my accounts online. Which (I think) is what Richard and I have been saying.

Winston
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!