# Ordering of AlphaNumeric No. in ascending Order

anandmindmill rai
Greenhorn
Posts: 3
I have data with mix number of digits and I like to sort them. Somehow, the
normal sorting process give me a not so desired result. Can anyone help?
Its very Impotant for me.
A10
B100
A2
AA1
32
11
A1
The result I am looking for is:
A1
A2
A10
B2
B100
11
32

Thanks and Regards:
Anand

Grant Gainey
Ranch Hand
Posts: 65
The problem is that you're trying to sort both lexicographically and numerically at the same time.

Sorting by string, "2" comes AFTER "100", just like "b" comes after "aaa". If you interpret the strings as numbers, then it's clear that 2 comes before 100.

Your requirement needs it both ways. Sounds like you need a Comparator that can split the strings into an alpha prefix (which appears to be optional) and a numeric remainder, do a lexicographic compare() on the alpha portion, and convert the nueric portion to Integer and compare() them numerically.

Does that help?

Grant

Alan Moore
Ranch Hand
Posts: 262
This problem turns out to be much more complicated than you might expect. For example, how should whitespace and punctuation characters be sorted? Do leading zeroes affect the sort order, and if so, how? Should decimal numbers be recognized? Will there be any accented letters, or other characters outside the 7-bit ASCII set? If so, you'll have to use a Collator for the non-numeric parts--but Collators do very strange things with punctuation characters.

Below is a simple comparator I wrote for my own use. It sorts numbers before letters, not after, but you can change that easily enough.

Grant, converting digit sequences to numbers is the obvious approach, but it has a major flaw: what if the number is too large to be represented as an int or long? The character-by-character approach also seems cleaner and more efficient to me.

Grant Gainey
Ranch Hand
Posts: 65
Originally posted by Alan Moore:
Grant, converting digit sequences to numbers is the obvious approach, but it has a major flaw: what if the number is too large to be represented as an int or long? The character-by-character approach also seems cleaner and more efficient to me.

Well, your code is certainly thorough! I think it's overkill for the OP's requirements (which, admittedly, are possibly not complete). If your digit-strings are more than 20 digits, you'll need more than a long - although if I really wanted to worry about that, I'd probably just convert to using BigInteger for it, rather than rolling my own.

But I think the real issue underlying this discussion is one of flawed design. Anytime I see code that wants to do more than one thing at a time, it suggests to me that I haven't stated the problem correctly.

Just my US\$0.02,

Grant

[Edited because there is a difference between "you'll need a long" and "you'll need more than a long", and I are a idiot.]
[ April 14, 2006: Message edited by: Grant Gainey ]

Alan Moore
Ranch Hand
Posts: 262
I did say I wrote it for my own use. And I'm sure the OP's (stated) requirements are incomplete; he didn't even say whether letters should be sorted case-sensitively or not. But this class should at least give him a good start on solving his problem.

I don't see this as trying to do more than one thing at a time; it's just another way of sorting strings. Unless you're saying that, instead of using strings, the OP should create a custom data object with a string field and a numeric field. If that's an option, I agree it would be a better way to go.

Grant Gainey
Ranch Hand
Posts: 65
Originally posted by Alan Moore:
I did say I wrote it for my own use. And I'm sure the OP's (stated) requirements are incomplete; he didn't even say whether letters should be sorted case-sensitively or not. But this class should at least give him a good start on solving his problem.

Fair enough.

Originally posted by Alan Moore:
I don't see this as trying to do more than one thing at a time; it's just another way of sorting strings. Unless you're saying that, instead of using strings, the OP should create a custom data object with a string field and a numeric field. If that's an option, I agree it would be a better way to go.

Again, fair enough. I think I was reacting to too many prior episodes of "All I Want It To Do Is This Really Simple Thing" - and then, as one asks questions just like yours above (whitespace? punctuation? case-sensitive? more than one 'numeric' inside the string? scientific notation? sign characters? floating point? hexadecimal??), one finds that the real requirement is for the person asking the question to actually think through the implications of their design.

And I do think your Comparator above is a very nifty thing. May need to add it to my toolbox, if you don't mind.

Grant
[ April 14, 2006: Message edited by: Grant Gainey ]

Alan Moore
Ranch Hand
Posts: 262
Originally posted by Grant Gainey:

And I do think your Comparator above is a very nifty thing. May need to add it to my toolbox, if you don't mind.

Thanks, feel free. I've started tweaking it again, mainly to add support for accented characters. In the process I discovered that, if two file names are equal except for the number of leading zeroes, Windows Explorer treats the one with more zeroes as smaller, not larger as I had assumed. And that a difference in leading zeroes is less significant than a difference in accents, no matter where in the names they occur. Every time I look at this problem it becomes more complicated...

Angelo Savio
Greenhorn
Posts: 28
How do you change the fact that it sorts letters before numbers?

Campbell Ritchie
Sheriff
Posts: 50277
80
Have you tried altering the order of the if-else blocks? That might help.

Paul Clapham
Sheriff
Posts: 21416
33
Are those last two posts addressed to the people who were last posting in this thread three years ago?

Campbell Ritchie
Sheriff
Posts: 50277
80
The last of those last-which-aren't-last-any-more posts was directed to the post 3 hours previously. As to the post before that: don't know. I hadn't noticed its age, sorry.