• Post Reply Bookmark Topic Watch Topic
  • New Topic

Duplicating default behavior of unix sort  RSS feed

 
Charles Rennie
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unix sort uses en_US.UTF-8 by default.

I'm trying to find a Locale and Collator that will duplicate the way that unix sort works by default.

Does anyone have any ideas?

Thanks much.

sl73caeapp03:~ $ cat f
a
A
b
B
sl73caeapp03:~ $ sort f # how to duplicate this behavior?
a
A
b
B
sl73caeapp03:~ $ LC_ALL=c sort f # not this behavior
A
B
a
b


-dreamer
 
Ivan Jozsef Balazs
Rancher
Posts: 999
5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This appears to be simply ignoring the case:
a
A
b
B
 
Winston Gutkowski
Bartender
Posts: 10575
66
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Ivan Jozsef Balazs wrote:This appears to be simply ignoring the case:

Not quite, because you could get upper and lower case mixed.

@Charles: I'm not too familiar with setting up Collators or Locales, but if they use Comparators, you could easily do something like:And you could then set up a String Comparator that uses the above.
However, as you were thinking, it is definitely possible that there's a Locale around that already does this; I just don't know what it is.

It should be added that the above won't work for supplementary characters, but you could make one that does.

HIH

Winston

[Edit] It also occurs to me that the above might produce odd results if neither character is uppercase, but their cases are different (eg, one is a TITLECASE_LETTER). You'll have to decide how you handle that yourself.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!