• Post Reply Bookmark Topic Watch Topic
  • New Topic

How to determine csv delimiter for a given locale?  RSS feed

 
Jon Swanson
Ranch Hand
Posts: 230
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I've been asked to modify my code to produce csv files for Dutch (or US) Excel users. It is simple in principle, a line like:

0.5,1.5

would be written as

0,5;1,5

I've found how to determine the country and the decimal separator. When I am writing in R, there is an R function to also return the csv delimiter. Is there an equivalent in Java? Here is what I have so far, my thought is that if nothing else, I will assume "," when the decimal separator is "." and ";" when the decimal separator is ",". For whatever reason I was told I couldn't go with a tab-separated file (or a 3rd party module).

 
Tony Docherty
Bartender
Posts: 3271
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
When I am writing in R, there is an R function to also return the csv delimiter.

CSV stands for Comma Separated Values and therefore the delimiter is a comma. Now admittedly you can and many people do use other characters as the delimiter but AFAIK there is no 'standard' delimiters defined for different locales and no matter which character you choose for the delimiter you are likely to run into the same problem ie the delimiter character is also part of a value and so you have to escape any values containing the delimiter character.

I suppose it makes sense to choose a delimiter that isn't likely to be embedded in many of the values (especially if you want the CSV file to be more easily human readable) but using a ; instead of a , is only probably only useful if the Dutch values are mainly numeric.

BTW there are freely available libraries to simplify handling CSV files, you may want to take a look at some of them.
 
Jon Swanson
Ranch Hand
Posts: 230
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks for the discussion. You would think CSV would mean one particular format.

Here is one thing I can do in Windows 7-

Control Panel -> Clock, Language and Region -> Change Location -> Formats -> Additional Settings

The dialog that comes up includes:

Decimal symbol
Digit grouping symbol
List separator

among others. The first two correspond to getDecimalSeparator() and getGroupingSeparator(). I've not come across a Java command that will return what Windows calls the List separator, but it is what Excel uses as the delimiter in a CSV file. I'm trying to be compatible with Excel, so I'm happy taking whatever Windows says, if I could query that value.
 
Tony Docherty
Bartender
Posts: 3271
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm trying to be compatible with Excel, so I'm happy taking whatever Windows says, if I could query that value.

The problem is other OS's don't necessarily have a notion of this type of list separator and so you can't get it directly through Java as it is an OS dependent setting.
I found an SO thread here that gives various ideas that may help you.
 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 66307
152
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rather than making something up, perhaps you should consult the specification. (Which says to enclose values that contain commas or other problem characters in double quotes.)

There is no locale-specific delimiter; it's always a comma.
 
Jon Swanson
Ranch Hand
Posts: 230
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I know that the registry entry is:

"HKEY_CURRENT_USER\\Control Panel\\International\\sList"

from looking around on StackOverflow. So if I put that together with the link that you sent, that provides an answer- read that key from the registry.

This is what I added to my original code, I still need to do a little more to extract the key value.



should be easy to finish it up from here.

Thanks.
 
Tony Docherty
Bartender
Posts: 3271
82
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
from looking around on StackOverflow. So if I put that together with the link that you sent, that provides an answer- read that key from the registry.

Be aware that you are now restricting your software to the windows platform and possibly a particular version of windows - I've no idea if that reg key was the same one as used in XP or if it is now used in Win 8 and no one knows for sure if it will be used in future versions of Windows.
 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 66307
152
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So I guess following the standard is out?
 
Jon Swanson
Ranch Hand
Posts: 230
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am a slave to the requirements. My choice would have been to use tab-delimited (TSV) files. However, when you enter numeric data into Excel, one option for saving the data is "CSV (Comma delimited)." The file that is saved is not necessarily delimited by commas. The delimiter is the list separator from the Windows regional settings. The requirement "read/write CSV files" that I have been given is not defined by the specification, but whether a "CSV" file written by Excel can be read by the program and a "CSV" file written by the program can be read by Excel. I was following the standard and it was flagged as a bug.

And, yes, since the program needs to work on XP, W7 and W8, I need to make sure that that key is in all three or find the equivalent. I think I will also include an option to override the list separator detection, since as you all point out the whole thing is a bit dicey. But the users will be happy.

 
Bear Bibeault
Author and ninkuma
Marshal
Posts: 66307
152
IntelliJ IDE Java jQuery Mac Mac OS X
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Jon Swanson wrote:The file that is saved is not necessarily delimited by commas. The delimiter is the list separator from the Windows regional settings.

Ah, yes. Once again, Microsoft declaring darkness as the standard.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!