Win a copy of Kotlin in Action this week in the Kotlin forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

regex question  RSS feed

 
jeroen dijkmeijer
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi i want to parse an excel csv file with some kind excel stupidity notations meaning negative numerics are displayed as (nnnn) and even worse, thousand separators are commas, and they're placed within double quotes. The latter makes the line.split(",") unusable alas.
so a line may look something like


The Euro I can deal with, the parenthesis I can deal with (did do some lisp in the past), but I'm wondering is there some really nice, smart and fast way to remove the comma's between the quotes.
I'm probably able to knock out something ugly with iterating over the string scanning for " and removing the commas in between. But I'm hoping for the "one line regex" shining like a star, readable like Dan Brown, leave all my colleagues speechless, and which will give me a wonderful judgement interview next year.

As always looking forward to your responses,
and happy Christmas and new year to all of you!

Jeroen.

PS I'm on jdk 1.5

[ December 22, 2006: Message edited by: jeroen dijkmeijer ]
[ December 22, 2006: Message edited by: jeroen dijkmeijer ]
 
Ulf Dittmer
Rancher
Posts: 42972
73
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
While there may be a regex solution, why reinvent the wheel? Parsing CSV files is not quite as simple as it initially looks, e.g., one needs to consider that any text in double quotes can contain line breaks, so reading the file line by line does not generally work. Fortunately, a ready-made library is available that does the parsing for you: the Ostermiller Utils
 
jeroen dijkmeijer
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you!

That looks like something I'm looking for, However the thing spits out an 2 dimensional array which is a bit unconvenient for an 18M csv file.
I still go on the line by line basis, and check for unmatched quotes by comparing col sizes.

regards,
Jeroen.
 
jeroen dijkmeijer
Ranch Hand
Posts: 132
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Reading it line by line, using the excelcsvparser and using a decimal formatter like:

produced some very efficient code and readable code! thanks for the response!
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!