Win a copy of Functional Reactive Programming this week in the Other Languages forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Regular Expressions on a .csv file

 
Elisha Cassidy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

i was wondering if someone could help me. i have a .csv file with a few useless lines of text at the beginning that i would like to ignore and just read in the lines that begin with numbers. My csv file is in the proper format in that the data is in their own separate columns. i tried to use regular expressions to only extract the lines where the first field is numbers but i can't seem to get it to work with the pattern matching. can you please help me as i am really stuck. my data is:

123,Fri Aug 11 11:21:25 2006,2,C:\Documents and Settings\Test\continues till end of file path,18

where the commas represent the data being in their own column. The C:\Docs and Settings part changes i.e. it could be C:\Test, but it always begins with C:\
Is there a way to just look at the first field of every line and if it begins with a number then just take that whole line i.e. if first field is 123 then get that and return 123,Fri...,2,C:\...,18 as it is above. i only want the lines where the first field contain numbers
thanks in advance for your help,

kedklok

Here is what i tried:

 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is there a way to just look at the first field of every line and if it begins with a number then just take that whole line i.e. if first field is 123 then get that and return 123,Fri...,2,C:\...,18 as it is above. i only want the lines where the first field contain


Well, in your code, you are already reading the file line by line. You just need to check if the line starts with a number and if true do something with it. There is also no need to return the line, as you already have each line.

Basically, you just need the regex to tell you if the line, that you already have, starts with a number.

Anyway, try this...



Henry
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Alan,

The request was to check if the first field was a valid number -- not if the first character was a digit. The first field contains all the characters up to the comma separator.

Henry
 
Garrett Rowe
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Garrett Rowe:
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?


There is a performance gain -- but not in the way I did it. The reason I did that was just out of habit...

Since the operation does not change. There is no reason to compile the pattern repeatedly. The "p" variable can be instantiated once, instead of everytime the method is called.

Henry
 
Alan Moore
Ranch Hand
Posts: 262
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Henry Wong:
The request was to check if the first field was a valid number -- not if the first character was a digit. The first field contains all the characters up to the comma separator.


Yes, but it's probably safe to assume that, if a line starts with a digit, the entire first field is numeric (but of course, only the OP will know for sure). People who are just starting to use regexes have a tendency to make their regexes more specific than they need to be (and thus more complicated and error-prone), or to use regexes where something simple like Character.isDigit() will suffice.

Originally posted by Garrett Rowe:
Henry, out of curiosity, was there a specific reason you explicitly created a Pattern/Matcher instance instead of using the String.matches() convience method. Is there a perfomance gain that can be realized by doing it that way?


The way Henry used it there's no benefit, but pre-compiling the regex can save a lot of overhead if you're using the regex in a tight loop. And if performance is really critical, you can save a little more overhead by pre-instantiating the Matcher. Another benefit is that you can use one Matcher's other matching methods, find(), find(int), and lookingAt(). I used lookingAt() because it requires the match to start at the beginning of the target text but doesn't require it to match all the way to the end. That makes it slightly more efficient than matches(), but again, you won't notice that unless you're doing some heavy-duty text processing.
 
Elisha Cassidy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

thanks very much for the help. i have tried Henry's code above and it works grand for me except that it keeps asking me for a return statement. i have modified the code to get it to work with my program but now it returns a blank line and i only want it to just return the lines where the first field is a numbers as i need to then put the results into an swt table.

thanks in advance for your help

Elisha

 
Anand Hariharan
Rancher
Posts: 272
C++ Debian VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Elisha Cassidy:

(...)
grand for me except that it keeps asking me for a return statement. i have modified the code to get it to work with my program but now it returns a blank line and i only want it to just return the lines where the first field is a numbers as i need to then put the results into an swt table.

(...)


How about getting it to return a true/false instead?



HTH,
- Anand
 
Elisha Cassidy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

i have the line printing out in full but does anyone know how to split it so i can print specific columns of the file. This is what i tried:

Thanks again for all you help

Elisha

[ August 30, 2006: Message edited by: Elisha Cassidy ]
[ August 30, 2006: Message edited by: Elisha Cassidy ]
 
Garrett Rowe
Ranch Hand
Posts: 1296
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
how about:

[ August 30, 2006: Message edited by: Garrett Rowe ]
 
K Terr
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

i am trying to read in a .csv file but i only want the lines that have URL or REDR in the first field. does anyone know the regular expression for this, i can't use [a-z] as the first few lines contain text that i want to ignore. i only want URL and REDR

thanks in advance for the help

K Terr
 
Elisha Cassidy
Greenhorn
Posts: 9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hi all,

thanks for all the help, it is now working. K Terr i have no idea how to do that i too am new to regular expressions.

Elisha
 
Henry Wong
author
Marshal
Pie
Posts: 21504
84
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
K. Terr,

Please start a new topic -- and provide examples of what you are looking for.

Henry
 
K Terr
Greenhorn
Posts: 14
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
its ok i got it to work

for anyone else stuck on this, use the following:



gets the lines starting with URL

K Terr
[ September 01, 2006: Message edited by: K Terr ]
 
Anand Hariharan
Rancher
Posts: 272
C++ Debian VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by K Terr:


gets the lines starting with URL


You don't need the parenthesis, and you'd be better of to include the comma in your RE.

Perhaps something like "^URL,"?
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic