• Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing text file into 3 columns  RSS feed

 
Vatsa dude
Greenhorn
Posts: 22
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a requirement to parse the text file (sample pasted below) and extract the 3 said columns in the file. I am using the Scanner class, but I cannot seem to leverage the "useDelimiter" method to trim the spaces in the text file..

Sample file

64.105.4.90 mail.virtuosoworks.com. ptr
64.105.4.178 mail.imaamd.org. ptr
64.105.4.186 smtp.vernonlaw.com. ptr
64.105.5.25 64-105-5-25.adsl.lbdsl.net. ptr
64.105.5.26 stitch.chipworks.net. ptr
64.105.5.27 studley.chipworks.net. ptr
64.105.5.28 heman.chipworks.net. ptr
64.105.5.29 xena.chipworks.net. ptr
64.105.9.133 MAIL.LINCOLNINDUSTRIAL.COM. ptr
64.105.9.137 DNS1.LINCOLNINDUSTRIAL.COM. ptr

Code below is extracting the 1st column (IP address), but not finding the 2nd and 3rd column. Any help is appreciated. ptr is the 3ed column - its sometimes 1 space after the 2nd column and sometimes multiple spaced after the 2nd column. This file is auto-generated by a mysterious script.

 
Henry Wong
author
Sheriff
Posts: 23295
125
C++ Chrome Eclipse IDE Firefox Browser Java jQuery Linux VI Editor Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


The default delimiter is "one or more whitespaces" -- so there is no need to set the delimiter, as it is correct. In fact, the delimiter that you set is exactly one space, and since you mention that there may be more than one space, is actually incorrect.

Henry
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
aLine.split("\\s+") is also an option.
 
Adeel Ansari
Ranch Hand
Posts: 2874
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:aLine.split("\\s+") is also an option.


And this one will be more efficient in terms of performance.
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I wouldn't dare say that without some figures to back it up. Both Scanner and String.split use a java.util.regex.Pattern object. Both these pieces of code create this for each line.
It is likely though that String.split is more efficient since Scanner uses several Pattern objects.

You are right that both are not really efficient. The Pattern can be pulled out of the loop with String.split though:
I've searched the Scanner API but there is no way to reset the Scanner with new input. Therefore, the String.split way will be the more efficient.
 
Adeel Ansari
Ranch Hand
Posts: 2874
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:I wouldn't dare say that without some figures to back it up...


Actually, I have benchmarked both few months ago, using File IO.. reading ... and then splitting strings based on some token. I tried both Scanner, and String's split. The latter seemed faster.
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, then you have some figures to back your claim
 
Consider Paul's rocket mass heater.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!