• Post Reply Bookmark Topic Watch Topic
  • New Topic

Parsing data file  RSS feed

 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I am hitting a bit of a snag trying to parse a data file. I have a text file with the following format:

Input.txt

//commented text
// more commented text
// even more comments
// Author format:
// a:<index>:<first name>:<last name>:<middle initial>
a : 1001 : Hank : Jones : M
a : 1002 : Tom : Smith : R
// Book format:
// b:<index>:<title>
b : 2001 :Java Programming
b: 2002 : Advanced Java Programming


The text file above features varied spacing on purpose (for the sake of this project, I must assume that I don't have control over the text files, and therefore, I must account for random spacing).

Goals:
I need to read the text and do the following:
*skip over commented lines (//)
*skip colons
*handle white space
*accept spacing between title names (e.g. Java Programming)
*ignore any extra data.
In this case, the extra data that I need to ignore is the middle initial data. I would like to be able to ignore more data if need be.

I would like the output to look like this:

a 1001 Hank Jones

Ultimately, I'd like to pass this data into an array list. However, I won't even entertain that idea until I can sort out reading the file.

Any help would be greatly appreciated.

What I have so far only ignores at random (so it seems) and is throws the following exception:

Exception in thread "main" java.util.NoSuchElementException: No line found
at java.util.Scanner.nextLine(Scanner.java:1540)
at TestFileReader.readFile(TestFileReader.java:30)
at TestFileReader.main(TestFileReader.java:41)

Please keep in mind that this java class is just a test class (it isn't meant to look pretty just trying to get the basic idea behind delimiters):


Output:

// more commented text
// Author format:
a : 1001 : Hank : Jones : M
// Book format:
b : 2001 :Java Programming
Exception in thread "main" java.util.NoSuchElementException: No line found
at java.util.Scanner.nextLine(Scanner.java:1540)
at TestFileReader.readFile(TestFileReader.java:24)
at TestFileReader.main(TestFileReader.java:37)



 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're calling nextLine() twice per iteration of the loop, once to assign the results to the line variable, and once to print it out. That results in outputting only every other line, and in certain cases asking the scanner for a line that the file doesn't have. How about changing the System.out.printlin() parameter to use the line variable, instead of reading a new line from the Scanner?

I also don't think you need to set delimiters for the scanner, but that's down the road a bit. As you say, try to get just the file read and output to work first.
 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Greg,

Thank you for the quick response!

You're calling nextLine() twice per iteration of the loop


Totally missed that!

How about changing the System.out.printlin() parameter to use the line variable, instead of reading a new line from the Scanner?


Okay, I changed the Sys out line to System.out.println(line)

However, that yielded the following output:

//commented text
// more commented text
// even more comments
// Author format:
// a:<index>:<first name>:<last name>:<middle initial>
a : 1001 : Hank : Jones : M
a : 1002 : Tom : Smith : R
// Book format:
// b:<index>:<title>
b : 2001 :Java Programming
b: 2002 : Advanced Java Programming


Looks like my delimiters aren't doing anything now!
 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greg,

I tweaked the code a bit. Now I am able to get rid of the commented lines, but I still have the colons.



Output looks a bit better:

a : 1001 : Hank : Jones : M
a : 1002 : Tom : Smith : R
b : 2001 :Java Programming
b: 2002 : Advanced Java Programming
 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It's true. All you're doing now is eliminating comment lines and printing out the others. You're not doing modifying those lines before you print them out, so you still have colons. Is your assignment definitely telling you to use Scanner? The way you've written it so far, Scanner isn't doing anything for you except letting you read line by line. BufferedReader or LineNumberReader could do the same thing. What if you used one of those to get lines of text, and then passed each line to a Scanner? That might work out better.
 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greg,

The assignment gives no direction on how to read in a file or what to use. The only reason I went with Scanner is because I have a vague familiarity with it. Never used BufferReader or LineNumberReader. That being said, I went ahead and toyed with Scanner a bit more to see if I could make some progress. Sadly, I've just managed to confuse myself even more! I am adding data from the text file to an array, but now, I have duplicates and data that should be on separate lines appearing on the same line. It is all a mess.

This is what I've tried and the output that results:


Output:








1001 Hank Jones

1001 Hank Jones 1002 Tom Smith

1001 Hank Jones 1002 Tom Smith

2001Java Programming1001 Hank Jones 1002 Tom Smith

2001Java Programming2002 Advanced Java Programming


The output is preceded by a sizable empty space.

Any ideas how I could remove the duplicate information?
 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Greg,

Sorry, I am a complete fool. I had my printArray call inside of my while loop. I called the printArray method after the while and got the following output (NO MORE DUPES!):








 
Mike Stein
Ranch Hand
Posts: 33
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a new problem that I can't seem to work out.

I have to move the additional information in my data file (the middle initial in this example) to a separate array list. However, I keep throwing the following exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4

This is what I have tried but it doesn't seem to be getting me anywhere:










 
Greg Charles
Sheriff
Posts: 3015
12
Firefox Browser IntelliJ IDE Java Mac Ruby
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Mike,

It looks to me like one of your lines doesn't have five parts, so lineParts[4] blows up. Maybe one of the input lines doesn't include middle initial. If you have access to a debugger, using it can help track problems like these. Otherwise, System.out.println() is a good fallback tool. Print out the line before you parse it, and maybe the size of the array after you split the line would be useful information.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!