• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Scanning a text file

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

I'm scanning a word file and creating Word objects out of each word in the file. I added a delimiter to my scanner because the 'rules' for what is considered to be a word are different. I got it to work and I have an arraylist of Word objects.

My problem has to do with what I need to do next. I need to rescan the file, and whenever I encounter one of the words from my list of words in the text file, I need to store the line number and paragraph number(occurance of the word) in the word object. I have a method to do that.

To be more particular, I can't figure out how to set up my counts to count the line number and paragraph number. Both are meant to start at 1, but the line number needs to reset back to 1 each time I get to a new paragraph. Paragraphs are separated by one or more blank lines. Also, the text file is UNIX-format.

Here is my code which first scans in all the words and adds them to the list, then removes the duplicates:



Here is my flawed code: (where i try to add the paragraph-line pairs to each word)


Any help with this would be greatly appreciated!

Thank you!
 
Ranch Hand
Posts: 441
Scala IntelliJ IDE Windows
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I think doing 2 separate scans and trying to match the words back to the line numbers on the second run through is the wrong approach, and is prone to error. You need to add the paragraph and line information in when you first scan it. Do this by using Scanner.nextLine() so that you know which line you're on, then running another scanner to match words on that line.

Also, an easier way of getting unique values is by sticking your values in a HashSet rather than an ArrayList. Just make sure you're overriding hashCode() and equals() in your Word class, so that the hashCode of 2 equal words is the same, so that HashSet can recognize duplicates.
 
Marshal
Posts: 79153
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Welcome to the Ranch

What about scanning line by line with the nextLine() method? Then you can count the lines. Split each line into individual words with the String#split method. Whenever you get a line whose length (after trimming) is 0 (or use the String#isEmpty method) you increment your paragraph count and reset your line count probably to 0.
 
Michael Wassack
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thank you both! I'll have at it.
 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Note that the java.io.LineNumberReader could track line numbers for you while providing a readLine method.

Bill
 
Campbell Ritchie
Marshal
Posts: 79153
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I never knew about a line number reader. Thank you.
reply
    Bookmark Topic Watch Topic
  • New Topic