This week's book giveaway is in the Android forum.
We're giving away four copies of Head First Android and have David & Dawn Griffiths on-line!
See this thread for details.
Win a copy of Head First Android this week in the Android forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Tim Cooke
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Liutauras Vilda
Sheriffs:
  • Jeanne Boyarsky
  • Rob Spoor
  • Bear Bibeault
Saloon Keepers:
  • Jesse Silverman
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
Bartenders:
  • Piet Souris
  • Al Hobbs
  • salvin francis

Reading a file from disk

 
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

We have been taught to use the Scanner class + FileReader to read a file from disk, and this is great except that it has to keep reading small bits from the file. I was wondering how to do to read the whole file in as one big chunk. I know you can use buffered reader, but what is a suitable data structure to put it in once it's been read in?

Thanks!
Toni
 
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I thought Scanner was deprecated?

What is the structure of the file? If you just want the raw bytes I think you could use ByteArrayInputStream.readAllBytes(), perhaps wrapped in a BufferedInputStream.

A suitable structure to read the file into objects in memory depends totally on the structure of the file (unless you just want the raw bytes). I suppose that's why they were using the Scanner.

Eventually you'll have to parse out the contents of the file into a suitable in-memory object structure but until we know the file structure, or even better, the logical structure, its impossible to say.

 
Antonio Moretti
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Damon McNeill wrote:I thought Scanner was deprecated?

What is the structure of the file? If you just want the raw bytes I think you could use ByteArrayInputStream.readAllBytes(), perhaps wrapped in a BufferedInputStream.

A suitable structure to read the file into objects in memory depends totally on the structure of the file (unless you just want the raw bytes). I suppose that's why they were using the Scanner.

Eventually you'll have to parse out the contents of the file into a suitable in-memory object structure but until we know the file structure, or even better, the logical structure, its impossible to say.



It's a CSV file. Where did you read that Scanner was deprecated?

My idea was to get the whole file in somehow, store in some tank in memory, then parse it from there rather than read it in line for line. I know we can read in characters and into a large buffer, but where to put them after that?
 
Damon McNeill
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the application? Inputs/outputs?

I'm not sure how to read a CSV file line-by-line into an array of Strings (as is easily done in Python) in Java, but one could potentially write a small class that accomplishes that.

You may get away with not having to read the entire file, depending on what output your program should produce. Does the output of the program depend on the entire file contents or can you produce an output for each line?

Either way you're gonna have to read the entire file, right? For a CSV file then its 1 record (object) created per line, with each field in the line corresponding to an element of the record. You could use a simple ArrayList to store each object as you read it in from the file, line by line.

Use the methods of the Scanner class to read in each property (a comma separated value) of each line in the file, into a new object, named after its logical purpose (GenericRecordClass), and store these into an array I suppose.
 
Marshal
Posts: 74371
334
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Damon McNeill wrote:I thought Scanner was deprecated? . . .

No, it isn't. It is good for what it is good for (only circular arguments today), but it is often badly taught. And some of its method names aren't intuitive.
 
Campbell Ritchie
Marshal
Posts: 74371
334
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Unless you have an assignment reading a CSV file, you are probably better off downloading an app optimised for CSVs.
I wouldn't want to store values read in an array, not if I could put them directly into an object.
 
Saloon Keeper
Posts: 1636
55
Eclipse IDE Postgres Database C++ Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Everything in this discussion is awesome, with the exception that I am virtually certain that Scanner is most decidedly not deprecated.

For some reason, presumably just "too much to potentially be on the exam!" Scanner use is not covered on the OCPJP exams (but Console is) but that just puts it on a list of things that are still important in Real Life that won't be found on the exam.  It has a couple of weird behaviors until you get used to it, but it definitely hasn't been widely displaced by something better.
 
Damon McNeill
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Unless you have an assignment reading a CSV file, you are probably better off downloading an app optimised for CSVs.
I wouldn't want to store values read in an array, not if I could put them directly into an object.



No I was saying read the records into an object then append them to an array. At a lower level, yes, you would need to store each individual field somewhere until the end of the line, then aggregate those into a logical object.
 
Antonio Moretti
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Jesse Silverman wrote:Everything in this discussion is awesome, with the exception that I am virtually certain that Scanner is most decidedly not deprecated.

For some reason, presumably just "too much to potentially be on the exam!" Scanner use is not covered on the OCPJP exams (but Console is) but that just puts it on a list of things that are still important in Real Life that won't be found on the exam.  It has a couple of weird behaviors until you get used to it, but it definitely hasn't been widely displaced by something better.



I put each line from the csv as an object in an ArrayList. All I was thinking here was to reduce the number of times the file was accessed. Or does Scanner put it all in a buffer anyway?
 
Damon McNeill
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Scanner basically is a parser that reads Java objects from a input source (file, or string). What are the Java objects? Integers, Strings, Floats, Doubles, basically the primitive types.

You need to understand the Scanner API.

https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Scanner.html

So basically the API is that you have an input source and you scan to see ifHasNextX, where X is some type. Then if that's true, you get the object with nextX. Say X = "Int".

Unfortunately that doesn't parse CSV. You would need to use ifHasNextInt() -- for example -- and nextInt() with a combination to parse out the separating commas between those items

Given an input file

1,2,3

Your program should then produce an array of objects [1, 2, 3]

Scanner won't help you to distinguish the items in a CSV file and it is not a CSV parser.

It WILL help with parsing basic Java data types such as Int and Double (*NOT* string) but its up to you to parse out the comma-separation.

 
Antonio Moretti
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Damon McNeill wrote:Scanner basically is a parser that reads Java objects from a input source (file, or string). What are the Java objects? Integers, Strings, Floats, Doubles, basically the primitive types.

You need to understand the Scanner API.

https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/util/Scanner.html

So basically the API is that you have an input source and you scan to see ifHasNextX, where X is some type. Then if that's true, you get the object with nextX. Say X = "Int".

Unfortunately that doesn't parse CSV. You would need to use ifHasNextInt() -- for example -- and nextInt() with a combination to parse out the separating commas between those items.



I know how to parse a CSV. I just used the split() method and put each value in a array and took it from there. I just wondered if there was another way of reading the file in other than line by line, which to me, suggests a lot of disk operations if the file is long. That was my question all along.
 
Damon McNeill
Ranch Hand
Posts: 106
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Antonio Moretti wrote:
I know how to parse a CSV. I just used the split() method and put each value in a array and took it from there. I just wondered if there was another way of reading the file in other than line by line, which to me, suggests a lot of disk operations if the file is long. That was my question all along.



No. How else would one read a file other than line by line?

This is why you wrap your input stream with a BufferedInputStream

You could read the entire file contents as a byte array, pass that as the input to a ByteArrayInputStream, create a Scanner on that. But still you have read the entire  file. I don't see you're premature optimization.
 
Antonio Moretti
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Damon McNeill wrote:

Antonio Moretti wrote:
I know how to parse a CSV. I just used the split() method and put each value in a array and took it from there. I just wondered if there was another way of reading the file in other than line by line, which to me, suggests a lot of disk operations if the file is long. That was my question all along.



No. How else would one read a file other than line by line?



I'll let you know when I find out.
 
Campbell Ritchie
Marshal
Posts: 74371
334
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Damon McNeill wrote:. . . How else would one read a file other than line by line? . . .

Token by token, as a Scanner does. That may be more efficient than reading line by line because the Scanner does its own parsing, even if it takes longer than a BufferedReader.
 
Antonio Moretti
Ranch Hand
Posts: 77
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi

It seems that BufferedReader was what I was looking for, as it has a larger buffer whose size is determine by the class itself. It reads characters, so the parsing has to be handled explicitly, which for a csv file is simple enough. However it seems that Scanner's buffer is big enough for small files anyway.

Antonio.
 
Campbell Ritchie
Marshal
Posts: 74371
334
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Neither Scanner nor BufferedReader attempts to read a file. BufferedReader reads individual chars or lines, and you should always read lines. Individual chars are very annoying to use.
A Scanner reads tokens, which are determined by the delimiter used. The delimiter defaults to multiple whitespace: for more details see the documentation and, run the following code or similar:-If you want separation by commas try setting its delimiter maybe like this:-That allows commas with or without adjacent whitespace. If you use a Scanner, it might not read the lines as quickly as a buffered reader, but it will do the parsing for you, without lots of Integer.parseInt(xyz) or BigDecimal.valueOf(xyz) calls.Be sure to find out what nextLine() does from its documentation before you use it. As DMcN implied, that approach is very dependent on the format of the files and the order in which you parse tokens; if there is any discrepancy you will suffer all sorts of errors and probably have exceptions thrown.
If you loop through the file with a Scanner and read tokens, use while (myScanner.hasNext()) ... rather than while (myScanner.hasNextLine()) ... because things can go wrong if your file ends with an empty line. Most text files do end with an empty line.
I have never looked at that part of the source, but if a buffered reader maintains a 0x400‑byte (=1024) buffer, that should suffice to contain a single line, and Scanner probably have a buffer of similar size. So I don't think buffer size will cause you any problems. The Scanner uses a regular expression to traverse the text read and find its delimiter, and that that is probably why it is slower than reading the whole line.
 
Campbell Ritchie
Marshal
Posts: 74371
334
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:. . . if there is any discrepancy you will . . . probably have exceptions thrown. . . .

If that happens on the first line, you have written your code wrongly. If it happens on a later line, you might have to send the file back to whoever gave you it and say it is corrupt
 
You showed up just in time for the waffles! And this tiny ad:
Building a Better World in your Backyard by Paul Wheaton and Shawn Klassen-Koop
https://coderanch.com/wiki/718759/books/Building-World-Backyard-Paul-Wheaton
reply
    Bookmark Topic Watch Topic
  • New Topic