Win a copy of Spring in Action (5th edition) this week in the Spring forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Ganesh Patekar
  • Frits Walraven
  • Tim Moores
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Stephan van Hulst
  • salvin francis
  • Tim Holloway

Looking for a Quick Way to Write Longs to a File and Read Longs from a File  RSS feed

 
Ranch Hand
Posts: 224
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have written a Java program that uses the Scanner class to represent a text file, and that reads in a lot of long values using method nextLine() and stores them, so that I can see roughly how long it takes to read in a lot of long values from a file. I guess that similarly I could use class PrintWriter to write those same long values to a file using method println(). I'm wondering though, is there a quicker way to write just long values to a file and read just long values from a file than using classes Scanner and PrintWriter? My guess is that those two classes make sense when the stuff stored in a file and read from a file are a mix of long values and text, but if I know that all I'll have stored in the file is long values, then my guess is that there would be a quicker way to do it. Anybody know what that quicker way to do it might be?
 
Sheriff
Posts: 23867
50
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, the vast majority of the time required to write data to a file is occupied in transferring the bytes to the hard drive. It really doesn't matter how much code is executed to write those bytes unless that code is absurdly and ridiculously horrible. So if you're looking for "faster" then writing fewer bytes is going to be one strategy. The other strategy is using a larger buffer so that transferring the bytes takes place fewer times.
 
Marshal
Posts: 61713
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Kevin Simonson wrote:. . . Scanner class to represent a text file, and that reads in a lot of long values using method nextLine() and stores them, so that I can see roughly how long it takes to read in a lot of long values from a file. . . .

No, you don't know it. You know how long it takes to read the individual lines using the Scanner object to find the line ends. You are not reading the longs there. Maybe you are using some other method, e.g. Long#parseLong() to read the individual numbers.
A Scanner is described as a simple text scanner which uses regular expressions, so you think it is a simple class. That bit confused me for a long time; I don't think it means there is anything simple about Scanner. I think it is the text that is simple, and Scanner is intended to scan “simple” text as opposed to HTML or XML or one of those older word processor files that contain control characters as well as text. Anyway, the Scanner uses a regular expression to search what it is scanning, and that takes time. So once you have got used to Scanner, you can use it to find the individual tokens and parse them together:-That will read the whole file if it contains nothing but longs in a text form, stopping if it encounters anything not a long. I presume all the constructs in the code are familiar to you.
But the time it takes finding the individual tokens (especially if there are several on the same line) and the fact that you might be opening a new buffer much more frequently will slow down your reading. The same applies to writing with a Formatter. If you want faster reading, do what Paul C suggests and use something with a bigger buffer.As you doubtless know, BufferedReader#lines() returns a Stream<String>, where each element represents a line in the file. The documentation reminds us that Streams are executed lazily, so you might get slightly faster execution because reading waits for the end of the process (.collect). It might not be necessary to catch the IOException: see the lines() link.
The Stream can create a Stream<String[]> with its map() method, using the λ instead of the Function argument. That splits the String with multiple whitespace using the well‑known String#split method.
Now, to get that back to a Stream<String>, you can use flatMap(). That uses the Arrays#stream() method to create a little Stream<String> and the method then “flattens” that into a single Stream<String>.
Then use the map method again, this time with a method reference to Long#valueOf which I am sure you don't need a link to, and that creates a Stream<Long>.
Last but not least use a terminal operation, so‑called because it terminates the Streams by creating a different kind of object, in this case a List, I believe, an instance of the well‑known ArrayList class. You use the collect() method which takes a Collector object as its parameter. You look in the Collector documentation and it says see Collectors, in which class you find a no‑arguments method which returns a Collector ready‑made to create your List.
Obviously you will have different requirements and you will do different things from my suggestions. I think similarly that a buffered writer will provide the fastest way to write such numbers to a text file, for the same reason Paul C gave.

Do you have to use a text file? You might find you are quicker with a random access file instead.
 
Kevin Simonson
Ranch Hand
Posts: 224
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Do you have to use a text file? You might find you are quicker with a random access file instead.


I don't have to use a text file. I followed the link you gave me and came up with the following code. I know I said my values were longs, but I was treating them as strings, and rather than convert them from hexadecimal strings into longs, I just left them as strings and did string compares.

Is this close to what you had in mind? I ran this on a file filled with the strings no longer separated by newlines three times, and ran my original program on the same strings that were separated by newlines three times, and averaged the times, and it turned out the new version speeds it up 2.3 times. That's a significant improvement.
 
Kevin Simonson
Ranch Hand
Posts: 224
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Kevin Simonson wrote:

Campbell Ritchie wrote:Do you have to use a text file? You might find you are quicker with a random access file instead.


I don't have to use a text file. I followed the link you gave me and came up with the following code. I know I said my values were longs, but I was treating them as strings, and rather than convert them from hexadecimal strings into longs, I just left them as strings and did string compares.


Oops! Sloppy code! I fixed it here:

 
Campbell Ritchie
Marshal
Posts: 61713
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Did you find anything useful? Why are you using the millisecond method rather than this? I don't think you are going to find out anything useful about timings until you are well into the thousands. Write 10000 numbers twice into a file, then start the timings. You can delete the files later. That is to make sure any just‑in‑time optimisations have kicked in before you start the timings. Then try several times with 1000000 numbers.You will have to use a different technique (obviously) for a random access file. That code would be easier to write for a Formatter. Once you have written that sort of file, consider using it for testing the reading.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!