• Post Reply Bookmark Topic Watch Topic
  • New Topic

File Existance Verification  RSS feed

 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have a list of file names & paths in a master file.

I need to verify that each of these files exist.
If the file does exist i write path & name to one output file ( Exists.txt ).
If the file does not exist I write path and name to a different output file ( NotExist.txt ) . (I at least need this output file. List of files that DO exist is a nice to have but not absolutely necessary. )

I need a faster way than reading a line of the input file, then verifying location and existence of file, then repeating.

Can  java.io.File.listFiles verify existence of a large list of files all at same time  ? and list files that do not exist ?

java documentation says :

listFiles()
Returns an array of abstract pathnames denoting the files in the directory denoted by this abstract pathname.

This tool will be run against a list of a minimum of several hundred thousand files each time it is used. 

Thank You,
Tom
 
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch

I would suggest you try the simplest available technique first. If you have lots of files to look for, consider timing your code:-The reason for doing the reading twice is that you will make all the optimisations kick in. Look up just in time compilation to find more details. The reason I have three local variables is to allow for the time taken by the nanoTime method. You may find can divide the result by 1000 and print it in μs instead. Now, maybe you would like some code for the reading and checking. Try here. You may want methods like Files#exists().I have never tried that sort of code, so I don't know how well it will work. Or whether it will work at all
Most of the code is explained in the link I gave. The allMatch method tests for a universal quantification using a method reference to the exists method and map uses the Stream<String> to create a Path object from each String using a method reference to Paths#get(). You may find you have to catch more kinds of Exception.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The listFiles method seems to return a File[] and it is not particularly fast to search an array to see whether a particular element exists. You can try putting all the File objects into a set, but the File class is regarded as legacy code nowadays.
The listFiles method gives you an array of the Files in a particular directory, so I am not convinced it is really what you want.
 
Tom Tumelty
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What would you suggest to speed up file verification ?
 
Tom Tumelty
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Correction :

What would you suggest to speed up file verification? Considering that all the path and file data cannot be in memory at same time ?
 
Tom Tumelty
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mr Campbell Ritchie,
Thank You for welcome to the Ranch !
 
Sheriff
Posts: 22846
43
Eclipse IDE Firefox Browser MySQL Database
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tom Tumelty wrote:What would you suggest to speed up file verification? Considering that all the path and file data cannot be in memory at same time ?


I'm confused about your estimate of memory requirements. Finding out whether a file exists takes essentially zero memory -- you just need a few very small objects. (That "zero" means "essentially zero compared to the 1 gigabyte of memory you have access to".) And even reading a list of path names from a file, you don't need to store them all in memory at the same time.

Or perhaps you've chosen a really bad algorithm; if you're doing something which uses a hell of a lot of memory then perhaps replacing that by a straightforward algorithm might reduce the running time as well.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Tom Tumelty wrote:. . . What would you suggest to speed up file verification?
Start by working out how long file verification takes. Decide how long a delay you will tolerate for 100,000 entries or 1,000,000 entries. Repeat the procedure several times (with timings) and then decide whether you have a performance problem at all. In the thread I linked to previously, P‑YS tried my suggestions and concluded that lines() was faster than my suggestion.
Considering that all the path and file data cannot be in memory at same time ?
As Paul C said, why not? If you are reading 1,000,000 file names and creating Path objects and putting them into a List, you will probably occupy a few hundred MB of RAM. The default heap space capacity (maximum, not actual use) is 25% of available RAM, so most PCs will have at least 1GB available, which you can increase if necessary. If you are simply verifying existence you may need less than 1kB at any one time.

I forgot about it last night, but Streams have methods which can be used for partitioning their input depending on a predicate, so you can create a Map<Boolean, List<Path>> where the two Boolean values give you Lists of Paths which do or don't exist. Go through the Stream documentation and its collect() method and Collector and Collectors, particularly its partitioning method. Or ask again and somebody will tell you.

I have also noticed in the Stream documentation that you have to close a Stream created from Files#lines(), which probably means try‑with‑resources.
Thank You for welcome
My pleasure
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Anther thing about Streams: if your Stream runs into 10⁵s of elements, consider making the Stream run in parallel; you will probably get faster performance, but I don't think the file reading can be parallelised, and that is often the slowest part of the process.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!