I need to verify that each of these files exist.
If the file does exist i write path & name to one output file ( Exists.txt ).
If the file does not exist I write path and name to a different output file ( NotExist.txt ) . (I at least need this output file. List of files that DO exist is a nice to have but not absolutely necessary. )
I need a faster way than reading a line of the input file, then verifying location and existence of file, then repeating.
Can java.io.File.listFiles verify existence of a large list of files all at same time ? and list files that do not exist ?
java documentation says :
Returns an array of abstract pathnames denoting the files in the directory denoted by this abstract pathname.
This tool will be run against a list of a minimum of several hundred thousand files each time it is used.
I would suggest you try the simplest available technique first. If you have lots of files to look for, consider timing your code:-The reason for doing the reading twice is that you will make all the optimisations kick in. Look up just in time compilation to find more details. The reason I have three local variables is to allow for the time taken by the nanoTime method. You may find can divide the result by 1000 and print it in μs instead. Now, maybe you would like some code for the reading and checking. Try here. You may want methods like Files#exists().I have never tried that sort of code, so I don't know how well it will work. Or whether it will work at all
Most of the code is explained in the link I gave. The allMatch method tests for a universal quantification using a method reference to the exists method and map uses the Stream<String> to create a Path object from each String using a method reference to Paths#get(). You may find you have to catch more kinds of Exception.
The listFiles method gives you an array of the Files in a particular directory, so I am not convinced it is really what you want.
What would you suggest to speed up file verification? Considering that all the path and file data cannot be in memory at same time ?
Thank You for welcome to the Ranch !
Tom Tumelty wrote:What would you suggest to speed up file verification? Considering that all the path and file data cannot be in memory at same time ?
I'm confused about your estimate of memory requirements. Finding out whether a file exists takes essentially zero memory -- you just need a few very small objects. (That "zero" means "essentially zero compared to the 1 gigabyte of memory you have access to".) And even reading a list of path names from a file, you don't need to store them all in memory at the same time.
Or perhaps you've chosen a really bad algorithm; if you're doing something which uses a hell of a lot of memory then perhaps replacing that by a straightforward algorithm might reduce the running time as well.
Start by working out how long file verification takes. Decide how long a delay you will tolerate for 100,000 entries or 1,000,000 entries. Repeat the procedure several times (with timings) and then decide whether you have a performance problem at all. In the thread I linked to previously, P‑YS tried my suggestions and concluded that lines() was faster than my suggestion.
Tom Tumelty wrote:. . . What would you suggest to speed up file verification?
As Paul C said, why not? If you are reading 1,000,000 file names and creating Path objects and putting them into a List, you will probably occupy a few hundred MB of RAM. The default heap space capacity (maximum, not actual use) is 25% of available RAM, so most PCs will have at least 1GB available, which you can increase if necessary. If you are simply verifying existence you may need less than 1kB at any one time.
Considering that all the path and file data cannot be in memory at same time ?
I forgot about it last night, but Streams have methods which can be used for partitioning their input depending on a predicate, so you can create a Map<Boolean, List<Path>> where the two Boolean values give you Lists of Paths which do or don't exist. Go through the Stream documentation and its collect() method and Collector and Collectors, particularly its partitioning method. Or ask again and somebody will tell you.
I have also noticed in the Stream documentation that you have to close a Stream created from Files#lines(), which probably means try‑with‑resources.
Thank You for welcome