• Post Reply Bookmark Topic Watch Topic
  • New Topic

how to make sure a file is complete when opening it

 
Bucsie Dusca
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi
I have an application that continuously scans a directory for files. As soon as a file is found, some processing functions begin.
The problem is that these files can be very large ones, and what if the file is transfered from the network? It may be incompletely downloaded, when the application will try to open it, and errors might occur. So how can one make sure that the file is completely there, and then one can go ahead and process it?
I was thinking trying to .rename() it for a while (that shouldn't work if the file is still being written) ... but maybe it's not a good ideea to specify a given time, since filesize and network lag may vary.
Thanks in advance...
 
Srinivasa Raghavan
Ranch Hand
Posts: 1228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In one of my project we handled the same situation like this.
You can have a status field in the database or some where in the class for each file. When ever a new file gets created in the source folder the database gets updated with a new row for this particular file. The other thread or an application trying to use this file can check the status and it can come to know whether the file got transferred completly or in the process of getting transferred.

The other way is to check the file size in both source & destination side.
[ April 07, 2005: Message edited by: Srinivasa Raghavan ]
 
Bucsie Dusca
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
thanks for the reply, but the thing is, my app doesn't really know what files to expect, what the size of the file should be. it doesn't know where the files come from. it just scans for them...
But i think there should be some information that a stream is still accessing/writing the file ... i mean, that someone/something has that file opened.
...
 
Horatio Westock
Ranch Hand
Posts: 221
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Some ideas:

- Before starting the big file upload, create some kind of 'lock' file, or a naming scheme (pre/postfix '_incomplete'?). When the upload is complete, delete the lock file (or remove the pre/postfix). The reading process then just needs to check that there is no lock present before consuming the file.

- Pre/postfix the filename with the size, then check before using the file.

Of course, these would only work if you have control over the creating process. If the creating process is an ftp server or similar, then you probably couldn't do this. In that case, you could perhaps monitor the log output from the server?

I wrote the above before reading you latest reply. I suppose your best guess in an environments where you have no control over or communication with the file creating process, is to look and see if there is a write lock on the file.

Another option might be to have your process note the size of files at a certain time point, then compare after a set (perhaps quite large) period. If there has been no change in size during the period, then (perhaps dangerously) assume that the file has been transferred completely.

My last thought on the matter is, if you are working with a particular file type, and that type declares it's content length in a header block, I suppose you could attempt to read this and check if the file is complete. This restricts you somewhat though.

At the end of the day, you might not currently have control of the creating process, but if it's a serious requirement of your project, then you might have to change that.
[ April 07, 2005: Message edited by: Horatio Westock ]
 
Bucsie Dusca
Ranch Hand
Posts: 31
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
i have no control whatsoever, nor do i know where they come from. the files just (magically ) turn up! well... actually another application (independent of mine) puts them there
 
Mudi Appu
Ranch Hand
Posts: 44
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Create a File object for the file and call lastModified() method to check file is still downloading or not, before start reading it. (If the difference between last modified time and current time of the system is small file is still coping. If that so leave it to pickup later)
 
Srinivasa Raghavan
Ranch Hand
Posts: 1228
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Try to play with the last modified time ..
 
Jeroen Wenting
Ranch Hand
Posts: 5093
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Unless you have some form of checksum transmitted with the file there's no way you can know if it's complete until you're done processing it.
Of course if you encounter an error because of missing data during processing that's a pretty good indication that the file was corrupt
 
M Beck
Ranch Hand
Posts: 323
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
can you tell that a file is incomplete by looking at its ending? i.e., is there anything you'd expect to find at or near the end of a completely downloaded file that'd be missing from an incomplete one? if so, you could just read the last few kilobytes of a file and check for it.

after all, you say that trying to process an incomplete file gives you errors - is there any quick way to just create such an error, as a way to see if running the entire processing would fail or not?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Moving to IO...
 
Steve Taiwan
Ranch Hand
Posts: 166
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am having the same problem in read an uploading file.
I tried the following code to read a very large file while the file was still uploading.
The result is the do-while loop ended after the file upload completed.
Could someone else double check if this works?

 
S Herod
Greenhorn
Posts: 10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We have a system that is very file based (hundreds of thousands of files a day arriving by various means) and have a similar issue.

We have three approaches

1/ Monitor for a files with a certain extension, uploading processes upload with a '.tmp' extention and then rename the file at the end.

2/ Monitor the file's size. if its size hasn't changed in a minute then assume the file is complete.

3/ This is a variation of 1/, upload a group of files to a temp directory outside the monitored directory, rename directory when final upload complete.

All three of these options work well for us.
 
Frank Ertl
Ranch Hand
Posts: 59
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

we also had a similar problem. Working with a Solaris machine we did something like that:

  • 1. Copy the files into a temporary folder
  • 2. Do a lookup in this folder every minute or so (cronjob)
  • 3. Copy the files with a unix-script (short snippet below). This script is able to tell if there's another process running on the file.

  • I don't know if there's something like the described way on other operating systems and I know it's not the best solution for it spoils plattform independency. But if you are using an unix-system try it. It's working in production very well for at least half a year.

    [ April 25, 2005: Message edited by: Frank Ertl ]
     
    • Post Reply Bookmark Topic Watch Topic
    • New Topic
    Boost this thread!