• Post Reply Bookmark Topic Watch Topic
  • New Topic

Sorting files in folder by last modified date  RSS feed

 
martin diaz
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi , i'm trying to figure out how to solve the following problem.
i have to save some particular info of a pdf file info into a database.
there are lots of folders with tons of pdf files to be analized and then getting that info saved.
anyways... how should i do to.. lets say.. make the process incremental? i need the program to stop at some point
and then continue to analize the resting files in the folders (i dont want the program to start all over again from 0 )
my ideas are..
sorting all the files using apache commons sort (by date) method (iv seen it somewhere) and saving the last file analized in the dabatase to then continue.
OR
just adding some extra character to the pdf file name that has been analized like for example xx-xxx.pdf to xx-xxxa.pdf
(i dont thing thats the way to do it but..)

any suggestions?
THANKS
 
Knute Snortum
Sheriff
Posts: 4279
127
Chrome Eclipse IDE Java Postgres Database VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch.

I think idea one is on the right track. Have you tried writing out on paper or a Notepad program it would work? Do you need to sort by modification date or just save a list of file names?
 
Stefan Evans
Bartender
Posts: 1837
10
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Another idea might be to have an "incoming" and a "processed" directory.
Your program picks up anything in "incoming" processes it and puts it in "processed"

Some other things to consider
- will it matter if a file gets processed multiple times?
- what happens to files that can't be processed? Will they/should they block any other files from being dealt with?
 
martin diaz
Greenhorn
Posts: 5
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
hey! for the sorting subject, well its not necessary to sort it by date.. but it came in to my mind that it was the best option .. lets say i have 1000 pdf files.. i need to save in a database some metadata( size, page cty, last modified date, file name )
the thing is.. in my work there is a "server" just a windows filesystem with tons of pdf that are growing daily.. and sooner or later the file system will explode (literally? :P ) i would leave the program running for some periods of time.
so it should not process a file 2 times!
the folder proposal is a very simple and good option indeed! if i can get the permission to modify folders.. ill go with that solution.

thanks for the help!


 
Liutauras Vilda
Sheriff
Posts: 4918
334
BSD
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
martin diaz wrote:just adding some extra character to the pdf file name that has been analized like for example xx-xxx.pdf to xx-xxxa.pdf
(i dont thing thats the way to do it but..)

As you rightly pointed yourself, to amend file name or content (worst scenario) it is not a good idea at any point in data analysis. Original files has to remain as they are all the time.
 
Anjaneya Reddy
Greenhorn
Posts: 11
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have some queries on data creation in pdf.

1. The data which available in pdf is dynamic data or static.

2. If it is runtime ,weather it is one time creation or batch process creation.

In my opinion.

If it data is creating as dynamically, we can create the pdf with data time extension and write the data into the pdf. So that we can easily separate the pdf file with data time.

Now we can write java component ,which can read the data from the pdf file with data and time bases.

Once read the file parallel, we can push the data into the data base.

Thanks.




 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!