I have a web app for writing reports. The annexes to the reports are scanned documents of a few pages per report, which are stored in the database. My clients scanner is producing scanned documents that are something like 200 kb by page, so many of the annexes are in the 1 - 1.5 Mb range. This somewhat annoys me and I'm worried about how big the db is going to become at this rate. We checked the settings of the scanner and this was the lowest setting that produced acceptable quality (300 dpi, i believe). I dont really know if this file size is normal for a scanner. So my question is does this sound normal to you or should something be done? The client is a very small business and I'd guess once their business really gets up and running, there might be 500-700 reports per year, so the db would grow accordingly.
So if I understand your situation right, you expect roughly 1 GB of new data per year. That in itself doesn't sound unmanageable to me, databases can handle a lot, lot more than that. You just need to size your hardware accordingly, ie. ensure you have the necessary amount of disk space for your database.
However, to avoid the size growing indefinitely, you might want to purge or somehow archive old records, if it is possible. Say that your clients require to keep the data for 5 years at most. If that is the case, you can delete reports older than that and also easily calculate the total amount of space your database will require (five years times the amount of data per year).
You should be also prepared to situation in which your client might grow his business and actually allocate a bit more space for the database (a bit more is not very specific, so, say, allocate twice as much). You could also monitor the rate at which new data are generated by your clients and if it grows above projections, contact them to arrange making necessary changes. It would be great if you had a contract which would specify the maximum amount of database space they can use and procedures to take if it is exceeded (say, have the clients pay for additional storage), but we're getting beyond the realms of databases here.
posted 3 years ago
Yeah, I guess its ok. I was thinking of archiving old reports, so thats probably what I'm going to do. It just seems silly and sort of wrong to have scanned documents that are only a few pages but over a megabyte. Then again, we havent found a setting in the scanner to reduce the size. I think I'm just going to let it be since the overall size won't be a problem for database. I'll check with the hosting provider for the total space available to us, but I'm sure it is sufficient. Thanks for your answer!
Scanned documents take much more space compared to the original (text) form. If you're scanning the documents in color, but can actually do with greyscale, set up your scanner to scan in greyscale - it might result in some reduction in size.
It looks like different tools use different encoding of images inside PDFs, which might result in slightly bigger or smaller files (see this). Different PDF tools might therefore produce different sizes with the same image. I wouldn't expect much -- most of the image formats are lossy, meaning that you can achieve smaller image size, but at the cost of reduced sharpness of the scanned image, which might result in bad readability.
He was giving me directions and I was powerless to resist. I cannot resist this tiny ad:
Programmatically Create PDF Using Free Spire.PDF with Java