Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

Mssql vs pure file save vs mongodb

 
Daniel Peichev
Greenhorn
Posts: 3
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,

I am currently working on a document management system. The file storage is in one table in ms sql database ziped and saved as binary. The files are usually ms office documents pdf pictures and txt files but it is possible to have all file types in it. The files will be very big amount and the file storage will grow greatly in the feature. At all i heard its not good to save big amount of binary fles in the db because it become slower and it may have complications with the backup when the db become too big. I recieved request to put the files on the file system or at least put them in different database. So my questions are the following. Why the db will become slower (I have key to every file, I search by this key and then extract the binary)? If I want to put the files on the file system, should i save them directly to the fs and just keep a key in the db or should I use for example mongodb. I heard mongodb and its equivalents are specialized document type db which can handle big amount of data. How it is better than just pure file saving on the fs or saving in a table in ms sql? Is it faster? Is it more scalable? How are the things with the backup there?

Thanks
 
Winston Gutkowski
Bartender
Pie
Posts: 10571
64
Eclipse IDE Hibernate Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Daniel Peichev wrote:I am currently working on a document management system....

First: Welcome to JavaRanch.

Second: Excellent description of your problem. Have a cow.

Now to the answer - or rather, my answer, because it's purely my opinion:
1. The best place to store files is on a filesystem, because that's what they're designed for.
2. The best place to store data - especially data that contains relationships - is in a database.

When you're dealing with document management you have both; so my preference would be to store the document files (compressed or not) in a filesystem, and store the path (or URL) to the file in the database, along with any other data you need.

Now not everyone will agree with this, so let me explain my reasoning:
1. You can handle the storage (and backup) of your repositories (database and files) separately.
2. You can keep your database and files on separate servers if you want.
3. You can use existing system, 3rd party or Java utilities to read, compress/uncompress, backup, sync, and even version your files if you want. If you store them as BLOBs, you're pretty much forced to use SQL to retrieve them, and the code tends to be clumsy.

Now let me qualify that a little. One thing that many modern databases can do very well is "enhanced searching" on text - eg, "find me every document that contains the words "corporate" and "philosophy" within 5 words of each other" - so if that's a major requirement of your system, or if you want to be able to use a search engine like Google, you might have to rethink my suggestion a bit.
Unfortunately, I've been out of the document management biz for a while now, so I can't really comment; but my gut feel would still be to keep document files and data about those files (even if it includes some or all of the text) separate.

HIH

Winston
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic