Uh oh. Looks like it's just you and me again, Andy. Oh well, here goes.
Virtually every organisation larger than your corner store will be using a relational (SQL) database for something. Businesses, government organisations, colleges, hospitals etc etc. Almost every significant business application needs a datastore, and in the vast majority of cases that store will be a relational database. So if you want to work on business applications that use data (pretty much all of them), you need to know something about relational databases.
Every RDBMS (relational DB management system) is ultimately based on the relational model
, which is a robust, flexible and logical approach to modelling data based on mathematical logic and set theory. There are variations e.g. you usually need to denormalise your relational data model to improve performance, and most importantly, data warehouses use a different approach to data modelling - the star schema
- which is based on a different logic but is often implemented on a conventional RDBMS platform (with some tweaks).
So if you want to learn about DBs, you could buy a college-level textbook on relational databases, or follow a suitable online course like the one on Coursera
, and you definitely need to learn SQL properly. All the RDBMSs support SQL, although there are typically various proprietary extras on different commercial DBs - some of these are very useful, others are simply variations on a theme. Whichever route you take, you need to understand at least the basics of the standard relational approach and SQL before you can appreciate how to use it, when not to use it and how to pick and use an appropriate alternative approach - star schemas for DWH, NoSQL etc.
The most widely used proprietary RDBMSs are probably Oracle
, IBM's DB2
and Microsoft SQL Server
. There are free versions of these available for download, so you can pick one and start learning it using the relevant online tutorials e.g from the vendor.
As an Oracle developer, I've naturally seen a lot of Oracle systems, so I know Oracle
is very widely used for all kinds of systems from small departmental applications to massive DWH systems serving multi-national corporations. As for the others, my impression is that IBM DB2
tends to be found in big data centres for large organisations like banks, but I don't know much about it. SQL Server
has traditionally been less popular for large scale applications (this may be changing), but very popular for mid-sized applications, possibly because of the easy integration with Microsoft development tools. Personally, I prefer Oracle but obviously I'm not exactly unbiased here!
There are also specialist databases for big data warehousing systems e.g. IBM Netezza, Teradata, Exadata etc. I'm not aware of any free options for downloading these, and to be honest if you want to learn about DWH and star schemas, you'd be better off doing so on a conventional RDBMS first.
The two main open source relational DBs are MySQL
(now owned by Oracle) and PostgreSQL
. There are also embedded SQL databases such as SQLite
which are aimed at lightweight database applications, but I haven't used these.
is very widely used for small/medium sized web applications (it's the default datastore for most of the free content management systems, for example). It's very popular with people learning about DBs or developing DB applications e.g. prototyping with Java
. I'm not sure how scalable it is for larger applications by default, but I seem torecall reading that Google used grids of MySQL servers running in parallel to serve some of their applications. There is also an "enterprise" version, but I'm not sure how widely it's used.
seems to be less widely used than MySQL, but my own impression is that it's the more robust and mature of the two databases. It's definitely used in large applications, but is equally suitable for smaller apps as well. It includes support for an implementation of Oracle's PL/SQL language for stored procedures, and there is a paid-for variation of PostgreSQL (EnterpriseDB
) which is marketed as an alternative to Oracle for commercial applications. PostgreSQL is also particularly popular for spatial data, because it's always had excellent support for specialist spatial functionality via the PostGIS add-on.
Having used both occasionally for small projects, I much prefer PostgreSQL to MySQL, but I seem to be in the minority there. There are plenty of books and online tutorials for both of these.
There is no single "NoSQL" model, because each NoSQL database has its own approach to modelling data. Typically these DBs aim to provide better scalability than the traditional RDBMS (among other benefits) but they usually have to sacrifice other RDBMS features in order to do so e.g. they may not guarantee that your transactions are always consistent at any given time, may not support arbitrary queries etc. This combination of different data models and different properties means that different NoSQL DBs may be appropriate for different applications. This is great, as it provides extra options for people who are struggling to achieve their goals with a conventional RDBMS. However, there is also a danger of people reaching for fashionable NoSQL solutions simply because they never bothered to learn how to use a conventional SQL database properly and they think the NoSQL DB will somehow make all their problems go away. Don't be one of those guys!
The best place to start learning about NoSQL is the excellent book Seven Databases In Seven Weeks
by Eric Redmond and Jim Wilson. Buy it now.
If you want to get some free hands-on training in a NoSQL DB, 10Gen are starting a fresh online course in MongoDB For Developers
in the New Year, using Java or Python (but you don't need much of either). I just finished the first presentation of the Python course
and it was excellent.
Your DB project:
Final words of advice:
Pick a database, download and install it and start coding your own web/DB application. You'll learn much more that way than reading endless rambling diatribes from superannuated Oracle developers on JavaRanch!