This week's giveaway is in the Spring forum. We're giving away four copies of liveProject: Protecting User Data with Spring Security and OAuth2 and have Laurentiu Spilca on-line! See this thread for details.
Data Science Certificate is a set of courses from Johns Hopkins University (via Coursera) looking at data science using the R language. You have to pay for the certificate track, but you can study the individual courses for free. R is widely used in data science and statistics, but these courses are not specifically about Big Data technologies.
I'm working on a small team doing R&D around Big Data technologies. We've found the following tools interesting so far:
MongoDB - NoSQL database stores data as JSON documents. Great for scalability, flexible data models, arbitrary queries. Not so good for number-crunching, easy admin.
Cassandra - NoSQL database stores data in column-family format. Just starting to look at this, great for scalability, robustness, speed. Not so good for flexible data model, arbitrary query (can only query by key columns).
Apache Spark - excellent distributed processing engine that can run on a Hadoop or Cassandra cluster or in stand-alone mode and on a local machine. APIs for Scala, Python and Java, plus R is coming soon. This is definitely going to be a core Big Data technology.
Cloudera or Hortonworks - pre-packaged bundles of Hadoop-based technologies. Free "sandbox" downloads available.
Python (especially with the IPython Notebook) - great for interactive work, ad hoc data analysis, prototyping etc. Not so good for scaling up/out but powerful when combined with Spark
Scala - primarily for developing scalable applications e.g. using Spark, Akka, Kafka, etc.
R - I don't use this but some of my statistical colleagues like it, but it's hard to scale up/out easily.