• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Tim Cooke
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Liutauras Vilda
  • Rob Spoor
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Stephan van Hulst
  • Tim Moores
  • Tim Holloway
  • Carey Brown
  • Scott Selikoff
  • Piet Souris
  • Jj Roberts
  • fred rosenberger

Roadmap for Bigdata study

Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Dear All,
I am new to bigdata are and I want some help for the folks to recommend good resource for bigdata for beginners
Posts: 2407
Scala Python Oracle Postgres Database Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Big Data is a Big Topic, so you probably need to decide on a more specific area to look at initially.

  • IBM Big Data University has a wide range of resources, although I haven't tried these myself.
  • MongoDB has some excellent free online courses on the MongoDB NoSQL database.
  • Datastax Academy has free online courses on the Cassandra NoSQL database.
  • Hortonworks Sandbox is an excellent way to get started with Hadoop, including several useful tutorials.
  • Data-wrangling with MongoDB is a free online course from Udacity on applying MongoDB for data science. You don't have to pay for the course - choose the "Access course materials" option.
  • Intro to Hadoop and MapReduce is another Udacity course on Hadoop (Cloudera).
  • Intro to Data Science is a Udacity course on data science using Python.
  • Data Science Certificate is a set of courses from Johns Hopkins University (via Coursera) looking at data science using the R language. You have to pay for the certificate track, but you can study the individual courses for free. R is widely used in data science and statistics, but these courses are not specifically about Big Data technologies.

  • I'm working on a small team doing R&D around Big Data technologies. We've found the following tools interesting so far:

  • MongoDB - NoSQL database stores data as JSON documents. Great for scalability, flexible data models, arbitrary queries. Not so good for number-crunching, easy admin.
  • Cassandra - NoSQL database stores data in column-family format. Just starting to look at this, great for scalability, robustness, speed. Not so good for flexible data model, arbitrary query (can only query by key columns).
  • Apache Spark - excellent distributed processing engine that can run on a Hadoop or Cassandra cluster or in stand-alone mode and on a local machine. APIs for Scala, Python and Java, plus R is coming soon. This is definitely going to be a core Big Data technology.
  • Cloudera or Hortonworks - pre-packaged bundles of Hadoop-based technologies. Free "sandbox" downloads available.
  • Python (especially with the IPython Notebook) - great for interactive work, ad hoc data analysis, prototyping etc. Not so good for scaling up/out but powerful when combined with Spark
  • Scala - primarily for developing scalable applications e.g. using Spark, Akka, Kafka, etc.
  • R - I don't use this but some of my statistical colleagues like it, but it's hard to scale up/out easily.

  • Hope this will give you some ideas.
    Consider Paul's rocket mass heater.
      Bookmark Topic Watch Topic
    • New Topic