• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
  • Campbell Ritchie
  • Paul Clapham
  • Ron McLeod
  • Jeanne Boyarsky
  • Tim Cooke
  • Liutauras Vilda
  • paul wheaton
  • Henry Wong
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
  • Frits Walraven
  • Piet Souris
  • Himai Minh

Storm compared to Hadoop and Spark

Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello Authors,
As far as I understand, there are at least 3 cluster computing frameworks Apache has released - Hadoop, Spark and Storm.
Could you please help understand which use cases would better fit in Storm comparing to Hadoop and Spark ?
There is another one, "Giraph", but per my understanding it is best for Graph processing ( never used it though).

Posts: 14
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hadoop is oriented towards working with batches of data.

Spark is oriented towards working with either batches of data like Hadoop or towards "micro batching" which is basically smaller batches of data that starts to approximate what a streaming solution is like.

Storm is oriented towards working on a never ending stream of data where you are constantly calculating and there is no start or end. Whenever data arrives, it is processed. Storm via Trident can also do microbatching.

Think batch processing system when you are crunching a large amount of data and don't need an answer right now. For example, you can process your website's log files to look for trends every day and extract value from them, then a batch framework like Hadoop is perfect. However, if you are analyzing those logs in order to detect intrusion attempts against your system, then you want to know as soon as possible. For this, you would want a system like Storm where each event within your system is shipped as a stream to Storm as soon as it happens so you can analyze it immediately.
Or we might never have existed at all. Freaky. So we should cherish everything. Even this tiny ad:
Free, earth friendly heat - from the CodeRanch trailboss
    Bookmark Topic Watch Topic
  • New Topic