• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Apache Accumulo and Hadoop

 
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the case with Apache Accumulo in the context of hadoop, and is there any real-case describing their usage together?

Regards,
Mohamed
 
author
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hadoop using MapReduce is a batch processing framework. Typically you churn through a lot of data in queries that take seconds, minutes or longer.

Hbase and Accumulo offer something more like a database modelled on the Google BigTable paper. These can service low-latency end-user facing queries. Accumulo has a number of particular extensions over HBase, in particular around much finer grained security labelling and the ability to efficiently run server-side functions.
Garry
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So that means Accumulo can be used in real-time cases? Or it similar to in-memory database?
 
author
Posts: 2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Accumulo can satisfy queries that demand fast response times, but the internal operations are not strictly in-memory. All underlying data structures are persisted to Hadoop HDFS.

It's primary purpose is to enable low-latency fetches over persistant columnar data stored in HDFS.
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I see. Thanks Brian.

Regards,
Mohamed
 
Garry Turkington
author
Posts: 15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Low latency is really what it's all about. In particular latency that is low enough that you could potentially use it to directly back applications servicing direct end users.

But if your interest here and in the other question re realtime is not just for low latency but true 'hard' realtime systems with all their consequent requirements then that's likely not a good fit for Hadoop or any of the related projects. Indeed when you take into account the basic mechanics of a distributed system adding hard realtime requirements would put you into a very specialised niche that most Hadoop use cases don't have to worry about.

Garry
 
Mohamed El-Refaey
Ranch Hand
Posts: 119
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Garry for clarifications ... It seems from all responses I got is that Hadoop may not the best options for hard real time processing, but at least it is capable of processing large base of data with an adequate speed.
Thanks again and have a nice day!

Regards,
Mohamed
 
reply
    Bookmark Topic Watch Topic
  • New Topic