• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Iterative computation in Hadoop

 
Ranch Hand
Posts: 33
1
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Is there any way to do iterative computation in Hadoop. In MapReduce paradigm, both map and reduce phase runs only once. Is there any procedure to fire map phase multiple times like iterations of a loop. I believe Apache Giraph runs compute method in some similar logic. Is there any way to perform MapReduce task in similar fashion or is there any alternate programming paradigm that performs similar task.
Thanks in advance !!!
 
Bartender
Posts: 1210
25
Android Python PHP C++ Java Linux
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What is the nature of data you are processing, that it requires iterations?

Depending on nature of data, the simplest way may be for the driver to run multiple map reduce jobs in a while loop.
If number of iterations is known and is constant, or can be determined by a first pass job, it's straightforward to run the loop.
If it's not known or is variable depending on data, the reducer in each job is responsible for communicating to the driver whether more iterations are necessary or the terminating condition has been satisfied.
It can do this either via a status file on HDFS, or using Counters.

Another option is to use Giraph itself, especially its blocks framework.
 
Debajyoti Kundu
Ranch Hand
Posts: 33
1
Java
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Karthik,
Thanks for your insight. The nature of the data I'm working with is a set of records, which need to go through multiple phases unless some desired metric is achieved. I previously briefly worked with Giraph and I think it only takes some network as input. Hence there is not much scope of using Giraph (Please correct me if I'm wrong).
Also running a Hadoop job via while loop causes some additional overhead. When a Hadoop job is submitted some set of internal codes run before the map-reduce phase begin. I previously tried such approach but the run-time is too much in this approach.
I mostly worked with Hadoop 1.0. I recently read in some blogs that in Hadoop 2.0 there is provision for iterative programming along with standard map-reduce paradigm. I did not find much help on this topic on internet. Please let me know if you have any knowledge on this.

Thanks.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
reply
    Bookmark Topic Watch Topic
  • New Topic