As per my (limited )knowledge of Map-Reduce algorithm, i believe that in a job, reduce tasks will start running only after all the map tasks (also the combiner tasks if there are any) have finished execution. if there is no chance of a reduce task running (correct me if i am wrong ) while there are pending map tasks, why the tasktracker have separate (configurable) slots for map and reduce tasks? i have read that before starting a map task, a task tracker will look for a free map task slot, if it finds any it will allocate that slot to the map task, if there are no free task slots left then it will allocate a slot from reduce task slot . i just want to know why there is a configuration like this in hadoop . is this configuration is per job (make no sense, since reduce tasks cannot start before the completion of all map tasks.... again correct me if i am wrong) or per system (a system with many job...... this make some sense )
OCP Java SE 6 Programmer, OCP Java EE 5 Web Component Developer, OCE Java EE 6 Web Services Developer, VMware Certified Core Spring 3.x Developer, EMC Proven Professional (ISM-V2)
Gartner says :Bigdata will be most advanced analytics products by 2015 !
Time to Become Big data architect by learning Hadoop(Developer,
Mahout, Splunk,R etc) from scratch to expert level