why there are separate slots for map and reduce tasks?
posted 4 years ago
As per my (limited )knowledge of Map-Reduce algorithm, i believe that in a job, reduce tasks will start running only after all the map tasks (also the combiner tasks if there are any) have finished execution. if there is no chance of a reduce task running (correct me if i am wrong ) while there are pending map tasks, why the tasktracker have separate (configurable) slots for map and reduce tasks? i have read that before starting a map task, a task tracker will look for a free map task slot, if it finds any it will allocate that slot to the map task, if there are no free task slots left then it will allocate a slot from reduce task slot . i just want to know why there is a configuration like this in hadoop . is this configuration is per job (make no sense, since reduce tasks cannot start before the completion of all map tasks.... again correct me if i am wrong) or per system (a system with many job...... this make some sense )