posted 3 years ago
Not sure about specific algorithms, but map-reduce is probably most useful when you can run the map tasks independently in parallel. But if each task depends on output from another task, then you can't run them in parallel. So it will depend on how far you can break your job up into these parallel tasks or successive map-reduce steps. But you could explore ways to use higher-level tools - Hive, Cascading etc - which might allow you to define your task at a higher level and let the tools work out how to map-reduce it.