Suppose I have some data and I want process it iteratively grouping for a different key. I think this could be done by running some Hadoop Tasks, but each would have an initial load, that is the initial I/O and the mapping process.
My idea was a map once and then do several reduces. Those reduces would emit new maps for the next reduces.
Is it possible to do with Hadoop? What do you think about this approach?
Is there any planning to change the hadoop to run many reduces from the same map? ( I guess then way it is now it can't)
IMO, it look like that if you can do many reduces taking advantage of it is already in the memory would make the process faster than do many jobs with the same map.