Hope this is the correct forum for a hadoop question.
I have a file with a bunch of lines like this:
It continues on for all 50 states, then there is another word like politics:30 Virginia ... etc.
I want to do a distributed sort on this using mapreduce. I know mapreduce sorts between the map and reduces stages, so I just want to emit from map, then from reduce without processing, but it is not working. Here is my map and reduce function:
Here is my main class
And here is the inputformat class i wrote since FileInputFormat would always fail
Here is the error
posted 7 years ago
Thought I would post the solution I found. It was an incredibly dumb error on my part. In my main class, I named the Job instance sort, but then when setting the mapOutputKey, mapOutputValues, outputKey and outputValue, I use the identifier job. That identifier was from a previous mapreduce in the chain and I had just copied and pasted the code without remembering to change the job identifier.
What are you doing? You are supposed to be reading this tiny ad!
the new thread boost feature brings a LOT of attention to your favorite threads