It seems that I was able to create custom indexes based on InputSplit. Performances have been greatly improved on my test environment, but is there any hadoop guru here that could review my implementation to make sure I did it on the right way ? E.g I will not get undesired side effects when using on production ?
Indexing on mapreduce
Thank you in advance
Gartner says :Bigdata will be most advanced analytics products by 2015 !
Time to Become Big data architect by learning Hadoop(Developer,
Mahout, Splunk,R etc) from scratch to expert level