Win a copy of The Little Book of Impediments (e-book only) this week in the Agile and Other Processes forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

How to get line number from textInput formats

 
Khajaasmath Mohammed
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,

I am new to Hadoop and I got a requirement in project where line number should be retreived from the text file. Is there any way to get the line number in map reduce. BTW, I am using java to develop map reduce. Eagerly waiting for the answers

Input file structure
n1,n2,n3
f1,f2,f3
------

Thanks,
Azzu
 
amit punekar
Ranch Hand
Posts: 544
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hello,
I think you will need to write your own InputFormat to spit out the line number as key and content of that line as a value.
However, I am not sure how would you handle a scenario where your Input file is split across different part file. In this case you may have "x" lines in one split and remaining "y" in another split.
Would you assign line-number as 1 again for the second file ? If there are two splits then there would be possibility of running two mapper jobs simultaneously. In this case you cannot even know how many lines first split had.
Maybe this was the reason Hadoop API does not have built-in input format that gives out linenumber and its contents as a record.

Could you tell us why you need linenumbers ?

Regards,
Amit
 
Rajesh Nagaraju
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Additionally due to the the parallel processing of the mappers and the sort and shuffle phase, the ordering of the lines in the output could change.
Hence the line numbers in the output will not sync with the input.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic