• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

how to set map-reduce task?

 
Ranch Hand
Posts: 34
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Following data are extracted from the 1st map-reduce task

country ; title ; sex ; units ; file location
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRQ647N.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRA647N.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRQ647S.csv
Turkey ; Population ; Males ; Persons ; L/F/W/A/5/LFWA55MATRA647S.csv

And then i try to set 2nd map-reduce task with csv files of the file location column. Data format of each csv files is like below

year ; population
2004 ; 2130034
2005 ; 2239913
2006 ; 2437712
2007 ; 2210673

But i have no idea how to set 2nd map-reduce task with using file location column data from 1st map-reduce task. The final output format is like below

country ; year ; population
Turkey ; 2004 ; 2130034
Turkey ; 2005 ; 2239913
Turkey ; 2006 ; 2437712
Turkey ; 2007 ; 2210673

As far as i know, input file path is set only in driver class with FileInputFormat.setInputPaths() method, but in my map-reduce task file location is handled only in map and reduce class.i wonder how to load input file path from map and reduce class into driver class?
How can i put file location value into FileInputFormat.setInputPaths() method, for example FileInputFormat.setInputPaths(job,new Path("L/F/W/A/5/LFWA55MATRQ647N.csv"));
I need your advice. Your help will be appreciated in advance!
 
Ranch Hand
Posts: 63
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One way to do chaining of MR jobs is to use Spring Batch
 
Ranch Hand
Posts: 544
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,
Are your CSV files on HDFS ? How big is one file? I mean how many rows of "year";"population" does it contain ? You could copy them to HDFS first.

Then run a Pig script which would automatically chain the required MR jobs to process the data.

Pig script would roughly look like (Assuming output of your 1st MR is in a file)
1) Read the 1st MR output with schema - country,title,sex,units, file location (or name)
2) If CSV files are on HDFS, read those file using schema - file location (or name), year, population [You may have to write your own Loader Function for this as we want to have File location as one of the output fields]
3) Join 1 and 2 using "file location (name)" which would result in desired output i.e.
country, year, population

Of course, this all can be done using plain MR as well but you will have to chain those jobs together. Whichever way you proceed, I believe you would need to have CSV files on the HDFS cluster.

Regards,
Amit
 
reply
    Bookmark Topic Watch Topic
  • New Topic