• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • Ron McLeod
  • paul wheaton
  • Jeanne Boyarsky
Sheriffs:
  • Paul Clapham
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
  • Himai Minh
Bartenders:

mapreduce giving a wrong count

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I am new to Mapreduce and I am trying to explore it a little. I took the basic WordCount example and have run it over data that is in mySQL table, it is giving 34 count for each individual record of mySQL. I assume that the map function is being called 34 times for each of the record in the table. I wonder is there a way to control the number of times the map function can be called. Please let me know if there is something I am missing.Any help on this is appreciated.

Here is the code that I am using:

public class WordCount1 {

public static Connection Con;
public static Statement statement = null;
public static PreparedStatement preparedStatement = null;
public static ResultSet resultSet = null;

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

public static Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

try
{
Class.forName("com.mysql.jdbc.Driver");
Con= DriverManager.getConnection("jdbc:mysql://<<ip_add>>:3306/test","user","mysql");
statement = Con.createStatement();

resultSet = statement.executeQuery("select * from test.a1");


while(resultSet.next())
{
word.set(resultSet.getString("aname"));
output.collect(word,one);
}

}
catch (ClassNotFoundException e)
{
System.out.println("no MYSQL Driver found");
word.set(e.toString());
output.collect(word,one);
}
catch (SQLException e)
{
word.set(e.toString());
output.collect(word,one);
}

}
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}

output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount1.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}





output:
name1 34
name2 34

Thanks in Advance,

Praveen K Bandi
 
Ranch Hand
Posts: 295
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Praveen,

I'm also very new to Hadoop but let me try answering you.

Hadoop should be used for unstructured data formats or semistructured formats for utilizing its features.
But I don't know why you are trying to get the data from database, instead you can try to read from file or any source.


Your assumption is correct in uderstanding your programe , it it calling 34 times.

As per the hadoop framework Map function will be for each data record.
But inside the Map you are trying to get the data from database always , infact map function should get the data from framework only.
Usually it will get from Text value parameter. by using value we usually perform our operation.

So I guess you might configured correctly, don't use connecting DB, getting results from resultset.

each records from the result set might get in value , try to use that value to your output.

I hope you understood.

Regards
Amruth
reply
    Bookmark Topic Watch Topic
  • New Topic