Two Laptop Bag*
The moose likes Hadoop and the fly likes mapreduce giving a wrong count Big Moose Saloon
  Search | Java FAQ | Recent Topics | Flagged Topics | Hot Topics | Zero Replies
Register / Login


Win a copy of Android Security Essentials Live Lessons this week in the Android forum!
JavaRanch » Java Forums » Databases » Hadoop
Bookmark "mapreduce giving a wrong count" Watch "mapreduce giving a wrong count" New topic
Author

mapreduce giving a wrong count

praveenKumar Bandi
Greenhorn

Joined: Feb 11, 2013
Posts: 1
Hi All,

I am new to Mapreduce and I am trying to explore it a little. I took the basic WordCount example and have run it over data that is in mySQL table, it is giving 34 count for each individual record of mySQL. I assume that the map function is being called 34 times for each of the record in the table. I wonder is there a way to control the number of times the map function can be called. Please let me know if there is something I am missing.Any help on this is appreciated.

Here is the code that I am using:

public class WordCount1 {

public static Connection Con;
public static Statement statement = null;
public static PreparedStatement preparedStatement = null;
public static ResultSet resultSet = null;

public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);

public static Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

try
{
Class.forName("com.mysql.jdbc.Driver");
Con= DriverManager.getConnection("jdbc:mysql://<<ip_add>>:3306/test","user","mysql");
statement = Con.createStatement();

resultSet = statement.executeQuery("select * from test.a1");


while(resultSet.next())
{
word.set(resultSet.getString("aname"));
output.collect(word,one);
}

}
catch (ClassNotFoundException e)
{
System.out.println("no MYSQL Driver found");
word.set(e.toString());
output.collect(word,one);
}
catch (SQLException e)
{
word.set(e.toString());
output.collect(word,one);
}

}
}

public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}

output.collect(key, new IntWritable(sum));
}
}

public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(WordCount1.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}





output:
name1 34
name2 34

Thanks in Advance,

Praveen K Bandi
Amruth Puppala
Ranch Hand

Joined: Jul 14, 2008
Posts: 295
Hi Praveen,

I'm also very new to Hadoop but let me try answering you.

Hadoop should be used for unstructured data formats or semistructured formats for utilizing its features.
But I don't know why you are trying to get the data from database, instead you can try to read from file or any source.


Your assumption is correct in uderstanding your programe , it it calling 34 times.

As per the hadoop framework Map function will be for each data record.
But inside the Map you are trying to get the data from database always , infact map function should get the data from framework only.
Usually it will get from Text value parameter. by using value we usually perform our operation.

So I guess you might configured correctly, don't use connecting DB, getting results from resultset.

each records from the result set might get in value , try to use that value to your output.

I hope you understood.

Regards
Amruth
 
I agree. Here's the link: http://aspose.com/file-tools
 
subject: mapreduce giving a wrong count
 
Similar Threads
Hadoop key mismatch
BMP, Jboss, and mysql configuring
Hadoop(Beginner Level Question)
SQLException: Invalid MySQL syntax
Concordancer