• Post Reply Bookmark Topic Watch Topic
  • New Topic

Find the maximum occurence of words in a file  RSS feed

 
Santhu Santhua
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have to parse through a file and count the occurence of each word in the file. Then i have to display the words and number of count of those words within the file. I have to display only 10 words which have more occurences and display them in descending order. (Say the word with maximum occurence should be displayed first)

I am using a buffered reader to parse the file and stringtokenizer to split the words using a space. Also, I am maintaining a Map to store the words and their counts. "Key would be word" and the "value would be count". But how do i get the 10 words which have the maximum occurences and display them in the descending order ?

Any help would be highly appreciated !

Thanks
San

 
Wim Vanni
Ranch Hand
Posts: 96
Eclipse IDE Java Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch, San!

Apparently you are already trained and/or experienced enough to make use of collections in your solution. It shouldn't be too hard to find some sorting algorithm to sort on the word count and then you're solution is complete. If you encounter some problem in implementing this, please provide us with the code you already have and we'll see how you can get you back on track!

Cheers,
Wim
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Separate the two problems. Focus first on getting the number of occurrences of words. I think you're already have an idea to solve that.

After that focus on the other problem - getting some actual information out of it. I'd probably use a List<Map.Entry<String,Integer>> that stores each map entry, then sort that using a custom comparator.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You don't need to sort a List to find its maximum value (that runs at lest in Onlogn time). You need a single iteration through the List, which records the maximum found, which will run in linear time, but will not find both maxima in a bimodal distribution.
 
Rob Spoor
Sheriff
Posts: 21135
87
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But what if you want to get the top 10 most occurring words? Because that was one of the questions.
 
Campbell Ritchie
Marshal
Posts: 56600
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes, I missed that bit about 10 words. Sorry.
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!