• Post Reply Bookmark Topic Watch Topic
  • New Topic

word frequency program for unicode  RSS feed

 
the shrink
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I'm new to Java and am compounded with an issue:
Need to write a program that counts the word frequency from a file containing Urdu (in Unicode) words.
Would appreciate any help in starting out. I understand that BufferedWriter, FileInputStream, and FileOutputStream classes would be very helpful.
Also would appreciate tips in handling white spaces, new line, punctuation marks and end of file characters.
 
marc weber
Sheriff
Posts: 11343
Java Mac Safari
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
"the shrink,"

Welcome to JavaRanch!

First, please revise your display name to meet the JavaRanch Naming Policy. To maintain the friendly atmosphere here at the ranch, we like folks to use real (or at least real-looking) names. You can edit your name here. Thank you for your prompt attention.

Now, with respect to your question, where exactly are you are stuck? Can you post some of the code that you have so far?

-Marc
 
Adam Richards
Ranch Hand
Posts: 135
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since Java uses Unicode, the fact that the words are Urdu is irrelevant.

A common approach to counting words (or any kind of substrings) is:

1. Create HashMap<String,Integer>
2. For each word in the string, if not in the hash map add it, else just increment its count.
 
Layne Lund
Ranch Hand
Posts: 3061
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What have you done so far? At the very least, I assume you know to start with a main() method. Please post some code to illustrate what you have tried.

Layne
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!