programming forums Java Java JSRs Mobile Certification Databases Caching Books Engineering OS Languages Paradigms IDEs Build Tools Frameworks Products This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
Sheriffs:
Saloon Keepers:
Bartenders:

parsing files and storing them to a data type

Naziru Gelajo
Ranch Hand
Posts: 175
1
Hello, I am working on a project that parses data from two .dat files (a File composed of various movies and a file that contains separate ratings for those movies). The overall function of the project is to predict ratings of different movies that a particular user has not seen yet.

The way the ratings are done is by first computing similarity between different movies via cosine similarity, and then using the cosine similarity of the movies we compute the rating.

My problem stems from parsing the files and storing them to a data type. I ideally want to use a hashmap due to its efficiency to store the respective movies to each user so it is something like this

map<userID, HashMap<itemIDRating>>

The problem is that when computing the cosine Similarity, it will have to be in a different method. So I first parse the files (both movie and ratings file), then I attempt to compute cosineSimilarity for all the movie items based on their respective ratings. I do not know if this is a clear or vague explanation from me, but here is what I have done thus far:

I am a bit confused, because I have parsed the data, the problem now is computing the similarity via the cosine similarity algorithm. Am I on the right track?

Fred Kleinschmidt
Bartender
Posts: 571
9
you should never use Math.pow() to compute the square of a number.  Just use x*x instead.
In your computeCosineSimilarity() method, what do you think will happen if movieA is longer than movieB?

Naziru Gelajo
Ranch Hand
Posts: 175
1
Fred Kleinschmidt wrote:You should never use Math.pow() to compute the square of a number.  Just use x*x instead.
In your computeCosineSimilarity() method, what do you think will happen if movieA is longer than movieB?

Well that could be a problem if movieA is longer than movieB so I'm thinking about using a HashMap instead of an array.

Liutauras Vilda
Sheriff
Posts: 4917
334
parseMovieFile() and parseRatingFile() methods look almost identical. Consider adding parameter for file path and removing one duplicated method.

Liutauras Vilda
Sheriff
Posts: 4917
334
All comments you wrote are redundant too. No point in writing such comments - just makes code longer and harder to read.

Naziru Gelajo
Ranch Hand
Posts: 175
1
Liutauras Vilda wrote:parseMovieFile() and parseRatingFile() methods look almost identical. Consider adding parameter for file path and removing one duplicated method.

Noted, however one data set is far larger than the other, hence, there are separate methods for movie file parsing and rating file parsing.

Knute Snortum
Sheriff
Posts: 4276
127
Some minor notes on your code:

1) Always code to the interface.  So prefer, for example: Map<String, Double> ratings = new HashMap<>();

2) Use try-with-resources when the object you're reading from is autocloseable:

Campbell Ritchie
Marshal
Posts: 56536
172
Why not use a Scanner to read the file? You can use nextInt or similar to find the ID, and nextLine to find the title as the remainder of the line. Use "\\|" as your delimiter. You can pass a line to a Scanner's constructor and use that Scanner object to parse the line.
If this isn't a learning exercise, consider a CSV reading library instead of creating your own method.
Don't call the array in line 18 tokenDelimiter; that is a misleading name.
Your method for reading the ratings into a Map (line 46) is probably incorrect.
You have already been told the comments are redundant. I would say that the cosine similarity method ought to have a comment (preferably /** type */) to explain the algorithm. Then people can verify that your code actually implements that algorithm.

Naziru Gelajo
Ranch Hand
Posts: 175
1
So, I have made some modifications, but my cosineSimliarity computation is not working properly based on some values given to me for this exercise...I am not sure what I'm doing wrong, Is my formula somehow wrong?

I know that some people mentioned remove the comments so I attempted to do so. Where I'm not too sure of where I'm succeeding is in the math. The Cosine similarity computation is not working properly.

Knute Snortum
Sheriff
Posts: 4276
127
my cosineSimliarity computation is not working properly

It's hard to diagnose a problem from a statement like this.  What output do you get?  What do you expect?

 Don't get me started about those stupid light bulbs.