posted 7 months ago

Hello, I am working on a project that parses data from two .dat files (a File composed of various movies and a file that contains separate ratings for those movies). The overall function of the project is to predict ratings of different movies that a particular user has not seen yet.

The way the ratings are done is by first computing similarity between different movies via cosine similarity, and then using the cosine similarity of the movies we compute the rating.

My problem stems from parsing the files and storing them to a data type. I ideally want to use a hashmap due to its efficiency to store the respective movies to each user so it is something like this

map<userID, HashMap<itemIDRating>>

The problem is that when computing the cosine Similarity, it will have to be in a different method. So I first parse the files (both movie and ratings file), then I attempt to compute cosineSimilarity for all the movie items based on their respective ratings. I do not know if this is a clear or vague explanation from me, but here is what I have done thus far:

I am a bit confused, because I have parsed the data, the problem now is computing the similarity via the cosine similarity algorithm. Am I on the right track?

The way the ratings are done is by first computing similarity between different movies via cosine similarity, and then using the cosine similarity of the movies we compute the rating.

My problem stems from parsing the files and storing them to a data type. I ideally want to use a hashmap due to its efficiency to store the respective movies to each user so it is something like this

map<userID, HashMap<itemIDRating>>

The problem is that when computing the cosine Similarity, it will have to be in a different method. So I first parse the files (both movie and ratings file), then I attempt to compute cosineSimilarity for all the movie items based on their respective ratings. I do not know if this is a clear or vague explanation from me, but here is what I have done thus far:

I am a bit confused, because I have parsed the data, the problem now is computing the similarity via the cosine similarity algorithm. Am I on the right track?

Fred Kleinschmidt

Bartender

Posts: 571

9

posted 7 months ago

you should never use Math.pow() to compute the square of a number. Just use x*x instead.

In your computeCosineSimilarity() method, what do you think will happen if movieA is longer than movieB?

In your computeCosineSimilarity() method, what do you think will happen if movieA is longer than movieB?

posted 7 months ago

Well that could be a problem if movieA is longer than movieB so I'm thinking about using a HashMap instead of an array.

Fred Kleinschmidt wrote:You should never use Math.pow() to compute the square of a number. Just use x*x instead.

In your computeCosineSimilarity() method, what do you think will happen if movieA is longer than movieB?

Well that could be a problem if movieA is longer than movieB so I'm thinking about using a HashMap instead of an array.

posted 7 months ago

Noted, however one data set is far larger than the other, hence, there are separate methods for movie file parsing and rating file parsing.

Liutauras Vilda wrote:parseMovieFile() and parseRatingFile() methods look almost identical. Consider adding parameter for file path and removing one duplicated method.

Noted, however one data set is far larger than the other, hence, there are separate methods for movie file parsing and rating file parsing.

posted 7 months ago

Some minor notes on your code:

1) Always code to the interface. So prefer, for example:

2) Use try-with-resources when the object you're reading from is autocloseable:

1) Always code to the interface. So prefer, for example:

`Map<String, Double> ratings = new HashMap<>();`2) Use try-with-resources when the object you're reading from is autocloseable:

All things are lawful, but not all things are profitable.

Campbell Ritchie

Marshal

Posts: 56536

172

posted 7 months ago

Why not use a Scanner to read the file? You can use nextInt or similar to find the ID, and nextLine to find the title as the remainder of the line. Use "\\|" as your delimiter. You can pass a line to a Scanner's constructor and use that Scanner object to parse the line.

If this isn't a learning exercise, consider a CSV reading library instead of creating your own method.

Don't call the array in line 18 tokenDelimiter; that is a misleading name.

Your method for reading the ratings into a Map (line 46) is probably incorrect.

You have already been told the comments are redundant. I would say that the cosine similarity method ought to have a comment (preferably /** type */) to explain the algorithm. Then people can verify that your code actually implements that algorithm.

If this isn't a learning exercise, consider a CSV reading library instead of creating your own method.

Don't call the array in line 18 tokenDelimiter; that is a misleading name.

Your method for reading the ratings into a Map (line 46) is probably incorrect.

You have already been told the comments are redundant. I would say that the cosine similarity method ought to have a comment (preferably /** type */) to explain the algorithm. Then people can verify that your code actually implements that algorithm.

posted 7 months ago

So, I have made some modifications, but my cosineSimliarity computation is not working properly based on some values given to me for this exercise...I am not sure what I'm doing wrong, Is my formula somehow wrong?

I know that some people mentioned remove the comments so I attempted to do so. Where I'm not too sure of where I'm succeeding is in the math. The Cosine similarity computation is not working properly.

I know that some people mentioned remove the comments so I attempted to do so. Where I'm not too sure of where I'm succeeding is in the math. The Cosine similarity computation is not working properly.

Don't get me started about those stupid light bulbs. |