This week's book giveaway is in the Java in General forum.
We're giving away four copies of Event Streams in Action and have Alexander Dean & Valentin Crettaz on-line!
See this thread for details.
Win a copy of Event Streams in Action this week in the Java in General forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Paul Clapham
  • Knute Snortum
  • Rob Spoor
Saloon Keepers:
  • Tim Moores
  • Ron McLeod
  • Piet Souris
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Tim Holloway
  • Frits Walraven
  • Ganesh Patekar

Comparing two lists

 
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I got 2 lists of titles of books from two different bookstores. These titles can be the same, but they are written differently e.g "For example" - "For - example", as you can see they are equal, but not at all.

That's why I wrote stream that will purify elements (it will delete blank spaces and special letters) from list and make them equal, so after stream both will look like "forexmaple" so they are now equal.


The problem is... I want to get ONE map that will consist title (from original list) and number of occurrences of book (maximum 2 occurrences, default 1). I've written algorithm that compares two titles and add title from first bookstore to map, but I have to add from second, but don't know how can I get this title.

To make it clear...

I'm comparing title from first bookstore with each title from second bookstore, if it is equal, then I'm adding +1, if for loop ends, I'm adding this iterated title from first bookstore with number of occurrences. But what with titles from second bookstore that has only one occurrence? I know index of iterated title from first bookstore so I can get this title from original list (with unpurified titles) by using .get(i) method, but I do not know the index of iterated title from second bookstore to get original title.

The only solution I see is, first compare tite with each title from second and then compare title with each title from first bookstore, but it is not optimal solution... or somehow unpurify list.

To sum up, I have only in map titles from first bookstore, how can I add titles from second bookstore that were omitted. I want to have originals titles in map (e.g purified is houseisbig, but the original is House - is big)! I'm comparing with purified list and add original titles.

The class:

 
Marshal
Posts: 65038
247
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And when you have a book in the Economics category about changing currencies, “Forex - Ample”, what then?

I think you are trying to do too much all at once. Look at the length of your method names. Line 81, “for example”, and that isn't even the longest. What if you want twenty book titles, or you add “SciFi” as a book category? Why are you using so many methods with similar names when you could pass the category as an argument? Why are you using Strings for categories when you could create an enum?
I think you should divide that task into small parts. Start by creating a book title class with some sort of equals() method with a proper algorithm to compare titles.

Later: Please explain what you are doing with all the Lists and Maps. Have you got the right data structures?
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:And when you have a book in the Economics category about changing currencies, “Forex - Ample”, what then?



Then purified String will be "forexample" (I see that I did not add `.lowerCase()` method to stream) and in list it should be "Forex - Ample"

Campbell Ritchie wrote:I think you are trying to do too much all at once. Look at the length of your method names. Line 81, “for example”, and that isn't even the longest. What if you want twenty book titles, or you add “SciFi” as a book category? Why are you using so many methods with similar names when you could pass the category as an argument? Why are you using Strings for categories when you could create an enum?



Tbh, I was wondering how can I choose proper implementation for given category, but that is what I invented. It is hard for me to invent new solution as my methods for categories looks like this: https://github.com/must1/BookstoreScraper/blob/master/src/main/java/bookstore/scraper/book/scrapingtypeservice/CategorizedBookService.java I'm using two different bookstores and need to pass two different URL's based on given category. Is there any way to reformat it? I will create ENUM for categories for sure, but that's what I know how to do.

Campbell Ritchie wrote:

I think you should divide that task into small parts. Start by creating a book title class with some sort of equals() method with a proper algorithm to compare titles.

Later: Please explain what you are doing with all the Lists and Maps. Have you got the right data structures?



I have book class whcih contains title, you think I should create another entity class?
I think, I've got right data structures. In list I've got titles I want to get Map with titles and occurrences.
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

must Janik wrote:. . . you think I should create another entity class?

Probably not. But what I think you should do is give a good explanation of why you are using that particular structure.

I think, I've got right data structures. . . .

You will only know that when you have explained what you are going to use them for.

There are suggestions about Maps and counting in  this thread.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

must Janik wrote:. . . you think I should create another entity class?

Probably not. But what I think you should do is give a good explanation of why you are using that particular structure.



Tbh, propably that's why, using Jsoup I retrieve detailed book containg title, author etc. That's why i wanted to retrieve from Book instance title.

You will only know that when you have explained what you are going to use them for.



Do not know what you are asking exactly. I just wanted to keep that information (title, occurrences) in map, that's all.

Do you have any idea how can I reformat category service to shorten everything up as you said and then get the result I want on ranking service, because  I'm stuck? I want to count titles that are reapeated in both list and then merge it. Example:




the result should be
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
As I said, I think your book class wants an equals() method, and the other things that go with equals(), so you can define equality, so your books “toj est”, “tojest”, and, “to jest” count as the same. Then, maybe, combine the two Lists and stream the combined List.Remember, it will only work if you have overridden equals(), etc, correctly. I am not joking.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:As I said, I think your book class wants an equals() method, and the other things that go with equals(), so you can define equality, so your books “toj est”, “tojest”, and, “to jest” count as the same. Then, maybe, combine the two Lists and stream the combined List.Remember, it will only work if you have overridden equals(), etc, correctly. I am not joking.



But I need to replace also special letters like ".", "-" etc. I should do that also in equals() method inside Book entity? It seems to be harder than I thought :/
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

must Janik wrote:. . . . I should do that also in equals() method inside Book entity?

Probably. I think that is what I would do. If the book objects are immutable, it should be possible to have a second field for a plain simple title, minus double spaces and hyphens.

It seems to be harder than I thought :/

Sorry, You seem to have run into an awkward so‑and‑so who delights in finding flaws in people's programs. Maybe I am that awkward person
Maybe it will be easier if you make the book class immutable and have a secondary field for the sanitised title. Use that secondary field in the equals() method, but be sure to explain how it works in the documentation comments. Make sure to think carefully through what you want to do before you try it.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay. I will try to do that.

Do you have any idea how to improve code from category service, not to repeat ge15BooksFromGuides,From Crimes etc.
 
Saloon Keeper
Posts: 3407
149
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Is that purified title strong enough to base the 'equals' method on it?

But to make a frequencymap, you do not need to override 'equals'. Assuming your Book class has a method 'String getPurifiedTitle', then a frequencymap of two Book lists can be made like:

 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:Is that purified title strong enough to base the 'equals' method on it?

But to make a frequencymap, you do not need to override 'equals'. Assuming your Book class has a method 'String getPurifiedTitle', then a frequencymap of two Book lists can be made like:



I do not understand this solution at all.. Purified title of two String for example "Ta - la" and "ta.la" would be "tala" for both.
Part about frequency map is hard to understand by me.
 
Piet Souris
Saloon Keeper
Posts: 3407
149
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well,

As I wrote, it seems handy to add a method 'getPurifiedTitle()' to the Book class.

If you have a List with two books, with the titles in your exapmple, then they have the same purified title, and thus, if you group the list, based on that purified title, you get a count of 2. That is what I did, and as you see, for this the books do not have to be equal. Just having the same purified title is enough.

The Stream.of(list1.stream(), list2.stream()... concatenates the two files list1 and list2 in the form of a Stream, so that you do not need to merge the two lists.

All in all, it is just a concise way to get your frequency count of two lists.
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Good point, Piet, that you can create the Map from the sanitised title. Please supply a bit more information about the collect() call, because it is obviously something unfamiliar to MJ.
MJ: Make sure the sanitised title retains enough information to distinguish it from other titles. Consider retaining single spaces, but not multiple spaces. Have a look at this recent thread, which tells you a bit more about frequencies. Once you have a Stream, it is very easy to limit it to a maximum size.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All right, so here is it what I understood.
I got method getPurifiedTitle() which returns iterated title purified. According to it, it is comparing with other titles in stream, if it finds same purified title I get count of 2.  The question is, it is saving original title or the purified, if so, which one, from first list or second list? Because as I said, the orignal title can be different, but the same. (given example in previous post)
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

must Janik wrote:. . . it is saving original title or the purified

Since you are calling a getPurifiedTitle() method, it must be the purifies/sanitised title, not the original.

if so, which one, from first list or second list? . . .

Does that matter? You will get the same result whichever order you count the lists in. ImagineandSince the counts are independent of the order of counting, you will get the same result from both.

If you are doing something where encounter order matters, you must be careful which order you concatenate the two Streams.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

must Janik wrote:. . . it is saving original title or the purified

Since you are calling a getPurifiedTitle() method, it must be the purifies/sanitised title, not the original.



So it is not, what I wanted. I have got stream that does the same thing.  

I want to have original title or at least title which is readable not purified. What can I do with purified title which looks like "purifiedtitle". It is not readable and looks ugly in the map.
 
Piet Souris
Saloon Keeper
Posts: 3407
149
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
To clarify what I was doing: I supposed that your Book class looks like this:

So now you do not need extra methods to derive those purified titles from books. I think this is just a comfortable way of doing.

Hopefully this makes my example of grouping books by their purified titles a little more clear. However, when grouping your books on that p.t. , there are many things that you can put in each group. I give three examples:
That will give you the number of books in each group, i.e. a frequencycount.
This will give you a List of all the Books in each group. If you print these lists, you will see a List of Books in the form of their toString method.
The groups now contain the true titles of the books in each group.

See the API of the Collectors class, to see what more you can do with this grouping. Experiment as much as you can, since it is a very convenient way of reporting characteristics of collections.
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Piet for your time. Appreciate it, but it is not what I was asking or I misunderstood something.
 
Piet Souris
Saloon Keeper
Posts: 3407
149
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Then I apologize for my misunderstanding. I based my posts on this part of one of your replies:


Now, I thought you were mistaking here, since here you are grouping based on the purified title, but you report the real titles. Now I undertand why you asked from what list the title should be taken! That question wasn't clear to me at that time.

Well, this is also doable, but we must add yet another Map (this is certainly a great traning in Maps!).

Suppose we have that last map that I described, i.e. a Map<purified title, List<real title>>, so to speak. Now, we may assume that the first real title in each List is coming from list 1, except when a Book is only present in list 2. So what we can do is take all the values of that last map, and turn them into a Map themselves, like:
And, please, don't tell me I STILL misunderstood!    

(PS: but if so, can you again explain your intentions?)
 
must Janik
Ranch Hand
Posts: 99
1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes! It is working!
Thanks a lot!
 
Piet Souris
Saloon Keeper
Posts: 3407
149
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You're welcome!
 
Campbell Ritchie
Marshal
Posts: 65038
247
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Mentioned in despatches?

Congratulations For starting a thread quoted in the July 2019 CodeRanch Journal, you have been awarded a cow.
 
Trust God, but always tether your camel... to this tiny ad.
Java Code Review and Psychology
https://coderanch.com/t/714798/java/Java-Code-Review-Psychology
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!