must Janik

Ranch Hand
+ Follow
since Jul 04, 2018
Cows and Likes
Cows
Total received
1
In last 30 days
1
Total given
0
Likes
Total received
0
Received in last 30 days
0
Total given
34
Given in last 30 days
6
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by must Janik

Bump. Really looking forward to see what I did wrong. Thanks.
1 week ago
BookstoreScraper is application based on Spring Boot. It allows to scrap data from two biggest polish online bookstores - EMPIK and MERLIN. You can scrap books within 3 options:

bestsellers
most precise book (you simply give title and it looks for most precise book)
categorized book (currently 5 categories of book are available: BIOGRAPHY, CRIME, GUIDES, FANTASY, ROMANCES)
There is ranking option. It is comparing books from each bookstore and if title repeats the book is higher in the ranking.

There is also history system which tracks every action of logged user as there is provided simply security.

There are a lot of classes so I will paste just the most important classes, but if you can I will paste github link so you can check it out so you can tell me also about structure and other unit tests that are not posted here.

Let's start from the class responsible for scraping data from the site:

MerlinSource which implements BookServiceSource interface. I'm not gonna paste EmpikSource as it looks really similiar.



BookService - it fetches result from scraping data classes and wrap it into Map.



CategorizedBookRankingService - as I said this is service responsible for comparing books and creating ranking.



BookController



Let's go further to Account staff.

AccountService - create useres.



LoggedAccountService - retrieve logged account id




MerlinUrlPropeties - I got .yml file with corresponding url's.



JSoupConnector - create connection to url and return document



HistorySystemService - service responsible for fetching account history and saving account history.



TEST SECTION

AccountServiceTEst



MerlinSourceTest


CategorizedBookRankingService



BookServiceTest



I wanted to post also: MerlinUrlPropertiesTest,AccountHistorySystemServiceTest , but I couldn't as post was too long.

That's all for now. Looking forward to hear your opinions. Thanks.

2 weeks ago

Dave Tolls wrote:Since you have mocked out the List then there isn't really a List sat behind it at all.
Why don't you just use a real List?



You mean that I should pass the list via constructor?
2 weeks ago
I'm trying to unit test the method responsible for adding to map categorized books.



BookSerivceSource is an interface. This interface is implemented by two classes. I'm gonna provide just one, as the second is really similiar.

EmpikSource (one of implementation)



JsoupConnector:




Properties class:




While debugging the test I see that the sources size is 0. How should I add mocked object to the sources list or could you tell me if there is better way to do this?

Test



If adding elements to the list is bad choice, how it should be done?

Greetings,
Peter
2 weeks ago
DO NOT REVIEW IT. I PROVIDED BIG CHANGES, SO THIS CODE IS NOT UPDATED

I'm developing my bookscraper to improve myself in coding and I think I have solid structure that I want to be reviewed by more experienced people than me. It is app based on SpringBoot which fetches data from 2 different bookstores. For now it has 3 options of fetching books.

1) Bestsellers

2) Book with given title e.g: You say you want to get book with title "Great boy" and it gives you book from each bookstore with title, price, link to the book

3) Categorized books (currently for 5 categories e.g.: crimes, biogrpahies etc)

I have created ranking for categorized books which takes 15 books from each bookstore, then it counts how many given title was repeated (if for example two, then it is higher in the ranking) and returns it in the map Map<String,Integer> which goes for title and number of occurrences.

I have integration and unit tests. In my test resources I put 3 files for each bookstore which contains .html files for each type of fetching (bestsellers,categorized book,most precise book).

Here is the link to github, because I'm not gonna put all the classes here as there is a lot of them. https://github.com/must1/BookstoreScraper

I added also CI with Travis and SonarCloud.

JsoupConnector (responsible for connecting to the given url)



MerlinUrlProperties (all links are stored in .yml file), same class for second bookstore



MerlinFetchingBookService (fetches data from the net)



CategorizedBookRankingService (responsible for ranking for given category)



CategorizedBookService



MostPreciseBookService



BestsellersService



TESTS

CategorizedBooksRankingServiceTest



MerlinFetchingBookService



BestsellersServiceTest




Those are main classes that I want to be reviewed. Thanks a lot for each suggestion/opinion!

2 weeks ago
Yes! It is working!
Thanks a lot!
3 weeks ago
Thanks Piet for your time. Appreciate it, but it is not what I was asking or I misunderstood something.
3 weeks ago

Campbell Ritchie wrote:

must Janik wrote:. . . it is saving original title or the purified

Since you are calling a getPurifiedTitle() method, it must be the purifies/sanitised title, not the original.



So it is not, what I wanted. I have got stream that does the same thing.  

I want to have original title or at least title which is readable not purified. What can I do with purified title which looks like "purifiedtitle". It is not readable and looks ugly in the map.
3 weeks ago
All right, so here is it what I understood.
I got method getPurifiedTitle() which returns iterated title purified. According to it, it is comparing with other titles in stream, if it finds same purified title I get count of 2.  The question is, it is saving original title or the purified, if so, which one, from first list or second list? Because as I said, the orignal title can be different, but the same. (given example in previous post)
3 weeks ago

Piet Souris wrote:Is that purified title strong enough to base the 'equals' method on it?

But to make a frequencymap, you do not need to override 'equals'. Assuming your Book class has a method 'String getPurifiedTitle', then a frequencymap of two Book lists can be made like:



I do not understand this solution at all.. Purified title of two String for example "Ta - la" and "ta.la" would be "tala" for both.
Part about frequency map is hard to understand by me.
3 weeks ago
Okay. I will try to do that.

Do you have any idea how to improve code from category service, not to repeat ge15BooksFromGuides,From Crimes etc.
3 weeks ago

Campbell Ritchie wrote:As I said, I think your book class wants an equals() method, and the other things that go with equals(), so you can define equality, so your books “toj est”, “tojest”, and, “to jest” count as the same. Then, maybe, combine the two Lists and stream the combined List.Remember, it will only work if you have overridden equals(), etc, correctly. I am not joking.



But I need to replace also special letters like ".", "-" etc. I should do that also in equals() method inside Book entity? It seems to be harder than I thought :/
3 weeks ago

Campbell Ritchie wrote:

must Janik wrote:. . . you think I should create another entity class?

Probably not. But what I think you should do is give a good explanation of why you are using that particular structure.



Tbh, propably that's why, using Jsoup I retrieve detailed book containg title, author etc. That's why i wanted to retrieve from Book instance title.

You will only know that when you have explained what you are going to use them for.



Do not know what you are asking exactly. I just wanted to keep that information (title, occurrences) in map, that's all.

Do you have any idea how can I reformat category service to shorten everything up as you said and then get the result I want on ranking service, because  I'm stuck? I want to count titles that are reapeated in both list and then merge it. Example:




the result should be
3 weeks ago

Campbell Ritchie wrote:And when you have a book in the Economics category about changing currencies, “Forex - Ample”, what then?



Then purified String will be "forexample" (I see that I did not add `.lowerCase()` method to stream) and in list it should be "Forex - Ample"

Campbell Ritchie wrote:I think you are trying to do too much all at once. Look at the length of your method names. Line 81, “for example”, and that isn't even the longest. What if you want twenty book titles, or you add “SciFi” as a book category? Why are you using so many methods with similar names when you could pass the category as an argument? Why are you using Strings for categories when you could create an enum?



Tbh, I was wondering how can I choose proper implementation for given category, but that is what I invented. It is hard for me to invent new solution as my methods for categories looks like this: https://github.com/must1/BookstoreScraper/blob/master/src/main/java/bookstore/scraper/book/scrapingtypeservice/CategorizedBookService.java I'm using two different bookstores and need to pass two different URL's based on given category. Is there any way to reformat it? I will create ENUM for categories for sure, but that's what I know how to do.

Campbell Ritchie wrote:

I think you should divide that task into small parts. Start by creating a book title class with some sort of equals() method with a proper algorithm to compare titles.

Later: Please explain what you are doing with all the Lists and Maps. Have you got the right data structures?



I have book class whcih contains title, you think I should create another entity class?
I think, I've got right data structures. In list I've got titles I want to get Map with titles and occurrences.
3 weeks ago