A friendly place for programming greenhorns!
Code For Review: Find Nth Rarest Element
(0 likes, 3 cows)
Hi folks,

I sat a technical test yesterday for a tech company.  I can't divulge the company's name but I did take note of the Java question I was asked to complete.  My solution passed only one of three test cases and failed the performance test.  If the people on here don't mind, I'd like to post my solution and get some feedback.  Efficient algorithms and array parsing has never been my strong suit - something I'm looking to rectify.  My code has comments to explain my thinking.

The problem: presented with an integer array, write a method that will find the nth rarest element in the array.  For example, consider an integer array [3, 2, 3, 2, 3, 1].  The second rarest element in the array is 2 as it occurs just twice.

And what the answer supposed to be in your given example? 1 or 5 because both are 4th rarest.

However, your code gives answer 4, I'd say 4 isn't rarest, but rather most occurred one.

4 - four occurrences
3 - three occurrences
2 - two occurrences
1 and 5 - by 1 occurrence

Ok, I see now. I misinterpreted. It seems your logic for identifying rare one is correct, ok, so we understand rules now.
Yes, sorry, I should have clarified what was being sought there.

Just looking at my code, the thing that jumps out at me is that it's very long.  There must be a more efficient and creative way of doing this without writing so much code.
(1 like)
Here's shorter version I just came up, but I believe it can be shortened further by omitting list of occurrences and apparently the part about optional answer by dealing only with sortedMap. Plus you can build initial map with streams. Challenge for you as I didn't come up in fast.

Or somebody else I'm sure will do so. I'm not that yet proficient with streams.

(1 like)
For instance

Piet Souris wrote:

I absolutely love that statement, Can you explain how the collect and grouping work ? I am really curious to learn that!
hi Salvin,

took me more than a year to get such a statement in less than one hour trying... but practice and a lot of patience (and ugly words) were my friend...

Arrays.stream(array) makes a stream from a primitive array, which is here an array of ints
But we're dealing with primitives here, while the Map has Integer as key, so with .boxed I transform the ints to Integers.

Then we collect the result: see the API of Stream.collect(), and especially the API of the Collectors class. First we specify that we want a grouping, in order to make the Map. Next we specify what keys we want to have, in this case the keys are the boxed Integers themselves, so I use the identity Function i -> i, and lastly we specify what exactly do we want to report about the keys. Here we want to count them, and so we get Collectors.counting() (mind you, that gives us a Long!), and there we have our frequency table.

Edit: this is such a case where a couple of static imports can save a lot of typing!
Thanks for that explanation Piet, keep up the good work.

Do let me know if my understanding is correct :

The collect method requires a Collector. The Collectors class has a few utility methods to return a Collector. One such method is the groupingBy.
The groupingBy method has a couple of variants. Here, we are using a classifier, downStream variant.
The "classifier" is a Function.In this example, we can use Function.identity() or simply i-> i
The "downStream" (I don't like this name) is a Collector. The groupingBy internally uses this Collector to collect the count of occurrences.
(1 like)
Spot on!

Indeed one of the hurdles to take is the namegiving in the API's. Mind you, the downstream Collector is a Collector, and Collectors.counting() is such a Collector. Very interesting is also the Collector Collectors.mapping() that you can also use. For instance, we have now a Map<Integer, Long>, being the frequency table. But suppose we want a grouping by the frequencies, putting keys with equal frequency into a List? How about:

where 'frequencies(list)' is a method that makes, as before, a frequency map from the List. But as said, it took me a lot of trouble to get a little aquainted with this stuff.

Edit: I did use a static impot here!
(1 like)
Thanks Piet.

I hope OP won't mind that I'll throw one or two questions to front row.

Alrigth, my initial idea was to have something like this, so I got details in agreement now after stealing groupBy part from Piet (it didn't work when I tried, because instead Long I had Integer):
So, my sortedMap now contains {1=1, 5=1, 2=2, 3=3, 4=4}. What else I wanted to do (before I went dirty way), I wanted to remove duplicates by values and leave with the very first "key" which was part of duplicate series.

Is that even possible? So after I do some XYZ thing, I'd end up with {1=1, 2=2, 3=3, 4=4}. Prefer not to anything else apart from sortedMap. Is there a way you could think of?
(1 like)
I see what you mean. No, I did not find yet a solution with LinkedHashMap, but I made something likewise, namely if two Integers have the same frequency, then the order will be the order in which they appear in the array. For that I had to introduce a List. Getting more and more complex, and further away from the intention of the exercise! Hmm... time to stop.
Some great code there, Liutauras and Piet.  Thanks very much.  I'm finding much of it difficult to follow as I'm not anywhere near proficient on streams or lambdas but it's given me something to work on.
(1 like)
You're welcome! We are a bit hobbying, but while doing so one silently hopes that it all has some educational value, and for us it is also some experimenting and learning as well. Topics like this one are always much fun! Thanks for starting and enjoy a cow!
(1 like)

Simon Ritchie wrote:I'm finding much of it difficult to follow as I'm not anywhere near proficient on streams or lambdas but it's given me something to work on.

To be honest to me it also looked very confusing and complicated some time (not much) ago. By no means I'm proficient either.

To get started, I could recommend couple of books, well, phrase to get started maybe not correct, authors give an opportunity not just to start but reach some advanced level.

1. Functional Programming in Java by Venkat Subramaniam
2. Functional Programming in Java by Pierre-Yves Saumont Saumont

Note: my defined ordering has no significance.
Liutauras, can I ask specifically what's going on with this line of code from your first example?

Liutauras Vilda wrote:

I was looking at the Collectors API for the toMap() method but I can't see an example there that takes four parameters.  My understanding of this line of code is that it's saying something like:

"Get every key and value in this Map, then map the key value onto a parameter called k, then return the results in a LinkedHashMap object"

Simon Ritchie wrote:I was looking at the Collectors API for the toMap() method but I can't see an example there that takes four parameters.

it is one before the last one. It goes before function toSet() in your given link.

"Get every key and value in this Map, then map the key value onto a parameter called k, then return the results in a LinkedHashMap object"

Idea is that, but you shifted I think everything to the right by one step. Mapping happens on very first 2 parameters, so it takes key from Entry and maps to key, then takes value from the Entry and maps to value, then the bit (k, v) -> k, ... is a function (mergeFunction), solving collisions of possible duplicated keys. Meaning, if there is such existing key with some value X already, replace the value (most recent) and assign it to key, or remove in case there is null. Read more about that bit here (nice explanation). And the last bit yes - just adds the result to a newly created LinkedHashMap. Linked one that's because trying to solve your exercise I wanted to preserve insertion order after I sorted numbers by their occurrence frequency. And then I stuck on the last bit, how to remove keys whose frequencies are possibly duplicated (in your given data it was a case {1=1, 5=1}) with some other numbers. To remove duplicates I wanted for the reason to be able to identify Nth rarest occurrence. Well, but stuck But Piet is right, these kind of threads as yours are good training material to learn something new.
I took your various solutions, along with my non-Java8 version, and I'm getting inconsistent results. The result should have been 2 or 4.

Carey Brown wrote:I took your various solutions, along with my non-Java8 version, and I'm getting inconsistent results. ...

There's a tiny bug in the code...

Fixed code:
With that patch I now get:
Why '7'? 7 has a count of 4 and is the third rarest. (unless I'm misunderstanding the requirements.)

1 and 5 are the first
3 and 6 are the second
7 is the third
2 and 4 are the fourth
I know that my initial code has a subtle bug , knew that on the same day, and waited till somebody spots that. It gave correct answer because of the OP’s input. Because I retrieve nth frequency (value) instead of the key. You can verify that by looking to my answer, which is the frequency of 4.

Sould be fairly easy fix. Will fix once pc on.

But Piet, how come you got it wrong?

Liutauras Vilda wrote:Because I retrieve nth frequency (value) instead of the key. You can verify that by looking to my answer, which is the frequency of 4.

The problem: presented with an integer array, write a method that will find the nth rarest element in the array.  For example, consider an integer array [3, 2, 3, 2, 3, 1].  The second rarest element in the array is 2 as it occurs just twice.

He says "element" not frequency. I take that to mean the key not the value.
I start counting from 0, as is not uncommon in computerland, and so I return 7 when n = 4.

But guys, this topic is one of those topics where the journey is much more interesting than the destination, let's not spoil it over some interpretation differences or minor bugs.

Carey Brown wrote:He says "element" not frequency. I take that to mean the key not the value.

Yes, this is what I mean. While mine initial code retrieved nth frequency, which was the same with OP's input, because there were, 1, 2, 2, 3, 3, 3, .... And then I spotted that bug on the same day, but didn't tell anybody anything yet, now that you said, I had no other choice just to admit
It looks more of an interpretation issue, I rarely ever see a spec which says get the nth rarest item. My code does a occurrence search. This means that it would search the item occurring the nth time. (I recall even calling it occurrence).
One thing we all got correct (I think so) was the occurrence map. Here's mine

To get the 4th rarest, I guess it would have to be sorted as per occurrence rate ascending:
• 1=1,      -> 1st (most) rarest
• 5=1,      -> 2nd rarest
• 3=3,      -> 3rd rarest
• 6=3,      -> 4th rarest
• 7=4,      -> 5th rarest
• 2=5,      -> 6th rarest
• 4=5      -> 7th (least) rarest

• This brings me to the answer either 3 or 6  Am I right ?

Here's my fixed version.

Answer is 2 because have findFirst() on line 20.

salvin francis wrote:To get the 4th rarest, I guess it would have to be sorted as per occurrence rate ascending:
1=1,      -> 1st (most) rarest
5=1,      -> 2nd rarest
3=3,      -> 3rd rarest
6=3,      -> 4th rarest
7=4,      -> 5th rarest
2=5,      -> 6th rarest
4=5      -> 7th (least) rarest

Nope. How you prioritize same frequency value? Which one is more rare than the other - they are indeed the same. That means, if 2 numbers have same frequency, they count as they were same Nth rare.

Addition: In which case, on Carey's given input we supposed to provide also 2 answers, because there are 2 Nth numbers with same frequency.
In which case the modification of my code could look like:

Answer printed is 2 and 4.

salvin francis wrote:There's a tiny bug in the code...

salvin, I like how we programmers are so protective about our code. One may think there is a big bug, but no! Programmer said there is a tiny bug, that means tiny and no further discussions

Liutauras Vilda wrote:salvin, I like how we programmers are so protective about our code.

It's a skillset one earns after countless debates with QA Some bugs are bugs. Some bugs are undiscovered surprise features.
Congratulations Simon Ritchie,