Forums:

A more idiomatic (Kotlin) way of creating an inverted index

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I've been working on one of the Kotlin projects at hyperskill.org, Simple Search Engine, and the second to last stage has a bit of a challenging requirement to create an inverted index. Here's what that's about:

Given a list of strings that represent entries in a phone book (one entry per line)

You're supposed to create an inverted index, basically a Map<String, List<Int>> where the key are tokens extracted from the entries, i.e. each first name, last name, and email is a separate token, and the values are lists of indices of entries that contain that token.

Here's an example of my program running with the above input:
=== Menu === 1. Find a person 2. Print all people 3. Print index 0. Exit 2 === 6 people indexed === Dwight Joseph djo@gmail.com Rene Webb webb@gmail.com Katie Jacobs Erick Harrington harrington@gmail.com Myrtle Medina Erick Burgess === Menu === 1. Find a person 2. Print all people 3. Print index 0. Exit 3 dwight=[0] joseph=[0] djo@gmail.com=[0] rene=[1] webb=[1] webb@gmail.com=[1] katie=[2] jacobs=[2] erick=[3, 5] harrington=[3] harrington@gmail.com=[3] myrtle=[4] medina=[4] burgess=[5]

(continued...)

JUnit 4 (verified skill)
Skill verified by Jeanne Boyarsky

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's the code I came up with to create the inverted index:

This works and it's about as clean as I can get it with what I know (or can remember) for now. I'm just wondering if there's a better and maybe more idiomatic way to build the inverted index. I'm looking at some of the suggestions on SO (https://stackoverflow.com/questions/54232530/merge-values-in-map-kotlin) and a few look promising but right now I'm leaning towards taking a swing at this with groupingBy() and fold().

Would love to see what others might come up with.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's mine:

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's the incantation I came up with after quite a bit of fiddling around and Googling:

The above works (at least it passes the tests and as far as I can tell, it generates the correct index)

Mike, your solution works, too, according to the tests.

Got a few things to digest and fully understand. One thing that's kind of driving me nuts right now is figuring out how to format these darn chains.

Keep 'em coming, please!

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's my refactoring of Mike's code (just renamed and reindented):

Here's another refactoring using destructuring of IndexedValue

and actually, for consistency... (trying to put method to the madness)

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

It seems to me that Mike's and my versions vary only in some details (such as using vs not using Sequence). The verbs may be different but are essentially just different ways of saying the same thing. In the end, the results are the same. I'm kind of liking how Mike's version reads though.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Yeah, I've reformatted a bunch of different ways, and haven't settled on the "right" style really. But in functional and stream-oriented programming I tend to favor formatting long method chains like this:

It's weird at first, but quickly becomes very readable, I think. Of course, once those method calls then require multiline blocks in their argument lists, well then there are many more formatting questions to resolve.

Your destructuring of the IndexedValues is a nice improvement - I'd forgotten about that.

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Here's what I guess I'm starting to settle on around formatting and indentation:

Starting at line 2, people.withIndex().flatMap is the top level structure, kind of like an outer for-loop. The iteration parameters, (index, person) are placed on the same line.

Lines 3-5 are indented one level because it's essentially the body of the "for-loop"

Line 3 has a nested structure, person.trim().toLowerCase().split(" ") and the .map starts another iteration. Again, the iteration parameter, word is placed on the same line.

Line 4 is indented another level because it's the "body" of the "nested loop" or iteration that starts on line 3.

On line 6, I keep the final operations that apply to the result of the "outer loop" started on line 2 on the same line as the close } -- I might drop it down to the next line if there are more operations in the chain and format those as Mike showed.

Given that there's a bit of a paradigm shift in going from imperative to functional, I don't know if the above indentation approach is helpful in making that shift or if in fact it's keeping me in a weird no-man's-land between the two paradigms. I know it helps me understand what's going on better for now because my brain still works very much in terms of for-loops and if-then-else statements. Forcing your brain to think differently is fun but challenging as well.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I feel like there's no need to hold on to the old notion of what a "loop" looks like, especially given that lambdas can often be replaced my method references - if

can be replaced with

I'm not going to make that

and, given the above, I see nothing at all wrong with this either:

Though I haven't really decided how much space, if any, I need around curly braces. Depends on my mood.

Here's another formatting of the code that I like:

Yes, this approach to method chain formatting may seem a bit extreme - but I find it very clean and consistent. Code takes up more lines, but less width, which is neither here nor there. And extracting a method is a nice way to introduce a method name to summarize what's going on in an important intermediate step.

Piet Souris

Bartender

Posts: 5465

212

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

My style is like loopings indeed. I don't speak Kotlin (never will, I'd rather learn a completely different language like Cobol). Taking into account that java lacks these 'indexed' things, I got this:

where my Person class has the method

and Pair is the JavaFX Pair class.

There are three kinds of actuaries: those who can count, and those who can't.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

For what it's worth, those IndexedValue classes are just another type of Pair, where the members are called index and value.

Piet Souris

Bartender

Posts: 5465

212

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Thanks for the info.

There are three kinds of actuaries: those who can count, and those who can't.

Stephan van Hulst

Saloon Keeper

Posts: 15525

364

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I prefer to put each chained method call on a separate line when writing functional code in Java. This is hard to marry with the use of lambda expressions, because their syntax kind of messes up the formatting. This is how I'd format your code Junilu.

There is no hard and fast rule to the formatting. I use something different every time I write a piece of functional code. It depends on what looks good for that particular code snippet.

Lambdas are the culprit. Thankfully, Kotlin has local functions. Use lambdas only for one-liners. If your expressions become more complex, use a local function:

Disclaimer: I don't know Kotlin. The code above might contain syntax errors.

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

I have to admit that I've gotten a little bit Perl-ish with Kotlin, wanting to keep the code down to the fewest possible lines and still have the clarity of expression the functional-style is supposed to give. There are just so many ways to do the same thing though and I'm still trying to figure out what's idiomatic. Then again, there may not be one idiomatic way and maybe that's the point, too. Take the operator overload function for matrix multiplication below. Matrix is a class I've defined; this function overloads the * operator for it so I can just write simpler high-level code:

The nested for-loop on lines 5-9 can be rewritten in at least two ways:

and

That's just the imperative style way of doing it. I'm sure there's a more functional way but I just haven't found the right incantation yet.

Line 14 shows another trick I've found to cut down one line of code using destructuring. If you use a Triple() this way, you can reduce 3 lines down to one. Then there's Line 2, which is the guard clause. If this were Java code, I'd normally make that take up 3 lines, like so:

But then this slowly building Perl-ish desire for brevity pushes me into making it a one-liner in Kotlin.

Stephan van Hulst

Saloon Keeper

Posts: 15525

364

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Junilu, if you really want to challenge yourself to write purely functional, you must keep the following rule in mind: You may only use the assignment operator when declaring a variable. Statements like the one on line 7 are out of the question.

Junilu Lacar

Sheriff

Posts: 17644

300

I like...

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Stephan van Hulst wrote:You may only use the assignment operator when declaring a variable. Statements like the one on line 7 are out of the question.

Thanks, Stephan. It feels a bit uncomfortable though, the thought of replicating an entire matrix just to change one element. Thoughts of premature optimization probably but the feeling is undeniable and hard not to listen to. I'm sure it's just a phase that I'll (hopefully) get past.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

Hm, I feel like a matrix is a really good example where insisting on immutability everywhere is likely to lead to very very bad performance, for precisely the reason you cite, Junilu. Which may make it an interesting exercise as a learning challenge, but let's not be surprised too much if it doesn't work well. That said though, it may work better if you focus on bulk operations, e.g. adding and multiplying entire matrices, rather than a new matrix every time you set one single element. Or even, is there a way to build a list of such operations, and then process them all one after another? Similarly to how we build Streams that don't execute anything until a terminal operation is called.

Mike Simmons

Master Rancher

Posts: 4830

posted 3 years ago

Number of slices to send:

Optional 'thank-you' note:

Send

And now I see that's the direction you seem to be heading in this thread: https://coderanch.com/t/729490/languages/Designing-immutability-Matrix-class-Kotlin

Cool.