• Post Reply Bookmark Topic Watch Topic
  • New Topic

Auto complete implementation using primitives only  RSS feed

 
Rafael Rodrigez
Greenhorn
Posts: 7
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I saw a question from an interview (without a solution) that asks to implement a data structure that will allow perform an auto complete of words. I.E., say that I've already inserted the following words into my data structure: "hello", "her", "ht", "ab".                          
And now, I wish to get an auto complete options for the word "he". In this case, the following words will be printed to the screen: "hello", "her" .                         
                        
I've already found an answer for this here, but the problem is that I'm allowed to use only primitives (besides non-primitives parameters and fields that exist in the skeleton class for this problem, which I'll show now). This is the skeleton class that I got for solving this problem:              
                                     
                    
                          
I succeed in solving this using the code below, but as you can see- I used non-primitives in some places. Do you know how to fix this code so it will have the same functionality, but without using primitives (besides non-primitives parameters and fields that exist in the skeleton, which are allowed to use)?          
                                  



[Moderator edit: awarding additional cow for posting an interesting and fun problem, as well as posting well formatted code.]
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The question is probably meant to see if candidates know how to implement a Red-Black Tree. You are using an ArrayList so that immediately disqualifies you.

Do you have any background in algorithms and data structures? That's a subject that most Computer Science students should be familiar with. If you don't have that kind of background, then you were at a disadvantage. If you do have a CS background, then either it wasn't a very good program or you were absent when they taught that stuff.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
On second thought, I'm not sure a red-black tree would be that great at implementing autocorrect suggestions. I'm thinking maybe a hash table or map with the key as a Soundex code and the value as a list of words that have that same Soundex code. You'd have to implement your own list data structure and map, and you'd still want some kind of balanced tree to do quick searches of the Soundex code. That actually sounds like an interesting exercise. Of course, in the context of an interview, you have to consider how much time you are given to come up with a solution. You could start simple at first, then maybe discuss improvements when you get to the next round in the interview process.
 
Rafael Rodrigez
Greenhorn
Posts: 7
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:On second thought, I'm not sure a red-black tree would be that great at implementing autocorrect suggestions. I'm thinking maybe a hash table or map with the key as a Soundex code and the value as a list of words that have that same Soundex code. You'd have to implement your own list data structure and map, and you'd still want some kind of balanced tree to do quick searches of the Soundex code. That actually sounds like an interesting exercise. Of course, in the context of an interview, you have to consider how much time you are given to come up with a solution. You could start simple at first, then maybe discuss improvements when you get to the next round in the interview process.


I'm not allowed to implement any data structure or a map. When it comes to variables, I'm allowed to use only primitives (char, short, int,....) .
I read your suggestions, including what you wrote about Red-Black Tree, and I still don't see how to implement what I want with primitives only.                   
Do you see a way for implementation, and can explain how do you implement it?
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael Rodriguez wrote:I'm not allowed to implement any data structure or a map. When it comes to variables, I'm allowed to use only primitives (char, short, int,....) .
I read your suggestions, including what you wrote about Red-Black Tree, and I still don't see how to implement what I want with primitives only. Do you see a way for implementation, and can explain how do you implement it?

The Maps, Trees, etc. that are in the standard Java API did not create themselves or come into existence out of thin air. *Somebody* had to write those classes from scratch using only primitives and what they know about how those high-level abstract data types should work. If you want to see how it's done, try to Google for terms like "source code for java.util.Map implementations" and "source code for java.util.Tree implementations".

Or you can try, "how to write my own implementation of a Red-Black Tree"
 
Piet Souris
Master Rancher
Posts: 2043
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But must you go that distance? If I look at OP's first (given) code:

That does not seem all too heavy.

For instance, suppose you have the words 'he' and 'her'. Now, the first 'Tree' would contain 'h' as its char. Then we have this Tree[26]  array, of which we have only one element filled: a Tree with char 'e'.
Now, one of the two words is finished, so my suggestion would be to either have this Tree[26] as Treee[27], with one element filled with, say, '\0', to indicate that a word is finished here, while we also have a Tree element containg the char 'r'. This way, you would indeed use only 'primitives'.

The top level contains a Tree[26] array, to get it all going. If you have typed one or more characters, then from the point in the relevant Tree you can construct all possible words. Maybe, for efficiency reasons, you can add a String to the Tree, containing all the characters we have so far. Is that more or less what OP already made?

Never made a thing like that myself, but it sounds not too complicated and it looks like what is given. But maybe I completely misunderstood.

A simple alternative is to have a Map<Character, List<String>>, and whenever a char is given, retrieve the relevant list. For every next char, you just filter that list with 'startsWith'. But that does not look at all like what's been given.

Interestingl Hope to hear.

 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:But must you go that distance? If I look at OP's first (given) code:

That does not seem all too heavy.

Serves me right for not reading carefully. I thought that was OP's attempt, not what he was given as a starting point. Thanks for pointing that out.

In that case, it looks like they wanted to see an implementation of a recursive structure.
so my suggestion would be to either have this Tree[26] as Treee[27], with one element filled with, say, '\0', to indicate that a word is finished here,

I don't think that's necessary. The end of a word would be when you get to a leaf node.  Does your suggestion have anything to do with the comment about assuming that words will contain only small English letters? Because I think that actually means that it's OK to assume that the input only contains lowercase letters.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:A simple alternative is to have a Map<Character, List<String>>

OP said he was specifically not allowed to do that. I take "primitives" to mean primitive data types and arrays. OP would have to create his own abstract data type like Map or List or, in this case, Tree.
 
Piet Souris
Master Rancher
Posts: 2043
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:
I don't think that's necessary. The end of a word would be when you get to a leaf node.  Does your suggestion have anything to do with the comment about assuming that words will contain only small English letters? Because I think that actually means that it's OK to assume that the input only contains lowercase letters.

My first assumption is that we are indeed dealing with lowercase English characters, otherwise a Tree[26] would be by far insufficient. My 27th element is meant to discriminate the word 'he' from 'her'. In that case 'he' would not be a leaf. But maybe a dedicated boolean for this situation is handier. As said, I never made such a structure, and I am only ventliating some ideas, that may (or may not) be handy!

Junilu wrote:OP said he was specifically not allowed to do that. I take "primitives" to mean primitive data types and arrays. OP would have to create his own abstract data type like Map or List or, in this case, Tree.

Agreed. I only mentioned it as a much simpler structure than what OP was given. But since it is irrelevant in this context, I might better not have said it


[Moderator edit: Cow awarded for foresight in recognizing condition where a bug might arise. Good thinking!]
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It helps if you try to visualize what your object graph will look like. Here's my interpretation of what the tree that contains the words "hello" and "her" looks like:


Something else you might find useful: char variables are integer types so you can do math on them. That means that if you subtract 'a' from 'z', you get 25. That is:

Do you see how this might be useful when you have an array that can hold 26 things in it?
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:It helps if you try to visualize what your object graph will look like.

Actually, when I look at that visual representation of the Tree, I suddenly realize how silly and heavy-handed my previous ideas were about a Red-Black Tree, Maps, Lists and other things that might be needed. This Tree structure is all you need to hold all the unique words you can imagine. No matter how much experience and practice one gets in keeping things simple, the brain still tends to make things more complicated than they need to be <facepalm>, especially if you don't fully understand what needs to be done.
 
Rafael Rodrigez
Greenhorn
Posts: 7
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar wrote:It helps if you try to visualize what your object graph will look like. Here's my interpretation of what the tree that contains the words "hello" and "her" looks like:


Something else you might find useful: char variables are integer types so you can do math on them. That means that if you subtract 'a' from 'z', you get 25. That is:

Do you see how this might be useful when you have an array that can hold 26 things in it?


Yes, and if you take a look at my implementation, you will see that I implemented the Tree class just like you suggested.
The problem is that I don't know how to implement this without using non-primitives (besides non-primitives that were parameters as I wrote in my post).
I.E. , I can use only char, short, int etc.
I can't use Object, Array, String or any other non-primitive.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It can be done, because I just did it. I have to admit, it was a little more tricky than I thought it would be.  One thing that will get in your way of thinking through the problem and the solution are poorly chosen names.  I find the names that you use like "printFromNode" and "TreeArray" typical in that they include a word that give details about the implementation--"Node" and "Array"--but don't do much to help you think about the problem at a more abstract level.

Here's my main:

And here's the outline of my Tree class:

I put comments to show where the start and end lines are in my IDE to give you an idea of how long this class is. So I guess it's about 92 lines of code long, with the longest method being the printAllWordsStartingWith(String prefix) from line 84-96 or about 12 lines of code, including white space and lines with only closing braces on them. I only used the Tree array itself which I call "branches" and the String.substring() method and just what you see in those convenience methods. The rest of the code is all high-level abstract. For example, I have this:

and this:
I've elided that last return statement so I don't give too much away and give you a chance to work it out yourself.

This shows how it's helpful to try to keep your code as high level as possible by pushing down details as deep into private methods as you can. It allows you to make your program more expressive and therefore easier to understand and break down into small logical parts. I started out trying to write a quick and dirty solution with a lot of detailed calculations mixed together but that just confused the heck out of me and I couldn't figure out how to express the algorithm. 

However, once I started extracting detailed calculations into small convenience methods, the high-level code became more expressive and I finally understood what I needed to tell the computer to do. This is the practice of writing expressive code by choosing good names and getting a Single Level of Abstraction Principle in action.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The practice of ruthless refactoring can become quite a little obsession. I just made this change:

I just renamed the parameter from startOfWord to firstPartOfWord. I thought that startOfWord could be misconstrued to mean "the first letter".

Always try to find the right word(s) to use so that your program "does what it says and says what it does". This can be challenging at times but it can be very rewarding in terms of giving clarity to your program and not having code that misleads you and leads to bugs.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
@OP, you wrote:

That looks correct to me. However, compare it with how I wrote it:

I didn't look at your code when I wrote this but we seem to have arrived at identical solutions. However, the names that I use are much more expressive and it's easy for me to walk through my code and verify that it's doing what I want it to do. The passing tests are just additional confirmation that it works as intended. It's a little more difficult to cut through the noise of your code and get to the intent of it.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
This code can actually be refactored directly into what I wrote.

FAIR WARNING: this is a nice example that demonstrates the refactoring process but it is VERY LONG. Buckle up and get some coffee before you proceed! (EDIT: Also, there may be a few places that are slightly off. When I wrote this, it was just off the top of my head. When I actually walked through it just now, there were a few slight adjustments I had to make. I will edit this a few more times to correct whatever mistakes there are but I think I captured the main motivations and the general flow of the refactoring about right)

Starting with your original version, I'm going to apply a series of refactorings to get the version that I wrote:

Most of the code in this method is nested in that outer if statement. If you introduce a guard clause, you can eliminate one level of nesting:

Now I want to eliminate that len variable because it's used in only one place. So I will inline the value in the guard clause.

The guard clause is not very expressive. I could replace it with an equivalent expression that's expressed in terms of String values instead of int values:

I want to make that cryptic formula for index on line 18 more expressive. I also see that the expression word.charAt(0) is used in a couple of places. We can express that better, so we extract it to a method:

Ok, now we're getting somewhere. There's a lot of noise created by this.TreeArray[index]. That will be next on the refactoring block.  Let's finish refactoring line 18 first. We extract the entire right side of the assignment statement to its own method:

I also deleted the comment at the end of the line because the refactoring for expressiveness made it redundant.

Now, this index variable is only used in the expression this.TreeArray[index]. This must represent a single idea then. I first rename the field called TreeArray to just branches. First, this makes it conform to the standard Java naming convention for field names. I don't want any name that includes "Array" because that's an implementation detail. I want to stay high level.  I also don't want to use "tree" or "trees" because that can get confusing given that our class is called Tree (I tried those and it got confusing). So, we settle on the name "branches" for the array.

Now I can clearly see that this.branches[index] refers to the branch of the Tree along which we want to insert the given word. EDIT: So, I'm going to extract that to a method and leave more expressive code in its place. I do this in a couple of steps. First, introduce a temporary variable: The assignment statement on line 20 prevents us from extracting this right away. So, we first introduce a temporary variable:

Notice what I did on line 22. I put the new temporary branch variable to the left of the original assignment. This creates a multiple assignment statement (I think that's what it's called) and keeps the original behavior intact.  I can now extract the rightmost assignment statement into its own method:

You may notice a slight change to the Tree constructor call on line 28. I didn't see a need to pass anything in to the constructor so I just got rid of the argument.

Now I want to get rid of the index variable. So again, I will inline the formula and get rid of the declaration for index:

I still see a lot of implementation details being leaked with these references to "index", so I change it up a little bit so that the code only talks in terms of Strings and parts of Strings. I change the signature of the newBranchFor() method to accept a char value instead and adjust the calling code accordingly:

Now I can create a branchFor() method that's similar to the newBranchFor() method and replace the right side of the assignment statement on line 17.


Almost done! This looks a lot more readable than what we started with.

Line 22 is not very expressive so we extract that formula to a new method:

That's what I current have in my insert(String word) method.  If you want to go a step further, you can extract that first line:

I tried that but I talked myself out of it because a method called "isEmpty" in a Tree class is more likely to be expected to pertain to the Tree rather than to a String value. I didn't want to potentially confuse myself again, so I kept the "".equals(word) as it is. I didn't think it was too bad of a formula to keep around with the other high-level code.

Note: Out of curiosity, I dug around the String API a little bit more and was pleasantly surprised to see that since Java 6, String has an isEmpty() method, so that code can actually be refactored to use that method instead:

This is a little more expressive but since this is a public method and I have no real control over what gets passed, there's still a chance that this code will generate an NullPointerException if word is null. I still think that "".equals(word) is a safer bet because you're guaranteed to NOT get a NPE with it even though it's not as expressive as the alternatives.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:
Now, one of the two words is finished, so my suggestion would be to either have this Tree[26] as Treee[27], with one element filled with, say, '\0', to indicate that a word is finished here, while we also have a Tree element containg the char 'r'.
...
Interestingl Hope to hear.

Your foresight continues to amaze me, Piet. I just found a bug in my code that reminded me of this comment that I had previously dismissed as unnecessary. Boy, was I wrong. The bug I found causes words to lose their status as entries in the dictionary and they basically disappear when a longer word that starts with that entire shorter word is added to the dictionary. There needs to be a way to flag that condition in the Tree although at this junction I'm thinking of a boolean.

Will post an update when I fix the bug.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Found it. Fixed it. Again, clean code saves the day!

EDIT: Wow, I didn't think it was over 12 minutes between me posting that I'd fix the bug and posting that I was done. Time sure flies when you're having fun.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The bug can be reproduced with this test:
1. Add "ham" then "hamburger"
2. Print the words in the Tree.
3. Result should have both "ham" and "hamburger" (FAILED - only printed "hamburger")

Same case with "hand" and "handle", "base" and "baseball".

I'm pretty sure OP's code has the same bug, given that his code is functionally equivalent to the buggy version that I had. I had to fix the code in two places: in the insert(String word) method and in the recursive method to print the words. I also added a boolean endOfWord field that I set on insert and checked when printing words.
 
Piet Souris
Master Rancher
Posts: 2043
75
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael Rodrigez wrote:(...)
I can't use Object, Array, String or any other non-primitive.

hi Rafael,

in one of my two replies i wondered whether you had indeed implemented this model in the way Junilu describes. I had indeed the impression. Have you testend your implementation? If so, did it work?

And I think you are way too harsh for yourself. If I look at the given template (see your opening post), then I see a Tree[], I see a String as parameter, so the only thing a little doubtful is the use of an ArrayList. Well, that counts for me as 'more than enough primitives'! So, well done as far as I'm concerned. Have a cow!

@Junilu
in my scond reply I suggested to use a boolean to indicate the end of a word, so I fully agree. And this is how we know you: a complete refactoring. I will grant you a cow as well tomorrow, since I reached my max for today.

But overlooking the whole construct: if you see how much trouble it takes to retrieve all the possible words, given some beginning of a String, wading through all those 26-sized arrays, maybe using a StringBuilder to construct all those words from their individual chars. brrr. I just looked at it now, so I have to check how Rafael and you manage to get all these words, I am very curious.

That brings me to my earlier suggestion and that is to 'store' the complete word in every Tree that is marked with the boolean 'isComplete'. That would save a lot of work, and it certainly would defeat the whole purpose of this construct! I would love to see the face of that interviewer, when one would do such a suggestion  

But again: interesting stuff! I'll be back later with some more comment.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Piet Souris wrote:
That brings me to my earlier suggestion and that is to 'store' the complete word in every Tree that is marked with the boolean 'isComplete'.

I went with endOfWord for the name and changed the other private boolean method to hasNoWordsThatAreLonger() so I could write:

Both of these checks are needed in the method that recursively prints all the words that match a given prefix. Once set to true, the endOfWord field remains set for the duration of the program. The hasNoWordsThatAreLonger() is easier to implement as a method although I suppose if you really want to optimize for time, you could make it a field as well. The flipside to that would be an additional boolean field in each object. I think I'll stick with calculating it each time it's called even though once it's true, it will always return true, assuming you can never remove a word from the Tree.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rafael Rodrigez wrote:
The problem is that I don't know how to implement this without using non-primitives (besides non-primitives that were parameters as I wrote in my post).
I.E. , I can use only char, short, int etc.
I can't use Object, Array, String or any other non-primitive.

I think that last part is going a bit far, IMO. Not allowing the use of String makes it necessary to write you own utility classes to represent long sequences of characters. That's ridiculous. Like I said, I was able to implement this with just an array of Tree[], char, int, and String variables and parameters. I don't see how you can't use Object or arrays without making it unnecessarily difficult for the average programmer to come up with a decent solution. I used a few methods from the standard String class and, of course, I passed around String objects to various methods in the Tree class.  If the interviewer won't accept a solution like that, then I would challenge them to come up with as well-factored code that I came up with.
 
Junilu Lacar
Sheriff
Posts: 11477
180
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here are some of the changes I made to fix that bug:

The reason I created a separate private add(String) method was so that I could separate the recursive logic from the logic of validating input values coming from an unknown external client. The checks on lines 10 and 17 may seem redundant to the casual observer but if you follow the flow of recursion, the check on line 17 is actually the check for the degenerate case in the recursion. The check on line 10 has an entirely different intent/motivation from the seemingly duplicated code on line 17. 

I have happily found another good example that shows how Don't Repeat Yourself (DRY) is not about a mechanical, character-by-character comparison of code but rather it's about eliminating duplication of knowledge/intent/idea. So, even though the two checks in question have exactly the same code, they are not really duplicated or redundant.
 
Rafael Rodrigez
Greenhorn
Posts: 7
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Junilu Lacar,

Tnx for your help. Can you please write a post with all the code you wrote?
It will be easier to follow.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!