Win a copy of TensorFlow 2.0 in Action this week in the Artificial Intelligence and Machine Learning forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Paul Clapham
  • Bear Bibeault
  • Jeanne Boyarsky
Sheriffs:
  • Ron McLeod
  • Tim Cooke
  • Devaka Cooray
Saloon Keepers:
  • Tim Moores
  • Tim Holloway
  • Jj Roberts
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • salvin francis
  • Scott Selikoff
  • fred rosenberger

Better code for interview question list duplicate characters in a String

 
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I was asked in an interview to write a program to list the duplicate characters in a String. I wrote the below code but the interviewer said that this way is not the optimal way of doing it .



If this is not the optimal way of doing it then what would be the optimal way of doing this ? Thanks
 
Marshal
Posts: 70689
288
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you submitted code that badly formatted for an interview question, you are lucky that they even read it
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:If you submitted code that badly formatted for an interview question, you are lucky that they even read it



Thanks.I had this in Eclipse and then the formatting got changed when I pasted.Sorry for that .Below is the formatted code:

 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Line 5: Why did you call your class DuplicateRemoval? Nobody said anything about removal.
Line 7: Why didn't you use <>?
Line 9: Why are you using for rather than for‑each?
Lines 13‑15: There are several ways to do this, including myMap.put(c, myMap.get(c) + 1);.
Lines 21‑22: Serious error not specifying actual type parameters for the Map parameter and the Iterator local variable. Then you don't need the cast in line 25, which is horribly error‑prone. Consider using var in line 22; you might be able to infer its type throughout.
Line 27: Why are you calling remove()?

It is possible to do what you want in two statements, but you will need some declarations as well.
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks.
What exactly is the issue with line 7.I am using a generic Map.
 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Let me remind you that the issue is on line 21; I shall let you work out what is wrong for yourself.
 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, I misunderstood your last question. Anybody reading your code would expect you to be up to speed with the very newest features, as well as <> and other features introduced in Java7 in 2009.
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:<> [/tt]and other features introduced in Java7 in 2009.



I thought <> means using generics which I am using by creating generic hashmap. However ,I see that you have mentioned that it was introduced in Java 7, so it can't be generics which I was assuming as generics were introduced in Java 5. Let me search what <> when it is not generics .
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Sorry, I misunderstood your last question. Anybody reading your code would expect you to be up to speed with the very newest features, as well as <> and other features introduced in Java7 in 2009.


Instead of


One can write

Advantage of this diamond operator is that it is simpler  .Is that correct ?
 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Yes. It also obviates an opportunity for spelling errors and compiler errors. It works in 90+% of cases.
 
Bartender
Posts: 2702
130
Google Web Toolkit Eclipse IDE Java
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Adding to what campbell said...

I would be very angry to see a method called "printMap" that modifies the map I pass to it. Even angrier if it deletes something from my map.
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

salvin francis wrote:Adding to what campbell said...

I would be very angry to see a method called "printMap" that modifies the map I pass to it. Even angrier if it deletes something from my map.



I will correct it. By 'modifies' do you mean the delete line ?
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Yes. It also obviates an opportunity for spelling errors and compiler errors. It works in 90+% of cases.



What kind of spelling error ?
 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It is smart not to make such errors. I hope you are sharp enough to get the point.
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:It is smart not to make such errors. I hope you are sharp enough to get the point.



But it is not even a possibility because the the IDE will show error for it and it would not compile.
 
Campbell Ritchie
Marshal
Posts: 70689
288
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Remember every keystroke you use is an opportunity to gett it wrong. The whole point is, the smart thing is not to get stung with compiler errors because it really hurts to have to correct them.
 
salvin francis
Bartender
Posts: 2702
130
Google Web Toolkit Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Monica Shiralkar wrote:

salvin francis wrote:...

I will correct it. By 'modifies' do you mean the delete line ?

I do mean any kind of modification. If a method says print, it should ... print !! But of all the modification sins the method can commit, deletion ranks the highest

As an analogy in real life, if you handed over a few paper sheets to an employee to read out loud to the team, you wouldn't expect him/her to scribble something on it and return it back. (that's modification).
Now, taking this a step ahead, you wouldn't expect him/her to to promptly shred any random pages once he is done reading. (that's deletion).
 
Monica Shiralkar
Ranch Foreman
Posts: 1809
9
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Remember every keystroke you use is an opportunity to gett it wrong. The whole point is, the smart thing is not to get stung with compiler errors because it really hurts to have to correct them.



Yes.
 
Master Rancher
Posts: 3701
44
  • Likes 3
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The "not the optimal way of doing it" comment makes me think the interviewer is looking for the old-school fastest possible implementation.  I suspect that might be achieved by replacing the HashMap with an int[] array of size 128, or 256, assuming all the char fall in the 0-127 or at least 0-255 range.  That's worth coding for practice, to see what it would look like.

If I were the interviewer, I'd be looking first for, what questions does the candidate ask?  Some that come to mind are:
  • Do uppercase and lowercase characters count the same?
  • Is the input restricted to standard US-ASCII characters?  Or are others possible?  Do we need full Unicode support?
  • Should we ignore spaces and punctuation, or include them in the count?
  • Are we looking to optimize speed?  Or maintainability?  I would really prefer the latter, but some interviewers may have something else on their minds.

  • Whatever solution you come up with should reflect the answers to these questions.  Try to use modern coding conventions where appropriate, such as generics.  

    Personally I've never liked code that does a containsKey() followed by a get() on a Map.  It's like looking up the key twice. I would prefer just doing a get(), and checking if the result was null.  Null means it wasn't there.

    There's also a Map.merge() method that makes all this quicker and easier.  And there's a Map.forEach() that could also simplify your printMap().  There are also solutions using streams.  Both approaches can minimize the amount of coding you need, which in turn minimizes the chance for errors, and can make everything easier for the reader to understand.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 1809
    9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Mike Simmons wrote:
    If I were the interviewer, I'd be looking first for, what questions does the candidate ask?  Some that come to mind are:

    Do uppercase and lowercase characters count the same?

    Is the input restricted to standard US-ASCII characters?  Or are others possible?  Do we need full Unicode support?

    Are we looking to optimize speed?  Or maintainability?  I would really prefer the latter, but some interviewers may have something else on their minds.

    .



    That's a very good point .Thanks
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Oops, I added one more question just before you quoted me.  Sorry for the confusion.
     
    Campbell Ritchie
    Marshal
    Posts: 70689
    288
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Mike Simmons wrote:The "not the optimal way of doing it" comment makes me think the interviewer is looking for the old-school fastest possible implementation.

    I hope you wouldn't go for speed above anything else.

    . . . get(), and checking if the result was null.  Null means it wasn't there. . . .

    It might do in the old Hashtable, but not necessarily in HashMap, which allows null both as a K or as a V.
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:

    Mike Simmons wrote:The "not the optimal way of doing it" comment makes me think the interviewer is looking for the old-school fastest possible implementation.

    I hope you wouldn't go for speed above anything else.


    Not as first choice, no... but it may be what the interviewer is looking for.  Hence the questions I suggest.  And frankly, I think that can be coded pretty simply, too.  It just doesn't handle larger character sets.

    Campbell Ritchie wrote:

    . . . get(), and checking if the result was null.  Null means it wasn't there. . . .

    It might do in the old Hashtable, but not necessarily in HashMap, which allows null both as a K or as a V.


    Yes, but if the HashMap is under my control, and I know I never put in a null value, then null means it's not there.
     
    salvin francis
    Bartender
    Posts: 2702
    130
    Google Web Toolkit Eclipse IDE Java
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Also looking at the code OP posted :
    I suggest removing code that calls the same method repeatedly as in the case of charAt above. If a method always returns the same value (for the given 'i'), let's just call it once reducing that code to something like this :

    As Campbell pointed out, the 'i' is not even needed above if we can use the enhanced for-each loop instead.
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    That further reduces to:


    For comparison, here's the char array approach:

    Which frankly seems pretty readable.  As long as you don't use any "exotic" (non-US-ASCII) characters.
     
    Rancher
    Posts: 218
    16
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Another thing to note - the question only asked to find duplicate letters, not to count how many times they were duplicated. Every answer so far is counting up the letters, but that's not strictly necessary.

    You can create a Set of Character. For each Character in the String, attempt to add the Character being iterated through to the Set. If it returns false, print that you found a duplicate (or add it to a Set of Character which tracks duplicates and print it out at the end). Whether it's faster or not than the Map approach would have to be tested.

    Unfortunately with such a vague description as "not optimal", we can't say what the interviewer meant. It could mean too slow, or has too much code, or the code is unreadable, or that the interviewer simply would not have done it that way and didn't like it.
     
    Campbell Ritchie
    Marshal
    Posts: 70689
    288
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Zachary Griggs wrote:. . . create a Set of Character. For each Character in the String, attempt to add the Character being iterated through to the Set. If it returns false . . . you found a duplicate . . .

    Salvin and I had a private conversation about this question and we would both have used that approach. It has the advantage that the memory footprint of the app makes some attempt to match the memory actually used.

    . . . . It could mean too slow, or has too much code, or the code is unreadable, or that the interviewer simply would not have done it that way and didn't like it.

    Some of those would be good reasons for the interviewer to reject the applicant and some would be good reasons for the applicant to reject the interviewer.
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Campbell Ritchie wrote:

    Zachary Griggs wrote:. . . create a Set of Character. For each Character in the String, attempt to add the Character being iterated through to the Set. If it returns false . . . you found a duplicate . . .

    Salvin and I had a private conversation about this question and we would both have used that approach. It has the advantage that the memory footprint of the app makes some attempt to match the memory actually used.


    Well, I would assume that if a character occurs ten times, we want to report it as a duplicate once, not nine times.  That requires the second set which Zachary mentioned:

    (or add it to a Set of Character which tracks duplicates and print it out at the end).


    Which works fine, and ends up with code of similar complexity as the other approaches listed.  But by the time you're adding another Set, I don't think there's any reduced memory footprint.  Unless duplicates are rare for some reason.  Where do you think this saves memory?  Yeah, the counting approaches require a bunch of Integer or Long objects - but those were probably in the cache anyway, unless the count exceeds 127.  A single HashSet uses essentially the same memory as a HashMap with the same number of keys - the HashSet contains an actual HashMap internally.  Once you add a second HashMap... I don't see memory savings here.  Am I missing something?
     
    Campbell Ritchie
    Marshal
    Posts: 70689
    288
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Mike Simmons wrote:. . . if a character occurs ten times, we want to report it as a duplicate once . . . . That requires the second set . . .

    That is what we all three did.

    I don't see memory savings here.  Am I missing something?

    I should know better than to mention memory, since memory is cheap. If you use simple chars, you might want an array to accommodate 64Ki different values. Or 1110111 if you use all code points, a number 1000 too few In C, a char might be 8 bits, but not in Java┬«. The memory footprint of each set/map will depend far more on the length of the input text rather than the potential range of values.
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Oh, I thought you were comparing the Set to the Map-based solutions which have mostly been under discussion.  For the char[] solution, I would consider sizes of 128 or 256 pretty much insignificant, and would probably not use char[] array for any larger charsets.  But yes, it's a fixed cost that has nothing to do with the input length, rather with the range of the character set.  That may be a good thing or bad, depending on your use case.
     
    Monica Shiralkar
    Ranch Foreman
    Posts: 1809
    9
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Mike Simmons wrote:The "not the optimal way of doing it" comment makes me think the interviewer is looking for the old-school fastest possible implementation .



    I could recall his words now only.
    He was talking something about O(n)/ O(1)/O(2) something.
     
    Zachary Griggs
    Rancher
    Posts: 218
    16
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    There's not going to be an O(1) way of counting duplicate letters in a String, unless I'm missing something critical here. Reason being that in order to discover duplicates, we have to have some logic that knows about all previous characters we've seen, which becomes a loop through each character, i.e. O(n). All the approaches mentioned in this thread are O(n), however that doesn't mean their performance is the same. It just means they grow linearly with the amount of items.

    Saying O(2) doesn't make any sense, it's a misuse of big O notation, since constant time would simple be said O(1), regardless of how performant that constant operation is (since big O notation measures time as the items grow)

    You can also write this algorithm in O(n ^ 2) complexity pretty trivially (nested for loop), and you could write it in O(n * log n) by sorting it first; but it's not worth writing that algorithm out because... well why would you, when we can do it in O(n).
     
    Mike Simmons
    Master Rancher
    Posts: 3701
    44
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Agree on all points.  The only reason I can think to mention O(1) here is to talk about individual operations within a loop - all our implementations, including Monica's initial implementation, use O(1) operations with a loop, and the loop can go n times, so it's O(n) overall.  (Even if you don't see the loop explicitly.)  These implementations have different speeds relative to each other, but none have a different big-O performance profile.
     
    salvin francis
    Bartender
    Posts: 2702
    130
    Google Web Toolkit Eclipse IDE Java
    • Likes 2
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator

    Monica Shiralkar wrote:...He was talking something about O(n)/ O(1)/O(2) something.


    Isn't it a homework for you to figure out what that is ? Also can you share your final program after all the discussions we've had so far ? Maybe we can discuss about what you think the Big-O notation for your program would be and why.
     
    Greenhorn
    Posts: 18
    3
    • Likes 2
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Well, this post caused me to wonder how I would do this in Java, as I haven't written any Java for about 10 years.  I came up with this, but it looks way too clunky and krufty to be a good solution (IMO):



    I've never worked with the more "functional" stuff in Java (it shows   ), so then I tried Scala, where the Collections functions do most of the work for you, giving me this:


    There have to be better ways of doing this in a functional style, but it was fun playing around, anyway.  
     
    Campbell Ritchie
    Marshal
    Posts: 70689
    288
    • Likes 1
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    It is rather awkward to work out with Streams because you have both to collect that letters and to count them. Remember there is no such thing as a CharacterStream. I thought of something like this:-The argument to the constructor in line 1 is redundant; you probably won't notice any difference in speed or memory footprint if you remove it.
     
    Christopher Webster
    Greenhorn
    Posts: 18
    3
    • Mark post as helpful
    • send pies
    • Quote
    • Report post to moderator
    Yeah, if I were interested in working with Java I'd probably have to actually go and RTFM on Streams and Lambdas and stuff!    

    I like the two sets solution - it's nice and clear what's happening and it does the job with a minimum of boilerplate.  
     
    With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
    reply
      Bookmark Topic Watch Topic
    • New Topic