• Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

HashMap interview Question.

 
Guy Emerson
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks

 
naved momin
Ranch Hand
Posts: 692
Eclipse IDE Java Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Guy Emerson wrote:Hi all,
In a recent interview I was given a scenario wherein there is a database which contains millions of records even duplicate records as well. Now in a java program if we access these records, how would I ensure that the data I am accessing is not duplicate?

They provided a hint that Maps can be used, or may be hashcode() method!!

Now the question is how?

Thanks


hashmap can store unique key , and duplicate values
so if empid is unique put it in key so in that manner your hashmap will never save two keys which is same
for eg : hashmap cannot store hashmap.put(1,"naved");
hashmap.put(1,"sam");
but this is possible
hashmap.put(2,"naved");

alternative you can print the hashcode of two stuff if they are same there hashcode will be same
however i m not very much sure with the hashcode thing but hashmap will be the solution for this problem
lets see what others says about hashcode .....
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I don't know what the interviewer wanted to hear, but the correct way would be to obtain the data without duplicates right from the database, using proper SQL construct (ie. select distinct or group by). Databases were designed to handle tasks like these.

Now if I was forced to do this in Java, putting the records into a HashSet would be the natural way of doing it. Yes, hashcode() plays a role in this. If you're unsure why or how, I'd recommend reading any Java Collection Framework tutorial. You should definitely do it before your next interview.

If the question actually was to detect whether there are duplicates in the database, even this could (and should) be handled by pure SQL, eg. select key_columns from my_table group by (key_columns) having count(*) > 1, though - depending on a database and other requirements - a better approaches might exist. A pure Java solution in this case might actually employ a Map. Was this the gist of the question?

Anyway, it seems to me that either the question was more twisted, or your interviewer knew neither Java nor databases.
 
Guy Emerson
Ranch Hand
Posts: 41
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks Naved and Martin for your replies.

Guy Emerson
 
Alok Aparanji
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
A million records in memory is a way too much heap. Not possible in a practical scenario. Duplicates have to be eliminated while fetching from the database.

Answering to the interview question, have a well defined key. Put the each row as an object. Duplicates will be eliminated, since HashMap cannot have multiple records with the same key. The older value will be overwritten (Older and newer value should be the same in your case).
 
Pavan Kumar Dittakavi
Ranch Hand
Posts: 106
Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.
 
Martin Vajsar
Sheriff
Posts: 3752
62
Chrome Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pavan Kumar Dittakavi wrote:
2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

Of course that can be determined. When constructing a query to detect or remove duplicates, you need to know which columns uniquely identify each row object, so that you can group by them or distinct them. These columns would then be used as the map's key. You'd need do create a class for them and define the hash() and equals() method properly, of course, but that is true for any multi-column key.

So, the appropriate way for doing this should be at the query level.

I agree, but it does not mean it is not doable from Java.

Edited for clarity. The key columns do not uniquely identify rows, as in this case no duplicate rows could occur. Since there can be duplicate rows, the key columns must be chosen so that they identify the "objects" that are stored in the table.
 
Campbell Ritchie
Sheriff
Pie
Posts: 50278
80
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Welcome to the Ranch Alok Aparanji
 
Alok Aparanji
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Pavan Kumar Dittakavi wrote:I have one question here. With a hashmap, you can have a key that points to the value. And we can have two cases in this problem.

1. Database schema having a primary key:

In this case, we can capture the primary key of the database in the KEY position of the map. In this case, to be honest there wont be a need for KEY as a db having a primary key ensures that no duplicate records are stored in it. So, this case is handled automatically.

2. Database schema not having a primary key:

In this case, since the db does not have a primary key..what should be the KEY for the map?. Surely this is something that can't be determined.

So, the appropriate way for doing this should be at the query level.

[ Experts please drop your views on this one ].


Thanks,
Pavan.


+1 to your first point.

For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.
There could be more reasons, I'm sure. But if you have domain knowledge about the table, you can be pretty sure if a column or a combination of columns could work like a unique key. Sounds like the interviewer's question was just to test the candidate's collections knowledge and not a real time scenario.

Campbell Ritchie wrote:Welcome to the Ranch Alok Aparanji

Thanks Campbell. Hope to learn lots of stuff here .
 
Matthew Brown
Bartender
Posts: 4568
9
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Alok Aparanji wrote:
For the second point, there can be several reasons why the DB might not have a primary / unique key
1) There is a column which is actually unique but is not declared to be the primary key.
2) The DB might be de-normalized for performance reasons.

3) The database designer doesn't know what they're doing. While 1 and 2 may be valid, from my experience 3 is more common .
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic