• Post Reply Bookmark Topic Watch Topic
  • New Topic

Comparing hashmaps... I think  RSS feed

 
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am trying to write a program that will take the information from one group of information in a table (last name look up returning multiple rows) with another group of information (first name lookup returning multiple rows) and compare the two groups to find the row that is the same for both groups of information. I am not sure if hashmaps would be the best way to do this... So what would be the best way to handle this kind of scenario?
 
Ranch Hand
Posts: 120
1
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
it sounds like what you want is a database. so you could make your own class to act as a data-type holding all their info like name address etc, then make a hashmap containing these objects, with the hashmap key being unique like a employee ID# the best way to deal with this is probably to use a real database software like ms access.
 
John Morgan
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I am actually pulling the information from a SQL database. I have the SQL queries written where I can pull the information and view it manually. I am trying to figure out whether a hashmap for each of the queries is the best way to go or is there some other form of storing the information temporarily to determine what the common row is for both queries.
 
S Fox
Ranch Hand
Posts: 120
1
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I guess it's the best way for what you're doing, key should be the row# itself in each of your maps
 
John Morgan
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Okay still a little confused but to better explain what I am trying to do... (simplified for message board)

I do a query which returns the following SQL results
(Search all males)
FNameLNameCountrySex
LarryCurryUSAMale
ThomasSwordCanadaMale
DarylDavisFranceMale


(search everyone in USA)
FnameLnameCountrySex
KalebLynnUSAMale
LarryCurryUSAMale
TemperanceVaugnUSAFemale


I want to return the value of:
FName=Larry
LName=Curry
Country=USA
Sex=Male
As this is the only rows that are common to both search results.

So the question I would have should I be creating HashMapUSASrch1, HashMapUSASrch2, HashMapUSASrch3, & HashMapMaleSrch1, HashMapMaleSrch2, HashMapMaleSrch3 or is there a better way to step through this process?
 
S Fox
Ranch Hand
Posts: 120
1
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The problem is none of that information is unique, you have no way to identify the exact database record you want. You need a unique identifier for all records in the database. Such as employee id#, Social security#, etc. Using the row# is not good because it changes when data is added or removed. Otherwise how do you know you got the right person? If theres more than one larry, or 2 separate people at the same address. So you gotta do it like this:

AccountID#      FName LName Country Sex
0001                Larry         Curry         USA          M

HashMap1 should be:
Key: AcctID
Value: FName

HashMap2 should be:
Key: AcctID
Value: LName

done like this I can search for the key "0001" in either of these maps and always come back with the correct record
 
John Morgan
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And there is my conundrum... The information I am pulling contains 10+ columns and there are 5 points of reference that need to be matched up to give me the unique record I am looking for. I can do it by sight and there is always a record in each one that matches all 5 points, where as the other records would have one or more of the points different. In my case above all 4 columns match up identical even though the identifier in each one would be different.
 
S Fox
Ranch Hand
Posts: 120
1
C++ Eclipse IDE Java
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well if nobody will fix the db for you, I thought of a hack way you can maybe do this. You'd basically have to make your own class for a db record... FName, Lname, all that. Then put these into a hashset for query1. do the same for each query. Youll probably need to make a comparator for testing records equality on those 5 points of reference. This should let you iterate through q1 hs and ask does it contain any record from q2 hs, and if it does then you can pull those out into sub-sets of the data by sticking them into a new hs.
 
Rancher
Posts: 3539
39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So, when you say "there are 5 points of reference that need to be matched up" is that a fixed set of 5 columns?

So in your table with 10 (+) columns, you only need to compare the same 5?

Can you give a fuller example?
Because to me that looks like a single query, so I'm clearly missing something.
 
John Morgan
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

S Fox wrote:Well if nobody will fix the db for you, I thought of a hack way you can maybe do this. You'd basically have to make your own class for a db record... FName, Lname, all that. Then put these into a hashset for query1. do the same for each query. Youll probably need to make a comparator for testing records equality on those 5 points of reference. This should let you iterate through q1 hs and ask does it contain any record from q2 hs, and if it does then you can pull those out into sub-sets of the data by sticking them into a new hs.



Okay which is kind of what I thought. This is part of a larger DB clean up where we have individuals who have been entered multiple times but with different IDs (yes bad practice I know but the realism is that it happened and now I have to fix it) most times we know the first and last name, sex, SSN and DOB (Although the DOB can be listed as a generic number not sure why) so I am working on comparing those 5 points on employees that appear to match. I will run the query on each employee I find and see of the other data needs to be added to one of them (i.e. personnel actions, paygrades, etc) but with over a million people in the database it is a time consuming problem.  The other issue is the changes in names (i.e. Smith vs. Smyth or John vs. Jon).

It appears from the responses I was heading down the right path just going to be more time consuming than I thought.

Thanks all.
 
Dave Tolls
Rancher
Posts: 3539
39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


And if you just want the multiples:



That will give all the ones that are likely duplicates.
 
John Morgan
Ranch Hand
Posts: 144
2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Dave Tolls wrote:So, when you say "there are 5 points of reference that need to be matched up" is that a fixed set of 5 columns?

So in your table with 10 (+) columns, you only need to compare the same 5?

Can you give a fuller example?
Because to me that looks like a single query, so I'm clearly missing something.



Typically we will get 2 employee IDs to compare and see if the individuals are the same, we then compare the 5 points (First Name, Last Name, DOB, Sex, SSN) and see if they are the same person or if in fact they are different. Where the rub comes in is that there could be multiple entries returned for a individual who moved to a different division, sub company, etc (it creates a new record
 
Dave Tolls
Rancher
Posts: 3539
39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
OK.

So not a case of pulling up likely matches from the whole table then?

To be honest I don't see why a trawl wouldn't work...or at least cut the effort.
These are employees, so there can't be that many.
 
Dave Tolls
Rancher
Posts: 3539
39
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Oh, just noticed I mucked up that query:

select firstname, lastname, sex, ssn, dob, count (*) as num_entries
FROM your_table
GROUP BY firstname etc


Had the WHERE close left over from my first pass!
 
I am going down to the lab. Do NOT let anyone in. Not even this tiny ad:
Why should you try IntelliJ IDEA ?
https://coderanch.com/wiki/696337/IntelliJ-IDEA
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!