Win a copy of The Java Performance Companion this week in the Performance forum!
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic

comparing records

 
Sherif Shehab
Ranch Hand
Posts: 485
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi Guys ,

I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?
 
Jim Hoglund
Ranch Hand
Posts: 525
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I would sort both records, independently, and then step
through them in parallel to answer your question.
Jim ...
 
Rob Spoor
Sheriff
Pie
Posts: 20550
57
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sherif Shehab wrote:I got nearly 6 million record , and other 1 million records , i need to check with Java if the 6 million records contains the 1 million records or no , what is the fastest from performance perspective to do this in java ?

If you're talking about database records, then letting the database handle it is probably the best solution. Use an INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN or FULL OUTER JOIN to combine the two tables.

For instance, to get the number of records that are in the 1 million but not in the 6 million (let's call them tables "one" and "six"):
This will link the two tables on the two fields specified in the JOIN clause. Every record of table "one" without a matching record in table "six" will have NULL values for all columns of table "six". The WHERE clause then selects only these matches, and the COUNT(*) returns the number of records.


If they're not database records, use Maps.
 
Sherif Shehab
Ranch Hand
Posts: 485
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
if they're not database records, use Maps.


What about Sets vs Maps from performance perspective, because what i'm thinking in is to use HashSet for the two groups of data as you name them one and six to sure there is no duplication , then check if the six HashSet contains what in one HashSet , What do you think ?
 
Rob Spoor
Sheriff
Pie
Posts: 20550
57
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
It depends how you wrote your equals() and hashCode() methods. It they are not compatible a HashMap is better, with the common keys as the map's keys and the records themselves as the objects.

If you use a Set or Map, check out the bulk methods containsAll, removeAll and retainAll defined in java.util.Collection. Map has methods keySet(), values() and entrySet() you can use to get a Collection (Set extends Collection).
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
You don't plan to hold these millions of keys in a Map, do you ?
 
Sherif Shehab
Ranch Hand
Posts: 485
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I think i;ll go for Sets , but what is more faster in iteration on the Set for loop or an iterator ? or both are same ?
 
Christophe Verré
Sheriff
Posts: 14691
16
Eclipse IDE Ubuntu VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?
 
Sherif Shehab
Ranch Hand
Posts: 485
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Christophe Verré wrote:If possible, I'd do what Jim said in his posts. Where are your records ? In a database ? In a file ?

Actually the records are in DB , but they dont want me to do anything on the DB for major performance issues , this why i need to put them in some Collections to the comparing on them ..
 
Rob Spoor
Sheriff
Pie
Posts: 20550
57
Chrome Eclipse IDE Java Windows
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
So they are moving the performance problem from the database server to the application? Great...
 
Sherif Shehab
Ranch Hand
Posts: 485
Android Eclipse IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Prime wrote:So they are moving the performance problem from the database server to the application? Great...


Ya Rob
 
Jim Hoglund
Ranch Hand
Posts: 525
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Maybe you can creep up on those pesky DBAs. If they are
nervous about the joins, maybe you can at least get each
output record sorted before you receive it.
Jim ... ...
 
Campbell Ritchie
Sheriff
Pie
Posts: 49442
62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob is right (he usually is), and you will probably get better performance asking the database to do that; database management programs are specially optimised for that sort of query.
 
  • Post Reply
  • Bookmark Topic Watch Topic
  • New Topic