This week's book giveaway is in the Web Services forum.
We're giving away four copies of Microservices in Action and have Morgan Bruce & Paulo A. Pereira on-line!
See this thread for details.
Win a copy of Microservices in Action this week in the Web Services forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Bear Bibeault
  • Devaka Cooray
  • Liutauras Vilda
  • Jeanne Boyarsky
Sheriffs:
  • Knute Snortum
  • Junilu Lacar
  • paul wheaton
Saloon Keepers:
  • Ganesh Patekar
  • Frits Walraven
  • Tim Moores
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Stephan van Hulst
  • salvin francis
  • Tim Holloway

Is there any query/code that can find common values in records?  RSS feed

 
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Would you please guide me (maybe Simple and fast query if there is or some fast code) to convert my CSV data file (with commas separation):

1,A,C,Z,F,G
2,G,Q,R,C,
3,Z,G,Q,
4,C,F,
5,O,P,
6,O,X,Y,J,
7,A,P,X,

I have this table with ~1,000,000 records like these 7 records that you see (In real Database A,B,C,... are words in string), Records 1 and 2 are common in G and C value and 2,3 and 1,3 and ...

I want to sync records if they have at least two common value like Records 1 & 2,3,4 (but record 5,6,7 haven't at least 2 shared values with others) and generate a list like this:

1 A C Z F G Q R
2 G Q R C A Z F
3 Z G Q A C F R
4 C F A Z G Q R
5 O P
6 O X Y J
7 A P X

at the end we must have 4 same records if we sort data and one others without sync:

1 A C F G Q R Z
2 A C F G Q R Z
3 A C F G Q R Z
4 A C F G Q R Z
5 O P
6 J O X Y
7 A P X

Maybe I do not use good term for my meaning, please see:

1 A C Z F G
2 G Q R C

record 1 has C and G common with Record 2 now 1 has not R and Q thus we must have 1 A C Z F G + Q and R and Record 2 has not A,Z and F thus we must have: 2 G Q R C + A,Z and F thus at the end we have:

1 A C Z F G Q R
2 G Q R C A Z F

I need all records Respectively in the queue from top to bottom. wrote a delphi code but it is so slow. Someone suggest me this groovy code:

def f=[:]
new File('Data.csv').readLines().each{
def items=it.split(',')
def name
items.eachWithIndex { String entry, int i ->
   if(i==0){
       name=entry
   }
   else if(entry){
       if(!f[entry])
           f[entry]=[]
       f[entry]<<name
   }
}

}
f.findAll {it.value.size()>1}

It is very fast (because of using a map file I think), but It only finds the common values.
 
Marshal
Posts: 61805
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Please explain how many duplicates you will have. One way you can do it is to put each String into a Set<String> and work out the intersections of those Sets. But I am a bit worried that you will end up with 1,000,000 Sets in memory simultaneously, which will be expensive in terms of memory consumption and performance. So it is quite likely there will be better ways to do it.
Are those Strings common words? If so, you can probably save space by interning every single String as soon as you read it from the database.
Are you reading from a CSV file or a database? It is probably better to create an SQL query to look for duplicates, but I can't think how at the moment.

And welcome to the Ranch
 
sam Saam
Greenhorn
Posts: 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you for your Time
Really I dont know How many duplicates may be inside of it, now I read CSV but I can make an mysql from it,
But How can I make a query for this when they are not in the same column and I do not know where are there
Those are not common words
I thout maybe with NoSql Databases like mongodb I can manage it.
 
With a little knowledge, a cast iron skillet is non-stick and lasts a lifetime.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!