• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Deleting duplicate numbers from a .csv file.

 
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi All,

I have generated 1 crore random numbers using java's SecureRandom and stored all those numbers in a .csv file.
As mu next step i noticed that in the .csv file i have lots of duplicate numbers.

So now i need to delete all duplicate numbers from that file. Please any one help me.

OR just let me know if you are aware of any other methods to generate unique 1 crore random numbers.
 
Rancher
Posts: 43081
77
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Lots of duplicates doesn't necessarily mean it's not random, although it certainly sounds suspicious. How, exactly, are you generating the numbers?

How are you storing them in a file, meaning, what makes the output a CSV file? If there's a single number on each line then it's not really a CSV.
 
Shreedhar Naik
Greenhorn
Posts: 7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Thanks for your response. First I am generating the random number and storing the same into a csv file each line of file will have 254 random numbers with ',' (except the last number of each line). I need to open the same file in Microsoft Excel and it only 256 columns due to that only i am doing like this. and the code which i have written for the same is as below;



-Thanks
Shree
 
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I tried to work out the chances of your never having any duplicates, and I may have got it wrong, but it was too small a number to display on my calculator. It simply showed "0". That was assuming 2^32 possibilities for SecureRandom#next() which returns an int, not a long.
 
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..
 
Campbell Ritchie
Marshal
Posts: 79177
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Leander Kirstein-Heine wrote:I won't verify if there are some duplicates. ;) But try to store your numbers in a set and leave the loop if the size of the set tells you having as much numbers as you want.

Just my 2 cents ..

Agree. That has already been suggested here.
 
Leander Kirstein-Heine
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Agree. That has already been suggested here.



Right and sorry, I'm really new here and haven't read all threads ...
 
Master Rancher
Posts: 4806
72
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I replied in Shree's previous thread, because that thread seemed to have more info about what I believe the main difficulty here is - ensuring that the numbers are unique. Trying to delete duplicates after the fact is still going to require some way of detecting duplicates. And for ten million numbers, this may be nontrivial. The problem here is comparable to the one in the original post, so I figured as long as it has to be solved, it's better to eliminate duplicates before they are written to the file.

Having said that though, I note that the code above has a simple bug which ensures that the number at the end of each line is duplicated at the beginning of the next line. Removing that bug may be enough to generate files that look, to the casual eye, like they have no duplicates. If you need to ensure this, well, see the other thread for more discussion.
reply
    Bookmark Topic Watch Topic
  • New Topic