• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Paul Clapham
  • Tim Cooke
  • Jeanne Boyarsky
  • Liutauras Vilda
Sheriffs:
  • Frank Carver
  • Henry Wong
  • Ron McLeod
Saloon Keepers:
  • Tim Moores
  • Frits Walraven
  • Tim Holloway
  • Stephan van Hulst
  • Carey Brown
Bartenders:
  • Al Hobbs
  • Piet Souris
  • Himai Minh

Sort very large data without using database

 
Greenhorn
Posts: 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
How to sort large data (say 100 million) integer numbers in a machine with 64 mb ram without using database
 
author and iconoclast
Posts: 24204
44
Mac OS X Eclipse IDE Chrome
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,

Welcome to JavaRanch!

The classic, most general solution is an N-way merge sort. Divide the data into chunks that fit in memory, and sort each one in-core, then write out individual sorted data files. Now open all the files, and read one number from each. Write the largest one to the final file, and replace it with a number from the same file. Repeat. Continue until all the files are empty.

But there are other things you can do if you know more about the data. For example, what if the numbers are all unique and have a known maximum number of digits (for example, a large collection of US phone numbers?) Then you can create an output array that contains just one bit to represent each possible number, and just run through the 100 million numbers on disk, setting the bit in the array corresponding to each number you find. Then you run through the bit array, writing the full number for each "on" bit to a final file. This last algorithm is linear in the size of the data -- quite a feat! The idea comes from Jon Bentley's "Programming Pearls", by the way -- a marvelous little book.
 
Ranch Hand
Posts: 3640
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Originally posted by Ernest Friedman-Hill:
The idea comes from Jon Bentley's "Programming Pearls", by the way -- a marvelous little book.



Thanks for suggesting book.
 
Ranch Hand
Posts: 257
Hibernate Firefox Browser Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hello,

Always there will be a trade off between the memory and performance. Can you tell me, will we get this type of situation in real world. If we get this situation, then will we limit memory to 64mb and will think about good algorithm. probably no, and we have to go for tradeoffs.

More inputs are welcome.
 
(instanceof Sidekick)
Posts: 8791
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

The classic, most general solution is an N-way merge sort. Divide the data into chunks that fit in memory, and sort each one in-core, then write out individual sorted data files. Now open all the files, and read one number from each. Write the largest one to the final file, and replace it with a number from the same file. Repeat. Continue until all the files are empty.



Brings back memories ... on 70's vintage mainframes we could see the tape drives write a few records to each tape, spin them all back, merge them together, and, as you say, repeat more or less forever.
 
Clowns were never meant to be THAT big! We must destroy it with this tiny ad:
the value of filler advertising in 2021
https://coderanch.com/t/730886/filler-advertising
reply
    Bookmark Topic Watch Topic
  • New Topic