• Post Reply Bookmark Topic Watch Topic
  • New Topic

Sorting huge collections based on a property  RSS feed

 
krish venkat
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi,
I have a requirement where I need to sort a collection on the basis of a property(ex ssn) with atleast a million records.Is using a List(ex ArrayList) and then using a Comparator the best way to go?I'm looking for an approach that is efficient.Please advise.

Thanks
 
Campbell Ritchie
Marshal
Posts: 56521
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
What does Collections#sort use? What time complexity does the algorithm have? Does it require a Comparator?
What about List#sort()? That is only available in Java8.
 
Jesper de Jong
Java Cowboy
Sheriff
Posts: 16057
88
Android IntelliJ IDE Java Scala Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
If you have the data in a database and you're getting it with a database query, then the most efficient way to sort the data is most likely by letting the database sort it, by adding an "order by" clause to your SQL query.

Otherwise, using Collections.sort(...) (or List.sort(...)) is quite efficient, and you should use these first before trying to write your own sorting algorithm (you're unlikely to write something that is correct and faster than the well-tested sorting algorithm included in the standard Java library, and writing it yourself will certainly cost a lot more time than using what's already available for you).
 
Campbell Ritchie
Marshal
Posts: 56521
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Since the description of List#sort is so similar to what I remember from the last time I looked at Collections#sort, I suspect they use each other. It may even look like this
 
Paweł Baczyński
Bartender
Posts: 2074
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Well, Collections.sort already handles nulls so List.sort is:
 
Campbell Ritchie
Marshal
Posts: 56521
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Even simpler than I thought.
 
Rob Spoor
Sheriff
Posts: 21131
87
Chrome Eclipse IDE Java Windows
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
They've changed it; Collections.sort now directs to List.sort which does this*:
Now since Arrays.sort is used, this one step can be executed in parallel:
That does mean you're iterating over the list twice (toArray, setting the values) in addition to the sorting. I'm not sure if it will be more efficient.


* LinkedList inherits this implementation. ArrayList skips the toArray and setting and directly sorts its backing array. So does a List created using Arrays.asList.
 
Paweł Baczyński
Bartender
Posts: 2074
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is correct. The code I posted was from version 1.8.0_05.
 
Campbell Ritchie
Marshal
Posts: 56521
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That is because a linked list is liable to go into O(n²logn) complexity if you try to sort it in situ.
 
krish venkat
Greenhorn
Posts: 4
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks everyone for these ideas.
 
Lucian Whiteman
Ranch Hand
Posts: 62
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Rob Spoor wrote:They've changed it; Collections.sort now directs to List.sort which does this*:
Now since Arrays.sort is used, this one step can be executed in parallel:
That does mean you're iterating over the list twice (toArray, setting the values) in addition to the sorting. I'm not sure if it will be more efficient.


* LinkedList inherits this implementation. ArrayList skips the toArray and setting and directly sorts its backing array. So does a List created using Arrays.asList.


Holly cow, this is a great deal of memory cost. Wouldn't it be cheaper to write a custom sort method that swaps the list elements values ?
 
Paweł Baczyński
Bartender
Posts: 2074
44
Firefox Browser IntelliJ IDE Java Linux Spring
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
ArrayList overrides sort() and sorts its internal array directly.

And why such implementation in default method?
javadoc of List.sort() wrote:Implementation Requirements:
The default implementation obtains an array containing all elements in this list, sorts the array, and iterates over this list resetting each element from the corresponding position in the array. (This avoids the n2 log(n) performance that would result from attempting to sort a linked list in place.)


This is nice with default methods. If your class can provide better way of doing a task (like the ArrayList), you can override the method.
 
Campbell Ritchie
Marshal
Posts: 56521
172
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Lucian Whiteman wrote: . . .
Holly cow, this is a great deal of memory cost. Wouldn't it be cheaper to write a custom sort method that swaps the list elements values ?
. . .
Memory is cheap. Very cheap. I can buy 4GB of additional RAM for less than it would cost to go to see a film with my wife and buy a pizza.
Memory is fast. Very fast. Complexity is slow. And unnecessarily slow if simpler complexity is available.

A long time ago, I tried sorting a 1000000‑element int[] using bubble sort and selection sort which both run in O(n²) complexity; the two sorts took about 100 and 99 minutes respectively. If you use a sort with O(n²logn) complexity, you are looking at taking all day to sort them. Newer machines with faster chips would probably do that more quickly. I also tried Arrays#sort which is a quicksort, and that took about 0.4seconds. For a merge sort on a List, you are looking at about 4MB to store the array and not more than 4MB for storage of temporary arrays during the merging process. The Arrays#sort and the Collections#sort methods use the same algorithm which is more economical on memory than that. Because there are faster algorithms available, it is harmful to use slower algorithms.

But for sorting a 20‑element array, bubble sort is probably as fast as any other algorithm. And the swapping method in a linked list is probably more difficult to write than sorting an array.
 
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!