satyendra adhikari

Ranch Hand

Posts: 52

posted 10 years ago

Class XYZ

{

public void depositData (String data)

{

//stores this data

}

public String[] getSortedData ()

{

//returns all data that is currently there in a sorted String array.

}

}

Other considerations.

>used in multithreaded environment add method will be called roughly 5-10 times every second, while get called once every 10 second.

> Should be able to perform for very large number of records, say a million.

> there is enough memory available to run this application.

Would appreciate a detailed description of approach you would follow, including choice of data structure to hold data and sorting, synchronization issues.

{

public void depositData (String data)

{

//stores this data

}

public String[] getSortedData ()

{

//returns all data that is currently there in a sorted String array.

}

}

Other considerations.

>used in multithreaded environment add method will be called roughly 5-10 times every second, while get called once every 10 second.

> Should be able to perform for very large number of records, say a million.

> there is enough memory available to run this application.

Would appreciate a detailed description of approach you would follow, including choice of data structure to hold data and sorting, synchronization issues.

Dave Wingate

Ranch Hand

Posts: 262

posted 10 years ago

A good starting point is to consider this piece:

You want to return sorted data, but how are the items sorted?

Are items sorted by the order in which they were added? In that case, a list would be a good candidate for your data structure.

If you need to return items sorted accoding to a relationship among the data members, then you might consider using a heap.

//returns all data that is currently there in a sorted String array.

You want to return sorted data, but how are the items sorted?

Are items sorted by the order in which they were added? In that case, a list would be a good candidate for your data structure.

If you need to return items sorted accoding to a relationship among the data members, then you might consider using a heap.

satyendra adhikari

Ranch Hand

Posts: 52

Dave Wingate

Ranch Hand

Posts: 262

posted 10 years ago

If

- data items need to be returned sorted alpha

- you're going to be working with a large number of records

- you'll be calling both the add and get methods frequently

- calling get does not remove elements from your collection

- you don't plan to also provide a method like getFirstOrderdElements(int numElements)

A heap is not the way to go after all; time complexities for your operations backed by a heap would be:

depositData: O(lg n) since we need only insert the new element into the heap

getStoredData: O(n + n lg n) we must both copy our heap (to not destroy data) and then remove elements from the heap in order

An linked list would be the way to go; time complexities for your operations backed by a linked list would be either:

(unordered)

depositData: O(1) since we would just insert new daa at end of list

getStoredData: O(n + n lg n) since we must sort prior to returing a copy of the data in array form

or

(ordered)

depositData: O(lg n) since we must use binary search to find proper insert location

getStoredData: O(n) since all we have to do is return a copy of the data in array form

Since you plan to call the two methods with equal frequency, I'd recommend an ordered list for better performance .... because when we combine time complexities for both operations:

O(1 + n + n lg n) > O(n + lg n)

You could further improve performance by allowing getStoredData to return a reference to the underlying data structure:

(ordered)

depositData: O(lg n) since we must use binary search to find proper insert location

getStoredData: O(1) if we don't have to make a copy of our data

[ January 05, 2007: Message edited by: Dave Wingate ]

- data items need to be returned sorted alpha

- you're going to be working with a large number of records

- you'll be calling both the add and get methods frequently

- calling get does not remove elements from your collection

- you don't plan to also provide a method like getFirstOrderdElements(int numElements)

A heap is not the way to go after all; time complexities for your operations backed by a heap would be:

depositData: O(lg n) since we need only insert the new element into the heap

getStoredData: O(n + n lg n) we must both copy our heap (to not destroy data) and then remove elements from the heap in order

An linked list would be the way to go; time complexities for your operations backed by a linked list would be either:

(unordered)

depositData: O(1) since we would just insert new daa at end of list

getStoredData: O(n + n lg n) since we must sort prior to returing a copy of the data in array form

or

(ordered)

depositData: O(lg n) since we must use binary search to find proper insert location

getStoredData: O(n) since all we have to do is return a copy of the data in array form

Since you plan to call the two methods with equal frequency, I'd recommend an ordered list for better performance .... because when we combine time complexities for both operations:

O(1 + n + n lg n) > O(n + lg n)

You could further improve performance by allowing getStoredData to return a reference to the underlying data structure:

(ordered)

depositData: O(lg n) since we must use binary search to find proper insert location

getStoredData: O(1) if we don't have to make a copy of our data

[ January 05, 2007: Message edited by: Dave Wingate ]

Garrett Rowe

Ranch Hand

Posts: 1296

posted 10 years ago

Insertion into an ordered linked list is O(n), binary search assumes random access.An linked list would be the way to go; time complexities for your operations backed by a linked list would be either:

(unordered)

depositData: O(1) since we would just insert new daa at end of list

getStoredData: O(n + n lg n) since we must sort prior to returing a copy of the data in array form

or

(ordered)

depositData: O(lg n) since we must use binary search to find proper insert location

getStoredData: O(n) since all we have to do is return a copy of the data in array form

Some problems are so complex that you have to be highly intelligent and well informed just to be undecided about them. - Laurence J. Peter

Dave Wingate

Ranch Hand

Posts: 262

posted 10 years ago

Yep ... thanks for the correction ... so using either an array based list or a linked list would result in these time complexities:

(ordered - array-based list)

depositData: O(lg n + n) since we must use binary search to find proper insert location and then call the insert operation which over time amortizes to O(n) (described in API)

getStoredData: O(1) if we don't have to make a copy of our data

(ordered - linked list)

depositData: O(n + 1) since we may have to search the entire list for the proper insertion point, but insertion takes a constant amount of time for a linked list

getStoredData: O(1) if we don't have to make a copy of our data

Although not quite as good as I led you to believe in my previous post, the ordered linked list implementation is still more efficient than the ordered array list or any unordered list structure.

Note that the implication of insert operatin having O(n) instead of O(lg n) means that the orrdered linked list is a viable option only if you really are going to call the "get" method about as frequently as you call the "insert" method. Insert of O(n) is a tough price to pay for very large collections. If it turns out that you will be calling "insert" much more frequently than you call "get," you should consider using a heap as I originally mentioned.

[ January 06, 2007: Message edited by: Dave Wingate ]

[ January 06, 2007: Message edited by: Dave Wingate ]

(ordered - array-based list)

depositData: O(lg n + n) since we must use binary search to find proper insert location and then call the insert operation which over time amortizes to O(n) (described in API)

getStoredData: O(1) if we don't have to make a copy of our data

(ordered - linked list)

depositData: O(n + 1) since we may have to search the entire list for the proper insertion point, but insertion takes a constant amount of time for a linked list

getStoredData: O(1) if we don't have to make a copy of our data

Although not quite as good as I led you to believe in my previous post, the ordered linked list implementation is still more efficient than the ordered array list or any unordered list structure.

Note that the implication of insert operatin having O(n) instead of O(lg n) means that the orrdered linked list is a viable option only if you really are going to call the "get" method about as frequently as you call the "insert" method. Insert of O(n) is a tough price to pay for very large collections. If it turns out that you will be calling "insert" much more frequently than you call "get," you should consider using a heap as I originally mentioned.

[ January 06, 2007: Message edited by: Dave Wingate ]

[ January 06, 2007: Message edited by: Dave Wingate ]

Ilja Preuss

author

Sheriff

Sheriff

Posts: 14112

posted 10 years ago

If you don't have duplicates in the collection, I would use a TreeSet.

The soul is dyed the color of its thoughts. Think only on those things that are in line with your principles and can bear the light of day. The content of your character is your choice. Day by day, what you do is who you become. Your integrity is your destiny - it is the light that guides your way. - Heraclitus