• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Tim Cooke
  • paul wheaton
  • Jeanne Boyarsky
  • Ron McLeod
Sheriffs:
  • Paul Clapham
  • Liutauras Vilda
  • Devaka Cooray
Saloon Keepers:
  • Tim Holloway
  • Roland Mueller
Bartenders:

Looking for a Collection more efficient than ArrayList

 
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Can there be any efficient Collection/DataStructure more efficient than ArrayList for the following scenario?

I have a List of Employee objects in an ArrayList. I have to perform a search in this List basing on criteria for eg. search for String Patterns "MR", "MRS","MS"...etc.
 
Marshal
Posts: 5791
367
IntelliJ IDE Python TypeScript Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Probably. It depends.

However, my follow up question is: Do you actually have a problem with the performance of an ArrayList? Have you identified this look up as a bottleneck in your application? Have you measured it?

Let's say you answer yes to these questions. Then the most reliable thing to do is to take your benchmark performance statistics that most replicate your real life usage, re-run your performance tests again using a selection of different List implementations, then compare results. You can now base your decision on facts, rather than the opinion of strangers.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have performed the test using various Collections - Set implementations, Map implementations and List implementations. Following are the results in nanoseconds:





Sorry, I couldnt format the result more than that

As seen from the above, ArrayList is far better than all other collections for the scenario mentioned. (For Key,Value requirement of Map, I used unique identifier EmpID as key and Emp object as value. )
 
Tim Cooke
Marshal
Posts: 5791
367
IntelliJ IDE Python TypeScript Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So assuming that your test is representative of the real life usage, then you've answered your own question.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Tim.

Im looking for a data structure which yeilds me be better result than the ArrayList. I have a requirement which demands a more efficient result than that of ArrayList.
 
Bartender
Posts: 15737
368
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
If your search always compares your query string to the beginning of your search results, then you can use a Trie (not tree) to save your strings in.



You can very quickly compare your query string to the trie (and sub-strings to sub-tries). You can implement it yourself or maybe find a library online.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Campbell Ritchie Thanks for the Edit.
Stephan, Thanks for the Trie solution.
I have a search requirement to simply have a look up of the people with prefixes in the pattern from a List of People Objects. It doesn't always compares my query String to the beginning of result. I need to search for a Pattern of Strings in a String array eg:

 
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I have corrected the formatting by putting code tags round your text, changing java to text and then adding spaces to align the numbers on the right

If you do that sort of profiling, we do not know whether you are comparing like for like, nor how you are searching. Also, if you do such profiling you should be aware that you get great differences from run to run. Implementation A might be 33% faster than implementation B and on a later run, B might be 10% faster than A.
You are also trying the wrong implementations. If you need a List, then you need a List. Not a Map or a Set. So why are you testing Sets and Maps? They all do different things. The only thing you have managed to compare with your ArrayList is Vector.
How many elements are there in the Lists? Are there millions? For collections that large you should put all the data into a database; databases are specially optimised for that sort of search.
How are you searching? Are you using a loop? What sort of loop? Have you considered a Stream?

All you have here is a tendency for Vector (15seconds) to be slightly slower than ArrayList (9seconds), but you do not have enough information to come to any firm conclusions.


What's more, I have tried something with a simple class, and got this sort of result:

campbell@campbellsComputer:~/java$ java ListSpeedDemo arraylist
With an ArrayList it took 239054ns
campbell@campbellsComputer:~/java$ java ListSpeedDemo vector
With an Vector it took 243199ns

Here is the code:That appears to show that you can iterate a 10⁷‑element List in about 0.24μs and there is no significant difference between the two implementations.

You would need to tell us much more about your profiling and about your search strategies before we can make any useful suggestions.
 
Stephan van Hulst
Bartender
Posts: 15737
368
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
It may be a lot less space-efficient, but a solution could be to store your people objects in a Map<Title, Collection<Person>>. Retrieving all people with a certain title would then be as simple as employees.get(Title.MRS).
 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I still think we need to know more about your search strategy, and the results from your profiler, before we can really help.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I do not really have a complex search strategy, but simply searching for an exact phrase/s depending on the user selection. I placed the selection in a String array as below:


and then search as follows:


Neither did I use a profiler, I was just manually switching different collections and noting down the efficiency.

 
Bartender
Posts: 10978
87
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tauri Valor wrote:I do not really have a complex search strategy, but simply searching for an exact phrase/s depending on the user selection. I placed the selection in a String array as below:


and then search as follows:


Neither did I use a profiler, I was just manually switching different collections and noting down the efficiency.


Right off the bat I'd swap the two loops and in the inner loop (pattern) I'd break out of the loop as soon as you found a match. You don't say what you are doing with the boolean result which may affect how we try to help you. You could always use a regular expression "(Mr|Ms|Dr|Fr|Br|Sis|Er|Hon|H\.E)\."
 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So I tried enhancing my little class to emulate what you are doing… and Vector took about 6‑8× as long as an ArrayList.

campbell@campbellsComputer:~/java$ java ListSpeedDemo arrayList
With an ArrayList it took 84439761ns and found 1109692 doctors.
campbell@campbellsComputer:~/java$ java ListSpeedDemo vector
With an Vector it took 620384900ns and found 1114150 doctors.

That is 0.6″ as against about 0.08″.

Please supply details of the size of your list and the speed of your computer.

In my previous post I said μs when it should have been milliseconds. Sorry.
 
Carey Brown
Bartender
Posts: 10978
87
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My test. Compare OP's loop, my suggest loop modification, and a regular expression approach. I didn't use enums because I though the OP's data was most likely already a String.
I was surprised at how much longer the regex approach took.
My results:

 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
My Processor and Speed: Intel(R) Core(TM)2 Duo CPU T7500 @2.20GHz 2.20GHz
I have 100,000 records which will grow over time.
I reading the data currently from CSV data file, no DB being used for my requirement .
I'm so sorry that I was so abstract, while giving only an outline of what I need (I'm compelled to do so ).
From all your expert comments and code, I believe and can conclude that probably ArrayList will be the best preference.

I also tried, by adding all of my ArrayList in a LinkedHashSet and made a search, it yeilded me a result which is closest so far to ArrayList:

Adding to the earlier results in Nanoseconds.

 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Not certain, but I suspect those nested loops lengthen the search time, CB. I also suspect that the nested loops take longer than the search from a List. Remember I didn't have any nested loops; since you have 5 elements in the array to search (line 11) it is not surprising that your search took about 5× as long as mine.
My chip is a Mobile Celeron 2 × 1.40GHz with 2GB of RAM using Ubuntu 14.04LTS.

You should not even consider comparing a linked hash map with an array list. Linked maps are quite different from lists. As I said yesterday, if you need a List, you need a List. Have a look in the Java® Tutorials about the interfaces and remind yourself what each interface represents. That will help you decide whether you need a different data structure from a List.
Have you tried a linked list? I suspect that will give much slower execution than array‑based lists (Vector, ArrayList, etc), however.
Have you considered a mapping from title to list of employees, as SvH suggested yesterday? But that Map represents indexing. Indexing is probably best done by a database; remember databases are optimised for that sort of search and designed to handle large numbers of data. When you run a search, the database will probably index that search in case you run it again.

Please explain what you are doing, TV, because your search is taking much longer than mine even when I have 100× as many elements to search. Are you including reading from the CSV in that time?
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I missed a major point to tell you. I am performing search operation for 200 times using for loop in the test, I'm not including the time to read from CSV. And that's why the result.
Yes, I must not think of comparing Map with a List. But was curious about the result just experimenting. My goal is to get a better result than the ArrayList.

The following is my test code:

 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tauri Valor wrote: . . .

 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Typo only in the pasted code here. But in the original code by which the test was made it is correct.

 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tauri Valor wrote:Typo only in the pasted code here. . . .

Thank goodness for that.

Where does the 200 come from in line 12? What are you doing with the return value in line 15?
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

I missed a major point to tell you. I am performing search operation for 200 times using for loop in the test



Just for the sake of test. I'm not doing anything with the return value. Just measuring the time.
 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
So you are running the test 200 times and taking the total value? So your 100000 becomes 20000000. And your titles array is more than twice as long as CB's. So you would expect to get 4+× the duration that CB had. So his 0.4″ becomes 2″, and you had 9″.

Please supply some details about your computer, because maybe CB's machine is faster.

I think that if you want a List you will not find one with faster execution than ArrayList, as long as you are not multi‑threading.

Have you considered a Stream? That works in Java8 only.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Processor and Speed: Intel(R) Core(TM)2 Duo CPU T7500 @2.20GHz 2.20GHz
I cannot use Stream because of the version issue.

So finally, I think ArrayList is the best option for me.

Thank you all GREATS who have participated in the discussion, special thanks to Campbell. Thanks for your precious time!!


 
Author and all-around good cowpoke
Posts: 13078
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Perhaps I missed it in the discussion, but I didn't see any mention of something that might affect optimizations with alternate data structures.

How many searches do you need to do before the dataset changes?

Bill
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Bill this is for testing my scenario to yeild a better result than ArrayList. I have 100000 records of Person to be searched 200 times.
 
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
I would seriously consider looking into how long your 200 searches would take after bulk loading into a db.
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Im reading the data from csv file holding 100000 records and then adding them to ArrayList, not loading them into DB.
 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
What Dave Tolls means is what I hinted before. It might be much quicker to read those data into a database and let the database do the search. It could be something simple like
SELECT * FROM employees WHERE employees.title = 'Mrs'
 
Carey Brown
Bartender
Posts: 10978
87
Eclipse IDE Firefox Browser MySQL Database VI Editor Java Windows ChatGPT
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tauri Valor wrote:I do not really have a complex search strategy, but simply searching for an exact phrase/s depending on the user selection. I placed the selection in a String array as below:


and then search as follows:


Neither did I use a profiler, I was just manually switching different collections and noting down the efficiency.


I seemed to have take phrases as meaning that a user can search for both "Mr" and "Mrs" simultaneously. Did I read you wrong?
 
Tauri Valor
Ranch Hand
Posts: 181
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes you are right. I'm placing the user selection in String array 'pattern'. For the test I'm assuming all the values in array as selected.
 
Dave Tolls
Rancher
Posts: 4801
50
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Tauri Valor wrote:Im reading the data from csv file holding 100000 records and then adding them to ArrayList, not loading them into DB.



Most databases have a bulk loading option that will load data from a CSV.
This is usually pretty quick.

With suitable indexes it might prove quicker doing this than trying to find a faster Collection than ArrayList.

Of course if there's some other requirement restrictions you haven't told us then maybe that's not a possibility.
 
Campbell Ritchie
Marshal
Posts: 80217
423
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Dave Tolls wrote:. . . [databases] . . . might prove quicker doing this than trying to find a faster Collection than ArrayList. . . .

And they might be quicker for the next query because you have all the data ready to investigate.
 
Why fit in when you were born to stand out? - Seuss. Tiny ad:
Smokeless wood heat with a rocket mass heater
https://woodheat.net
reply
    Bookmark Topic Watch Topic
  • New Topic