Win a copy of Murach's Python Programming this week in the Jython/Python forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic

Fastest search search  RSS feed

 
Billy Fence
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi. This has been puzzling me for a while. I'm storing, let's say error-codes in a container which I pass back to a client. That client starts looking for certain error-codes it is interested in. What's performance-wise the best way of doing this?
-1- Store public static final declared String instances in an ArrayList,
-2- Store public static final declared Integer instances in an ArrayList,
-3- Store public static final declared int primitives in an integer array and use Arrays.binarySearch() after an Arrays.sort() call,
or ....
I did some testing and found that method -1- is fastest, with 3 following behind and 2 lagging far behind. Can someone explain? I used random size Strings, some where even 30 chars long.
What about adding to an ArrayList compared to adding to an array? Which is the most expensive. Maybe odd questions but I'd like to tune my performance since my application is dealing with loads of data and loads of requests and I got to start somewhere.
Tx.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Billy Fence:
Can someone explain?

I don't have an idea. Can you show us some code?
I'd like to tune my performance since my application is dealing with loads of data and loads of requests and I got to start somewhere.
Tx.

With all due respect - you shouldn't start by *guessing* what your performance bottlenecks might be. If you do so, there is a very high probability that you are wasting efforts on the wrong parts of your application.
What you *should* do is apply some performance testing. *Try out* how fast your application really is. If you find that some features aren't fast enough, use a profiler to find the actual bottlenecks. *Then* you can start to think about how to optimize those.
 
Billy Fence
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
That's what I did. It's loosing to much time in the client side where processing of the errors is done. So I ran seperate tests and found what I posted. So there's no guessing here, just looking for an explaination and alternatives.
Has someone an idea (instead of an opinion)?
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Sorry, then I misinterpreted "I got to start somewhere"...
I don't know why Strings in an ArrayList is fastest from the above solutions - but I would expect a HashSet to be even faster.
Hope this helps.
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I agree with Ilja and this sounds like a job for HashSet ( has excellent lookup speed and seems to match what you want to use the structure ).
I would also assume that Integers would have faster lookup times in a HashSet than a String. But depending on the size of the HashSet you may not even notice a difference.
Jamie
 
Jamie Robertson
Ranch Hand
Posts: 1879
MySQL Database Suse
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Billy Fence:
...Has someone an idea (instead of an opinion)?
Ilja's opinions are well respected around these parts, especially in the Performance forum. More times than not, performance tuning is done prematurely, which usually results in tuning code that may not make a difference in the end or that is already optimized behind the scenes by the compiler or JIT, instead of identifying the actual bottleneck in the system using a profiler. The more you frequent this forum, the more you'll understand why he posted his first response.

Back to your question, you can read up performance of each data structure in the Javadocs. Each has a performance description for common activities such as add, delete and lookups. eg. HashSet:
"This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important."
where as ArrayList says:
"The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking). The constant factor is low compared to that for the LinkedList implementation.
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost."

so it depends on if you spend more time adding or looking up the data structure, or if you need the order of addition to the structure to be preserved.
Jamie
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Jamie Robertson:
Ilja's opinions are well respected around these parts, especially in the Performance forum. More times than not, performance tuning is done prematurely, which usually results in tuning code that may not make a difference in the end or that is already optimized behind the scenes by the compiler or JIT, instead of identifying the actual bottleneck in the system using a profiler. The more you frequent this forum, the more you'll understand why he posted his first response.

Thanks for backing me up!
I think Billy is right in that I could have stated it more friendly, though. Again, I apologize.
so it depends on if you spend more time adding or looking up the data structure, or if you need the order of addition to the structure to be preserved.

If the latter, you could also take a look at LinkedHashSet (if you are able to use JDK 1.4.x).
 
Billy Fence
Greenhorn
Posts: 17
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
More times than not, performance tuning is done prematurely, which usually results in tuning code that may not make a difference in the end or that is already optimized behind the scenes by the compiler or JIT, instead of identifying the actual bottleneck in the system using a profiler.



That's what I did. It's loosing to much time in the client side where processing of the errors is done.

I used a profiler, found the bottlenecks. Working on each of them seperatly. I hear a lot of people say concentrate on the real bottlenecks. Sure, but usually re-design of the architecture and refactoring of some code at some stage will sort that out, if you have the time and resources for that. Still I think that when you program you should be aware of performance all the time. I see a lot of programmers use '+' with Strings whilst all books say use StringBuffer. Ok, it might be a small thing but if you're a real-time systems programmer they'll hang you by the goolies with an attitude like that. What I'm trying to say is, if there is a better way, get to know it, use it and don't always hide behind the re-design dogma. You guys know as well as I that projects hardly ever go the way you want and there might never be the time to sort out the big issues which cost a lot of time (=$$$).
Anyway thanks for the replies.
 
Ilja Preuss
author
Sheriff
Posts: 14112
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Originally posted by Billy Fence:
Still I think that when you program you should be aware of performance all the time.

It certainly doesn't hurt to *be aware* of performance implications. How you should act on that awareness is a different topic, though.
I see a lot of programmers use '+' with Strings whilst all books say use StringBuffer.

If you have a book that says something like "always use StringBuffer in favor of '+'", burn it. Seriously. Search this forum for threads on this topic.
Ok, it might be a small thing but if you're a real-time systems programmer they'll hang you by the goolies with an attitude like that.

Why? Why would they do so if your code is *fast enough*???
What I'm trying to say is, if there is a better way, get to know it, use it and don't always hide behind the re-design dogma.

Rudeness objection. I don't hide behind anything - I very openly advocate awareness of the Anti Pattern "Premature Optimization" and absolutely consciously choose good design over fast code.
And, btw, faster != better.

You guys know as well as I that projects hardly ever go the way you want there might never be the time to sort out the big issues which cost a lot of time (=$$$).

Who is more likely to waste time and risk a project being late:
- The one concentrating on a well decoupled, maintainable design, only optimizing where he has a proof for a performance problem, or
- the one concentrating on performance all along, thereby compromising the design
???
 
Anupam Sinha
Ranch Hand
Posts: 1090
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Hi all
I thought that the 3rd option in the first post should offer the maximum performance on the time per computation front. Shouldn't the 3rd option if used with the switch case construct offer the best performance.
[ July 27, 2003: Message edited by: Anupam Sinha ]
 
Stan James
(instanceof Sidekick)
Ranch Hand
Posts: 8791
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Here's a storage and search technique that is advertised as very fast for large, stable sets with string keys, eg a spell check dictionary. All the Map classes work with Object keys so they don't optimize on the string itself. On the other hand, this involves quite a lot of little node objects and might not serialize quickly to go from server to client.
http://www.javaworld.com/javaworld/jw-02-2001/jw-0216-ternary.html
 
It is sorta covered in the JavaRanch Style Guide.
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!