• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Bear Bibeault
  • Knute Snortum
  • Liutauras Vilda
Sheriffs:
  • Tim Cooke
  • Devaka Cooray
  • Paul Clapham
Saloon Keepers:
  • Tim Moores
  • Frits Walraven
  • Ron McLeod
  • Ganesh Patekar
  • salvin francis
Bartenders:
  • Tim Holloway
  • Carey Brown
  • Stephan van Hulst

Does an "if" statement change the time complexity of function?  RSS feed

 
Ranch Hand
Posts: 35
Java Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Does "if" statement change the time complexity of function? I saw an example about it and someone commented like this: " the if statement does not change the time complexity in this example." And still I didn't get the exact answer. Now I want to know that is it true for all the functions which include "if" statment or it depends on the "if" 's content.
 
Marshal
Posts: 62231
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Do you mean an if or an if‑else? I always think of an if‑else, which has two “arms”. If each option contains the same amount of processing, what difference will it make to the complexity of the code overall? All the selection does is to pick an option.
Please tell us where you saw that comment, so we can assess the source.
 
Bartender
Posts: 20125
103
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Alas, too much of my time (no pun intended) has been spent doing applied computing rather than the theoretical stuff, so I never got around to learning the formal definition of the CompSci term "time complexity". So I'm going to reverse-engineer for meanings and leave the interpretation to you.

A conditional statement like "if" obviously changes the logical complexity of code, since it creates multiple execution paths depending on whether the tested condition was true or false.

In raw hardware terms, as a general rule, the instruction execution time for a branch statement would often have one value for a true result and a different timing value for a false result. Branch statements at the machine level were commonly either done as a jump or as a skip. In a jump, a new next-instruction address would be loaded into the program counter address register, in a skip, the program counter address register would simply increment an extra step (this works best on machines with fixed-length instructions). As to the differences in timing (if any) in either case, it would be dependent on the hardware design. On the machines I spent the most amount of time on, both conditional jump and conditional skip took slightly longer than a condition that didn't jump/skip.

It gets even messier when you consider modern processors. The old-time machines were pretty much models of the classic Turing architecture. Modern processors are much more complex with multiple instructions at various stages of execution at any moment. It is not uncommon for a conditional instruction to look ahead at both true and false results and load up the pipeline with them, discarding whichever path the conditional test later proved inapplicable.

Short answer, from a physical timing point of view, an "if" statement will probably change the complexity of computing how long a given instruction sequence (containing the conditional) will take, but at this point, the world of computer hardware is only slightly short of quantum mechanics when it comes to determinacy. And, that's not even counting literal quantum computing!
 
Asif Haciyev
Ranch Hand
Posts: 35
Java Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator


I am confused with this code, then I searched to find answer. And I saw this stackoverflow question and still don't know the answer because on the comment is written like this: "the if statement does not change the time complexity in THIS example." (which I mentioned in the question above)
 
Bartender
Posts: 9559
189
  • Likes 2
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The answer is, it depends.

Let's start the discussion by saying it's not sufficient to just state that a piece of code "performs in the order so-and-so". You should actually say that the piece of code "performs in the order so-and-so in the best case" or it "performs in the order so-and-so in the average case". Usually when people omit what case they are considering, they actually mean the worst case.

Usually, for average and worst cases, conditional statements don't influence time complexity, because you just consider the branch with the highest complexity. In the sample that Asif gave, the if-clause has a time complexity of O(1), and the else-clause has a time complexity of O(n) (because of the for-loop), so the entire method has a time complexity of O(n).

Usually, for best cases, you only consider the branches with the lower complexity.

Now, here comes the part that makes complexity analysis, well... complex. Conditional statements inside loops may terminate them prematurely, and if the termination is quick enough, it will affect the time complexity of the loop. Here's a simple example:

This loop performs in the order O(1), in the worst case. That's because it never iterates more than 5 times, a constant factor.

Average cases make things even more difficult, because you have to consider how the method performs over many repeated calls. A good example is ArrayList.add(). If you look at a single method call in isolation, it will appear that ArrayList.add() runs in the order O(n), where n is the current size of the list. This is definitely true for the worst case, because in the worst case the list has reached its capacity and it needs to copy every element to a new array. On average though, the amount of times that the capacity needs to be increased is a logarithmic function over the amount of times that the method is called, because the capacity doubles every time. This logarithm cancels out with the linear cost of copying the elements, and so over many calls to ArrayList.add(), the average time becomes O(1). This is called "amortized constant time". This is also a case where a condition (has the list size reached the capacity?) affects the time complexity.
 
Sheriff
Posts: 12816
211
Android Debian Eclipse IDE IntelliJ IDE Java Linux Mac Spring Ubuntu
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
I concur with Stephan's answer, with a cow.
 
Asif Haciyev
Ranch Hand
Posts: 35
Java Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Stephan van Hulst.I understand like that: The if-else statement can change the time complexity but as a entire time complexity we have to take worst case time. Is it true? Did I understand right?

But I didn't get just this sentence. Can you explain please?

Stephan van Hulst wrote:because the capacity doubles every time. This logarithm cancels out with the linear cost of copying the elements, and so over many calls to ArrayList.add(), the average time becomes O(1).

 
Tim Holloway
Bartender
Posts: 20125
103
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thank you Stephan for explaining in CompSci terms.

Now for the caveats.

What you code isn't always what you get, even at the higher levels. I've already mentioned that we've gone beyond the determinate to the statistical at the machine level.

Most compilers these days have some pretty extensive optimizations. For example, there's loop unrolling:


May compile to:

And thus the "if" condition that terminates the loop reduces out entirely. No extra time complexity there!

But this is Java, so the generated bytecode may be interpreted or it may be converted from bytecode to machine code using JIT (Just-in-Time) compiling at runtime.

And there's more. On an industrial scale, the JVM may monitor the code's performance in real time. And if the machine does have a noticeable performance penalty for one outcome of an "if" over another, it may decide to discard the original JIT code and replace it with more optimal code.Consider the following:


Real-world is more complicated than that, since an unconditional jump over the "else" also costs, but I hope you can see what I mean.

Which is simply that static analysis of source code is not a very reliable predictor of real-world performance these days. It's one of the reasons why we always caution people against premature optimization - another being that you may gain vastly more performance simply by choosing a more appropriate algorithm. Only very short sequences such as interrupt handlers really benefit by manual instruction-level optimization.

Thus for practical purposes, the complexity you should worry about most outside of academia is not low-level time complexity, it's human complexity. Compilers can take legible code and do wonders with it, but if the human aspect of the code isn't addressed, then not only will humans burn a lot of resources understanding and maintaining the code, but since the compiler designers tend to design optimizers to exploit common coding practices, the compiler will find less to optimize.
 
Stephan van Hulst
Bartender
Posts: 9559
189
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Asif Haciyev wrote:but as a entire time complexity we have to take worst case time. Is it true? Did I understand right?


No. You can take either best case, average case or worst case (or even some specific custom case that you describe in detail), and when you give the time complexity you must mention which case you considered. If you don't mention what case you considered, people will assume you are talking about worst case.

But I didn't get just this sentence. Can you explain please?

Stephan van Hulst wrote:because the capacity doubles every time. This logarithm cancels out with the linear cost of copying the elements, and so over many calls to ArrayList.add(), the average time becomes O(1).


Take a look at this chart that shows the runtime per call to ArrayList.add():



Initially, the list is empty and has capacity for 1 element. Calling add() will just put the element in the empty slot and return. This takes constant time.

Calling add() a second time causes the list to double its capacity, because it's currently full. It allocates a new array with size 2, copies the single element from the previous array, and then adds the new element in the only remaining empty slot.

Calling add() a third time causes the list to once again double its capacity. This time it allocates a new array with size 4, copies the two elements from the previous array, and then adds the new element in the first of the two remaining empty slots.

Calling add() a fourth time will just put the element in the final remaining empty slot.

This goes on like this, doubling the size of the array and copying all elements on the 5th, 9th, 17th, 33rd, 65th etc. call to add(). On each of these calls, the amount of time that the call needs to copy all elements increases, but so does the number of times you can call add() again before the array needs to double its size. These two factors cancel each other out, so if you divide the total running time of all calls to add() by the number of calls to add(), the average running time will approach a constant value as the the number of calls increases.

The time complexity of ArrayList.add() is "constant time" for the best case (the array isn't full), "linear time" for the worst case (the array is full) and "amortized constant time" for the average case (the total runtime divided by the number of calls).
 
Stephan van Hulst
Bartender
Posts: 9559
189
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:What you code isn't always what you get, even at the higher levels. I've already mentioned that we've gone beyond the determinate to the statistical at the machine level.

Most compilers these days have some pretty extensive optimizations. For example, there's loop unrolling:


What Tim says here ties in nicely with the notation used in complexity analysis. "Big Oh" notation is used to give an upper bound to the performance of a function. If a function is in O(n) it means it requires at most linear time with regards to n.

Other often used notations are "Big Omega" and "Big Theta". If a function is in Ω(n) it means that it requires at least linear time with regards to n. If a function is in Θ(n) it means it requires at most, AND at least linear time with regards to n.

Tim's point however is while you can determine the Big Oh complexity class from static analysis of the code, the lower bound given by Big Omega and Big Theta can be misleading because of optimizations that the compiler performs.

Another important thing to consider: Complexity classes only tell you how the code performs when n is large. This means that if you know that the problem size is small, an algorithm with a poor time complexity may outperform a "faster algorithm". A great example of this is Mergesort versus Insertion Sort. Mergesort has a worst case time complexity of O(n log(n)), which is okay, while Insertion Sort has a worst case time complexity of O(n^2), which is bad. However, for small arrays, Insertion Sort will still have a better runtime.
 
Campbell Ritchie
Marshal
Posts: 62231
193
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
The point about merge sort and insertion sort (the same applies to selection sort and bubble sort) shows that the actual running time depends both on the time complexity and on the speed of the individual operations. The same sort of thing applies if the operations are optimised away, as Tim told you. I would say also that a plain simple if‑else doesn't change the complexity; it is the content of the two branches that make such changes. And remember that a plain simple if doesn't exist in CS terms; you must always impute an else to it, even if that else is empty.
 
Tim Holloway
Bartender
Posts: 20125
103
Android Eclipse IDE Linux
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:And remember that a plain simple if doesn't exist in CS terms; you must always impute an else to it, even if that else is empty.



I dunno. I think that actually "else" is supposed to be shorthand for "if not" (preceeding condition). I know that when Structured Programming was first posited, it was supposed to have been proven that only a small number of specialized statements were required to create any program while avoiding the dreaded "go-to". To be precise, I think it was "IF", "LOOP" and maybe procedure call. For example, SELECT (switch/case) almost immediately started getting adopted, but all a switch statement is is just a bunch of implied "IF" statements that happen to be predicated on mutually-exclusive conditions. And that's true of ELSE and OTHERWISE/default in a switch as well.

Regardless, the all this brings up the importance of the difference between worst-case and average-case timing. As I've mentioned before, the overhead for a heap or quicksort when presented with data that's already in order is horrible, whereas it's O(n) for a bubble-sort where n is the number of elements being sorted. Scramble that data thoroughly, however, and the timing changes drastically for both algorithms.

And here again, real-world optimizations can also distort one's computations. I've seen sort/search libraries where when the data being processed consisted of 10 items or less, the algorithm switched from a binary mode to a linear mode, since the overhead of divide-and-conquer was higher than simply plowing through with brute force.
 
Asif Haciyev
Ranch Hand
Posts: 35
Java Netbeans IDE Oracle
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Thanks you all
 
Marshal
Posts: 6363
437
BSD Linux Mac OS X VI Editor
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:all a switch statement is is just a bunch of implied "IF" statements that happen to be predicated on mutually-exclusive conditions.


Not sure I understood you correctly, but that's not true. Perhaps that's true in situations with really few switch cases, but when the number grows switch statements use lookup tables. So it outperforms technically bunch of if-else statements as search of matching condition is O(1) as oppose to bunch of if-then-else goes in a linear fashion.
 
Ranch Foreman
Posts: 1065
24
IBM DB2 Java Netbeans IDE Spring
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
An excellent discussion guys. Much more cleaner that a lot of lessons I had to learn at University's times.
Thanks, all of you.
 
Tim Holloway
Bartender
Posts: 20125
103
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Liutauras Vilda wrote:

Tim Holloway wrote:all a switch statement is is just a bunch of implied "IF" statements that happen to be predicated on mutually-exclusive conditions.


Not sure I understood you correctly, but that's not true. Perhaps that's true in situations with really few switch cases, but when the number grows switch statements use lookup tables. So it outperforms technically bunch of if-else statements as search of matching condition is O(1) as oppose to bunch of if-then-else goes in a linear fashion.



Don't mistake theory for practice. The net (theoretical) effect of a switch/case is always a series of mutually-exclusive "IF" statements. But in practice, yes, it's often done via lookup tables. In fact, it's common that when you have sparse sets of switch values for the compiler to generate several sets of internal "IF"s plus lookups. That's not as uncommon a situation as you might think. For example, when using a switch statement where all the upper-case letters go in one case, all the lower_case go in another, digits in another, and special characters in yet another. And doubly-so when it's being compiled for an IBM EBCDIC system where there are gaps between blocks of letter codes.

This optimization is one reason why in many languages you can only use switch/case on integral values (including character and byte data types) and not on stuff like strings.

Oh, BTW. the IBM System/360 architecture has a native code instruction called "translate and test" which is used to scan byte sequences and classify them. It's designed with lexical scanning in mind, and it can be made to return a user-defined code value from a lookup table based on the byte values of the presumed string being scanned. Which very nicely then optimises into an indexed set of jump statements.
 
Tim Holloway
Bartender
Posts: 20125
103
Android Eclipse IDE Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Incidentally, the switch statement is a good study in extremes. In cases where it doesn't optimise to a lookup, the worst-case performance would be O(n+1) where n is the number of cases tested plus optionally a default. The best-case performance should be O(1). Calculation of the performance for clustered sparse cases is left to the reader.

And if you're operating on a massively-parallel processor even an IF-based switch implementation could be O(1)!
 
Stephan van Hulst
Bartender
Posts: 9559
189
  • Likes 1
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Tim Holloway wrote:the worst-case performance would be O(n+1) where n is the number of cases tested plus optionally a default.


Just a minor remark: O(n+1) doesn't actually mean anything other than O(n).

O is a function that returns the set of all functions that don't grow faster than the function f(n) = k*n, where n is the argument of the O function, and k is any arbitrary constant that is large enough. This implies that:

O(7) returns the same set of functions as O(1), because I can choose k to be 8 and f(n) = 8*1 grows faster than g(n) = 7.

O(2n + 4) returns the same set of functions as O(n), because I can choose k to be 3 and f(n) = 3n grows faster than g(n) = 2n + 4.

O(41n³ + 5n² + 100n + 7) returns the same set of functions as O(n³), because I can choose k to be 42 and f(n) = 42n³ grows faster than g(n) = 41n³ + 5n² + 100n + 7.

The complexity class indicates an order of growth. You can imagine that the omitted coefficients influence how well the algorithm performs for small problem sizes, but O doesn't indicate how well an algorithm performs for small problem sizes, so the coefficients are meaningless.

For instance, the performance of Insertion Sort might be represented by f(n) = 2n² + 5 (I pulled the coefficients out of my backside), while the performance of Mergesort might be represented by g(n) = 4n * log₂(n) + 80. For small values of n, g(n) returns greater runtimes than f(n), but as the problem size increases, eventually f(n) will start reporting greater runtimes.

This is what the ArrayList documentation means by:

The constant factor is low compared to that for the LinkedList implementation.


Even though some of LinkedList's operations are conceptually as fast (are in the same complexity class) as the corresponding operations of ArrayList, ArrayList will almost always outperform LinkedList.
 
Don't touch me. And dont' touch this tiny ad:
RavenDB is an Open Source NoSQL Database that’s fully transactional (ACID) across your database
https://coderanch.com/t/704633/RavenDB-Open-Source-NoSQL-Database
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!