This week's book giveaway is in the Programmer Certification forum.
We're giving away four copies of OCP Oracle Certified Professional Java SE 11 Programmer I Study Guide: Exam 1Z0-815 and have Jeanne Boyarsky & Scott Selikoff on-line!
See this thread for details.
Win a copy of OCP Oracle Certified Professional Java SE 11 Programmer I Study Guide: Exam 1Z0-815 this week in the Programmer Certification forum!
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Liutauras Vilda
  • Junilu Lacar
  • Jeanne Boyarsky
  • Bear Bibeault
Sheriffs:
  • Knute Snortum
  • Devaka Cooray
  • Tim Cooke
Saloon Keepers:
  • Tim Moores
  • Stephan van Hulst
  • Tim Holloway
  • Ron McLeod
  • Carey Brown
Bartenders:
  • Paweł Baczyński
  • Piet Souris
  • Vijitha Kumara

Effect of Doubling the Size on Running Time

 
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ryan McGuire wrote:As far as the math goes, I'm going to point at my first post in this thread:  Use the known data set size and time to determine a coefficient, and then use that coefficient to estimate the time for the larger data set size.


While this might seem true by citing specific examples, I doubt the math is logically sound. N log N is not a linear function wheres calculating for a coefficient like that assumes that it is.

If you were to argue that substituting lower bounds from 1 element to 2 elements (1 log 1 : 2 log 2) gives you 2 * (1 log 1) < 2 log 2

For the upper bound, you'd have to do some math with limits and inequalities to show that as N approaches positive infinity, 4 * (N/2 log N/2) approaches N log N.

At least that's how I'd approach making a sound mathematical proof based on my (probably faulty) recollection of my engineering math.
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
We do have at least one math expert around in Paul Clapham but I  dont know if he's been watching this thread or not. I'd bet he could give a definitive if not more convincing answer than I could.
 
Bartender
Posts: 3616
151
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
Or one could pay some more attention to what has been replied sofar.
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:Or one could pay some more attention to what has been replied sofar.


Well, I have read the replies but I haven't found one so far that makes me go "Yeah, that looks about right."

For example, I'm not really sure how you got to the conclusion with this. Not saying that it's incorrect but it doesn't jump out at me as obviously correct either.

Since 0 < log (2) / log (N) < 1, we get a ratio that is between 2 and 4.


But then again, all it might be is an insufficiency in my math skills.

 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Junilu Lacar wrote:N log N is not a linear function wheres calculating for a coefficient like that assumes that it is.


Wasn't planning on digging into Big O tonight but just wanted to double check my earlier statement and it appears that for pretty much all practical purposes, N log N is, in fact, more or less linear (quasilinear or linearithmic) so I guess calculating a coefficient the way Ryan showed is valid after all.
 
Marshal
Posts: 66266
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Junilu Lacar wrote:. . . quasilinear or linearithmic . . .

If you want nice clear terms that explain what they mean without your having to look in several dictionaries all giving slightly different definitions, look no further!

calculating a coefficient the way Ryan showed is valid after all.

Of course that is a valid calculation.
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:Of course that is a valid calculation.


Only because it's linearithmic. The general procedure he proposed, however, is incorrect for the other types of curves that are non-linear.

Ryan McGuire wrote:How to do these in general:

1. Create an equation that sets the known time equal to an unknown coefficient times the big O expression with the known value for N substitutes in.  
2. Solve that for the unknown coefficient.
3. Use that with the other known value for N to get the (reasonable estimate of) the time for that N value.


Of course, this doesn't work for the quadratic curves.

For example, take O(N²). That procedure would have you do this:
c * (500²) = 3ms
c = 3/25000

Applying that coefficient to solve for O(1000²):
3/25000 * (1000²) = 120ms

That is, of course, incorrect if you take what the 2nd column in the table says (increases by a factor of 2²) as the correct answer: 3ms * (2²) = 12ms

You can also compare those two results with the plots for the Quadratic family of curves at the link that I posted previously: http://science.slc.edu/~jmarshall/courses/2002/spring/cs50/BigO/index.html

The reason this jumped out at me as incorrect is because the "coefficient" he's solving for is simply the slope of the linear function (rise/run). For a linearithmic function like n log n, the slope of a similar line can be used as an approximation. The slope on a quadratic curve is not as straightforward though so the "generalization" he proposes is actually not applicable to anything that is non-linear that cannot be approximated by a line.
 
Campbell Ritchie
Marshal
Posts: 66266
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Junilu Lacar wrote:. . . incorrect for the other types of curves that are non-linear. . . .

I only checked that one calculation. For quadratic complexity, I would calculate 3 × 1000² ÷ 500², which should come to 12.
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:For quadratic complexity, I would calculate 3 × 1000² ÷ 500², which should come to 12.


Yes, that approach gives you consistent results for N³ as well: 3 * 2³ ==> 24ms, per the table and 3 * 1000³ / 500³ ==> 24ms
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
But that kind of begs the question "Why try to come up with all these other approaches for this exercise when the 2nd column already tells you what to do?"
 
Campbell Ritchie
Marshal
Posts: 66266
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
And, for the rather unusual n²logn complexity, I would wantabout 13.34ms.
 
Piet Souris
Bartender
Posts: 3616
151
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
All Ryan does is making the assumption that the time needed to perform one "operation" is independent of the number of operations. So, if the formula is N log(N), and it takes 3 ms to perform 500 log(500) operations, then one operation takes 3 / (500 log(500)) time. So, for a size of 2N, taking 2N log(2N) operations, we get a total time of 3 / (500log(500) * 1000log(1000).

What I did was to determine the ratio between the number of operations for size 2N and size N. As is immediately clear, that boils down to the same.

In the third column the "answer" is given, so for the NlogN the correct answer should be: between 6 and 12 ms.

For the N^2 log(N), if you determine the ratio, you get: 4 + 4log(2)/log(N), so in terms of the course: 4x < work < 8x.

@Campbell
you explained it more than once, but I am unable to remember it. How do you get all these nice sub- and superscripts and the other thingies?
 
Campbell Ritchie
Marshal
Posts: 66266
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Piet Souris wrote:. . . nice sub- and superscripts . . .

See this RO thread and here. You need to learn a few Unicode numbers; you will find superscripts and subscripts here. By the way, superscript 1 is &#xb9; not what I posted before. You can also try &sup1; = ¹.
 
Rancher
Posts: 1189
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Junilu Lacar wrote:But that kind of begs the question "Why try to come up with all these other approaches for this exercise when the 2nd column already tells you what to do?"



There are a handful of reasons I went into the math for figuring it out:
1. At least for myself, it's a lot easier to remember a single method to figure it out than to memorize a table.
2. By knowing the general method for figuring it out, you can figure out the estimates for other big O expressions that aren't in the table, such as the n²logn that Campbell contributed.
3. The 2x < work < 4x rubbed me the wrong way.  Even at the level of N=500 or 1000, the run time is multiplied by a factor of 2.2 (if I recall correctly).  As N continues to grow, the "4x" end of the range becomes meaningless.
4. The "increase by 1" for the O(logN) row in the table also rubbed me the wrong way.  As my first message in this thread suggested, giving a unitless number in this context is meaningless and can give wildly different results depending on what unit the original test run was measured in - 0.003 s, 3 ms, 3000 microseconds, 3000000 ns.  For t=3ms at N=500, the amount to be added for each doubling is 0.33 ms or so.
5. By working out the math, you can also estimate the time needed for data set sizes other than those that are some power of 2 larger than the known one.  e.g. If an O(N logN) algorithm takes 3 ms to complete for a data set of N=500, how long would you expect it to take for an N=50,000 data set?  Even if you say the that N increased by a factor of 100 which means it "doubled" between 6 and 7 times, you still need to apply the rule from the table 6 and 7 times and then estimate somewhere in between.  If you know the math, you can just work it out directly.


 
Campbell Ritchie
Marshal
Posts: 66266
250
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Ryan McGuire wrote:. . . 2x < work < 4x . . .

That part is correct for the figures given.

. . . "increase by 1" for the O(logN) . . .

Agree that part is at best confusing.

. . . 3 ms to complete for a data set of N=500, how long would you expect it to take for an N=50,000 . . .

At this point, you might start to move out of Big O notation; maybe 50000 executions will fill the available heap space and require multiple GC runs, causing space complexity to become its limiting factor. Repeated String concatenation is rather like that, though the Java9+ performance is much faster than in earlier versions.
 
Ryan McGuire
Rancher
Posts: 1189
16
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator

Campbell Ritchie wrote:

. . . 3 ms to complete for a data set of N=500, how long would you expect it to take for an N=50,000 . . .

At this point, you might start to move out of Big O notation; maybe 50000 executions will fill the available heap space and require multiple GC runs, causing space complexity to become its limiting factor. Repeated String concatenation is rather like that, though the Java9+ performance is much faster than in earlier versions.



Maybe if we increase the initial data set size by a factor of 3 or 5 or 12, instead of 100, we can avoid the big O expression becoming invalid while still making my point about the memorized table being restricted to only doubling the data.
 
Junilu Lacar
Marshal
Posts: 14399
240
Mac Android IntelliJ IDE Eclipse IDE Spring Debian Java Ubuntu Linux
  • Mark post as helpful
  • send pies
  • Quote
  • Report post to moderator
In my opinion, all that other stuff is mostly an exercise in overthinking. The table looks pretty straightforward to me and I would expect the student to be able to fill it less than 5 minutes, if that. I shouldn't be the one to talk about going overboard but in this case, it seems like wandering too far from the given instructions is no longer helpful to the OP.
 
Roses are red, violets are blue. Some poems rhyme and some don't. And some poems are a tiny ad.
Java file APIs (DOC, XLS, PDF, and many more)
https://products.aspose.com/total/java
  • Post Reply Bookmark Topic Watch Topic
  • New Topic
Boost this thread!