Mike Simmons

Ranch Hand
+ Follow
since Mar 05, 2008
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Mike Simmons

Piet Souris wrote:Java allows you to write 50_000_000_000, to prevent all possible misery  ;

Well,  that still wouldn't prevent me from mixing up "million" and "billion" in my head, which I believe was the issue here.  But it's a good point nonetheless
@Piet - regarding Day 12, I had a similar problem, kept getting "too small" for my answer.  Finally discovered that for "fifty billion" I had typed 50000000 rather than 50000000000.  Not to imply that anyone *else* would ever be so foolish... but don't forget to recheck the little things that seem obvious.

The thing about day 12 is, I was actually pretty proud of the nice efficient implementation that I had, which will calculate each new generation with a minimum of computation.  Except... it turned out to be no help at all in the actual problem. :| . Classic example of premature optimization.

As for the rest, I'm way behind.  I expect to be coming back to these for some time though - they're pretty fun.
@Tim, I had the same experience for day 11 part 2.  I let version 1 keep running while I worked on some speed enhancements for part 2... but then it eventually completed before I was done with the revisions.  I lost motivation to continue optimizing after that since I'm behind on other problems.

But I don't think it's luck, exactly, that the answer is found early on.  More like regression to mean - as the rectangles get bigger and bigger, they become more "average" in a sense, taking in more of a mix of positive and negative values.  Still, it's at least theoretically possible a max could occur later, so we need to complete the search I guess.
2 months ago
Sorry, I misspoke - I meant that doing it kN times, proportionate to the length, becomes O(N^2) overall.  Each individual remove in an ArrayList is indeed O(N), unless it's at the end.

Of course this can also be slightly improved by using an ArrayDeque, O(1) at both ends, but that still doesn't solve the problem in the middle, which is what was needed for day 5.

And obviously, to use LinkedList effectively here you have to use its ListIterator, not any indexed method.  That's precisely why it worked so much better than ArrayList here.
2 months ago

Stephan van Hulst wrote:I really liked that one, because it's the first real example I've seen of a case where a LinkedList greatly outperforms an ArrayList.

Really?  That's a little surprising - I would guess that may just mean that the list lengths aren't usually big enough to notice how big the difference can be.  But as an example, my first solution to day 5 part 2 used a LinkedList, and part 2 runs in 0.25 seconds.  Replacing the LinkedList with an ArrayList causes the code to run in 3.2 seconds.  If you make the input string longer, it gets much worse, as it's fundamentally an O(N^2 ) operation to delete from the ArrayList anywhere but the end.

To be clear, I wasn't using the Lists as stacks, but rather, removing from the list as I went.  I eventually refactored to use a stack instead, which didn't seem to change performance substantially from the original 0.25 seconds - though other changes have gotten it down to about 0.18 seconds.  Hard to tell how much the stack itself contributed really.

Unfortunately I'm way behind since then on the other challenges.  Don't you people have anything else you need to be doing? ;)  I look forward to catching up over time.  Thanks for posting about it and bringing it to my attention.
2 months ago

Campbell Ritchie wrote:If you iterate the entry set, you may get a different order of iteration.

I should hope not.  From the javadoc for java.util.SortedMap:

The map is ordered according to the natural ordering of its keys, or by a Comparator typically provided at sorted map creation time. This order is reflected when iterating over the sorted map's collection views (returned by the entrySet, keySet and values methods).

2 months ago
Well, in the first place, I wouldn't be so concerned about creating a new Map now and then - sometimes it's warranted. Especially if the keys are changing, as was the case in your original code.  Didn't you originally want a TreeMap<Double, T> rather than a TreeMap<T, Double>?  If that's the ultimate goal, you might as well do it all at once, and the cost of a new map is essentially nothing, since you need to reinsert each entry anyway.  I originally did this with that custom remap() function I wrote on the fly, but I now realize Collectors.toMap() works as well:

However if you want a TreeMap<T, Double>, and are willing to mutate the input map (fine here since you just created it with the counting collector, and no one else has a reference) then you can use some creative casting to accomplish what you want even faster:

The above code generates warnings for unchecked casts, which can be cleaned up with the previously shown coerciveCast util method:

This works as long as you know the thing passed in really is a TreeMap, and the values can really be any Object type.  So the same map can have new types for its values.  But make sure no one tries to access the original map using the TreeMap<T, Long> reference, as that will probably throw ClassCastException if you try to access any Long values in the map - they aren't Longs any more.
3 months ago
Hi Piet - thanks for clarifications.

I'm not clear which code you're using that latest code with, but I don't believe it works.  The second argument to collectingAndThen() needs to be something like a  

Function<TreeMap<T, Long>, TreeMap<T, Double>>

whereas you have a

Function<Long, Double>
3 months ago
Or, just for fun, the condensed version :
3 months ago
Stefan, regarding your Accumulator code - nice!  

I see you're still doing a CDF, which is to say a map of observation to cumulative probability, rather than the inverse function.  I.e. Map<T, BigDecimal> rather than Map<BigDecimal, T>.  I think the inverse function is what Stefan actually needs, as previously noted, but I'll go with your interpretation here.

Also I've already noted my feelings on BigDecimal for this problem, but here I'll accept it and move on.  I guess there is a possible benefit in being able to pass in a MathContext, at least for some applications.

I found one small optimization to make in the combine() method, always merging the smaller map into the bigger one:

As for the general design... I see you're aggressively reusing the same Map instance throughout.  That can work.  But I feel it's imposing a lot of costs as well, having to do all your counting by adding BigDecimal.ONE for every single observation, when long would be much faster.  Also doing a TreeMap log(N) lookup for every access, while you really only need the sorted nature of the map after the counting has been done.  I'm thinking it's better to let Collectors.groupBy() and counting() do most of that work.  If BigDecimal is desired, that really only is needed for the division; it can be done in a downstream transformation.  And if we really want to reuse the map rather than recopying it, we can still do that too, with a little... ummm... questionable casting. ;)

Of course, if you reverse the map as I think Piet intended, then you might as well make at least one new Map along the way, since you need to map on different keys.  But here we're assuming that is not what is needed.
3 months ago
Hi Stephan!

Stephan van Hulst wrote:Strongly agreed. Always perform division as late as possible, and then prefer BigDecimal over Double, unless performance is important and accuracy isn't.

In this case, it's not that double is less accurate - it's that there is no benefit whatsoever to bringing in BigDecimal.  Unless we wanted more than 18 digits in the answer, or are concerned with overflowing the long - which is pretty much impossible here; we could really use an int.

Moreover, it's putting unnecessary work on the clients of this code, adapting to an unnecessarily convoluted API.  This isn't financial data - it's a probability distribution.  When someone uses this TreeMap, they're most likely going to be generating random numbers with Random.nextDouble() and using that as a key to look up the nearest values with floor() and ceiling().  One could convert these random numbers into BigDecimal I suppose - but why?  It's not going to be any more precise, just slower and harder to read.

I also confess to a general prejudice against BigDecimal even for financial data, as I find a properly written round(double val, int digits) method can fix most any problems, and is more readable.  Whereas rounding errors can creep into BigDecimal calculations too (especially where division is concerned, or a value is converted incorrectly from floating point to BigDecimal) but they're harder to find because all code involving BigDecimal is harder to read than equivalent code with double or long.  But that's a more general discussion - in this code, I don't see any reason for BigDecimal at all.

Stephan van Hulst wrote:Don't use raw types. Declare your type variable as T extends Comparable<? super T>.

Agreed, good catch.

Stephan van Hulst wrote:Use wildcards in method parameters: Map<? extends T, ? extends Long>. I don't feel strongly about doing this with final types (like Long), but it is more consistent and clearly conveys intent.

<br /> <br /> Why are there br symbols appearing here?  Seems to be a bug in the forum software - I didn't put them in, but I can't seem to remove them.  Anyway... <br /> <br /> Yeah, I'm less enthusiastic about this one as it makes method signatures even harder to read, but it can have benefits, so OK.

Stephan van Hulst wrote:Why is the size parameter a double? If the only reason is to perform an implicit cast, then you're sacrificing the principle of least astonishment to make your code just a tiny little bit less verbose.

I have a hard time seeing anything astonishing in that code, but that was the reason, yes.

Stephan van Hulst wrote:Personally I would have your remap() method operate on the current map, rather than a new one. It saves a lot of copying, and if you actually need a copy then just make the copy before you pass it into the method.

Hmmm, mutating the input map seems not very FP.

Actually I did consider that - but note that we're also changing the return type, transforming a Map<T, Long> to a Map<Double, T>.  We can hack that within the method with some creative casting tricks, but it won't be pretty.  And anyone retaining a reference to the original map will find some unpleasant surprises if they try to use it.  Also, since we're swapping the key and value, among other things, we need to remove and reinsert the entry anyway for the new key.  Seems much cleaner and more intuitive to do it in a new map.  

Stephan van Hulst wrote:Don't use the return values of assignments in enclosing expressions in any but the most common idioms, such as incrementing an index while accessing an array.

In cases where it might be overlooked, I agree.  But in a tiny loop like that it's pretty obvious.  This should be a common idiom.  You're lucky I bothered to put in curly braces around the loop body at all. ;)
3 months ago
Hi Piet, Stephen.  I'm enjoying this - I hope you are too.

Stephan van Hulst wrote:You use two collect() operations in your cdf() method. That's wasteful. Use a downstream collector to specify what should be done with the values of the map while the groupingBy() is doing its magic.

Aside from the extra collection, I'm a bit troubled by doing all the summation on floating-point values.  Seems like it would be less error prone to do the summation on longs, with nice precise values, and then divide by the observation count at the end.  In this way you can better guarantee that the total probability will be 1.0, rather than 0.999999999756 or some such.

Stephan van Hulst wrote:Why is T a Number? The only thing that your methods require about T is that it has a natural order.

Good catch, this is true.  To some extent, you may not even need this requirement, for sampling purposes - but it's nice to see the map in a sorted order.  Which I assume is why the extra TreeMap was put there.

Piet Souris wrote:The map.forEach() allows the very convenient (k,v) -> construct, and for that I'm willing to skip functional code. As you see, it is just a very clear short one-liner.

Yeah, that's a nice feature.  Conversely, I don't really like having to use a the one-element array trick to work around Java's lack of true closures - I find it's an additional readability impediment.  Where readability is often determined by how often you (and other readers) have seen a particular construct.  In this case, I favor returning to a more traditional loop so I can just update a normal variable.

Here are the tweaks I came up with, for your consideration:

Or, a slightly less performant but maybe more readable and reusable version:

Here both integrate() and remap() become more general-purpose utility methods.  The final call to remap could also be expanded to:

which is, once again, more readable, but slower.

In several places, we could also replace List and TreeMap with more general types like Collection and NavigableMap.  But I get used to the short versions of each, even when they're unnecessarily specific.
3 months ago

Stephan van Hulst wrote:Why is your CDF keyed by probability? A CDF returns a probability given an observation, so it makes sense to keep T as the key.

Well, an inverse CDF can also be useful - e.g. if you want to generate values that follow the original probability distribution, you can generate random numbers in [0, 1) and apply the inverse CDF.  If that's intent (or something similar), then fine.  However it's misleading to call it a CDF in this case - it's an inverse CDF, or quantile function.
3 months ago

This is a bit troubling though, as it relies on the assumption that the combiner (3rd arg of Collector.of()) will never be called, since we're not using a parallel stream.  Though truthfully, most all of these implementations rely on that sort of assumption - otherwise our sums would be totaled in a different order.  In comparison, an old-fashioned non-streaming solution would be clearer:

3 months ago
Here's a variation I came up with that's close to the original structure, but it avoids the requirement of knowing the list size.  Accordingly it uses an Iterable rather than List as input

Another variant, for fun:
3 months ago