This week's book giveaways are in the Jython/Python and Object-Oriented programming forums.
We're giving away four copies each of Machine Learning for Business: Using Amazon SageMaker and Jupyter and Object Design Style Guide and have the authors on-line!
See this thread and this one for details.
Win a copy of Machine Learning for Business: Using Amazon SageMaker and JupyterE this week in the Jython/Python forum
or Object Design Style Guide in the Object-Oriented programming forum!

Jonathan Graef

+ Follow
since Jan 27, 2020
Aspiring game engine and programming language developer with an interest in game theory and existential philosophy. Peace and happiness to my fellow maths who like games.
Cows and Likes
Total received
In last 30 days
Total given
Total received
Received in last 30 days
Total given
Given in last 30 days
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Jonathan Graef

Um, wut? I come from a background of logic and theoretical computing; I didn't know about CAS. It makes sense that RAM would be striped over multiple internal buses, I think. Maybe.

I read online that DDR3 RAM has around 15 nanosecond latency round trip. That's 66mhz. Divide by a reasonable 15 memory accesses per loop and you get 4.4mhz of memory accesses that can't be predicted.

I got 5mhz on my benchmark, so I would go so far as to say that Java is predicting hardly anything.

Long-story short: RAMs are latent, you need branch prediction and array preloading to get that sweet 2 billion instruction count, and since Java didn't detect the loop counter in this benchmark, it wasn't predicting which element needed to be preloaded before array[ai] was accessed. It may not do this every time, but I'm hopping to LuaJiT, which gets my 2ghz on the button. They predict it, I guess.
1 week ago
Um, as I understand it, you load the two operands from memory, which is as fast as the CPU, then perform the operation. That's 3 ticks. What's java doing, compiling itself?

I gave it 100 for the possibly bs notion that memory access is slow. Or maybe java just compiles itself every time we change a variable and then runs for no reason. (no reason because the variables would all be compiled in statically if they recompile every time)

And what about C? It doesn't have a JiT.


If you're getting 30mhz, they're probably getting 40. mhz. In C?

There's a full spectrum of possibilities, but I don't think activating naiive bench optimizations is going to cure our miserable Minecraft FPS. Or the experience. But we don't like to talk about that.
1 week ago
You need to change the two occurrences of 27l to 28l (one at the bottom), 27.0 to 28.0 in "Ticks per loop", and 27 to 28 in "Operations per loop". That is all.

It computes the actual number of CPU ticks per Java opcode that you are getting based on this number (but average, so e.g. array access is probably higher than addition), taking into account your PC's background processes are included in the count (check your task manager and multiply to get the correct number if you care, ticks * (1 - bitcoin / 100), where bitcoin is the percent 0-100 idle in background).

Ticks per loop tells you the total number of ticks it takes to loop once, by multiplying the average ticks per (Java bytecode) operation by 28.0.
2 weeks ago
Stephan said it was 28 Java bytecode instructions per loop. That's off by one from my 27 estimate.

The numbers in my previous post were his estimate in processor ticks, not real results. I added it up, and gave it some pad: 10 "real" (actually, there's a multiplier) instructions per Java, instead of 4. My real results were 8318 ticks per loop, ~300 ticks per operation. As you can see, there's a disparity there.
2 weeks ago
The 28 bytecode instructions roughly correspond to high-level language instructions, which is what I counted in the loop. But two orders of magnitude difference from 10 to 1000 for the two array accesses adds up to 2260 ticks per loop, or 80 ticks per operation average, in which case you would get "Java has performant code!" at the end of the benchmark.

According to Carey Brown above, a good PC gets ~150 ticks per operation average. DDR3 runs at the same clock speed as the CPU (modern RAM is actually faster), so that isn't the bottleneck. Maybe the school of bus failed or something. But in my opinion, either some extra cruft has been written by the Java devs, or the JiT is compiling late as you mentioned.

As for tools like jmh, I feel they fail to simulate a real app, as they first warm up the JiT after making any changes to variables, thus making the entire output of the app a fixed value. If I understand correctly.
2 weeks ago
You may be right. My "background processes," what they may be I'm not sure, are eating up 20-50% of my processor. So multiply the 10Mhz in my benchmark by that and you get 20Mhz, I might be able to get Arduino speeds out of HotSpot JVM if I ditch unwanted backwares.
2 weeks ago
I added a new JIT warmup phase. I also upped the process priority to High under Task Manager.


Here were my new results:

As you can see, there has been a performance improvement of around 30%, which brings us up to a better 300 instructions per operation. But that's still a lot. Where's my gigahertz? Arduinos are advertised to be faster than this...
3 weeks ago
You're getting way better performance than I am: 140 ticks per high-level language operation (plus, assignment, variable read, etc.). But why not 1-4, plus a few (5-10) for cleaning up CPU registers?

For example, I counted ai++ as 4 HLL operations: read ai, read 1 constant, add, and assign to ai. That's what the 27 operations per loop means: I counted all of them. But you're getting 4000 ticks per loop, not 27. That's down to a much more reasonable level, but the cutoff I put in the code is <= 100 ticks/HLL operation for no performance gap (see line 91 of the code).

TLDR; It says performance gap if there are more than 100 ticks per thingy you see in the source.
3 weeks ago
I counted. There are 27 statements, operators, and variable reads. Those roughly correspond to CPU instructions. So what I want to know is, where are the other 396 ticks doing, if each instruction takes 4 ticks?

I know java does some things for you. Array bounds checking. Double pointer. Um... Goto statement at the end of the loop? I counted that. I honestly can't think of anything else it could be doing. Where are my CPU cycles going in java?

C has the same performance problem - see Has Intel misreported its specs? They do say real performance may vary, but every once in awhile they advertise their fancy 1-4 tick instructions. Then there's the clock multiplier, not counted in my benchmark. See you processor family on: i7 has 20-35 lil' ones per clock cycle.
3 weeks ago
That's what the code does to compute the number of ticks. However Intel says modern CPUs can do floating-point math in 1-4 ticks. My code uses integer math, which should be faster. Does it really take 1000+ ticks to access or update an element of an array? What is it doing, mining for bitcoin?
3 weeks ago
This is my first post here, so here we go:

I did a benchmark for java array handling, and it came out looking like I have a rootkit. Does anybody know what java does behind the scenes when iterating over an array?

Here's the code:

You can run it yourself in the file here:

Here were my results:

I was expecting this, having run similar benchmarks before, but as you can see, my processor has an internal advertisment for 2 gigahertz. I get 5 megs. Where did they all go?
4 weeks ago