John Matthews wrote:Hi Silvi
Without examining the code...
Probably a silly question, but have you enabled compiler optimisations, and tried playing with the various options to see which give the best results?
Are you aware of AVX - Advanced Vector Extensions? Basically it allows the code to do multiple calculations in parallel. Assuming your CPU supports this, ensure your compiler set to generate the best code for your CPU. And take a look at stuff like this:
https://software.intel.com/en-us/articles/using-avx-without-writing-avx-code
Cheers
John
John Matthews wrote:Looking at a bit of the code, specifically lines 40/41...
That doesn't look right
Silvi Hasana wrote:I'll take a look at it. Also, if possible I want to improve the code so I'm not dependent on machine
John Matthews wrote:One thing that made a small improvement on my machine was to remove some redundant code, ie.
can (as I'm sure you're aware) be simplified to:
John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.
John Matthews wrote:Using -O3 certainly makes a big difference on my machine, where I'm using gcc. But I expected that; something else that also made a significant difference (>30%) was the option -fsingle-precision-constant "Treat floating point constant as single precision constant instead of implicitly converting it to double precision constant". If you're not using gcc, look for an equivalent option for your compiler.
John Matthews wrote:
John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.
...or instead of using numeric literals, use const variables eg.
Silvi Hasana wrote:Done ! It's 23 fps now !
Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.
Silvi Hasana wrote:
John Matthews wrote:
John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.
...or instead of using numeric literals, use const variables eg.
Done ! It's 23 fps now ! It's small steps but it is improving and I learn a lot too. You're such a life saver.
John Matthews wrote:
Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.
If you want/need double precision then all your float variables should be double.
Are you sure you need double? What is the effect - looks like you doing some sort of image processing, so is the difference visible? I'm curious
Silvi Hasana wrote:
John Matthews wrote:
Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.
If you want/need double precision then all your float variables should be double.
Are you sure you need double? What is the effect - looks like you doing some sort of image processing, so is the difference visible? I'm curious
I have other functions too which really can't have single precision. But you're also correct, in the end the picture looks the same though. Now I'm having dilemma
Silvi Hasana wrote:but.. why...
John Matthews wrote:Regarding -Ofast and -O3, it just might be that for this particular (relatively simple) code there is no difference.
John Matthews wrote:
Silvi Hasana wrote:but.. why...
It might not answer your question, but to clarify, the 'single precision flag' doesn't mean the compiler only generates single precision code - it just means that the numeric literals are only held as single precision instead of the default of double precision. But this has the effect of an 'overall single precision flag' in this code because all the other values in the expressions are also single precision (float). Hence the compiler generates single precision code.
As soon as a double appears in an expression, eg. one of the numeric literals (without the single precision flag), the expression is calculated using double precision.
Using const float variables for the numeric literals is just another way of ensuring that no doubles appear in the expressions, so single precision is used.
Regarding -Ofast and -O3, it just might be that for this particular (relatively simple) code there is no difference.
Did you see how Paul cut 87% off of his electric heat bill with 82 watts of micro heaters? |