• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Need Advice on Improving Matrix Calculation Process in C

 
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi, I'm trying to do per-pixel image calculation. My program works, but not on my desire speed (fps).
I have an image as input and after several computation, I want to apply pixel transformation (below). No matter if I put it in a function directly in the function in the main file or if I put it in a struct and then call it, my computation process is very slow.
Without this calculation, I could get 40 fps and when I tried to put it directly into the function inside my main program that calculated the pixels (below) my fps was cut in a half (only 20fps).


And when I tried to put it on a struct (like below) and then call it, my fps dropped into 18-19 fps


I'm aware that the computation process also depends on the size of my image, but even with the same image I can get 40 fps when I disabled this calculation. Is there anyway I can improve this calculation ?
 
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Silvi

Without examining the code...

Probably a silly question, but have you enabled compiler optimisations, and tried playing with the various options to see which give the best results?

Are you aware of AVX - Advanced Vector Extensions? Basically it allows the code to do multiple calculations in parallel. Assuming your CPU supports this, ensure your compiler set to generate the best code for your CPU. And take a look at stuff like this:
https://software.intel.com/en-us/articles/using-avx-without-writing-avx-code

Cheers
John
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Looking at a bit of the code, specifically lines 40/41...

That doesn't look right
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:Hi Silvi

Without examining the code...

Probably a silly question, but have you enabled compiler optimisations, and tried playing with the various options to see which give the best results?

Are you aware of AVX - Advanced Vector Extensions? Basically it allows the code to do multiple calculations in parallel. Assuming your CPU supports this, ensure your compiler set to generate the best code for your CPU. And take a look at stuff like this:
https://software.intel.com/en-us/articles/using-avx-without-writing-avx-code

Cheers
John




Hi, thanks. Actually, now I'm working with Intel i7 + Linux terminal + NVIDIA 1080Ti GPU which should be pretty fast and I'm not familiar with compiler optimization, I'll take a look at it. Also, if possible I want to improve the code so I'm not dependent on machine
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:Looking at a bit of the code, specifically lines 40/41...

That doesn't look right



Indeed ! my bad, it's supposed to be I can't edit my post, that's why.
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
One thing that made a small improvement on my machine was to remove some redundant code, ie.
can (as I'm sure you're aware) be simplified to:
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Silvi Hasana wrote:I'll take a look at it. Also, if possible I want to improve the code so I'm not dependent on machine


Sure - really you should always start with the algorithm. But general compiler optimisations such as gcc's -On where n=2 or 3 are machine independent - typically they improve performance at the expense of code size, memory usage, or the ability to single-step through the code when debugging.
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:One thing that made a small improvement on my machine was to remove some redundant code, ie.
can (as I'm sure you're aware) be simplified to:



Oh wow, thanks! I feel so stupid, but it's indeed improved. It's 22 fps now from 20 fps.
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Using -O3 certainly makes a big difference on my machine, where I'm using gcc. But I expected that; something else that also made a significant difference (>30%) was the option -fsingle-precision-constant "Treat floating point constant as single precision constant instead of implicitly converting it to double precision constant". If you're not using gcc, look for an equivalent option for your compiler.
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.


...or instead of using numeric literals, use const variables eg.
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:Using -O3 certainly makes a big difference on my machine, where I'm using gcc. But I expected that; something else that also made a significant difference (>30%) was the option -fsingle-precision-constant "Treat floating point constant as single precision constant instead of implicitly converting it to double precision constant". If you're not using gcc, look for an equivalent option for your compiler.



Thanks, I tried with O3 optimization and there's no difference. As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:

John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.


...or instead of using numeric literals, use const variables eg.



Done ! It's 23 fps now ! It's small steps but it is improving and I learn a lot too. You're such a life saver.
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Silvi Hasana wrote:Done ! It's 23 fps now !


...but I think it will be using single precision ie. using those float variables instead of double precision constants means that all calculations are single precision. Using double precision constants results in double precision calculations.
 
John Matthews
Rancher
Posts: 508
15
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.


If you want/need double precision then all your float variables should be double.

Are you sure you need double? What is the effect - looks like you doing some sort of image processing, so is the difference visible? I'm curious
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Silvi Hasana wrote:

John Matthews wrote:

John Matthews wrote:If you're not using gcc, look for an equivalent option for your compiler.


...or instead of using numeric literals, use const variables eg.



Done ! It's 23 fps now ! It's small steps but it is improving and I learn a lot too. You're such a life saver.



I changed all literal (in another function too) with constant, I also changed the compiler flag from -O to -Ofast (Source here), now I get 28 fps. For now I don't know how "not accurate" is -Ofast, I'm working on it.
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

John Matthews wrote:

Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.


If you want/need double precision then all your float variables should be double.

Are you sure you need double? What is the effect - looks like you doing some sort of image processing, so is the difference visible? I'm curious



I have other functions too which really can't have single precision. But you're also correct, in the end the picture looks the same though. Now I'm having dilemma
 
Rania Bradbury
Greenhorn
Posts: 18
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Silvi Hasana wrote:

John Matthews wrote:

Silvi Hasana wrote:As for single point precision, yes I'm compiling with gcc. I tried it and I could get 29 fps but I'm working on a very small coordinate space and single precision isn't sufficient so I guess the flag isn't the best approach for me.


If you want/need double precision then all your float variables should be double.

Are you sure you need double? What is the effect - looks like you doing some sort of image processing, so is the difference visible? I'm curious



I have other functions too which really can't have single precision. But you're also correct, in the end the picture looks the same though. Now I'm having dilemma



ok, anyway I use the single precision option, lol. I tried the single precision flag before I read your other comment and then changed literal values to constant, and so far :
  • With single precision flag = 29 fps
  • With single precision flag + change literal value to constant variable + change -O to -Ofast = 29 fps


  • but.. why...  
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    Silvi Hasana wrote:but.. why...  


    It might not answer your question, but to clarify, the 'single precision flag' doesn't mean the compiler only generates single precision code - it just means that the numeric literals are only held as single precision instead of the default of double precision. But this has the effect of an 'overall single precision flag' in this code because all the other values in the expressions are also single precision (float). Hence the compiler generates single precision code.

    As soon as a double appears in an expression, eg. one of the numeric literals (without the single precision flag), the expression is calculated using double precision.

    Using const float variables for the numeric literals is just another way of ensuring that no doubles appear in the expressions, so single precision is used.

    Regarding -Ofast and -O3, it just might be that for this particular (relatively simple) code there is no difference.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    John Matthews wrote:Regarding -Ofast and -O3, it just might be that for this particular (relatively simple) code there is no difference.


    Just as an experiment, try -O0. That *should* make a difference (although not in a good way) - it certainly did for me.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    ...and here's a useful summary of C's type promotion rules:
    https://www.eskimo.com/~scs/cclass/int/sx4cb.html
     
    Rania Bradbury
    Greenhorn
    Posts: 18
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator

    John Matthews wrote:

    Silvi Hasana wrote:but.. why...  


    It might not answer your question, but to clarify, the 'single precision flag' doesn't mean the compiler only generates single precision code - it just means that the numeric literals are only held as single precision instead of the default of double precision. But this has the effect of an 'overall single precision flag' in this code because all the other values in the expressions are also single precision (float). Hence the compiler generates single precision code.

    As soon as a double appears in an expression, eg. one of the numeric literals (without the single precision flag), the expression is calculated using double precision.

    Using const float variables for the numeric literals is just another way of ensuring that no doubles appear in the expressions, so single precision is used.

    Regarding -Ofast and -O3, it just might be that for this particular (relatively simple) code there is no difference.



    Thanks a lot for detailed explanation! I completely understand now. Also, yes I tried -O0, only managed to get 14 fps.
     
    John Matthews
    Rancher
    Posts: 508
    15
    • Mark post as helpful
    • send pies
      Number of slices to send:
      Optional 'thank-you' note:
    • Quote
    • Report post to moderator
    Some other suggestions, aimed primarily at improving code quality rather than performance.

    Reduce variable scope by declaring them close to where they are used, and use const where appropriate to help the reader (ie. self-documenting code) and compiler (might result in further optimisations, although I suspect that won't happen here). In other words, instead of declaring all the local variables at the top of the function, use:
    And I wouldn't bother with variables r/g/b - just assign the relevant values direct to p1[0/1/2].

    p should be defined as a const unsigned char *, because what it's pointing to isn't, and shouldn't be, modified. This would detect bugs in the code such as storing a result in p[] instead of p1[] - it would give a compiler error. And I would rename p as src (source), p1 as dst (destination).

    Assuming this is C, not C++, then the compute function should be defined as compute(void). Although I think it would be better if it took arguments, eg.:
    Then either the image1 malloc() would need to be done before compute() was called, or the image1 argument could be a pointer-to-pointer with the malloc() inside the function.

    One other C-specific point - malloc() returns a void*, which doesn't need casting (it does in C++). The cast doesn't add anything, so it should be removed.
    reply
      Bookmark Topic Watch Topic
    • New Topic