Quest for the ultimate timer framework

A lot stuff on this blog talks about code optimization, and sometime very small improvement that have performance minimal impacts but in case they are called a lot of time it became difficult to ensure optimization are useful. Let’s take an example. Imagine you have a function that lasts one millisecond. You are optimizing this function and as a result you found two solutions to optimize your code . But if you are using a timer that lasts 0.5 milliseconds, you won’t be able to choose one of this other. The aim of this article is to help you to understand ???

Throughput of the algorithm

In this case

Pro:

  • Easy to implement
  • Can cover a full range of service, or the lifetime

Con

  • Can be difficult to implement as you invert the timing (aka how many cycles I’ve done in one minute for instance)
  • Initial conditions can be impossible to reproduce
  • Program should maintain this feature

Example:

  • ¬†Our ethminer with -m option.

Time with linux command

The time command, is an Unix/Linux standard line command. It will use the internal timer to return the time elapsed by the command.

time ls -l 

real	0m0.715s
user	0m0.000s
sys	0m0.004s

You know that in this blog I like to use example. Imagine you have a 3D application that do a lot complex mathematical calculus function (square root, cosines,..). You know that theses functions are called a lot of time (billion per second). As we have already seen a small improvement in this function can have very strong impact on all the program. Now you know two way to implement this calculus in C by using the standard mathematic library or in assembler which is a bit complex to do but you might achieve better performance. The method I’m gonna to present can be also used when you have two implementations of the same feature and you don’t know which one to choose. If you have to choose how fast is the new version, and do it worth the pain to do it in assembler as the code becomes difficult to maintain.

#include <math.h>
inline double Calc_c(double x,double y,double z){
double tmpx = sqrt(x)*cos(x)/sin(x);
double tmpy = sqrt(y)*cos(y)/sin(y);
double tmpz = sqrt(z)*cos(z)/sin(z);
return (tmpx+tmpy+tmpz)*tmpx+tmpy+tmpz
}

inline double Calc_as(double x,double y,double z){

    __m512d a1  _mm512_set4_pd(x,y,z,0.0);
    
}

We know that the assembler version will be faster but to which value ?

 

Leave a Reply