On Compiler

Adding compilation flags to gcc not always makes it faster, it just enables a specific set of optimization methods. It’s also good to turn on platform specific flags to turn on some specific optimization methods to that architecture. Remember that compilers are conservative, meaning they do not apply that optimization if they think it does not always apply.

What are they good at

Compilers are good at: mapping program to machine ▪ register allocation ▪ instruction scheduling ▪ dead code elimination ▪ eliminating minor inefficiencies

Common limitations

Compilers are not good at: algorithmic restructuring ▪ for example to increase ILP, locality, etc. ▪ cannot deal with choices Compilers are not good at: overcoming “optimization blockers” ▪ potential memory aliasing ▪ potential procedure side-effects

Common optimizations

Done by the compiler

  • Code motion: Usually compilers are good at this (static function dependency analysis)
  • Strength reduction The idea here is to substitute the same operation with something that is faster, for example $16 \cdot x = x \ll 4$.
  • Function inlining: most of the time it is able to do compiler inlining.
    • Not always done. It could be side effects or other things that are not provable that slow it down anyways.
  • Memory accesses: if there are useless memory accesses, that would be very hard to optimize automatically. (they do not handle aliasing).
    • In some cases the compiler can do some aliasing checks.
    • There are some hints (restrict, pragma ivdep, or some other compiler flags that assume there is no alias anywhere)

What the compiler cannot do

  • Common subexpression optimization if you need too many arithmetic dependencies, often the compiler is not able to do it.
  • Scalar replacement: usually you have to do it by yourself, because compiler cannot assume memory aliasing.

Vector Instructions

Vector instructions just do the same operation to a set of data (usually 2,4,8…). Many operations on images are the same but on different data, this is also why vector instructions are used. Another one is linear algebra since they assume independent operations on a lot of data. So many many operations could benefit from vector instructions. Historically, it was introduced on MMX on integers, then SSE with 128 bits, then AVX family with 256 and now AVX-512, for floating points. There are many many operations in this family, it has also made obsolete common optimization frameworks explored in Fast Linear Algebra.