Locality principles

Remember the two locality principles in Memoria. Temporal locality and spatial locality.

Temporal Locality

Some elements just are accessed many times in time. This is an example of a temporal locality.

Spatial locality

Some elements are accessed close to each other, this is an idea of spatial locality. In modern architectures, the a line of cache is usually 64 bytes.

For example consider this snippet:

sum = 0; 
for (i = 0; i < n; i++) sum += a[i]; 
return sum;

Sum is an example of temporal locality as the same memory location (or register) is accessed many times, and the access of the array a is an example of spatial locality. loops cycle through the same instructions, this is an example of temporal locality.

Other characteristics of cache

Cache structure

The structure of the cache can be described by mainly three parameters (usually powers of two).

  • The numbers of sets (cache lines)
  • The size of a single line.
  • An associativity index. (The least recently used block is removed between the associated caches). Then parts of the double is used to map into the cache structure (row, column). And part of it used for the tag, and you need to store the tag with the values. Also a valid bit is stored usually.

Types of cache misses

  • Cold (Compulsory) Miss: where there is no data in the cache, so it is impossible to have a hit.
  • Capacity: The cache is just not big enough to fit the data of the program.
  • Conflict: The data is badly organized, you evict some parts of the cache that you need to access again.

Usually a Hit costs about 4 cycles for L1, 12 for L2 while a miss on skylake L3 costs 200 cycles).

Writing policies

Cache Optimization-20250317112553854

Image from the slides.

Roofline model

See paper. The roofline model is a way to represent the performance of a system. It is a model that represents the performance of a system in terms of the arithmetic intensity and the peak performance of the system.

Cache Optimization-20250320172407372

Cache-Aware Roofline Model

The cache-aware roofline model is a way to represent the performance of a system in terms of the arithmetic intensity and the peak performance of the system, but also considering the cache hierarchy. Intel has chosen to implement this version of the Roofline. It is computed by a tool called Intel Advisor.

Sparse Matrix-Vector Multiplication