Why does the order of the loops affect performance when iterating over a 2D array ?


loop 2d array

Why does the order of the loops affect performance when iterating over a 2D array ?

When fetching a certain element of a matrix from memory, elements near it(15 ints) will be fetched as well and stored in a cache line(64 bytes = 16 ints).

For simplicity, assume the cache comprises a single cache line which can contain 2 matrix elements and that when a given element is fetched from memory, the next one is too. Say we want to take the sum over all elements in the example 2x2 matrix above (lets call it M):

Exploiting the ordering (e.g. changing column index first in c++):

M[0][0] (memory) + M[0][1] (cached) + M[1][0] (memory) + M[1][1] (cached)
= 1 + 2 + 3 + 4
--> 2 cache hits, 2 memory accesses

Not exploiting the ordering (e.g. changing row index first in c++):

M[0][0] (memory) + M[1][0] (memory) + M[0][1] (memory) + M[1][1] (memory)
= 1 + 3 + 2 + 4
--> 0 cache hits, 4 memory accesses

Reference

History

  • 2019/11/08: created.

Author: kezunlin
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint polocy. If reproduced, please indicate source kezunlin !
评论
  TOC