Go backward to Block Matrix Multiplication
Go up to Top
Go forward to Shared Memory Synchronization

Memory Access

Row algorithm has computation/memory ratio O(n/p).
Block algorithm uses computation/memory ratio O(n/sqrt(p)).
Block algorithm has higher data locality.
Cache performance of algorithm improves.
- Large input matices.
- Row algorithm: subsequent accesses to B cannot be cached ->O(n³/p) memory operations.
- Block algorithm: subsequent accesses to B can be cached ->O(n³/pc) memory operations.
Important especially for distributed shared memory architectures.

Reduce average memory latency time by increasing locality.

Author: Wolfgang Schreiner
Last Modification: October 27, 1997