Go backward to
Block Matrix Multiplication
Go up to
Top
Go forward to
Distributed Shared Memory
Memory Access
Row algorithm has computation/memory ratio
O(n/p)
.
Block algorithm uses computation/memory ratio
O(n/
p
)
.
Block algorithm has higher
data locality
.
Cache performance of algorithm improves.
Large input matices.
Row algorithm: subsequent accesses to
B
cannot be cached
->O(n
3
/p)
memory operations.
Block algorithm: subsequent accesses to
B
can be cached
->O(n
3
/pc)
memory operations.
Important especially for
distributed shared memory
architectures.
Reduce average memory latency time by increasing locality
.
Author:
Wolfgang Schreiner
Last modification: November 15, 1996