Go backward to Row/Column-Oriented Matrix Multiplication Go up to Top Go forward to Hypercube Algorithm |
See Quinn, Figure 7-15.
A[i]
and B[i]
on every processor P[i]
.
P[i]
do:
p = (i+1) mod N j = i for k=0 to N-1 doC[i][j]
=A[i]
*B[j]
j = (j+1) mod N ReceiveB[j]
fromP[p]
C[i]
Point-to-point communication -> Step 2 takes O(N) time.