The program is simply a repeated matrix multiplication of two different matrices, with a third matrix. The multiplication series is repeated 100 times.

This time, I only tested *MatLab* and TONS.

Language | Time (seconds) | ||

MatLab |
9 | . | 2 |
---|---|---|---|

TONS | 20 | . | 5 |

TONS | 10 | . | 9 |

That looks more like a reasonable result from *MatLab*. The
matrix multiplication I implemented is a very fast hack, and it is far
from being fast. Obviously, *MatLab* has efficient multiplication
routines implemented, so even though their virtual machine, or
interpreter, is slow as molasses, *MatLab* is twice as fast as
TONS, on one CPU.

We scale almost linearly in performance, as the extra CPU is taken into use. This is because the amount of work done in the two independent loops (which of course the loop is transformed into as we add the second CPU) is the same. No node server waits for the other.

If we added a third CPU, it would never be taken into use. This code just does not parallelize onto three or more CPUs, with the current state of the TONS parallelizer. I do not see an easy way of parallelizing this program any further, at the virtual machine opcode level, without changing the order in which things are happening, which we refrain from.

We could however, sometime in the future, implement parallel versions
of the instructions, so that if nodes where available, the matrix
multiplication could run in parallel on several nodes. But there are
two things to this. It is not ``automatic parallelization'' in the
sense that the virtual machine code is parallelized, it is simply a
matter of exchanging the actual implementations of the instructions
with parallel ones. Secondly, implementing parallel matrix operations
in *C++* is way beyond the scope of this work. It is an area in
which there has been a lot of research, and it should be fairly simple
just to plug in well-known efficient parallel matrix routines, once we
get the actual free-node/busy-node communication done.

<