A piece of *C* code that calculates an element-wise vector
product could look like:

double A[1000]; double B[1000]; double C[1000]; void vmult(double* a, double* b, double* c, int length) { { c += length; while(length--) *a++ = *b++ * *--c; } int main(int argv, char** argv) { /* some code here... */ /* A(1:1000) = B(1:1000) .* C(1000:-1:1) */ vmult(A, B, C, 1000); /* some more code here... */ /* C(1:500) = B(1:500) .* C(1000:-1:501) */ /* and */ /* C(501:1000) = B(501:1000) .* C(500:-1:1) */ vmult(C, B, C, 1000); }

The above code calculates the element-wise product of two vectors, and stores the result in a third vector. For two vectors and the routine will calculate the result . Note that the ordering of the second vector is reversed.

I put the equivalent *MatLab* code in the comments above the two
calls to the `vmult()` routine.

The first call `vmult(A, B, C, 1000)` does exactly what we
would expect it to. It stores the special element-wise vector product
result in the `A` array. This could fairly easily be
parallelized. We could execute the 100 multiplications on each node in
a 10 node cluster for example.

The second call however, has a two-line *MatLab* equivalent. It
produces a radically different result. The first 500 multiplication
results are stored in the `C` vector, which then is used as
*input* to the last 500 multiplications. This does not
parallelize as well as the previous example, since we require
knowledge of the first results in order to compute the last results.

Because we do not pass the argument values (eg. the elements of
`A`, `B` and `C`) to the function, but rather
point at the memory location to where the first elements reside, we
are completely unable to decide whether or not we can parallelize the
`vmult()` routine just by looking at the routine. We will need
complete knowledge about the location of the addresses passed to the
routine. This is bad, since this information often can not be
predicted at all.

One could then argue, that the *C* example should have been
written differently. Well, if one needs an element-wise vector
multiplication routine that reverses the order of the right-hand-side
vector, then the above *C* code is a compact (and simple to the
trained *C* programmer - however obscure to the untrained eye)
and very efficient implementation.

Besides from being difficult to parallelize without more knowledge
about the code than a compiler usually has, the *C* example is
also a pristine example of code that works as expected, until someone
uses it in a way that was not expected by the original implementer
(for example, someone wants to store the result in one of the argument
vectors, as in the second call to the routine in the example above,
but expects the result to be the product of the two arguments).

In the language designed here, similar code would look like:

entry "example" decl vector, ; first vm register r0 is a vector vector, ; second vm register r1 is a vector vector ; third vm register r2 is too ; some code here... ; multiply move r0, r1 ; r0 <- r1 mult r0, r2[999:-1:0] ; r0 <- r0 * r2[999:-1:0] ; some more code here... ; multiply again ; first we compute r2[0:499] <- r1[0:499] * r2[999:-1:500] move r2[0:499], r1[0:499] mult r2[0:499], r2[999:-1:500] ; then we compute r2[500:999] <- r2[0:499] * r2[499:-1:0] move r2[500:999], r2[0:499] mult r2[500:999], r2[499:-1:0] end

There is a perfectly clear relationship between the instruction, and
the data it depends upon or *uses*, and the data it modifies or
*touches*.

Also it is clear from the code what actually happens. It is easy to
see (at least for the virtual machine), that the results from the
second `mult` is re-used in the last `mult`.

In the virtual machine we do not have the equivalent of *C*
pointers. The user doesn't know where the data are located in memory,
and shouldn't need to know either. The fact that data may be located
on different machines also make the whole idea of pointers somewhat
ambiguous or inadequate. As a user, one only cares about the data,
not where the data physically resides.