Side effects

Next: Introduction to the VM Up: The new language Previous: Data types Contents

Side effects

The main difference between this language and the more conventional imperative languages is, that side effects do not exist.

A piece of C code that calculates an element-wise vector product could look like:

double A[1000];
double B[1000];
double C[1000];

void vmult(double* a, double* b, double* c, int length) {
{
  c += length;
  while(length--)
    *a++ = *b++ * *--c;
}

int main(int argv, char** argv)
{
  /* some code here... */ 

  /* A(1:1000) = B(1:1000) .* C(1000:-1:1) */
  vmult(A, B, C, 1000);

  /* some more code here... */ 

  /* C(1:500) = B(1:500) .* C(1000:-1:501)    */
  /* and                                      */
  /* C(501:1000) = B(501:1000) .* C(500:-1:1) */
  vmult(C, B, C, 1000);

}

The above code calculates the element-wise product of two vectors, and stores the result in a third vector. For two vectors and the routine will calculate the result . Note that the ordering of the second vector is reversed.

I put the equivalent MatLab code in the comments above the two calls to the vmult() routine.

The first call vmult(A, B, C, 1000) does exactly what we would expect it to. It stores the special element-wise vector product result in the A array. This could fairly easily be parallelized. We could execute the 100 multiplications on each node in a 10 node cluster for example.

The second call however, has a two-line MatLab equivalent. It produces a radically different result. The first 500 multiplication results are stored in the C vector, which then is used as input to the last 500 multiplications. This does not parallelize as well as the previous example, since we require knowledge of the first results in order to compute the last results.

Because we do not pass the argument values (eg. the elements of A, B and C) to the function, but rather point at the memory location to where the first elements reside, we are completely unable to decide whether or not we can parallelize the vmult() routine just by looking at the routine. We will need complete knowledge about the location of the addresses passed to the routine. This is bad, since this information often can not be predicted at all.

One could then argue, that the C example should have been written differently. Well, if one needs an element-wise vector multiplication routine that reverses the order of the right-hand-side vector, then the above C code is a compact (and simple to the trained C programmer - however obscure to the untrained eye) and very efficient implementation.

Besides from being difficult to parallelize without more knowledge about the code than a compiler usually has, the C example is also a pristine example of code that works as expected, until someone uses it in a way that was not expected by the original implementer (for example, someone wants to store the result in one of the argument vectors, as in the second call to the routine in the example above, but expects the result to be the product of the two arguments).

In the language designed here, similar code would look like:

entry "example"
  decl  vector,   ; first vm register r0 is a vector
        vector,   ; second vm register r1 is a vector
        vector    ; third vm register r2 is too

  ; some code here...

  ; multiply
  move r0, r1             ; r0 <- r1
  mult r0, r2[999:-1:0]   ; r0 <- r0 * r2[999:-1:0]
  
  ; some more code here...

  ; multiply again

  ; first we compute  r2[0:499] <- r1[0:499] * r2[999:-1:500]
  move r2[0:499], r1[0:499]  
  mult r2[0:499], r2[999:-1:500] 
  ; then we compute  r2[500:999] <- r2[0:499] * r2[499:-1:0]
  move r2[500:999], r2[0:499]
  mult r2[500:999], r2[499:-1:0]

end

There is a perfectly clear relationship between the instruction, and the data it depends upon or uses, and the data it modifies or touches.

Also it is clear from the code what actually happens. It is easy to see (at least for the virtual machine), that the results from the second mult is re-used in the last mult.

In the virtual machine we do not have the equivalent of C pointers. The user doesn't know where the data are located in memory, and shouldn't need to know either. The fact that data may be located on different machines also make the whole idea of pointers somewhat ambiguous or inadequate. As a user, one only cares about the data, not where the data physically resides.

Next: Introduction to the VM Up: The new language Previous: Data types Contents

1999-08-09