The Stalin Compiler

Stalin is a questionably named Scheme compiler written by Jeffrey Mark Siskind that can supposedly create binaries as fast or faster than Fortran or C for numerical problems. To test this, I tried creating a simple program to numerically integrate \sqrt{x} from 0 to 10000. To make things interesting, I used a manual Newton’s method implementation of sqrt from SICP.  The integration is done by a simple tail-recursive method.

The scheme code is very pretty:

(define (sqrt-iter guess x)
  (if (good-enough? guess x)
      guess
      (sqrt-iter (improve guess x)
                 x)))

(define (improve guess x)
  (average guess (/ x guess)))

(define (average x y)
  (/ (+ x y) 2))

(define (good-enough? guess x)
  (< (abs (- (* guess guess) x)) 0.001))

(define (mysqrt x)
  (sqrt-iter 1.0 x))

(define (int x acc step)
  (if (>= x 10000.0)
      acc
      (int (+ x step)
           (+ acc (* step (mysqrt x)))
           step)))

(write (int 0.0 0.0 .001))

I then converted this to C. It is pretty much a transliteration, except it uses a for loop instead of recursion to accumulate the values:

#include "stdio.h"

double improve(double guess, double x);
double average(double x, double y);

double sqrt_iter(double guess, double x){
  if( good_enough(guess, x))
    return guess;
  else
    return sqrt_iter( improve(guess,x), x);
}

double improve(double guess, double x){
  return average(guess, x/guess);
}

double average(double x, double y){
  return (x+y)/2;
}

int good_enough(double guess, double x){
  if (abs(guess*guess-x)<.001)
    return 1;
  return 0;
}

double mysqrt(double x){
  return sqrt_iter(1.0, x);
}

main(){
  double rez = 0;
  double x;
  double step = .001;
  for(x=0; x<= 10000; x+=step)
    rez += mysqrt(x)*step;
  printf("%f\n", rez);
}

I compiled the two methods via:

stalin -On -d -copt -O3 int.sc
gcc -O3 int.c

The results are:
Stalin: 1.90s
gcc: 3.61s

If you declare every method inline in C, you get:

gcc-inline: 3.28s

For reference, I also compiled the code with chicken scheme, using the -optimize-level 3 compiler flag.

Chicken Scheme: 27.9s.

Some issues:

  • The Scheme code is more appealing on the whole, though it suffers from a lack of infix notation: (< (abs (- (* guess guess) x)) 0.001) is definitely harder to read than abs(guess*guess-x)<.001.  I wonder if more familiarity with prefix code would reduce this.
  • All three methods give slightly different results, which is worrisome.  In particular, Stalin is correct to 6 digits, whilst gcc and chicken are correct to 7.
  • Stalin apparently does not include macros.  One would think it would be easy to use the macro system from a different scheme compiler, and then send the source to stalin for final compilation.
  • Stalin is extremely slow to compile.  In principle this isn’t a big deal: you can debug using a different scheme compiler.  Still, Stalin seems to be somewhat less robust to edge cases, than at least chicken scheme.
  • It is amazing that Scheme code with no type declarations can beat C by almost a factor of 2.
  • Though in principle Stalin produces intermediate c code, it is utterly alien and low-level.  I have not been able to determine exactly what options Stalin is using when it calls gcc on the source code.  That could account for some of the difference.
  • A detailed writeup on Stalin is here.

Mandelbrot in Scala

With my growing frustration with Matlab, I’ve been looking for a while for a language that was

  1. Garbage collected
  2. Good notation for numerical computation
  3. Fast enough for numerical computation

After a long search, I think I’ve finally found my home in Scala. Today I did a very trivial, self-contained computation of some images of the Mandelbrot set to test it out.

When learning Scala, I couldn’t find enough examples of numerical stuff, so I thought I would post this here for those potentially interested. Here’s a first attempt:

import java.io._
import java.lang.Math

object mandelbrot {
  // tiny complex number class including syntactic sugar for basic operations
  class Complex(val a: Double, val b: Double){
    // represents the complex number a + b*i
    def +(that: Complex) = new Complex(this.a+that.a,this.b+that.b)
    def *(that: Complex) = new Complex(this.a*that.a-this.b*that.b,this.a*that.b+that.a*this.b)
    def abs() = Math.sqrt(this.a*this.a + this.b*this.b)
  }

  def run(n: Int, level: Int) : Unit = {
    val out = new FileOutputStream("scalaimage.pgm")
    out.write(("P5\n"+n+" "+n+"\n255\n").getBytes())

    for {y0 <- 0 until n
	 x0 <- 0 until n }{
	   // y0 and x0 are in pixel integers
	   // x  and y  are real number coordinates
	   val x = -2.0 + x0*3.0/n
	   val y = -1.5 + y0*3.0/n

	   var z = new Complex(0,0)
	   var c = new Complex(x,y)
	   for(i <- 0 until level)
	     z = z*z + c 

	   if (z.abs < 2)
	     out.write(0);
	   else
	     out.write(255);
	 }
    out.close()
  }

  def main(args: Array[String]) {
    run(Integer.parseInt(args(1)), Integer.parseInt(args(0)))
  }
}

Not bad, right? Notice how painlessly we create the Complex class, along with the natural syntax for manipulating the numbers. What’s more, the result is very fast. For a 2048 by 2048 image, I get results in less than 30 seconds. (Just consider running this algorithm in Matlab. Ha!)

Save this to a file “mandelbrot1.scala”, and compile and run with:

scalac mandelbrot1.scala
scala mandelbrot 100 2048

The first argument is many times to try the mandelbrot iteration, and the second argument is how big of an image to output. The result is:

scalaimage11

(I manually converted the .pgm to .png to save space– click to get the full resolution version.)

A slightly more complex version follows. I added notation for the operators *= and +=, which speeds things up by about 10%. I also now color the pixels by how many iterations it takes the mandelbrot iterations to diverge.

import java.io._
import java.lang.Math

object mandelbrot {
  // tiny complex number class including syntactic sugar for basic operations
  class Complex(var a: Double, var b: Double){
    // represents the complex number a + b*i
    def +(that: Complex) = new Complex(this.a+that.a,this.b+that.b)
    def *(that: Complex) = new Complex(this.a*that.a-this.b*that.b,this.a*that.b+that.a*this.b)
    def abs() = Math.sqrt(this.a*this.a + this.b*this.b)
    def *=(that: Complex) ={
      val newa = this.a*that.a-this.b*that.b
      this.b = this.a*that.b+that.a*this.b
      this.a = newa
      this
    }
    def +=(that: Complex)={
      this.a += that.a
      this.b += that.b
      this
    }
  }

  def run(n: Int, level: Int) : Unit = {
    val out = new FileOutputStream("scalaimage.pgm")
    out.write(("P5\n"+n+" "+n+"\n255\n").getBytes())

    for {y0 <- 0 until n
	 x0 <- 0 until n }{
	   // y0 and x0 are in pixel integers
	   // x  and y  are real number coordinates
	   val x = -2.0 + x0*3.0/n
	   val y = -1.5 + y0*3.0/n

	   var z = new Complex(0,0)
	   var c = new Complex(x,y)
	   var i = 0
	   do {
	     z *= z
	     z += c
	     i += 1
	   } while( z.abs < 2 && i < level)

	   if (z.abs < 2)
	     out.write(0);
	   else
	     out.write( (i*255.0/level).toInt );
	 }
    out.close()
  }

  def main(args: Array[String]){
    run(Integer.parseInt(args(1)), Integer.parseInt(args(0)))
  }
}

Compile and run as before. The output is:

scalaimage2

I’m not sure how many people are using Scala for numerical applications, but it looks very good so far. There are a few minor tricks that turn out to be incredibly convenient. For example, if you define any function func for some class a, in a normal language you would call it by

a.func(b).

Scala defines this, but also gives you automatically the notation

a func b.

Adding in Scala’s non-mandatory type declarations (as above, notice that x0 and x aren’t declared to be Int or Double) tricks for implicit conversion between types, and it looks like it is possible to recreate almost all of Matlab’s synactic sugar for matrix manipulation, while actually allowing reasonable speed for non-vectorizable code, and reasonable facilities for abstraction. I hope to soon be an ex-Matlab user.

The scalala library looks very promising, though I haven’t taken the time to install it yet, and it looks like absolutely no one is using it. (dramage, give us some documentation!)