Eccentric Developments


Very Slow Data-Oriented Design

This article ended up having a much different outcome than what was initially expected. The idea was to improve the code for the path tracer I have been writing in previous articles, using a Data-oriented design approach to improve the overall rendering speed.

By changing the scene data structure and the sphere tracing function, the expected outcome was that the Javascript engine would be able to better cache the scene and vectorize the code using SIMD operations; by leveraging the change to a data-oriented approach.

As you have guessed up to this point, this didn't work as expected, Javascript didn't vectorize the code and to make things worse, the performance tanked.

The changes made

I made some changes to the scene data structures and the code working on the objects in it (only spheres for now), to be arrays and array operations.

More specifically, I changed the spheres data structure from an array of objects to several arrays of values, as shown bellow.

This structure used initially:

[
    {
        center: [1, 2, 3],
        radius: 4,
    },
    {
        ... //other spheres
    },
];

Was updated to look like this:

{
    centerX: [1, ...],
    centerY: [2, ...],
    centerZ: [3, ...],
    radius:  [4, ...],
}

With this new data structure, instead of checking for instersections one sphere at a time, the algorithm is updated to check for intersections against all the spheres "at the same time".

I quoted "at the same time" because, the algorithm still works with one single data value at a time, but hints at the interpreter that it could instead use the much optimized SIMD instructions. Cue the add function bellow:

const add = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] + B[i];
    }
};

What I expected to happen

This functions is very simple, take two arrays and sum them element-wise. Which can be implemented with ADDPS, available in SSE with variants avalailable in AVX and Neon VADD.

The expectation is that Javascript would detect such optimizations and implement them in the code, as I mentioned, modern compilers already do this.

This does not happen in Javascript, and the result is much worse.

Javascript shenanigans

Javascript does not do the optimizations I expected, and do things its own way, and that means:

Javascript Arrays are not real arrays

They are Array-like objects (MDN) that give special treatment to numeric properties (spec). As such, accessing an element goes through a lengthy set of instructions.

Instead of doing the expected set of operations other programming languages do when accessing an array using an index, i.e. checking the array bounds, calculating the memory location and returning the value, Javascript does so much more than that (spec); also, there is no guarantee the elements are even in contiguous memory locations.

At this point I though: let's use TypedArrays, I can use the Float32Array which is backed by an ArrayBuffer and should behave similar to regular arrays, in other words, items would be stored in contiguous memory locations (cache friendly), and accessing elements should not go through all the checkings that the other Arrays do. But then...

Javascript normalizes values read/written from/to TypedArrays

Surprise! Performance is even worse this time. Not only the Javascript engine does not vectorize the code, there are some hidden costs involved, the most egregious one being value encoding and normalization.

This means that, whenever a value is read from a TypedArray, it will be converted when used in an operation with a regular Javascript Number, also, when a value is going to be written to a TypedArray, it will be converted to the expected type. All this will impacting performance, but in a much harder way than anticipated; again, other programming languages do this all the time (see type coercion), but Javascript goes a bit beyond by making Numbers follow the prototype inheritance pattern: (123).toExponential() and (123).__proto__ in your console.

I expected this normalization to be avoided by using a Float64Array, as Javascript uses 64bit numbers, but it didn't help at all, you can try it in the code example.

Javascript engines optimization seems not there for TypedArrays

This is more like a guess at this point, but the developers discussing this same issue here, talk about what could be causing this performance discrepancy and think the problem is caused by lack of optimization. I agree on this point of view, the ArrayBuffer concept is very simple and given that trying to avoid allocations (see my memory implementation in the path tracing code) didn´t improve performance, seems to indicate that they are treated like second class citizens by the Javascript engine.

Full code

This is the code with all the changes I've mentioned so far. Some things to try:

  • Update the pipeline to use createTraceFunction instead of createTraceSIMDFunction.
  • Update the createMemoryFunction to return new Array(size) instead of allocating a Float32Array from the shared buffer space.
  • Not cry.
function createTriangleIntersectFunction(args) {
  const {
    vector: { add, scale, sub, permute, maxDimension, abs },
  } = args;
  const triangleIntersect = (ray, triangle) => {
    let pt0_t = sub(triangle.pt0, ray.origin);
    let pt1_t = sub(triangle.pt1, ray.origin);
    let pt2_t = sub(triangle.pt2, ray.origin);
    const k = maxDimension(abs(ray.origin));
    const i = (k + 1) % 3;
    const j = (k + 2) % 3;

    const pd = permute(ray.direction, i, j, k);
    const sz = 1.0 / pd[2];
    const sx = -pd[0] * sz;
    const sy = -pd[1] * sz;

    pt0_t = permute(pt0_t, i, j, k);
    pt1_t = permute(pt1_t, i, j, k);
    pt2_t = permute(pt2_t, i, j, k);

    const pt0_t_0 = pt0_t[0] + sx * pt0_t[2];
    const pt0_t_1 = pt0_t[1] + sy * pt0_t[2];
    const pt0_t_2 = pt0_t[2] * sz;
    const pt1_t_0 = pt1_t[0] + sx * pt1_t[2];
    const pt1_t_1 = pt1_t[1] + sy * pt1_t[2];
    const pt1_t_2 = pt1_t[2] * sz;
    const pt2_t_0 = pt2_t[0] + sx * pt2_t[2];
    const pt2_t_1 = pt2_t[1] + sy * pt2_t[2];
    const pt2_t_2 = pt2_t[2] * sz;

    const e0 = pt1_t_0 * pt2_t_1 - pt1_t_1 * pt2_t_0;
    const e1 = pt2_t_0 * pt0_t_1 - pt2_t_1 * pt0_t_0;
    const e2 = pt0_t_0 * pt1_t_1 - pt0_t_1 * pt1_t_0;

    if (
      (e0 < 0.0 || e1 < 0.0 || e2 < 0.0) &&
      (e0 > 0.0 || e1 > 0.0 || e2 > 0.0)
    ) {
      return { hit: false };
    }

    const det = e0 + e1 + e2;
    if (det === 0.0) {
      return { hit: false };
    }

    const t_scaled = e0 * pt0_t_2 + e1 * pt1_t_2 + e2 * pt2_t_2;


    const t = t_scaled / det;

    if (t > 0.007) {
      const point = add(scale(ray.direction, t), ray.origin);
      return {
        hit: true,
        distance: t,
        point,
        normal: triangle.normal,
      };
    }

    return {
      hit: false,
    };
  };

  return {
    triangleIntersect,
  };
}

function createScene(args) {
  const {
    vector: { sub, unit, cross },
  } = args;
  const scene = [
    {
      center: [0, -10002, 0],
      radius: 9999.0,
      color: [1.0, 1.0, 1.0],
      isLight: false,
      isSphere: true,
    },
    {
      center: [-10012, 0, 0],
      radius: 9999.0,
      color: [1, 0, 0],
      isLight: false,
      isSphere: true,
    },
    {
      center: [10012, 0, 0],
      radius: 9999.0,
      color: [0, 1, 0],
      isLight: false,
      isSphere: true,
    },
    {
      center: [0, 0, 10012],
      radius: 9999.0,
      color: [1, 1, 1],
      isLight: false,
      isSphere: true,
    },
    {
      center: [0, 10012, 0],
      radius: 9999.0,
      color: [1, 1, 1],
      isLight: true,
      isSphere: true,
    },
    {
      center: [-5, 0, 2],
      radius: 2.0,
      color: [1, 1, 0],
      isLight: false,
      isSphere: true,
    },
    {
      center: [0, 5, -1],
      radius: 4.0,
      color: [1, 0, 0],
      isLight: false,
      isSphere: true,
    },
    {
      center: [8, 5, -1],
      radius: 2,
      color: [0, 0, 1],
      isLight: false,
      isSphere: true,
    },
  ].map((obj, id) => {
    if (obj.isTriangle) {
      const edge0 = sub(obj.pt1, obj.pt0);
      const edge1 = sub(obj.pt2, obj.pt0);
      obj.normal = unit(cross(edge0, edge1));
    }
    obj.id = id;
    return obj;
  });
  const spheresVector = scene.filter(obj => obj.isSphere).reduce((acc, sp) => {
    acc.centerX.push(sp.center[0]);
    acc.centerY.push(sp.center[1]);
    acc.centerZ.push(sp.center[2]);
    acc.radius.push(sp.radius);
    acc.id.push(acc.id);
    return acc;
  }, {
    centerX: [],
    centerY: [],
    centerZ: [],
    radius: [],
    id: [],
  });
  return {
    scene,
    spheresVector
  };
}

function createRandomDirectionFunction(args) {
  const {
    vector: { unit, dot },
  } = args;
  const randomDirection = (normal) => {
    while (true) {
      let p = unit([2 * Math.random() - 1, 2 * Math.random() - 1, 2 * Math.random() - 1]);
      if (dot(p, normal) >= 0) {
        return p;
      }
    }
  };
  return {
    randomDirection,
  };
}

function createMemoryFunctions() {
  const buffer = new ArrayBuffer(1024);
  let offset = 0;
  const allocFloat32Array = (size) => {
    const res = new Float32Array(buffer, offset, size);
    offset += size * 4;
    return res;
  };
  const free = () => offset = 0;
  return {
    memory: {
      allocFloat32Array,
      free,
    },
  }
}

function createSphereIntersectSIMDFunction(args) {
  const {
    simd: { add, fill, sub, mulAdd, mul, sqrt, min, div },
    vector,
    memory: {
      allocFloat32Array,
      free,
    }
  } = args;
  const sphereIntersectSIMD = (ray, spheresVector) => {
    const len = spheresVector.id.length;
    const OCX = allocFloat32Array(len); fill(ray.origin[0], OCX);
    const OCY = allocFloat32Array(len); fill(ray.origin[1], OCY);
    const OCZ = allocFloat32Array(len); fill(ray.origin[2], OCZ);
    const rayDirectionX = allocFloat32Array(len); fill(ray.direction[0], rayDirectionX);
    const rayDirectionY = allocFloat32Array(len); fill(ray.direction[1], rayDirectionY);
    const rayDirectionZ = allocFloat32Array(len); fill(ray.direction[2], rayDirectionZ);
    sub(OCX, spheresVector.centerX, OCX);
    sub(OCY, spheresVector.centerY, OCY);
    sub(OCZ, spheresVector.centerZ, OCZ);
    const A = allocFloat32Array(len); fill(vector.dot(ray.direction, ray.direction), A);
    const B = allocFloat32Array(len); fill(0, B);
    mulAdd(OCX, rayDirectionX, B);
    mulAdd(OCY, rayDirectionY, B);
    mulAdd(OCZ, rayDirectionZ, B);
    const MO = allocFloat32Array(len); fill(-1, MO);
    mul(B, MO, B);
    const C = allocFloat32Array(len); fill(0, C);
    mulAdd(OCX, OCX, C);
    mulAdd(OCY, OCY, C);
    mulAdd(OCZ, OCZ, C);

    const MR = allocFloat32Array(len); mul(spheresVector.radius, spheresVector.radius, MR)
    sub(C, MR, C);
    const BB = allocFloat32Array(len); mul(B, B, BB);
    const AC = allocFloat32Array(len); mul(A, C, AC);
    const DIS = allocFloat32Array(len); sub(BB, AC, DIS);
    const MASK = allocFloat32Array(len);
    for (let i = 0; i < len; i++) {
      MASK[i] = DIS[i] < 0 ? 0 : 1;
    }

    const E = allocFloat32Array(len); sqrt(DIS, E); //There will be a bunch of NaN here
    mul(E, MASK, E);
    const T1 = allocFloat32Array(len); sub(B, E, T1);
    const T2 = allocFloat32Array(len); add(B, E, T2);
    div(T1, A, T1);
    div(T2, A, T2);

    min(T1, T2, T1)
    mul(T1, MASK, T1);
    free();
    return T1;
  };
  return {
    sphereIntersectSIMD,
  };
}

function createSphereIntersectFunction(args) {
  const {
    vector: { sub, dot, scale, unit, add },
  } = args;
  const sphereIntersect = (ray, sphere) => {
    const oc = sub(ray.origin, sphere.center);
    const a = dot(ray.direction, ray.direction);
    const b = dot(oc, ray.direction);
    const c = dot(oc, oc) - sphere.radius * sphere.radius;
    const dis = b * b - a * c;

    if (dis > 0) {
      const e = Math.sqrt(dis);
      let t = (-b - e) / a;
      if (t > 0.007) {
        const point = add(scale(ray.direction, t), ray.origin);
        return {
          hit: true,
          distance: t,
          point,
          // This is the new code to calculate the normal
          normal: unit(sub(point, sphere.center)),
        };
      }

      t = (-b + e) / a;
      if (t > 0.007) {
        const point = add(scale(ray.direction, t), ray.origin);
        return {
          hit: true,
          distance: t,
          point,
          // This is the new code to calculate the normal
          normal: unit(sub(point, sphere.center)),
        };
      }
    }
    return {
      hit: false,
    };
  };
  return {
    sphereIntersect,
  };
}

function createAspectRatioFunction() {
  return {
    aspectRatio: (width, height) => {
      let gcd = width;
      let reminder = height;
      while (reminder != 0) {
        const temp = reminder;
        reminder = gcd % reminder;
        gcd = temp;
      }
      return [width / gcd, height / gcd];
    },
  };
}

function createCamera(args) {
  const { width, height, aspectRatio } = args;
  const [w, h] = aspectRatio(width, height);
  return {
    camera: {
      leftTop: [-w, h + 1, -50.0],
      rightTop: [w, h + 1, -50.0],
      leftBottom: [-w, -h + 1, -50.0],
      eye: [0.0, 0.0, -65.0],
    },
  };
}

function createImageGeometry({ width, height }) {
  return {
    imageGeometry: {
      width,
      height,
    },
  };
}

function createSIMDFunctions() {
  const add = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] + B[i];
    }
  };
  const fill = (b, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = b;
    }
  };
  const sub = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] - B[i];
    }
  };
  const mulAdd = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] += A[i] * B[i];
    }
  };
  const sqrt = (A, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = Math.sqrt(A[i]);
    }
  };
  const scale = (A, b, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] * b;
    }
  };
  const div = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] / B[i];
    }
  };
  const mul = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] * B[i];
    }
  }
  const min = (A, B, OUT) => {
    const len = OUT.length;
    for (let i = 0; i < len; i++) {
      OUT[i] = A[i] < B[i] ? A[i] : B[i];
    }
  }

  return {
    simd: {
      add,
      fill,
      sub,
      mulAdd,
      sqrt,
      div,
      mul,
      min,
      scale,
    },
  };
}

function createVectorFunctions() {
  const sub = (A, B) => [A[0] - B[0], A[1] - B[1], A[2] - B[2]];
  const add = (A, B) => [A[0] + B[0], A[1] + B[1], A[2] + B[2]];
  const mul = (A, B) => [A[0] * B[0], A[1] * B[1], A[2] * B[2]];
  const dot = (A, B) => A[0] * B[0] + A[1] * B[1] + A[2] * B[2];
  const scale = (A, s) => [A[0] * s, A[1] * s, A[2] * s];
  const norm = (A) => Math.sqrt(dot(A, A));
  const unit = (A) => scale(A, 1.0 / norm(A));
  const abs = (A) => [Math.abs(A[0]), Math.abs(A[1]), Math.abs(A[2])];
  const maxDimension = (A) => {
    if (A[0] > A[1] && A[0] > A[2]) return 0;
    if (A[1] > A[0] && A[1] > A[3]) return 1;
    return 2;
  };
  const permute = (A, i, j, k) => [A[i], A[j], A[k]];
  const cross = (A, B) => {
    const j = A[1] * B[2] - B[1] * A[2];
    const k = A[2] * B[0] - A[0] * B[2];
    const l = A[0] * B[1] - A[1] * B[0];
    return [j, k, l];
  };
  const vector = {
    sub,
    add,
    mul,
    dot,
    scale,
    norm,
    unit,
    abs,
    maxDimension,
    permute,
    cross,
  };

  return {
    vector,
  };
}

function calculatePrimaryRays(args) {
  const {
    camera: { rightTop, leftTop, leftBottom, eye },
    imageGeometry: { width, height },
    vector: { scale, add, sub, unit },
  } = args;
  const vdu = scale(sub(rightTop, leftTop), 1.0 / width);
  const vdv = scale(sub(leftBottom, leftTop), 1.0 / height);
  const primaryRays = [];

  for (let y = 0; y < height; y++) {
    for (let x = 0; x < width; x++) {
      const pixel = y * width + x;
      const origin = eye;
      const direction = unit(
        sub(add(add(scale(vdu, x), scale(vdv, y)), leftTop), origin)
      );
      primaryRays.push({
        pixel,
        origin,
        direction,
      });
    }
  }

  return {
    primaryRays,
  };
}

function createTraceSIMDFunction(args) {
  const { scene, spheresVector, sphereIntersectSIMD, vector: { add, scale, unit, sub } } = args;
  const len = spheresVector.id.length;
  const trace = (ray) => {
    const distances = sphereIntersectSIMD(ray, spheresVector);
    let closestIndex = -1;
    let closestDistance = Number.MAX_VALUE;
    for (let i = 0; i < len; i++) {
      const distance = distances[i];
      if (distance > 0 && distance < closestDistance) {
        closestDistance = distance;
        closestIndex = i;
      }
    }
    if (closestIndex > -1) {
      const sphere = scene[closestIndex];
      const point = add(scale(ray.direction, closestDistance), ray.origin);
      const normal = unit(sub(point, sphere.center));
      return {
        hit: true,
        distance: closestDistance,
        point,
        obj: sphere,
        normal,
      }
    }
    return {
      hit: false
    };
  }
  return {
    trace,
  };
}

function createTraceFunction(args) {
  const { scene, sphereIntersect, triangleIntersect } = args;
  const trace = (ray) => {
    let closestHit = { hit: false, distance: Number.MAX_VALUE };
    for (obj of scene) {
      const fn = obj.isTriangle ? triangleIntersect : sphereIntersect;
      const res = fn(ray, obj);
      if (!res.hit) continue;
      if (res.distance >= closestHit.distance) continue;
      closestHit = res;
      closestHit.obj = obj;
    }
    return closestHit;
  };
  return {
    trace,
  };
}

function tracePrimaryRays(args) {
  const { trace, primaryRays } = args;
  const traceResults = [];
  for (const ray of primaryRays) {
    traceResults.push({
      ray,
      intersection: trace(ray),
    });
  }

  return {
    traceResults,
  };
}

function generateBitmap(args) {
  const {
    traceResults,
    shading,
    vector: { mul },
  } = args;

  const bitmap = [];
  let idx = 0;
  for (const result of traceResults) {
    const it = result.intersection;
    let pixel = [0, 0, 0];
    if (it.hit) {
      pixel = it.obj.color;
      if (!it.obj.isLight) {
        const intensity = shading(it.point, it.normal, 0);
        pixel = mul(pixel, intensity);
      }
    }
    bitmap[idx++] = pixel;
  }

  return {
    bitmap,
  };
}

function pipeline(fns) {
  return (args) => {
    let acc = { ...args };
    for (const fn of fns) {
      const result = fn(acc);
      acc = { ...acc, ...result };
    }
    return acc;
  };
}

function createShadingFunction(args) {
  const {
    vector: { add, scale, mul, dot },
    trace,
    randomDirection,
  } = args;
  const shading = (shadingPoint, pointNormal, depth) => {
    if (depth === 5) {
      return [0, 0, 0];
    }
    const origin = add(shadingPoint, scale(pointNormal, 0.01));
    const direction = randomDirection(pointNormal);
    const d = dot(pointNormal, direction);
    const ray = { origin, direction };
    const tr = trace(ray);
    if (!tr.hit) {
      return [0, 0, 0];
    }
    if (tr.obj.isLight) {
      return scale(tr.obj.color, d);
    }
    return mul(tr.obj.color, scale(shading(tr.point, tr.normal, depth + 1), d));
  };

  return {
    shading,
  };
}

function createRandomFunction() {
  let x = 123456789;
  let y = 362436069;
  let z = 521288629;
  let w = 88675123;
  let t = 0;
  let max = 4294967295;

  const random = () => {
    t = (x ^ (x << 11)) >>> 0;
    x = y;
    y = z;
    z = w;
    w = (w ^ (w >>> 19) ^ (t ^ (t >>> 8))) >>> 0;
    return w / max;
  };

  return { random };
}

var renderingPipeline = pipeline([
  createMemoryFunctions,
  createVectorFunctions,
  createSIMDFunctions,
  createAspectRatioFunction,
  createScene,
  createCamera,
  createImageGeometry,
  createRandomFunction,
  createRandomDirectionFunction,
  calculatePrimaryRays,
  createSphereIntersectFunction,
  createSphereIntersectSIMDFunction,
  createTriangleIntersectFunction,
  createTraceSIMDFunction,
  // createTraceFunction,
  createShadingFunction,
  tracePrimaryRays,
  generateBitmap,
]);

const canvas = document.getElementById('canvas-1');
const ctx = canvas.getContext('2d');
const width = canvas.width;
const height = canvas.height;

const result = renderingPipeline({ width, height });
const { bitmap } = result;
const finalBitmap = new Uint32Array(width * height);
for (let i = 0; i < bitmap.length; i++) {
  finalBitmap[i] =
    (255 << 24) |
    ((bitmap[i][2] * 255) << 16) |
    ((bitmap[i][1] * 255) << 8) |
    (bitmap[i][0] * 255);
}

const imageData = new ImageData(
  new Uint8ClampedArray(finalBitmap.buffer),
  width
);
ctx.putImageData(imageData, 0, 0);

What's next

At this point I am very dissapointed and surprised the the performance for this implementation did so bad, but not all is lost. There is a way to avoid all these Javascript antics by doing some more weird stuff using WebAssembly.

With WebAssembly, I can work around the use of Javascript objects and the weird Array behavior, have all the memory in the same place and more importantly: use WebAssembly SIMD instructions like in here.

After the WebAssembly experiment is done, I'll move to the next tool in the belt: accelerator structures. But all this will have to wait for another entry.