Eccentric Developments


Speeding Things Up

Up to this point, the path tracer I have been developing in this blog entries turned out to be VERY slow, so much that it takes several seconds to render a single image with just four objects in the scene!

To improve things, I'll fix some of the problems introduced in the code. The updates have nothing to do with better algorithms, but just writing the algorithms into a form that represents less overhead to the browser javascsript engine.

Looking at the performance through a profiler we can observe that the vast majority of time is spent using the vector functions, this is as expected. The problem is the approach I took for implementing those operations, i.e. a fluent interface. This pattern implies that a new vector closure is created for every operation, which is wasteful (read: unnecessary).

So for starters, let's rewrite the vector function as a simple collection of functions that operate on arrays that no longer return closures. After the change the code looks like this:

function createVectorFunction() {
    const zip = (A, B) => A.map((a, idx) => [a, B[idx]]);
    const sub = (A, B) => zip(A, B).map(([a, b]) => a - b);
    const add = (A, B) => zip(A, B).map(([a, b]) => a + b);
    const mul = (A, B) => zip(A, B).map(([a, b]) => a * b);
    const dot = (A, B) => mul(A, B).reduce((acc, v) => acc + v, 0.0);
    const scale = (A, s) => A.map((a) => a * s);
    const norm = (A) => Math.sqrt(dot(A, A));
    const unit = (A) => scale(A, 1.0 / norm(A));
    const abs = (A) => A.map(Math.abs);
    const maxDimension = (A) => {
        if (A[0] > A[1] && A[0] > A[2]) return 0;
        if (A[1] > A[0] && A[1] > A[3]) return 1;
        return 2;
    };
    const permute = (A, i, j, k) => {
        return [A[i], A[j], A[k]];
    };
    const cross = (A, B) => {
        const j = A[1] * B[2] - B[1] * A[2];
        const k = A[2] * B[0] - A[0] * B[2];
        const l = A[0] * B[1] - A[1] * B[0];
        return [j, k, l];
    };
    const vector = {
        zip,
        sub,
        add,
        mul,
        dot,
        scale,
        norm,
        unit,
        abs,
        maxDimension,
        permute,
        cross,
    };

    return {
        vector,
    };
}

Which improves the running speed from 7251ms to 5049ms, a nice change. The code is still slow, so the next part to fix tis the zip function, which appart from being used a lot, it is used only internally in the vector functions for making the code clean; the update should be simple enough.

function createVectorFunction() {
    const sub = (A, B) => [A[0] - B[0], A[1] - B[1], A[2] - B[2]];
    const add = (A, B) => [A[0] + B[0], A[1] + B[1], A[2] + B[2]];
    const mul = (A, B) => [A[0] * B[0], A[1] * B[1], A[2] * B[2]];
    const dot = (A, B) => A[0] * B[0] + A[1] * B[1] + A[2] * B[2];
    const scale = (A, s) => [A[0] * s, A[1] * s, A[2] * s];
    const norm = (A) => Math.sqrt(dot(A, A));
    const unit = (A) => scale(A, 1.0 / norm(A));
    const abs = (A) => [Math.abs(A[0]), Math.abs(A[1]), Math.abs(A[2])];
    const maxDimension = (A) => {
        if (A[0] > A[1] && A[0] > A[2]) return 0;
        if (A[1] > A[0] && A[1] > A[3]) return 1;
        return 2;
    };
    const permute = (A, i, j, k) => [A[i], A[j], A[k]];
    const cross = (A, B) => {
        const j = A[1] * B[2] - B[1] * A[2];
        const k = A[2] * B[0] - A[0] * B[2];
        const l = A[0] * B[1] - A[1] * B[0];
        return [j, k, l];
    };
    const vector = {
        sub,
        add,
        mul,
        dot,
        scale,
        norm,
        unit,
        abs,
        maxDimension,
        permute,
        cross,
    };

    return {
        vector,
    };
}

Which now brings the running speed to 2844ms, we are looking good! Alright for the next trick, we are going to run the same code in Firefox... and we get 642ms... what is wrong with Safari?!

So when looking once again to the profiler results, and changing to "Inverted" "Top Functions" in the call tree, the two most costly operations are generatorResume and next, which indicates Safari does not quite like the generator functions and the code that does the cooperative execution. Since we care about performance, let's remove it and see what comes out.

The pipeline code now looks like this:

function pipeline(fns) {
    return (args) => {
        let acc = { ...args };
        for (const fn of fns) {
            const result = fn(acc);
            acc = { ...acc, ...result };
        }
        return acc;
    };
}

Much simpler but also, 550ms execution time! there goes the fancy cooperative execution. Firefox liked it too, executing the code in 372ms.

And for the final change, I removed all the destructuring and spread operators that are in the hot path, now the code executes takes 431ms in Safari, win!

Check the final version below.

Summary

Started from 7251ms on my M1 MBA and improved the code to take only 431ms, meaning a 15x performance improvement. If there is one thing to learn from this, is that Safari is very sensitive to memory operations: closures, spread operations and object destructuring, really carry a huge performance impact.

For next time, keep in mind the amount of memory pressure put in the codes hot path to help Safari keep up with other browsers.


Enrique CR - 2023-12-03