Very Slow Data-Oriented Design
This article ended up having a much different outcome than what was initially expected. The idea was to improve the code for the path tracer I have been writing in previous articles, using a Data-oriented design approach to improve the overall rendering speed.
By changing the scene data structure and the sphere tracing function, the expected outcome was that the Javascript engine would be able to better cache the scene and vectorize the code using SIMD operations; by leveraging the change to a data-oriented approach.
As you have guessed up to this point, this didn't work as expected, Javascript didn't vectorize the code and to make things worse, the performance tanked.
The changes made
I made some changes to the scene data structures and the code working on the objects in it (only spheres for now), to be arrays and array operations.
More specifically, I changed the spheres data structure from an array of objects to several arrays of values, as shown bellow.
This structure used initially:
[
{
center: [1, 2, 3],
radius: 4,
},
{
... //other spheres
},
];
Was updated to look like this:
{
centerX: [1, ...],
centerY: [2, ...],
centerZ: [3, ...],
radius: [4, ...],
}
With this new data structure, instead of checking for instersections one sphere at a time, the algorithm is updated to check for intersections against all the spheres "at the same time".
I quoted "at the same time" because, the algorithm still works with one single data value at a time, but hints at the interpreter that it could instead use the much optimized SIMD instructions. Cue the add
function bellow:
const add = (A, B, OUT) => {
const len = OUT.length;
for (let i = 0; i < len; i++) {
OUT[i] = A[i] + B[i];
}
};
What I expected to happen
This functions is very simple, take two arrays and sum them element-wise. Which can be implemented with ADDPS, available in SSE with variants avalailable in AVX and Neon VADD.
The expectation is that Javascript would detect such optimizations and implement them in the code, as I mentioned, modern compilers already do this.
This does not happen in Javascript, and the result is much worse.
Javascript shenanigans
Javascript does not do the optimizations I expected, and do things its own way, and that means:
Javascript Arrays are not real arrays
They are Array-like objects (MDN) that give special treatment to numeric properties (spec). As such, accessing an element goes through a lengthy set of instructions.
Instead of doing the expected set of operations other programming languages do when accessing an array using an index, i.e. checking the array bounds, calculating the memory location and returning the value, Javascript does so much more than that (spec); also, there is no guarantee the elements are even in contiguous memory locations.
At this point I though: let's use TypedArrays, I can use the Float32Array which is backed by an ArrayBuffer and should behave similar to regular arrays, in other words, items would be stored in contiguous memory locations (cache friendly), and accessing elements should not go through all the checkings that the other Arrays do. But then...
Javascript normalizes values read/written from/to TypedArrays
Surprise! Performance is even worse this time. Not only the Javascript engine does not vectorize the code, there are some hidden costs involved, the most egregious one being value encoding and normalization.
This means that, whenever a value is read from a TypedArray, it will be converted when used in an operation with a regular Javascript Number, also, when a value is going to be written to a TypedArray, it will be converted to the expected type. All this will impacting performance, but in a much harder way than anticipated; again, other programming languages do this all the time (see type coercion), but Javascript goes a bit beyond by making Numbers follow the prototype inheritance pattern: (123).toExponential()
and (123).__proto__
in your console.
I expected this normalization to be avoided by using a Float64Array, as Javascript uses 64bit numbers, but it didn't help at all, you can try it in the code example.
Javascript engines optimization seems not there for TypedArrays
This is more like a guess at this point, but the developers discussing this same issue here, talk about what could be causing this performance discrepancy and think the problem is caused by lack of optimization. I agree on this point of view, the ArrayBuffer concept is very simple and given that trying to avoid allocations (see my memory implementation in the path tracing code) didn“t improve performance, seems to indicate that they are treated like second class citizens by the Javascript engine.
Full code
This is the code with all the changes I've mentioned so far. Some things to try:
- Update the pipeline to use
createTraceFunction
instead ofcreateTraceSIMDFunction
. - Update the
createMemoryFunction
to returnnew Array(size)
instead of allocating aFloat32Array
from the shared buffer space. - Not cry.
What's next
At this point I am very dissapointed and surprised the the performance for this implementation did so bad, but not all is lost. There is a way to avoid all these Javascript antics by doing some more weird stuff using WebAssembly.
With WebAssembly, I can work around the use of Javascript objects and the weird Array behavior, have all the memory in the same place and more importantly: use WebAssembly SIMD instructions like in here.
After the WebAssembly experiment is done, I'll move to the next tool in the belt: accelerator structures. But all this will have to wait for another entry.
Enrique CR - 2023-12-31