Ruby Raytracer on M2 Max
A few years — and iterations of the Apple M-series — has passed, and I recently found myself in possession of a Mac Studio M2, in its base configuration of 8P + 4E CPU cores, and even more ridiculous1 amounts of cache.
But the burning question is, does it deliver on the promise of up to 18% CPU speed improvement?
Hard shadows
The base-line benchmark, as before, is the single-threaded render without any fancy rendering techniques active
Mac Studio M2 (2023)
[1] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true
Setup time: 0.178938 s
Render time: 104.306864 s
Total time: 104.485802 s
That’s a 17% improvement in single-core performance over the M1, so right on the spot.
MacBook Pro M1 (2021)
[1] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true
Setup time: 0.240659 s
Render time: 122.153057 s
Total time: 122.393716 s
Then we get into the unfair territory, because the M1 did the multi-threaded benchmark using all its P-, and E-cores, where the M2 Max can let its P-cores do all the work.
Mac Studio M2 (2023)
[1] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true, threads:8
Setup time: 0.173121 s
Render time: 15.689671 s
Total time: 15.862792 s
BOOM even lightning faster — over six and a half times as fast multi-threaded, and almost twice (1.79x) as fast as the M1 was.
MacBook Pro M1 (2021)
[2] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true, threads:8
Setup time: 0.266847 s
Render time: 28.061283 s
Total time: 28.32813 s
Going All In — Soft Shadows and Anti-Aliasing
This time, we’re throwing everything we have, letting the tiny E-cores do what they can to help along.
Mac Studio M2 (2023)
[1] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true, threads:12, ssaa:3
Setup time: 0.17466 s
Render time: 304.752032 s
Total time: 304.926692 s
Just shy of twice (1.97x) the crunch of an M1, in the extremely unfair final round.
MacBook Pro M1 (2021)
[1] pry(#<GlisteningRuby::DSL::Scene>)> render verbose:true, threads:8, ssaa:3
Setup time: 0.23379 s
Render time: 601.335931 s
Total time: 601.569721 s
Still, this is an extremely inefficient raytracer, and nothing we can throw at it will make renders reasonable. All the benchmarks yield a 320x256 image, which of course is the standard Amiga PAL low-res, and would take days to run with an optimized raytracer on an actual Amiga.
Which raises the next question, is it more efficient to emulate an Amiga, running an optimized raytracer, than this nonsense one I made in Ruby?
-
This monster has 87 MiB L1+L2+L3 cache in total. It can fit my entire first hard-drive (52 MB) in CPU cache. ↩
Previous related
RasPi 400 bonus benchmarks
I didn’t quite finish the bonus benchmarks, so for the sake of completeness, here we go. The missing ones for the RasPi 400.
Messy shadows
Apparently, I messed up some more vector algebra. Ray / surface hits currently record the event in surface space, which works just fine for shading a unit sphere still positioned at origin. When we transform things around, and try to cast the secondary shadow rays, it becomes painfully obvious that surface space and world space no longer coincide.
All the things
One limitation so far was that there could only be one surface in a scene. To get past that, we add the Group meta-surface, that contains a number of surfaces, and intersects with rays on their behalf. To make this actually useful, surfaces also need transforms to translate them out of their shared origin.