Wow, I had the time to re-run my benchmarks with DCP-o-matic 2.74 and compare against my
previous tests - these new code optimizations really pay out.
v1.83 Bunny: 12.3fps (20 threads on master +28 threads on localhost encode server)
v2.74 Bunny: 19.3fps (32threads on master GUI only)
v1.83 Sintel: 9.9fps (20 threads on master +28 threads on localhost encode server)
v2.74 Sintel: 14.6fps (32 threads on master GUI only)
These were done on a dual Xeon-5660 (2*6Core + HT) machine using the standard BigBuckBunny
and Sintel benchmark metafile data:
http://dcpomatic.com/benchmarks/
Aside from the J2C code optimizations brought in by Aaron Boxer, Carl also improved the
thread handling for machines with many logical cores (e.g. multi CPU/Core Systems).
Before, you needed to start a local encode server together with the DCP-o-matic master GUI
on the same machine to fully load your CPU during encoding, that penalty grew worse and
worse with the number of logical cores.
Now, running only the Master GUI already maxes out the CPU load, and with fewer threads
configured in preferences. DCP-o-matic's default thread setting (equals the number of
logical cores found in the machine) already comes very close to the optimum.
'Overthreading' (setting more encoding threads than the machine has logical cores)
is no longer necessary, or yields very small further improvements.
That also means, on multi-core machines, you get away with fewer threads set, which also
means less memory usage (every thread needs a decent amount of memory).
So, now especially with 4k encodes (4k needs even more memory per thread), you don't
need as much memory as previously to max out your CPU.
Encoding 4k at +48 threads would bring most machines into swapping, even with lot's of
RAM. Now you get the same encoding performance using only half this thread setting, thus
needing only half as much RAM (roughly).
On not-so-fast machines, like legacy systems that you might use as encoding servers or for
the occasional still conversion, there is improvement as well:
v1.83 Bunny: 0.82fps (4 threads Core2D-T7200)
v2.74 Bunny: 1.15fps (4 threads Core2D-T7200)
That's approaching a 40% speed increase from the last 1.x release.
Well done Carl and Aaron!
- Carsten