On Fri, 12 Sep 2014 01:29:01 -0600 (MDT)
Leslie Hartmier <leslieh1(a)shaw.ca> wrote:
> Hey everyone!
>
> I have a question regarding the conversion farms that people have
> made, and I would like some insight.
>
> We're going to have some fun creating a four octo-core machine farm
> to make DCPs out of whatever comes our way, and I was wondering if
> anyone is using 10GbE for their internal network.
Hi Leslie,
a while ago I sent an Excel file to Carl with a range of benchmarks I did on various
single machines and network configurations.
You can look up most of that data at the benchmark pages Carl set up here:
http://dcpomatic.com/benchmarks/
Currently, DCP-o-matic works with a 'master' process distributing jobs to local or
remote encode processes. Both master and encode processes need to be balanced in order not
to starve each other. Carl already did a great job to improve master/client balancing in
recent versions of dcp-o-matic.
It seems that currently any machine configuration, be it local multicore or multiple
remote nodes will max out around 18-20fps. I don't know if Carl has more insight into
limits of this whole process, wether they are memory, disc, network, etc. As you can see
from these benchmarks, a common, not even optimized 1GigE will be able to sustain around
17-20fps, and it is more or less independent of how many encoding clients you use, that
is, this limit will come up with either two very fast networked encoding servers or 6 slow
machines. 15 fps will need something like 100MByte/s to sustain 2k/8Bit encoding, as far
as I understand the process. If you want to go 4k, or Carl enables 12Bit processing, then
this will probably drop to 5-10fps. So, if you want to go towards 4k and future 12Bit, a
faster multicore machine with a faster local transfer bandwidth is preferable against
multiple networked encode servers.
Going 10GigE will probably not pay out currently.
The benchmarks will also show you that, for some reason, single high performance CPUs like
a 4930k will give you more bang for the buck than multicore-Xeons. E.g. two networked
4930k will outperform a very expensive Dual E5-2690 machine. The reason is probably that
some major pre-processing tasks in DCP-o-matic can not be parallelized to keep track with
a multi-multi-core encoding pipeline. Also, DCP-o-matic uses some libraries, and
parallelization of these modules is beyond what Carl can or will do, I guess. Simply said
- DCP-o-matic is not just a j2k converter, and the pre- or postprocessing tasks will not
scale with any number of j2k encode threads.
As you can see from these benchmarks, the results of multiple 8-12core Xeons are rather
disappointing if you compare them to single i4770 or i4930k.
The new Haswell CPUs may be worth looking at, although it seems that even there two
Haswell single CPU i7 6 or 8 core machines will outperform all Xeon configs.
DCP-o-matic encoding speed tracks very good with the Passmark benchmark - you can look up
this list
https://www.cpubenchmark.net/high_end_cpus.html and will find the Passmark vs.
price comparison very helpful. It is not surprising to find the 4930 and the new 5820 and
5930 among the most cost-effective CPUs, with the 5820 at the sweetspot. The only little
drawback is the more expensive mainboard and memory needed to accompany this CPU.
As Carl says - if you want to throw money at the problem, you may want to try e.g. Magna
Manas 'FinalDCP'. They use the commercial Kakadu encoder, with an optional turbo
module. The benchmarks are impressive, and there is a free watermark only version that you
can test on your hardware before buying it.
- Carsten