Together with the Ohio Supercomputer Center (OSC) and the UberCloud we have prepared a preconfigured, tested and validated environment where SpeedIT Flow is ready to be used. In this sales model the customer instead of buying the license, pays for actual consumption only. This model may be particularly important for small companies and research groups with limited budget and for bigger companies who want to reduce the simulation costs.
This technology may be particularly interesting for HPC centers, such as OSC, because our software offers better utilization of their resources. The computations may be done on the CPUs and GPUs concurrently. Moreover the power efficiency per simulation, which is an important factor in high performance computing, is comparable for a dual-socket multicore CPU and a GPU.
Four OpenFOAM test cases were run on two different clusters in OSC, Oakley with Intel Xeon X5650 processors and Ruby with Intel Xeon E5-2670 v2 processors. Results were compared to SpeedIT Flow which was run on the Ruby using NVIDIA Tesla K40 GPU.
The scaling results showed that SpeedIT Flow is capable of running CFD simulations on a single GPU in times comparable to those obtained using 16-20 cores of a modern server-class CPU. Also electric energy consumption per simulation is comparable to those needed by computations on multicore CPUs.
SpeedIT Flow gives the end-user an alternative way to reduce turnaround times. By taking advantage of GPUs, that are available in most of the systems, the simulation time can be reduced bringing significant cost savings to the production pipeline. Finally, flexible licensing that is not dependent on number of CPU cores but on number of GPUs reduces the costs of software licensing.
With our software resource providers such as OSC and private or public cloud providers can utilize their hardware more efficiently. On a cluster with GPUs CFD simulations can be run at the same time on CPUs as on GPUs. For example for a node with two GPUs and two CPUs with ten cores each three simulations could be run: two on GPUs and one on unused eighteen cores. As shown in our tests the turnaround times and the power consumption per simulation are comparable.
Read the full case study here.