Multi-GPU simulation of the motorbike in OpenFOAM and SpeedIT technology

Multi-GPU simulations of the motorbike in OpenFOAM with SpeedIT technology

Vratis Ltd., Wroclaw, Poland

March 28, 2012

1. Objective

OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential to overcome this problem. However, due to a limited memory of a single GPU card, realistic simulations may be not possible. As a solution to this problem we propose to use a SpeedIT Multi-GPU technology where we accelerate calculation of pressure equation, which usually takes most of the time in simulations of incompressible flows. We compare the performance of SpeedIT Multi-GPU to standard OpenFOAM runs on CPU in various test scenarios for up to 32 millions cells on clusters with up to 16 GPU cards.

2. Methodology

SpeedIT is a library that implements iterative solvers on GPU using MPI to exchange data between domains. SpeedIT Plugin to OpenFOAM® was used to call GPU-accelerated iterative solvers in OpenFOAM which was responsible for decomposition of the case. Preliminary tests (see the report) for cavity3D performed at PLGRID cluster with varying number of cells showed (see Fig.1-2) that technology has a potential in reducing the simulation time. The tests performed at CINECA cluster aimed at solving larger simulations, with geometries up to 80M cells as well as testing a more efficient preconditioners, such as AMG. CINECA PLX cluster was equipped with 548 Intel Xeon E5645 and 548 Tesla M2070 cards with 6GB memory and 448 CUDA cores. Following test was performed in both multi-CPU and multi-GPU environment for a fixed number of time steps:

  1. 80M case, a ramp, simpleFoam.
  2. 32M cells motorBike test, simpleFoam.
Figure 1: Acceleration defined as a ratio nGPU vs. nCPU for different cavity3D runs with icoFoam and diagonal preconditioner.
Figure 2: Acceleration defined as a ratio nGPU vs. nCPU for AhmedBody and Cabin runs with simpleFoam.


Tests performed on CINECA cluster were inspired by industry. First one was delivered by one of government agencies. It was a ramp and had 80M cells. We used PLX cluster to decompose the case and mesh it. Unfortunately, due to technical issues we were not able to run the simulations yet.

Second test was a standard OpenFOAM test, called motorBike modified by SGI so that it had 32 million cells. Test were performed in multi-GPU and multi-CPU environment. For the time being SpeedIT library can offer CG solver with diagonal preconditioner for the solution of the pressure equation on GPU. For the computations on CPU we also used the CG solver with diagonal preconditioner. We also used GAMG solver since it is mostly used in real life simulations. Results are presented in Fig. 3. As one can see computation on GPUs can be up to 8 times faster comparing to computations on the same number of processor cores. When compared against GAMG solver SpeedIT multi-GPU can also provide acceleration of factor x1.1-x1.4 (without the first time step, the acceleration was x1.5).

Figure 3: Acceleration defined as a ratio nGPU vs. nCPU for motorBike against diagonal preconditioner and CG or GAMG solvers used to solve the pressure equation.


We kindly acknowledge NVIDIA and CINECA for the support in performing the simulations as well as SGI for providing the test case.


  1. This offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM®  and OpenCFD®  trade marks (see the Disclaimer).
  2. The views and statements expressed in this blog are of Vratis Ltd. and are not necessarily the views of or endorsement by 3rd parties named in this activity.
  3. OPENFOAM®  is a registered trade mark of OpenCFD Limited, the producer of the OpenFOAM software.