SpeedIT FLOW accelerates OpenFOAM


Our recent findings indicate that the SpeedIT alone cannot accelerate OpenFOAM (and probably other CFD codes) to the satisfactory extent. If you follow our recent reports you will see SpeedIT is attractive for desktop computers but performs worse when compared to server class CPUs, such as Intel Xeon. The reason for such mild acceleration is the Amdahl’s Law which states that the acceleration is bounded by the percentage of the code that cannot be parallalized. Since in non-linear Navier-Stokes solvers only a fragment of the algorithm is accelerated by iterative solvers run on the GPU, acceleration is limited. The only reasonable solution is to implement the whole algorithm on a GPU card.

SpeedIT FLOW is a newly developed solver for incompressible transient and steady-state laminar flows that supports 3D unstructured grids and OpenFOAM format. It has been  fully implemented on GPU using CUDA. The solver currently implements PISO and SIMPLE algorithms, a selection of boundary conditions and was thoroughly tested for a number of OpenFOAM cases. Maximal acceleration is up to x3.5 when compared to OpenFOAM run on a multi-core Intel Xeon with 12 MPI threads and fastest possible multigrid solver (GAMG). To our knowledge SpeedIT FLOW is the fastest accelerator of OpenFOAM publically available.


The methodology of our approach takes advantage of the structure of tetrahedral mesh itself as well as neighborhood properties. A special data format has been designed to efficiently access the memory in a coaleased way. More information about the prototype of the technology is available in Int J Comp. Fluid Dynamics.


Fig. 1 presents the acceleration of SpeedIT FLOW on a single NVIDIA Tesla vs. OpenFOAM run on Intel Xeon with 12 cores.Tab. 1 presents the duration of selected test cases: lid driven cavity flow with varying number of cells, Poiseuille Flow and the blood flow through coronary arteries.

SpeedIT Flows vs. OpenFOAM

Fig. 1. Acceleration of SpeedIT FLOW run on NVIDIA Tesla C2050 vs. OpenFOAM run on Intel Xeon with 12 cores.

SpeedIT FLOW &  PISO Intel Xeon SpeedIT FLOW &  SIMPLE Intel Xeon
Case diagonal+CG AMG+CG GAMG diagonal+CG AMG+CG GAMG
cavity3D, 10K 190.0 172.0 33.3 13.9 13.5 0.8
cavity3D, 100K 655.0 445.0 379.7 123.0 81 69.1
cavity3D, 1M 4026.0 1542.0 5093.3 2062.0 821 2773.2
coronary_artery 3436.0 1077.0 1114 348.0 140 158.4
Poiseuille Flow 55877.0 1776.0 4182.4

Table 1. Duration of the simulations in seconds. SpeedIT FLOW run on NVIDIA Tesla C2050 vs. OpenFOAM run on Intel Xeon with 12 cores.


In order to validate the solver numerically, the results were compared with the results from the same tests run in OpenFOAM with both SIMPLE and PISO solvers.

Tab. 2 and Tab. 3 present the norm between OpenFOAM and SpeedIT Flow results. The norm is defined as a maximal absolute difference between pressure and velocity fields for all the cells in both cases.

Case t [sec] p norm U norm
cavity3D, 1MLN 0.32 7.64e-06 4.73e-06
cavity3D, 100K 0.32 2.33e-07 4.48e-07
cavity3D, 1000 0.32 1.39e-08 1.30e-09
coronary_artery 0.2 1.17e-06 2.60e-05
Poiseuille Flow 0.5 2.46e-08 1.80e-07

Table 2: Largest absolute difference in velocity magnitue and pressure between SpeedIT FLOW and OpenFOAM for time-dependent flows (PISO).

Case t [sec] p norm U norm
cavity3D, 1MLN 0.32 1.97e-04 8.41e-04
cavity3D, 100K 0.32 4.23e-05 1.78e-04
cavity3D, 1000 0.32 2.78e-06 9.48e-06
coronary_artery 0.2 3.09e-06 7.25e-05

Table 3: Largest absolute difference in velocity magnitue and pressure between SpeedIT FLOW and OpenFOAM for stationary flows (SIMPLE).

Next two figures present the plot lines for both OpenFOAM and SpeedIT FLOW for cavity 3D and Poiseuille Flow run with SIMPLE and PISO, respectively.


Geometry in 3D cavityScreen shot 2013-07-11 at 19.22.32


Poiseuille flowPoiseuille Flow


SpeedIT FLOW is a 3D solver for incompressible, laminar, transient and steady-state flows fully implemented on GPU. The results clearly show that achieved acceleration depends strongly on the size of the case and number of iterations per a time step. For cavity3D case the performance is reasonable when the original mesh has about a milion cells. Also in case of time-dependent flows the acceleration is acceptable.

Unfotunately, because used AMG implementation requires much memory the maximal case tha fits into GPU memory is about 4.74 millions cells. Therefore, the next goal is to add multi-GPU functionality and more efficient AMG implementation.

SpeedIT FLOW Features

  • Unstructured 3D Mesh Support
  • Incompressible, laminar transient and steady-state flows
  • Boundary conditions: time varying inlet conditions, fixed value, groovyBC, totalPressure.
  • Supports OpenFOAM Format.


  • Linux (x86, x86-64 and Itanium).
  • NVIDIA GPU with 2.0 cc

More information : info (at) vratis.com or sales (at) vratis.com


SpeedIT FLOW is in alpha version. Any suggestions, remakrs from interested parties will be kindly acknowledged.

None of the OPENFOAM® related products and services offered by Vratis Limited Sp. z o.o. are approved or authorized by OpenCFD Ltd. (ESI Group), owner of the OPENFOAM® and OpenCFD® trade marks and producer of the OpenFOAM software.