SpeedIT FLOW accelerates OpenFOAM


Our recent findings indicate that the SpeedIT alone cannot accelerate OpenFOAM (and probably other CFD codes) to the satisfactory extent. If you follow our recent reports you will see SpeedIT is attractive for desktop computers but performs worse when compared to server class CPUs, such as Intel Xeon. The reason for such mild acceleration is the Amdahl’s Law which states that the acceleration is bounded by the percentage of the code that cannot be parallalized. Since in non-linear Navier-Stokes solvers only a fragment of the algorithm is accelerated by iterative solvers run on the GPU, acceleration is limited. The only reasonable solution is to implement the whole algorithm on a GPU card.

SpeedIT FLOW is a newly developed solver for incompressible transient and steady-state laminar flows that supports 3D unstructured grids and OpenFOAM format. It has been  fully implemented on GPU using CUDA. The solver currently implements PISO and SIMPLE algorithms, a selection of boundary conditions and was thoroughly tested for a number of OpenFOAM cases. Maximal acceleration is up to x3.5 when compared to OpenFOAM run on a multi-core Intel Xeon with 12 MPI threads and fastest possible multigrid solver (GAMG). To our knowledge SpeedIT FLOW is the fastest accelerator of OpenFOAM publically available.


The methodology of our approach takes advantage of the structure of tetrahedral mesh itself as well as neighborhood properties. A special data format has been designed to efficiently access the memory in a coaleased way. More information about the prototype of the technology is available in Int J Comp. Fluid Dynamics.


Fig. 1 presents the acceleration of SpeedIT FLOW on a single NVIDIA Tesla vs. OpenFOAM run on Intel Xeon with 12 cores.Tab. 1 presents the duration of selected test cases: lid driven cavity flow with varying number of cells, Poiseuille Flow and the blood flow through coronary arteries.

SpeedIT Flows vs. OpenFOAM

Fig. 1. Acceleration of SpeedIT FLOW run on NVIDIA Tesla C2050 vs. OpenFOAM run on Intel Xeon with 12 cores.

SpeedIT FLOW &  PISO Intel Xeon SpeedIT FLOW &  SIMPLE Intel Xeon
Case diagonal+CG AMG+CG GAMG diagonal+CG AMG+CG GAMG
cavity3D, 10K 190.0 172.0 33.3 13.9 13.5 0.8
cavity3D, 100K 655.0 445.0 379.7 123.0 81 69.1
cavity3D, 1M 4026.0 1542.0 5093.3 2062.0 821 2773.2
coronary_artery 3436.0 1077.0 1114 348.0 140 158.4
Poiseuille Flow 55877.0 1776.0 4182.4

Table 1. Duration of the simulations in seconds. SpeedIT FLOW run on NVIDIA Tesla C2050 vs. OpenFOAM run on Intel Xeon with 12 cores.


In order to validate the solver numerically, the results were compared with the results from the same tests run in OpenFOAM with both SIMPLE and PISO solvers.

Tab. 2 and Tab. 3 present the norm between OpenFOAM and SpeedIT Flow results. The norm is defined as a maximal absolute difference between pressure and velocity fields for all the cells in both cases.

Case t [sec] p norm U norm
cavity3D, 1MLN 0.32 7.64e-06 4.73e-06
cavity3D, 100K 0.32 2.33e-07 4.48e-07
cavity3D, 1000 0.32 1.39e-08 1.30e-09
coronary_artery 0.2 1.17e-06 2.60e-05
Poiseuille Flow 0.5 2.46e-08 1.80e-07

Table 2: Largest absolute difference in velocity magnitue and pressure between SpeedIT FLOW and OpenFOAM for time-dependent flows (PISO).

Case t [sec] p norm U norm
cavity3D, 1MLN 0.32 1.97e-04 8.41e-04
cavity3D, 100K 0.32 4.23e-05 1.78e-04
cavity3D, 1000 0.32 2.78e-06 9.48e-06
coronary_artery 0.2 3.09e-06 7.25e-05

Table 3: Largest absolute difference in velocity magnitue and pressure between SpeedIT FLOW and OpenFOAM for stationary flows (SIMPLE).

Next two figures present the plot lines for both OpenFOAM and SpeedIT FLOW for cavity 3D and Poiseuille Flow run with SIMPLE and PISO, respectively.


Geometry in 3D cavityScreen shot 2013-07-11 at 19.22.32


Poiseuille flowPoiseuille Flow


SpeedIT FLOW is a 3D solver for incompressible, laminar, transient and steady-state flows fully implemented on GPU. The results clearly show that achieved acceleration depends strongly on the size of the case and number of iterations per a time step. For cavity3D case the performance is reasonable when the original mesh has about a milion cells. Also in case of time-dependent flows the acceleration is acceptable.

Unfotunately, because used AMG implementation requires much memory the maximal case tha fits into GPU memory is about 4.74 millions cells. Therefore, the next goal is to add multi-GPU functionality and more efficient AMG implementation.

SpeedIT FLOW Features

  • Unstructured 3D Mesh Support
  • Incompressible, laminar transient and steady-state flows
  • Boundary conditions: time varying inlet conditions, fixed value, groovyBC, totalPressure.
  • Supports OpenFOAM Format.


  • Linux (x86, x86-64 and Itanium).
  • NVIDIA GPU with 2.0 cc

More information : info (at) vratis.com or sales (at) vratis.com


SpeedIT FLOW is in alpha version. Any suggestions, remakrs from interested parties will be kindly acknowledged.

None of the OPENFOAM® related products and services offered by Vratis Limited Sp. z o.o. are approved or authorized by OpenCFD Ltd. (ESI Group), owner of the OPENFOAM® and OpenCFD® trade marks and producer of the OpenFOAM software.

SpeedIT FLOW Benchmark Test

This presentation shows our recent benchmark test where we compare SpeedIT FLOW running on a single Tesla M2050 GPU card vs. OpenFOAM running on 12 CPU threads (Intel Xeon E5649).

SpeedIT 2.4 vs. OpenFOAM


SpeedIT 2.4 is the next version of leading software for accelerating CFD on GPUs. The results show that SpeedIT is a good choice for users with desktop computers who want to accelerate OpenFOAM on their machines. Users with server-class CPUs should follow the development of SpeedIT FLOW.

SpeedIT 2.4 Features:
– OpenCL version of Conjugate Gradient, BiConjugate Gradient together with diagonal preconditioner.
– OpenCL version of Sparse Matrix-Vector Multiplication.

The performance has been tested on three cases: external flow simulation over a simplified model of a car Ahmedbody with 1.37M cells, and blood flow simulations through basiliary and caretoid arteries.

Screen shot 2013-07-18 at 22.00.35

Fig. Acceleration of OpenFOAM on GPU using SpeedIT. On CPUs OpenFOAM was run with 4 MPI threads and GAMG.

SpeedIT vs. OpenFOAM

Fig. Acceleration of OpenFOAM on GPU using SpeedIT. On CPUs OpenFOAM was run with 4 MPI threads and GAMG.

SpeedIT vs. OpenFOAM
Fig. Acceleration of OpenFOAM on GPU using SpeedIT. On CPUs OpenFOAM was run with 4 MPI threads and GAMG.


SpeedIT successfuly accelerates realistic simulations run on desktop machines to a satisfactory extent. However, for the cases where the number of iterations of iterative solvers is small accelerating them on GPU does not bring high speedup. Server-class CPUs are still beyond the reach of SpeedIT. The alternative approach where the solvers fully run on GPU is much more effective (see SpeedIT FLOW)

How to run SpeedIT with OpenFOAM?


SpeedIT plugin for OpenFOAM is a set of libraries which allows you to accelerate OpenFOAM on GPU. SpeedIT will release the computational power dreaming in NVIDIA Graphics Processing Unit (GPU) that supports CUDA technology. The SpeedIT library provides a set of accelerated solvers and functions for sparse linear systems of equations which are:

  • Preconditioned Conjugate Gradient
  • Preconditioned Stabilized Bi-Conjugate Gradient
  • Accelerated Sparse Matrix-Vector Multiplication
  • Diagonal Preconditioner
  • Algebraic Multigrid (AMG) based on Smoothed Aggregation
  • Approximate Inverse (AINV)


Software dependencies


OpenFOAM is an environment where SpeedIT plugin operates. OpenFOAM can be downloaded from http://www.openfoam.com/download/. Install OpenFOAM by following the instructions on the OpenFOAM page.

IMPORTANT: Make sure you have done step which sets the OpenFOAM environment variables


Download CUDA library from http://developer.nvidia.com/cuda-downloads and install it. Add CUDA include directory to your PATH variable:


Depends on your system (32/64-bit) add CUDA lib or lib64 to LD_LIBRARY_PATH i.e:



OpenFOAM plugin requires SpeedIT to work. SpeedIT is available commercially and SpeedIT Classic with limited functionality can be download at no cost.  SpeedIT can be downloaded from http://speedit.vratis.com


Cuwrap is an intermediate library which achieve compatibility between CUDA and OpenFOAM interfaces. It is distributed with the OpenFOAM plugin. You can find it in the folder cuwrap. It is necessary to build this library if you want to use SpeedIT with OpenFoam.

To build this library first in cuwrap folder open Makefile file.

Depends on your configuration set proper paths to CUDA environment.
For 32-bit suystem and default CUDA installation the header of the file should looks following:






For 64-bit systems:






After setting paths from the cuwrap folder run make command. It shall build the library. If  OpenFOAM is configured properly the library should be created inside $FOAM_USER_LIBBIN folder.

Plugin Installation

  1. Create directory $HOME/OpenFOAM
  2. Create additional directories by typing:
    mkdir $WM_PROJECT_USER_DIR | mkdir $FOAM_RUN
  3. From the plugin directory

     run ./Allwmake COMMERCIAL
  4. If compilation is successfully completed you should have new file libexternalsolv.so and libcuwrap.so in $FOAM_USER_LIBBIN directory.

Plugin use

  1. Copy (or make symbolic links)following libraries to $FOAM_USER_LIBBIN directory:

  • libcublas.so

  • libcudart.so

  • libcuwrap.so

  • libspeedit.so

libcublas.so, libcudart.so, are from NVIDIA CUDA toolkit

libcuwrap.so is distributed with the plugin.

NOTE: Remember to use proper version of libraries depends on your system architecture. 32-bit library in 32-bit operating systems and 64-bit library for 64-bit operating systems.

    1. Go in to the directory with your OpenFOAM case, i.e. $FOAM_RUN/tutorials/incompressible/icoFoam/cavity
    2. Append


      to the end of your system/controDict file for every FOAM case, for which you want to use external, accelerated solvers.

  • In file system/fvSolution change solver names for solvers, for which you are going to enable acceleration. Remember to use proper names for accelerated solvers. You may replace:

    PBiCG with SI_PBiCG
    PCG   with SI_PCG
  • For accelerated solvers choose an appropriate preconditioner in file system/fvSolution. You may use following preconditioners:

    1. SI_DIAGONAL – Diagonal preconditioner

    2. SI_AMG – Algebraic Multigrid preconditioner

    3. SI_AINV – Approximate Inverse preconditioner

    4. SI_AINV_SC – Approximete Inverse Scaled preconditioner

    5. SI_AINV_NS – Approximate Inverse Non-Symmetric preconditioner

  • When accelerated solvers are used you have to specify additional keyword “matrix” in solver definition. It can take 2 values CMR or CSR which stands for:

    1. CSR – Compressed Sparse Row format.

    2. CMR – Compressed Multi-Row Storage format (see our article for details)

When CSR is used, then all preconditioners mentioned in point 5 are allowed. When CMR matrix is used then only SI_DIAGONAL is working at the moment.

Run icoFOAM from $FOAM_RUN/tutorials/incompressible/icoFoam/cavity.

Accelerated solvers should be available from now.

Example of fvSolution:

/*——————————–*- C++ -*———————————-*\


| ========= | |


| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |


| \\ / O peration | Version: 1.7.1 |


| \\ / A nd | Web: www.OpenFOAM.com |


| \\/ M anipulation | |








version 2.0;


format ascii;


class dictionary;

location “system”;

object fvSolution;


// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

solvers {

p  {

solver SI_PCG;

preconditioner SI_AMG;

matrix CSR;

tolerance 1e-06;

relTol 0;



solver SI_PBiCG;

preconditioner SI_DIAGONAL;

matrix CSR;

tolerance 1e-05;

relTol 0;



nCorrectors 2;

nNonOrthogonalCorrectors 0;

pRefCell 0;

pRefValue 0;


// ************************************************************************* //

Acceleration of OpenFOAM with SpeedIT 2.1

Acceleration of OpenFOAM with SpeedIT 2.1
Comparison to GAMG and DIC preconditioners

Vratis Ltd., Wroclaw, Poland
April 5, 2012

1. Objective

OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential to overcome this problem. As a solution of this problem we propose to use SpeedIT technology that replaces iterative solvers in OpenFOAM with their GPU-accelerated versions. In following tests we accelerate calculation of pressure equation that usually takes most of the time in simulations of incompressible flows. We compare the performance of OpenFOAM & SpeedIT run on GPU to standard OpenFOAM on CPU using various preconditioners on a typical PC equipped with NVIDIA GPU card that is  CUDA compatible. This report is also used to present a new version of SpeedIT 2.1 that contains a new set of preconditioners.

2. Methodology

SpeedIT is a library which implements set of accelerated solvers with various preconditioners. Thanks to CUSP library in SpeedIT 2.1 we were able to utilize algebraic multigrid preconditioner with smoothed aggregation (AMG) . This preconditioner significally reduces number of iterations during the pressure calculation which imply shorter time of calculation. SpeedIT Plugin to OpenFOAM® was used to substitute OpenFOAM’s iterative solvers with the one provided by SpeedIT. Tests were performed on following machines:

A) CPU: Intel Core 2 Duo E8400, 3GHz, 8GB RAM @ 800MHz
GPU: Nvidia GTX 460, VRAM 1GB
Software: Ubuntu 11.04 x64, OpenFOAM 2.0.1, CUDA Toolkit 4.1
B) CPU: Intel Q8400, 2,66GHz, 8GB RAM @ 800MHz
GPU: Nvidia Tesla C2070, VRAM: 6GB.
Software: Ubuntu 11.04 x64, OpenFOAM 2.0.1, CUDA Toolkit 4.1

To solve pressure equation with OpenFOAM on CPU either GAMG solver or CG with DIC preconditioner was used for different number of cores. On GPU SpeedIT was run together with AMG preconditioner. We have tested following test cases for fixed number of time steps.

  1. Cavity 3D 512K cells, icoFoam, on 1 and 2 Cores with PCG solver and DIC preconditioner, GAMG solver, FDIC preconditioner, Gauss-Seidel smoother, and SpeedIT 2.1 with AMG preconditioners.
Picture 1. Cavity 3D, velocity streamlines
  1. Aorta 200K cells, simpleFoam, on 1 and 2 Cores with PCG solver and DIC preconditioner, GAMG solver, FDIC preconditioner, Gauss-Seidel smoother, and SpeedIT 2.1 with AMG preconditioner.
Picture 2. Aorta, velocity streamlines
  1. Ahmed case with 2.5M cells simulated with original simpleFoam, on 1, 2, 3 and 4 Cores with GAMG solver, Gauss-Seidel smoother and SpeedIT with AMG preconditioner.
    Picture 3. Ahmed 25º, velocity streamlines and pressure field.

Cases 1 and 2 were executed on machine A, and case 3 on machine B.

3. Validation

To validate our solution we have ploted pressure field along x axis for cases 1 and 2. From Figs. 1-3 it is quite clear that solutions are correct for simulations with different preconditioners.

Figure 1. Cavity 3D cross section along x axis. Solution for all preconditioners.
Figure 2. Aorta cross section along x axis. Solution for all preconditioners
Figure 3. Aorta cross section along x axis. Solution for all preconditioners.

4. Results
Cavity 3D

Figure 4. Execution time of Cavity 3D case for different preconditoners and number of cores.
Figure 5. Mean number of iterations for GAMG, AMG and DIC preconditioner during pressure calculations.
Figure 6. Acceleration defined as a ratio SpeedIT vs CPU with different preconditioners


Figure 7. Execution time of Aorta case for different preconditoners and number of cores
Figure 8. Mean number of iterations for GAMG, SpeedIT with AMG and DIC preconditioner during pressure calculations
Figure 9. Acceleration defined as a ratio GPU (SpeedIT) vs. CPU with different preconditioners.

Ahmed 25º

Figure 10. Execution time for Ahmed case for GPU with AMG preconditioner and different number of cores with GAMG solver

Figs. 1-3 prove that SpeedIT leads to the same solution as OpenFOAM. SpeedIT new AMG preconditioner can be competitive with OpenFOAM GAMG preconditioner working on 1 or 2 core CPU.  The main advantage of the AMG solver is that significantly reduces number of iterations when solving the pressure equation. Comparing to widely used DIC preconditioner SpeedIT 2.1 gives about 10 time less iterations (Fig. 5, and Fig 8 ) which in effect gives a speedup up to 3.5x. What was interesting we found that GAMG is failing when calculations are performed in single precision while AMG is still functioning. Fig. 11 presents the mean number of iterations for the Cavity3D case in single precision. GAMG solver gives as much as 1000 of iterations during pressure field calculations.

Figure 11. Mean number of iterations for Cavity 3D case in single precision.

5. Acknowledgments

We would like to thank NVIDIA for hardware support and 4-ID network for providing the Ahmed test case. Ahmed test case was based on Motorbike tutorial from OpenFOAM 2.0. We also acknowledge Dominik Szczerba from IT’IS Foundation for providing the geometry of the human aorta.