<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Where CFD meets GPU</title>
	<atom:link href="http://vratis.com/blog/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://vratis.com/blog</link>
	<description>Just another WordPress site</description>
	<lastBuildDate>Thu, 26 Apr 2012 17:46:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Acceleration of OpenFOAM  with SpeedIT 2.1</title>
		<link>http://vratis.com/blog/?p=119</link>
		<comments>http://vratis.com/blog/?p=119#comments</comments>
		<pubDate>Tue, 24 Apr 2012 09:18:55 +0000</pubDate>
		<dc:creator>marta</dc:creator>
				<category><![CDATA[SpeedIT]]></category>
		<category><![CDATA[AMG]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[GAMG]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[multigrid]]></category>
		<category><![CDATA[openfoam]]></category>
		<category><![CDATA[speedit]]></category>

		<guid isPermaLink="false">http://vratis.com/blog/?p=119</guid>
		<description><![CDATA[Acceleration of OpenFOAM with SpeedIT 2.1 Comparison to GAMG and DIC preconditioners Vratis Ltd., Wroclaw, Poland April 5, 2012 1. Objective OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential &#8230; <a href="http://vratis.com/blog/?p=119">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p align="center"><strong>Acceleration of OpenFOAM with SpeedIT 2.1<br />
</strong><em>Comparison to GAMG and DIC preconditioners</em></p>
<p>Vratis Ltd., Wroclaw, Poland<br />
April 5, 2012</p>
<p style="text-align: justify;"><strong>1. Objective</strong></p>
<p style="text-align: justify;">OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential to overcome this problem. As a solution of this problem we propose to use SpeedIT technology that replaces iterative solvers in OpenFOAM with their GPU-accelerated versions. In following tests we accelerate calculation of pressure equation that usually takes most of the time in simulations of incompressible flows. We compare the performance of OpenFOAM &amp; SpeedIT run on GPU to standard OpenFOAM on CPU using various preconditioners on a typical PC equipped with NVIDIA GPU card that is  CUDA compatible. This report is also used to present a new version of SpeedIT 2.1 that contains a new set of preconditioners.</p>
<p style="text-align: justify;"><strong>2. Methodology</strong></p>
<p style="text-align: justify;">SpeedIT is a library which implements set of accelerated solvers with various preconditioners. Thanks to CUSP library in <strong>SpeedIT 2.1</strong> we were able to utilize algebraic multigrid preconditioner with smoothed aggregation (<strong>AMG</strong>) . This preconditioner significally reduces number of iterations during the pressure calculation which imply shorter time of calculation. SpeedIT Plugin to OpenFOAM® was used to substitute OpenFOAM&#8217;s iterative solvers with the one provided by SpeedIT. Tests were performed on following machines:</p>
<p style="text-align: justify;">A) CPU: Intel Core 2 Duo E8400, 3GHz, 8GB RAM @ 800MHz<br />
GPU: Nvidia GTX 460, VRAM 1GB<br />
Software: Ubuntu 11.04 x64, OpenFOAM 2.0.1,<span style="text-align: justify;"> CUDA Toolkit 4.1</span><br />
B) CPU: Intel Q8400, 2,66GHz, 8GB RAM @ 800MHz<br />
GPU: Nvidia Tesla C2070, VRAM: 6GB.<br />
Software: Ubuntu 11.04 x64, OpenFOAM 2.0.1, CUDA Toolkit 4.1</p>
<p style="text-align: justify;"><span style="text-align: left;">To solve pressure equation with OpenFOAM on CPU either GAMG solver or CG with DIC preconditioner was used for different number of cores. On GPU SpeedIT was run together with AMG preconditioner. We have tested following test cases for fixed number of time steps.<br />
</span></p>
<ol>
<li style="text-align: justify;">Cavity 3D 512K cells, icoFoam, on 1 and 2 Cores with PCG solver and DIC preconditioner, GAMG solver, FDIC preconditioner, Gauss-Seidel smoother, and SpeedIT 2.1 with AMG preconditioners.</li>
</ol>
<div>
<div id="attachment_120" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/picture1.jpg"><img class="size-medium wp-image-120" title="Picture 1. Cavity 3D, velocity streamlines" src="http://vratis.com/blog/wp-content/uploads/2012/04/picture1-300x189.jpg" alt="" width="300" height="189" /></a><p class="wp-caption-text">Picture 1. Cavity 3D, velocity streamlines</p></div>
</div>
<ol start="2">
<li style="text-align: justify;">Aorta 200K cells, simpleFoam, on 1 and 2 Cores with PCG solver and DIC preconditioner, GAMG solver, FDIC preconditioner, Gauss-Seidel smoother, and SpeedIT 2.1 with AMG preconditioner.</li>
</ol>
<div style="text-align: justify;">
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_161" class="wp-caption  aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Picture2_popr.jpg"><img class="size-medium wp-image-161" title="Picture 2. Aorta, velocity streamlines" src="http://vratis.com/blog/wp-content/uploads/2012/04/Picture2_popr-300x205.jpg" alt="" width="300" height="205" /></a></dt>
<dd class="wp-caption-dd">Picture 2. Aorta, velocity streamlines</dd>
</dl>
</div>
</div>
<ol start="3">
<li style="text-align: justify;">Ahmed case with 2.5M cells simulated with original simpleFoam, on 1, 2, 3 and 4 Cores with GAMG solver, Gauss-Seidel smoother and SpeedIT with AMG preconditioner.
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_122" class="wp-caption  aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Picture3.png"><img class="size-medium wp-image-122" title="Picture 3. Ahmed, velocity streamlines and pressure field" src="http://vratis.com/blog/wp-content/uploads/2012/04/Picture3-300x226.png" alt="" width="300" height="226" /></a></dt>
<dd class="wp-caption-dd">Picture 3. Ahmed 25º, velocity streamlines and pressure field.</dd>
</dl>
</div>
</li>
</ol>
<p>Cases 1 and 2 were executed on machine A, and case 3 on machine B.</p>
<p><strong>3. Validation</strong></p>
<p style="text-align: justify;">To validate our solution we have ploted pressure field along x axis for cases 1 and 2. From Figs. 1-3 it is quite clear that solutions are correct for simulations with different preconditioners.</p>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_123" class="wp-caption aligncenter" style="width: 360px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure1.jpg"><img class="size-medium wp-image-123  " title="Figure 1. Cavity 3D cross section along x axis. Solution for all preconditioners" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure1-300x242.jpg" alt="" width="350" height="242" /></a></dt>
<dd class="wp-caption-dd">Figure 1. Cavity 3D cross section along x axis. Solution for all preconditioners.</dd>
</dl>
</div>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_162" class="wp-caption  aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.2-popr.jpg"><img class="size-medium wp-image-162" title="Figure 2. Aorta cross section along x axis. Solution for all preconditioners" src="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.2-popr-300x206.jpg" alt="" width="300" height="206" /></a></dt>
<dd class="wp-caption-dd">Figure 2. Aorta cross section along x axis. Solution for all preconditioners</dd>
</dl>
</div>
<div id="attachment_163" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.3-popr.jpg"><img class="size-medium wp-image-163" title="Figure 3. Aorta cross section along x axis. Solution for all preconditioners." src="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.3-popr-300x210.jpg" alt="" width="300" height="210" /></a><p class="wp-caption-text">Figure 3. Aorta cross section along x axis. Solution for all preconditioners.</p></div>
<p><strong>4. Results<br />
</strong><strong><em>Cavity 3D<br />
</em></strong></p>
<div id="attachment_125" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-3.jpg"><img class="size-medium wp-image-125" title="Figure 3. Execution time of Cavity 3D case for different preconditoners and number of cores." src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-3-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 4. Execution time of Cavity 3D case for different preconditoners and number of cores.</p></div>
<div class="mceTemp mceIEcenter" style="text-align: center;">
<dl id="attachment_126" class="wp-caption  aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-4.jpg"><img class="size-medium wp-image-126" title="Figure 4. Mean number of iterations for GAMG, AMG and DIC preconditioner during pressure calculations" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-4-300x187.jpg" alt="" width="300" height="187" /></a></dt>
<dd class="wp-caption-dd">Figure 5. Mean number of iterations for GAMG, AMG and DIC preconditioner during pressure calculations.</dd>
</dl>
</div>
<div id="attachment_127" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-5.jpg"><img class="size-medium wp-image-127" title="Figure 5. Acceleration defined as a ratio GPU AMG vs CPU with different precinditioners" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-5-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 6. Acceleration defined as a ratio SpeedIT vs CPU with different preconditioners</p></div>
<p><strong><em>Aorta<br />
</em></strong></p>
<div id="attachment_128" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-6.jpg"><img class="size-medium wp-image-128" title="Figure 6. Execution time of Aorta case for different preconditoners and number of cores" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-6-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 7. Execution time of Aorta case for different preconditoners and number of cores</p></div>
<div id="attachment_129" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-7.jpg"><img class="size-medium wp-image-129" title="Figure 7. Mean number of iterations for GAMG, AMG and DIC preconditioner during pressure calculations" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-7-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 8. Mean number of iterations for GAMG, SpeedIT with AMG and DIC preconditioner during pressure calculations</p></div>
<div id="attachment_131" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-8i.jpg"><img class="size-medium wp-image-131" title="Figure 8. Acceleration defined as a ratio GPU AMG vs CPU with different precinditioners" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-8i-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 9. Acceleration defined as a ratio GPU (SpeedIT) vs. CPU with different preconditioners.</p></div>
<p><strong><em>Ahmed 25º<br />
</em></strong></p>
<div id="attachment_132" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-9.jpg"><img class="size-medium wp-image-132" title="Figure 9. Execution time of Ahmed case for GPU with AMG preconditioner and different number of cores with GAMG solver" src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-9-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 10. Execution time for Ahmed case for GPU with AMG preconditioner and different number of cores with GAMG solver </p></div>
<p style="text-align: justify;">Figs. 1-3 prove that SpeedIT leads to the same solution as OpenFOAM. SpeedIT new AMG preconditioner can be competitive with OpenFOAM GAMG preconditioner working on 1 or 2 core CPU.  The main advantage of the AMG solver is that significantly reduces number of iterations when solving the pressure equation. Comparing to widely used DIC preconditioner <strong>SpeedIT 2.1 gives about 10 time less iterations</strong> (Fig. 5, and Fig 8 ) which in effect gives a <strong>speedup up to 3.5x</strong>. What was interesting we found that GAMG is failing when calculations are performed in single precision while AMG is still functioning. Fig. 11 presents the mean number of iterations for the Cavity3D case in single precision. GAMG solver gives as much as 1000 of iterations during pressure field calculations.</p>
<div id="attachment_133" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-10.jpg"><img class="size-medium wp-image-133" title="Figure 10. Mean number of iterations for Cavity 3D case in single precision." src="http://vratis.com/blog/wp-content/uploads/2012/04/Figure-10-300x187.jpg" alt="" width="300" height="187" /></a><p class="wp-caption-text">Figure 11. Mean number of iterations for Cavity 3D case in single precision.</p></div>
<p style="text-align: justify;"><strong>5. Acknowledgments</strong></p>
<p>We would like to thank NVIDIA for hardware support and <a title="4-ID Network" href="http://www.4-id.org/" target="_blank">4-ID network</a> for providing the Ahmed test case. Ahmed test case was based on Motorbike tutorial from OpenFOAM 2.0. We also acknowledge Dominik Szczerba from <a title="IT'IS Foundation" href="http://www.itis.ethz.ch" target="_blank">IT&#8217;IS Foundation</a> for providing the geometry of the human aorta.</p>
]]></content:encoded>
			<wfw:commentRss>http://vratis.com/blog/?feed=rss2&#038;p=119</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Multi-GPU simulation of the motorbike in OpenFOAM and SpeedIT technology</title>
		<link>http://vratis.com/blog/?p=144</link>
		<comments>http://vratis.com/blog/?p=144#comments</comments>
		<pubDate>Tue, 17 Apr 2012 08:09:03 +0000</pubDate>
		<dc:creator>marta</dc:creator>
				<category><![CDATA[SpeedIT]]></category>
		<category><![CDATA[cineca]]></category>
		<category><![CDATA[cuda]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[openfoam]]></category>
		<category><![CDATA[speedit]]></category>

		<guid isPermaLink="false">http://vratis.com/blog/?p=144</guid>
		<description><![CDATA[Multi-GPU simulations of the motorbike in OpenFOAM with SpeedIT technology Vratis Ltd., Wroclaw, Poland March 28, 2012 1. Objective OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential to overcome &#8230; <a href="http://vratis.com/blog/?p=144">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p style="text-align: justify;" align="center"><strong>Multi-GPU simulations of the motorbike in OpenFOAM with SpeedIT technology</strong></p>
<p>Vratis Ltd., Wroclaw, Poland</p>
<p>March 28, 2012</p>
<p style="text-align: justify;" align="center"><strong>1. Objective</strong></p>
<p style="text-align: justify;">OpenFOAM® simulations take a significant amount of time leading to higher costs of simulations. GPGPU technology has a potential to overcome this problem. However, due to a limited memory of a single GPU card, realistic simulations may be not possible. As a solution to this problem we propose to use a SpeedIT Multi-GPU technology where we accelerate calculation of pressure equation, which usually takes most of the time in simulations of incompressible flows. We compare the performance of SpeedIT Multi-GPU to standard OpenFOAM runs on CPU in various test scenarios for up to 32 millions cells on clusters with up to 16 GPU cards.</p>
<p style="text-align: justify;"><strong>2. Methodology<br />
</strong><br />
SpeedIT is a library that implements iterative solvers on GPU using MPI to exchange data between domains. SpeedIT Plugin to OpenFOAM® was used to call GPU-accelerated iterative solvers in OpenFOAM which was responsible for decomposition of the case. Preliminary tests (see <a title="Multi-GPU tests in OpenFOAM" href="http://vratis.com/blog/?page_id=2" target="_blank">the report</a>) for cavity3D performed at PLGRID cluster with varying number of cells showed (see Fig.1-2) that technology has a potential in reducing the simulation time. The tests performed at CINECA cluster aimed at solving larger simulations, with geometries up to 80M cells as well as testing a more efficient preconditioners, such as AMG. CINECA PLX cluster was equipped with 548 Intel Xeon E5645 and 548 Tesla M2070 cards with 6GB memory and 448 CUDA cores. Following test was performed in both multi-CPU and multi-GPU environment for a fixed number of time steps:</p>
<ol>
<li>80M case, a ramp, simpleFoam.</li>
<li>32M cells motorBike test, simpleFoam.</li>
</ol>
<div class="mceTemp mceIEcenter" style="text-align: justify;">
<dl id="attachment_145" class="wp-caption  aligncenter" style="width: 310px;">
<dt class="wp-caption-dt"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.1.jpg"><img class="size-medium wp-image-145" title="Figure 1: Acceleration defined as a ratio nGPU vs. nCPU for different cavity3D runs with icoFoam and diagonal preconditioner." src="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.1-300x177.jpg" alt="" width="300" height="177" /></a></dt>
<dd class="wp-caption-dd">Figure 1: Acceleration defined as a ratio nGPU vs. nCPU for different cavity3D runs with icoFoam and diagonal preconditioner.</dd>
</dl>
</div>
<div id="attachment_146" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.2.jpg"><img class="size-medium wp-image-146 aligncenter" title="Figure 2: Acceleration defined as a ratio nGPU vs. nCPU for AhmedBody and Cabin runs with simpleFoam." src="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.2-300x182.jpg" alt="" width="300" height="182" /></a><p class="wp-caption-text">Figure 2: Acceleration defined as a ratio nGPU vs. nCPU for AhmedBody and Cabin runs with simpleFoam.</p></div>
<p style="text-align: justify;" align="center"><strong><br />
Results</strong></p>
<p style="text-align: justify;">Tests performed on CINECA cluster were inspired by industry. First one was delivered by one of government agencies. It was a ramp and had 80M cells. We used PLX cluster to decompose the case and mesh it. Unfortunately, due to technical issues we were not able to run the simulations yet.</p>
<p style="text-align: justify;">Second test was a standard OpenFOAM test, called motorBike modified by SGI so that it had 32 million cells. Test were performed in multi-GPU and multi-CPU environment. For the time being SpeedIT library can offer CG solver with diagonal preconditioner for the solution of the pressure equation on GPU. For the computations on CPU we also used the CG solver with diagonal preconditioner. We also used GAMG solver since it is mostly used in real life simulations. Results are presented in Fig. 3. As one can see computation on GPUs can be up to 8 times faster comparing to computations on the same number of processor cores. When compared against GAMG solver SpeedIT multi-GPU can also provide acceleration of factor x1.1-x1.4 (without the first time step, the acceleration was x1.5).</p>
<div id="attachment_148" class="wp-caption aligncenter" style="width: 310px"><a href="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.31.jpg"><img class="size-medium wp-image-148" title="Figure 3: Acceleration defined as a ratio nGPU vs. nCPU for motorBike against diagonalCG or GAMG solvers used for the pressure equation." src="http://vratis.com/blog/wp-content/uploads/2012/04/Fig.31-300x163.jpg" alt="" width="300" height="163" /></a><p class="wp-caption-text">Figure 3: Acceleration defined as a ratio nGPU vs. nCPU for motorBike against diagonal preconditioner and CG or GAMG solvers used to solve the pressure equation.</p></div>
<p style="text-align: justify;"><strong>Acknowledgments</strong></p>
<p style="text-align: justify;">We kindly acknowledge NVIDIA and CINECA for the support in performing the simulations as well as SGI for providing the test case.</p>
<p>Disclaimer</p>
<ol>
<li style="text-align: justify;"><em>This offering is not approved or endorsed by OpenCFD Limited, the producer of the OpenFOAM software and owner of the OPENFOAM®  and OpenCFD®  trade marks (see the <a href="http://www.openfoam.com/legal/trademark-policy.php" target="_blank">Disclaimer</a>).</em></li>
<li style="text-align: justify;"><em>The views and statements expressed in this blog are of Vratis Ltd. and are not necessarily the views of or endorsement by 3rd parties named in this activity.</em></li>
<li style="text-align: justify;"><em>OPENFOAM®  is a registered trade mark of OpenCFD Limited, the producer of the OpenFOAM software.</em></li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://vratis.com/blog/?feed=rss2&#038;p=144</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Performance of SpMV in CUSPARSE, CUSP and SpeedIT</title>
		<link>http://vratis.com/blog/?p=1</link>
		<comments>http://vratis.com/blog/?p=1#comments</comments>
		<pubDate>Sat, 12 Nov 2011 20:30:51 +0000</pubDate>
		<dc:creator>lmiroslaw</dc:creator>
				<category><![CDATA[SpeedIT]]></category>
		<category><![CDATA[blas]]></category>
		<category><![CDATA[gpu]]></category>
		<category><![CDATA[speedit]]></category>
		<category><![CDATA[spmv]]></category>

		<guid isPermaLink="false">http://vratis.com/blog/?p=1</guid>
		<description><![CDATA[Introduction Sparse Matrix-Vector multiplication  (SpMV) is one of BLAS operations that are often used in scientific calculations. In order to show that SpeedIT belongs to the fastest implementations of this routine we have tested SpMV on 23 randomly chosen matrices &#8230; <a href="http://vratis.com/blog/?p=1">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong>Introduction</strong></p>
<p>Sparse Matrix-Vector multiplication  (SpMV) is one of BLAS operations that are often used in scientific calculations. In order to show that SpeedIT belongs to the fastest implementations of this routine we have tested SpMV on 23 randomly chosen matrices from <a href="http://www.cise.ufl.edu/research/sparse/matrices/" target="_blank">University Florida Matrix Collection</a>.  Their properties are described in Tab.1.  Tab 2 and Tab.3 present time of SpMV in single and double precision while Figs.1-8 present the results in a graphical form. Since the performance is strongly affected by the matrix size we have divided them into two groups: small and large matrices. The tests were performed on a Tesla C2050 GPU card from NVIDIA.</p>
<p>SpeedIT is available in two formats. CSR and a proprietary CMR format, either of which can be easily chosen by the user.</p>
<div align="center">
<h2>Fig.1. Average time of SPMV in Single Precision for small matrices.<a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-11.jpg"><img class="alignnone size-full wp-image-38" title="Time of SpMV in Single Precision for small matrices." src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-11.jpg" alt="Time of SpMV in Single Precision for small matrices. Resulting time is an average from 1000 runs." width="753" height="734" /></a> Fig.1. Time of SpMV in Single Precision for small matrices. Resulting time is an average from 1000 runs.</h2>
<h2>Fig.2. Average time of SPMV in Single Precision for large matrices.<img class="alignnone size-full wp-image-24" title="Time of SpMV in Single Precision" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-2.jpg" alt="Time of SpMV in Single Precision for large matrices. Resulting time is an average from 1000 runs." width="757" height="742" /></h2>
<p align="center">Fig.2. Time of SpMV in Single Precision for large matrices. Resulting time is an average from 1000 runs.</p>
<p align="center"><strong>Fig.3. Average time of SPMV in Double Precision for small matrices.</strong><a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-3.jpg"><img class="alignnone size-full wp-image-26" title="Time of SpMV in Double Precision for small matrices." src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-3.jpg" alt="Time of SpMV in Double Precision for small matrices. Resulting time is an average from 1000 runs." width="850" height="824" /></a></p>
<p align="center">Fig.4. Time of SpMV in Double Precision for small matrices. Resulting time is an average from 1000 runs.</p>
<p align="center"><strong>Fig.4. Average time of SPMV in Double Precision for large matrices.</strong><a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-4.jpg"><img class="alignnone size-full wp-image-27" title="Time of SpMV in Double Precision for large matrices" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-4.jpg" alt="Time of SpMV in Double Precision for large matrices. Resulting time is an average from 1000 runs." width="765" height="749" /></a>Fig.4. Time of SpMV in Double Precision for large matrices. Resulting time is an average from 1000 runs.</p>
<p align="center"><strong>Fig. 5. Speed-up of SpeedIT CMR vs. CUSPARSE and CUSP in Single Precision.</strong><a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-5.jpg"><img class="alignnone size-full wp-image-28" title="Performance ratio of SPMV algorithm in single precision" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-5.jpg" alt="Performance ratio of SPMV algorithm in single precision from SpeedIT CMR versus other algorithm" width="980" height="549" /></a>Fig.5. Speed-up of SpeedIT CMR in Single Precision vs. CUSPARSE and CUSP.</p>
<p align="center"><strong>Fig. 6. Speed-up of SpeedIT CSR vs. CUSPARSE and CUSP in Single Precision.</strong> <a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-6.jpg"><img class="alignnone size-full wp-image-30" title="Performance ratio of SPMV algorithm in single precision" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-6.jpg" alt="Performance ratio of SPMV algorithm in single precision from SpeedIT CSR versus other algorithm" width="981" height="533" /></a>Fig.6. Speed-up of SpeedIT CSR in Single Precision vs. CUSPARSE and CUSP.</p>
<p align="center"><strong><strong>Fig. 7. Speed-up of SpeedIT CMR vs. CUSPARSE and CUSP in Double Precision.</strong></strong><a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-7.jpg"><img class="alignnone size-full wp-image-31" title="Performance ratio of SPMV algorithm in double precision" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-7.jpg" alt="Performance ratio of SPMV algorithm in double precision from SpeedIT CMR versus other algorithm" width="982" height="432" /></a>Fig.7. Speed-up of SpeedIT CMR in Double Precision vs. CUSPARSE and CUSP.</p>
<p align="center"><strong>Fig.8. Speed-up of SpeedIT CSR vs. CUSPARSE and CUSP in Double Precision.</strong> <a href="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-8.jpg"><img class="alignnone size-full wp-image-32" title="Performance ratio of SPMV algorithm in double precision" src="http://vratis.com/blog/wp-content/uploads/2011/11/Wykres-8.jpg" alt="Performance ratio of SPMV algorithm in double precision from SpeedIT CSR versus other algorithm" width="982" height="489" /></a>Fig.8. Speed-up of SpeedIT CSR in Double Precision vs. CUSPARSE and CUSP.</p>
<h2>Conclusions</h2>
<ul>
<li style="text-align: left;">The highest speed-up of SpMV implemented in SpeedIT CMR vs. CUSPARSE is about 2x while vs. CUSP is more than 4x.</li>
<li style="text-align: left;">The highest speed-up of SpMV implemented in SpeedIT CSR against  and CUSP is about 1.4x.</li>
<li style="text-align: left;">SpeedIT performs better for large matrices ( &gt; 100 000 NNZ) and CMR format is more efficient.</li>
</ul>
<h2 style="text-align: left;">Appendix</h2>
<h2 style="text-align: left;">Tab 1. Description of matrix properties used in SpMV tests. NNZ and NZ correspond to the number of non-zero and zero elements. <em>Small</em> matrices are depicted in green. Remaining matrices are termed <em>large</em> in the following tests.</h2>
<p style="text-align: left;"><a href="http://vratis.com/blog/wp-content/uploads/2011/11/tab1.jpg"><img title="tab1 DESCRIPTION OF MATRIX PROPERTIES USED IN  SPMV TESTS. NNZ AND NZ CORRESPOND TO THE NUMBER OF NON-ZERO AND ZERO ELEMENTS. SMALL MATRICES ARE DEPICTED IN GREEN. REMAINING MATRICES ARE TERMED LARGE IN THE FOLLOWING TESTS." src="http://vratis.com/blog/wp-content/uploads/2011/11/tab1.jpg" alt="DESCRIPTION OF MATRIX PROPERTIES USED IN SPMV TESTS. NNZ AND NZ CORRESPOND TO THE NUMBER OF NON-ZERO AND ZERO ELEMENTS. SMALL MATRICES ARE DEPICTED IN GREEN. REMAINING MATRICES ARE TERMED LARGE IN THE FOLLOWING TESTS." width="420" height="431" /></a></p>
</div>
<div style="text-align: left;" align="center"><span class="Apple-style-span" style="border-collapse: collapse;"><br />
</span></div>
<h2 style="text-align: left;" align="center">Tab. 2 Time of SpMV in Single Precision for CUSPARSE, CUSP and SPEEDIT in two available formats.</h2>
<div style="text-align: left;" align="center"><a href="http://vratis.com/blog/wp-content/uploads/2011/11/tab21.jpg"><img title="tab2 TIME OF SPMV IN SINGLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS." src="http://vratis.com/blog/wp-content/uploads/2011/11/tab21.jpg" alt="TIME OF SPMV IN SINGLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS." width="569" height="434" /></a></div>
<div style="text-align: left;" align="center">
<div style="text-align: left;" align="center"><span class="Apple-style-span" style="border-collapse: collapse;"><br />
</span></div>
<h2 align="left">Tab. 3 Time of SpMV in Double Precision for CUSPARSE, CUSP and SpeedIT in two available formats.</h2>
<div align="left"><a href="http://vratis.com/blog/wp-content/uploads/2011/11/tab3.jpg"><img title="tab3 TIME OF SPMV IN DOUBLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS." src="http://vratis.com/blog/wp-content/uploads/2011/11/tab3.jpg" alt="TIME OF SPMV IN DOUBLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS." width="570" height="433" /></a></div>
<div style="text-align: left;" align="center"><span class="Apple-style-span" style="border-collapse: collapse;"><br />
</span></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://vratis.com/blog/?feed=rss2&#038;p=1</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

