<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SpeedIT Tools: Beyond Acceleration &#187; tests</title>
	<atom:link href="http://vratis.com/speedITblog/tag/tests/feed/" rel="self" type="application/rss+xml" />
	<link>http://vratis.com/speedITblog</link>
	<description>The blog describes SpeedIT Tools library that accelerates solving process of linear systems</description>
	<lastBuildDate>Sat, 17 Jul 2010 23:09:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Testing solvers</title>
		<link>http://vratis.com/speedITblog/2010/02/testing-solvers/</link>
		<comments>http://vratis.com/speedITblog/2010/02/testing-solvers/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 08:07:50 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Results]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[tests]]></category>

		<guid isPermaLink="false">http://vratis.com/speedITblog/?p=33</guid>
		<description><![CDATA[Currently we are in the process of testing our iterative solvers (BICGSTAB and CG) with or without preconditioners. The tests are being performed on the same benchmark set of matrices and will include the following tests: for a given matrix we will measure the performance (GFLOPS and speed-up) as a function of iterations that are [...]]]></description>
			<content:encoded><![CDATA[<p>Currently we are in the process of testing our iterative solvers (BICGSTAB and CG) with or without preconditioners. The tests are being performed on the same benchmark set of matrices and will include the following tests: for a given matrix we will measure the performance (GFLOPS and speed-up) as a function of iterations that are needed to find a stable solution. The results will be compared to  standard CPU implementation of the mentioned solvers, for example in MKL library.</p>
]]></content:encoded>
			<wfw:commentRss>http://vratis.com/speedITblog/2010/02/testing-solvers/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Preliminary tests</title>
		<link>http://vratis.com/speedITblog/2010/01/preliminary-tests/</link>
		<comments>http://vratis.com/speedITblog/2010/01/preliminary-tests/#comments</comments>
		<pubDate>Mon, 04 Jan 2010 20:48:45 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[benchmark]]></category>
		<category><![CDATA[tests]]></category>

		<guid isPermaLink="false">http://vratis.com/speedITblog/?p=14</guid>
		<description><![CDATA[Since around 90% of the computational time is devoted to Sparce Matrix Vector Multiplication we focused on testing this operation in the first place. The attached chart presents our results for 23 different matrices with different size, number of NNZ and the structure. As you can see, the performance depends strongly on the matrix structure. [...]]]></description>
			<content:encoded><![CDATA[<p>Since around 90% of the computational time is devoted to Sparce Matrix Vector Multiplication we focused on testing this operation in the first place. The attached chart presents our results for 23 different matrices with different size, number of NNZ and the structure. As you can see, the performance depends strongly on the matrix structure. This is the reason why we decided to have two seperate kernels for two types of matrices: sparse and denser ones. Please also note that because of the memory transfers &amp; PCIe bottleneck it is not worth to use our solvers only for few iterations.</p>
<div id="attachment_17" class="wp-caption aligncenter" style="width: 624px"><a href="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-5.png"><img class="size-large wp-image-17 " title="Perfomance of SpMV Multiplication in Double Precision" src="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-5-1024x341.png" alt="Perfomance of SpMV Multiplication in Double Precision" width="614" height="205" /></a><p class="wp-caption-text">Perfomance of SpMV Multiplication in Double Precision</p></div>
<div id="attachment_18" class="wp-caption aligncenter" style="width: 624px"><a href="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-6.png"><img class="size-large wp-image-18 " title="Perfomance of SpMV Multiplication in Single Precision" src="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-6-1024x340.png" alt="Perfomance of SpMV Multiplication in Single Precision" width="614" height="204" /></a><p class="wp-caption-text">Perfomance of SpMV Multiplication in Single Precision</p></div>
<div id="attachment_28" class="wp-caption aligncenter" style="width: 633px"><a href="http://vratis.com/speedITblog/wp-content/uploads/2010/01/Picture-81.png"><img class="size-large wp-image-28" title="Speed-up GPU vs. CPU" src="http://vratis.com/speedITblog/wp-content/uploads/2010/01/Picture-81-1024x356.png" alt="Speed-up GPU vs. CPU" width="623" height="216" /></a><p class="wp-caption-text">Speed-up GPU vs. CPU</p></div>
<p><strong>Methodology</strong></p>
<ol>
<li> Peak performance was calculated as a mean value from 10 runs with the same experimental conditions.</li>
<li>Benchmark matrices where collected from University of <a href="http://www.cise.ufl.edu/research/sparse/matrices/" target="_self">Florida Sparse Matrix Collection</a> in CSR format.</li>
<li>Not all of the matrices could be loaded to GPU memory due to its limitations.</li>
<li>CPU denotes a SpMV operation from Intel Math Kernel Library.</li>
<li>GPU denotes our SpMV kernel.</li>
<li>CPU machine: AMD Athlon(tm) 64 X2 Processor 3800+ working at 2010.373 MHz with 3 GB DDR 400 MH (Dual  Channel, bandwidth 6,4 GB/s) and  Nforce 4 SLI chipset.</li>
<li>GPU machine: NVIDIA GeForce GTX295 (480 SP) with 1792 MB GDDR3 (896 bits) 999 MHz and 223.8 GB/s bandwith on PCI-Express 2.0.</li>
<li>Bandwidth for ONE device measured with utility bandwidthTest from CUDA SDK:
<ul> device to device: 93 GB/s<br />
host to device pageable memory: 1090 MB/s<br />
host to device non-pageable memory: 1591 MB/s</ul>
</li>
<li>System: Ubuntu 9.10 64bit, NVidia driver version: 190.42, CUDA  ver. 2.3</li>
</ol>
<p style="text-align: center;">
<div id="attachment_23" class="wp-caption aligncenter" style="width: 788px"><a href="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-7.png"><img class="size-full wp-image-23 " title="Benchmark Matrices" src="http://vratis.com/speedITblog/wp-content/uploads/2010/02/Picture-7.png" alt="Benchmark Matrices" width="778" height="402" /></a><p class="wp-caption-text">Benchmark Matrices from University of Florida Sparse Matrix Collection</p></div>
]]></content:encoded>
			<wfw:commentRss>http://vratis.com/speedITblog/2010/01/preliminary-tests/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

