Performance of SpMV in CUSPARSE, CUSP and SpeedIT

Introduction

Sparse Matrix-Vector multiplication  (SpMV) is one of BLAS operations that are often used in scientific calculations. In order to show that SpeedIT belongs to the fastest implementations of this routine we have tested SpMV on 23 randomly chosen matrices from University Florida Matrix Collection.  Their properties are described in Tab.1.  Tab 2 and Tab.3 present time of SpMV in single and double precision while Figs.1-8 present the results in a graphical form. Since the performance is strongly affected by the matrix size we have divided them into two groups: small and large matrices. The tests were performed on a Tesla C2050 GPU card from NVIDIA.

SpeedIT is available in two formats. CSR and a proprietary CMR format, either of which can be easily chosen by the user.

Fig.1. Average time of SPMV in Single Precision for small matrices.Time of SpMV in Single Precision for small matrices. Resulting time is an average from 1000 runs. Fig.1. Time of SpMV in Single Precision for small matrices. Resulting time is an average from 1000 runs.

Fig.2. Average time of SPMV in Single Precision for large matrices.Time of SpMV in Single Precision for large matrices. Resulting time is an average from 1000 runs.

Fig.2. Time of SpMV in Single Precision for large matrices. Resulting time is an average from 1000 runs.

Fig.3. Average time of SPMV in Double Precision for small matrices.Time of SpMV in Double Precision for small matrices. Resulting time is an average from 1000 runs.

Fig.4. Time of SpMV in Double Precision for small matrices. Resulting time is an average from 1000 runs.

Fig.4. Average time of SPMV in Double Precision for large matrices.Time of SpMV in Double Precision for large matrices. Resulting time is an average from 1000 runs.Fig.4. Time of SpMV in Double Precision for large matrices. Resulting time is an average from 1000 runs.

Fig. 5. Speed-up of SpeedIT CMR vs. CUSPARSE and CUSP in Single Precision.Performance ratio of SPMV algorithm in single precision from SpeedIT CMR versus other algorithmFig.5. Speed-up of SpeedIT CMR in Single Precision vs. CUSPARSE and CUSP.

Fig. 6. Speed-up of SpeedIT CSR vs. CUSPARSE and CUSP in Single Precision. Performance ratio of SPMV algorithm in single precision from SpeedIT CSR versus other algorithmFig.6. Speed-up of SpeedIT CSR in Single Precision vs. CUSPARSE and CUSP.

Fig. 7. Speed-up of SpeedIT CMR vs. CUSPARSE and CUSP in Double Precision.Performance ratio of SPMV algorithm in double precision from SpeedIT CMR versus other algorithmFig.7. Speed-up of SpeedIT CMR in Double Precision vs. CUSPARSE and CUSP.

Fig.8. Speed-up of SpeedIT CSR vs. CUSPARSE and CUSP in Double Precision. Performance ratio of SPMV algorithm in double precision from SpeedIT CSR versus other algorithmFig.8. Speed-up of SpeedIT CSR in Double Precision vs. CUSPARSE and CUSP.

Conclusions

  • The highest speed-up of SpMV implemented in SpeedIT CMR vs. CUSPARSE is about 2x while vs. CUSP is more than 4x.
  • The highest speed-up of SpMV implemented in SpeedIT CSR against  and CUSP is about 1.4x.
  • SpeedIT performs better for large matrices ( > 100 000 NNZ) and CMR format is more efficient.

Appendix

Tab 1. Description of matrix properties used in SpMV tests. NNZ and NZ correspond to the number of non-zero and zero elements. Small matrices are depicted in green. Remaining matrices are termed large in the following tests.

DESCRIPTION OF MATRIX PROPERTIES USED IN SPMV TESTS. NNZ AND NZ CORRESPOND TO THE NUMBER OF NON-ZERO AND ZERO ELEMENTS. SMALL MATRICES ARE DEPICTED IN GREEN. REMAINING MATRICES ARE TERMED LARGE IN THE FOLLOWING TESTS.


Tab. 2 Time of SpMV in Single Precision for CUSPARSE, CUSP and SPEEDIT in two available formats.

TIME OF SPMV IN SINGLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS.

Tab. 3 Time of SpMV in Double Precision for CUSPARSE, CUSP and SpeedIT in two available formats.

TIME OF SPMV IN DOUBLE PRECISION FOR CUSPARSE, CUSP AND SPEEDIT IN TWO AVAILABLE FORMATS.

Share

Leave a Reply