Discipline: Technology and Engineering
Subcategory: Computer Engineering
Session: 3
Matthew Bredin - New Mexico State University
Sparse matrices are common place in modern computing and are becoming more apparent in machine learning. These matrices often involve very large data-sets that can be taxing to run on a modern CPU due to large overheads for fetching data from main memory. EMU technologies is developing a new architecture uses distributed memory and migration of threads. Our goal is to investigate the suitability of this new architecture for sparse matrix computations. We run sparse matrix compression and sparse matrix vector multiplication (SpMV) on the EMU architecture and compare its performance to an Intel x86 architecture. We use one of the most common methods for sparse matrices, compressed row storage (CRS). We chose this method because it has a high degree of parallelism. To test the compression and SpMV computation, we randomly generated multiple sparse matrices. The matrices generated were all square matrices varying in dimension and sparsity. The algorithms run were all limited to a single node on EMU, and were limited to a single Core™ i7-3770 CPU. Each machine was set to run with the same number of threads for fairness. The Intel CPU will have the same number of cores as the number of nodelets on EMU. The compression algorithm showed consistently better performance on EMU than x86, showing an average speed up of 4.42X when running with 32 threads. This is likely because the matrix generated was too large to take advantage of the cache in the x86 and main memory latency on EMU is a bit cheaper. The computation in SpMV showed inconsistent results, the average speedup for 32 threads was found to be 0.89X. The SpMV compute take advantage of the cache in the x86. Our results are encouraging, but must be expanded, specifically including the coordinate list method (COO). Also, use publicly available sparse matrix data-sets. References [1] E.Technology.(2017) Emutechnology. [Online]. Available: http://www.emutechnology.com/technology/ [2] A. Vincenti et al., “Emu technology program review,” 2018. [3] W. Liu and B. Vinter, “Csr5: An efficient storage format for cross-platform sparse matrix-vector multiplication,” inICS’15. [4] C. E. Leiserson, “Emu1 system programming guide,” 2018. [5] M. Ujald ́on, E. L. Zapata, S. D. Sharma, and J. Saltz, “Parallelization techniques for sparse matrix applications,” Journal of Parallel and Distributed Computing , vol. 38, no. 2, pp. 256–266, 1996. [6] S. Chen, J. Fang, D. Chen, C. Xu, and Z. Wang, “Optimizing sparse matrix-vector multiplication on emerging many-core architectures,” arXiv preprint arXiv:1805.11938, 2018. [7] T. J. Tessem, “Improving parallel sparse matrix-vector multiplication,” Master’s thesis, The University of Bergen, 2013.
Funder Acknowledgement(s): The authors would like thank the NSF for funding the REU program (NSF #1559723) that facilitated this research. We would also like to thanks EMU Technologies for allowing us to use of their systems and for their assistance in learning how to program on their architecture.
Faculty Advisor: Hameed Badawy, badawy@nmsu.edu
Role: I wrote the codes to be tested and performed all simulations. EMU Technologies provided me access to their system and I received assistance and guidance in writing the code from my mentor and others at EMU Technologies.