High Performance Computering-Application of Electronic Technique

Application and optimization of sparse matrix vector multiplication on C66x

Abstract：With the explosive development of big data, sparse matrix has become an important part of machine learning and edge computing. In the field of machine learning, sparse matrix of data sets can not only save information but also save memory, which has become an inevitable trend. Sparse matrix vector multiplication (SpMV) is the core of sparse matrix computation. The space complexity and time complexity of its iterative solution process have important research significance. Analyze the compression format of sparse matrix C00, CSR, ELLPACK and DIA, change the sparsity of sparse matrix and the distribution of non-zero elements, and conclude that the SpMV read by COO and calculated by CSR is more universal. Utilizing the VLIW instruction architecture of C66x, using software pipelining to manage SpMV_CSR algorithm for instruction parallel optimization, utilizing SIMD single instruction multiple data instruction set for SpMV_CSR algorithm completes data parallel optimization. The experimental results indicate that the optimized SpMV_CSR algorithm has an average acceleration ratio of over 5 times compared to before optimization.

Gene heuristic multi sequence alignment algorithm based on Yarn cloud platform

Abstract：This paper proposes a gene heuristic multi sequence alignment algorithm based on the Yarn cloud platform. Establish a nucleic acid replacement equivalence matrix as a genetic heuristic mathematical model, construct the Yarn cloud platform logical architecture, and classify and save the data through steps such as gene data preprocessing, gene data storage, gene data alignment, gene data management, and gene data analysis. Divide long sequences with high error rates, and obtain multiple shorter gene fragments. Implementing localization on different fragments, generating variable length seeds, constructing skeletons and filling gaps, can achieve gene heuristic multi sequence alignment. The results show that the designed algorithm reduces processing time on different datasets, and the sum of pairs (SP) score for multi sequence alignment is higher. This experiment verifies the practicality of the multi sequence alignment method.

Parallel quantum simulation method based on circuit cutting approach

Abstract：Quantum computing has shown great potential in addressing traditional computational challenges, but due to its high error rates and noise issues, classical simulation has become an essential tool for verifying its performance. However, the superposition and entanglement properties of quantum systems pose significant challenges for simulation, especially when memory is limited. Although circuit cutting methods can decompose large-scale quantum circuits into smaller computational tasks to reduce computational load, previous research primarily focused on their application to quantum computers, without fully considering their effectiveness in quantum circuit simulation. This study fills that gap by proposing an optimization scheme based on a heuristic cutting algorithm and subcircuit state vector reuse to address memory limitations in simulations. By incorporating global computational cost considerations and an integer programming model, the heuristic method proposed in this paper not only optimizes the cutting process but also combines subcircuit state vector reuse to reduce redundant calculations and memory usage. Experimental results show that compared to current popular circuit cutting methods, the proposed approach significantly improves simulation speed while reducing memory requirements, effectively addressing the challenges in quantum circuit simulation. The overall average speedup achieved 46%.

Sparse matrix compressed storage format based on GPU

Abstract：Sparse Matrix-Vector Multiplication (SpMV) is an important linear algebraic subroutine in Matrix numerical computation. Vectorized Compressed Sparse Row (VCSR) sparse matrix compression format is proposed by studying the load balancing and memory access frequency of SpMV algorithm. This format adjusts the data load of each thread according to the statistical characteristics of the distribution of each line of non-zero elements to prevent the problem of thread divergence, and improves the computational performance of SpMV flow based on the strategy of fast segmented summation and the vectorization method. By using the Sparse matrix of the University of Florida as the test set, the performance of the GPU is tested, and the average performance improvement is 10% to 30%, and the maximum performance is 50% compared to the CSR5 (Compressed Sparse Row 5) format.

" High Performance Computering"related to papers

WeChat shop