利用numba进行加速
2021-12-20 17:04:55    33    0    0
songjie

利用numba进行加速大规模的卡方检验检测交互作用

https://numba.pydata.org/numba-doc/latest/cuda

https://zhuanlan.zhihu.com/p/68846159

 

目前推测核心的四个环节,将内存数据传输到GPU,GPU进行矩阵统计,GPU进行矩阵运算,使用GPU进行并行运算

 

1.数据传输

To copy host->device a numpy array:

ary = np.arange(10)
d_ary = cuda.to_device(ary)
stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)​

 

To copy device->host:

hary = d_ary.copy_to_host()
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)​

2.矩阵统计和运算

3.简单实现:

这是使用CUDA内核的矩阵乘法的简单实现,numpy包中的 简单运算都兼容:

@cuda.jit
def matmul(A, B, C):
    """Perform square matrix multiplication of C = A * B
    """
    i, j = cuda.grid(2)
    if i < C.shape[0] and j < C.shape[1]:
        tmp = 0.
        for k in range(A.shape[1]):
            tmp += A[i, k] * B[k, j]
        C[i, j] = tmp​

4.并行运算:

利用@njit修饰和prange函数

The example below demonstrates a parallel loop with a reduction (A is a one-dimensional Numpy array):

from numba import njit, prange

@njit(parallel=True)
def prange_test(A):
    s = 0
    # Without "parallel=True" in the jit-decorator
    # the prange statement is equivalent to range
    for i in prange(A.shape[0]):
        s += A[i]
    return s​

Pre: python R 的库

Next: Screening the risk of stomach cancer based on exfoliated cells and human DNA in stool

33
Sign in to leave a comment.
No Leanote account? Sign up now.
0 comments
Table of content