Picture by Creator

CuPy is a Python library that’s appropriate with NumPy and SciPy arrays, designed for GPU-accelerated computing. By changing NumPy with CuPy syntax, you’ll be able to run your code on NVIDIA CUDA or AMD ROCm platforms. This lets you carry out array-related duties utilizing GPU acceleration, which ends up in sooner processing of bigger arrays.

By swapping out only a few traces of code, you’ll be able to make the most of the huge parallel processing energy of GPUs to considerably velocity up array operations like indexing, normalization, and matrix multiplication.

CuPy additionally allows entry to low-level CUDA options. It permits passing of `ndarrays`

to current CUDA C/C++ packages utilizing RawKernels, streamlines efficiency with Streams, and allows direct calling of CUDA Runtime APIs.

You’ll be able to set up CuPy utilizing pip, however earlier than that it’s a must to discover out the proper CUDA model utilizing the command under.

```
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Company
Constructed on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation instruments, launch 11.8, V11.8.89
Construct cuda_11.8.r11.8/compiler.31833905_0
```

It appears that evidently the present model of Google Colab is utilizing CUDA model 11.8. Subsequently, we are going to proceed to put in the `cupy-cuda11x`

model.

In case you are operating on an older CUDA model, I’ve supplied a desk under that can assist you decide the suitable CuPy package deal to put in.

Picture from CuPy 12.2.0

After deciding on the proper model, we are going to set up the Python package deal utilizing pip.

You may as well use the `conda`

command to mechanically detect and set up the right model of the CuPy package deal when you’ve got Anaconda put in.

`conda set up -c conda-forge cupy`

On this part, we are going to evaluate the syntax of CuPy with Numpy and they’re 95% comparable. As an alternative of utilizing `np`

you’ll be changing it with `cp`

.

We’ll first create a NumPy and CuPy array utilizing the Python checklist. After that, we are going to calculate the norm of the vector.

```
import cupy as cp
import numpy as np
x = [3, 4, 5]
x_np = np.array(x)
x_cp = cp.array(x)
l2_np = np.linalg.norm(x_np)
l2_cp = cp.linalg.norm(x_cp)
print("Numpy: ", l2_np)
print("Cupy: ", l2_cp)
```

As we will see, we received comparable outcomes.

```
Numpy: 7.0710678118654755
Cupy: 7.0710678118654755
```

To transform a NumPy to CuPy array, you’ll be able to merely use `cp.asarray(X)`

.

```
x_array = np.array([10, 22, 30])
x_cp_array = cp.asarray(x_array)
sort(x_cp_array)
```

Or, use `.get()`

, to transform CuPy to Numpy array.

```
x_np_array = x_cp_array.get()
sort(x_np_array)
```

On this part, we shall be evaluating the efficiency of NumPy and CuPy.

We’ll use `time.time()`

to time the code execution time. Then, we are going to create a 3D NumPy array and carry out some mathematical features.

```
import time
# NumPy and CPU Runtime
s = time.time()
x_cpu = np.ones((1000, 100, 1000))
np_result = np.sqrt(np.sum(x_cpu**2, axis=-1))
e = time.time()
np_time = e - s
print("Time consumed by NumPy: ", np_time)
```

`Time consumed by NumPy: 0.5474584102630615`

Equally, we are going to create a 3D CuPy array, carry out mathematical operations, and time it for efficiency.

```
# CuPy and GPU Runtime
s = time.time()
x_gpu = cp.ones((1000, 100, 1000))
cp_result = cp.sqrt(cp.sum(x_gpu**2, axis=-1))
e = time.time()
cp_time = e - s
print("nTime consumed by CuPy: ", cp_time)
```

`Time consumed by CuPy: 0.001028299331665039`

To calculate the distinction, we are going to divide NumPy time with CuPy time and It looks as if we received above 500X efficiency increase whereas utilizing CuPy.

```
diff = np_time/cp_time
print(f'nCuPy is {diff: .2f} X time sooner than NumPy')
```

`CuPy is 532.39 X time sooner than NumPy`

Observe:To attain higher outcomes, it is strongly recommended to conduct a couple of warm-up runs to attenuate timing fluctuations.

Past its velocity benefit, CuPy provides superior multi-GPU assist, enabling harnessing of collective energy of a number of GPUs.

Additionally, you’ll be able to take a look at my Colab notebook, if you wish to evaluate the outcomes.

In conclusion, CuPy offers a easy solution to speed up NumPy code on NVIDIA GPUs. By making only a few modifications to swap out NumPy for CuPy, you’ll be able to expertise order-of-magnitude speedups on array computations. This efficiency increase permits you to work with a lot bigger datasets and fashions, enabling extra superior machine studying and scientific computing.

## Assets

** Abid Ali Awan** (@1abidaliawan) is an authorized knowledge scientist skilled who loves constructing machine studying fashions. At the moment, he’s specializing in content material creation and writing technical blogs on machine studying and knowledge science applied sciences. Abid holds a Grasp’s diploma in Know-how Administration and a bachelor’s diploma in Telecommunication Engineering. His imaginative and prescient is to construct an AI product utilizing a graph neural community for college kids fighting psychological sickness.