Harness the Power of Nvidia GPU for Deep Learning
If you are here to know why people prefer GPU over CPU for DL. Refer to my other article here
I will talk to the point directly. So without wasting your time let’s start.
My CPU and GPU spec : i5–10300H and GeForce GTX1650TI 4GB GDDR6
We will follow 6 simple step to enable GPU for DL:
- Install Anaconda
Download the anaconda according to your System OS(Windows,MacOS,linux) and install the package after downloaded. - Download Visual Studio,Cuda and cuDNN
- Visual Studio Community: A fully-featured, extensible, free IDE for creating modern applications for Android, iOS, Windows, as well as web applications and cloud services.
- CUDA: In GPU-accelerated applications, the sequential part of the workload runs on the CPU — which is optimized for single-threaded performance — while the compute intensive portion of the application runs on thousands of GPU cores in parallel. When using CUDA, developers program in popular languages such as C, C++, Fortran, Python and MATLAB and express parallelism through extensions in the form of a few basic keywords.
- cuDNN: The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers.
Corresponding version of CUDA and cuDNN you can download by referring this below image, that is provided by tensorflow itself.
I will use python 3.7 so I have to install the highlighted version of applications i.e. Microsoft Visual Studio Community 2019,cuDNN 7.6 and CUDA 10.1
3. Install MSVC19 before installing CUDA without selecting any extra packages
Now, install the CUDA executable file (.exe) it would take 1 min or so.
cuDNN is actually a folder which include bin, include, lib files.
Paste the bin, include and lib files of cuDNN into the corresponding place of CUDA 10.1 installation path
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
You can refer to this offical website of Nvidia
4. Install tensorflow
we don’t want to mess the base environment so we will create a new environment for the DL.
When installing tensorflow it install tensorflow-estimator 2.5 and tensorflow-gpu 2.3 is not compatible with that. So we have downgrade manually to make it work
conda create -n dl python=3.7
conda activate dl
conda install keras jupyter tensorflow==2.3
conda install --update-specs tensorflow-estimator==2.3
pip3 install tensorflow-gpu==2.3
5. Testing with code
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
This will give output something like this
import tensorflow as tf
print(tf.test.is_built_with_cuda())
6. Finally Train a CNN model
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
num_classes = 10
input_shape = (28, 28, 1)
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Scale images to the [0, 1] range
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
# Make sure images have shape (28, 28, 1)
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
print("x_train shape:", x_train.shape)
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(num_classes, activation="softmax"),
]
)
model.summary()
batch_size = 128
epochs = 5
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)
In my system it took 12–16 seconds to train with epoch of 5 with accuracy of 98.21 % while if you train with CPU it takes 35–40 seconds. You can clearly see the huge difference.
That’s all for today
Thank you for Reading 😀