Adding GPUs to your Pulsar setup

GPU’s devices are presently widely adopted to accelerate high-intensive computational tasks, leveraging the intrinsic parallel computation capability of this kind of hardware. If your Cloud provider makes GPUs available to your tenant, you can effectively apply them in many scientific contexts like the molecular docking, prediction and searching of molecular structures or machine learning applications.

In the following steps, we describe how to add a GPU device to the computation cluster created following the instructions provided in the section above.

Prerequisites

You know the name of the OpenStack’s flavor that can be used to instantiate a VM with one or more GPU devices connected and the number of VMs that can be created.

Software provided

The VGCN image provides all the software need to enable an NVIDIA GPU to submit a GPU job to the HTCondor queue manager, also through a Docker container.

The current VGCN image provides the following packages to your VMs:

cuda toolkit 10.1
Docker version 19.03.8
NVIDIA Container toolkit 1.1.1

Pay attention, the NVIDIA software will be installed, by a Cloud-init task, at runtime during the first boot.

Configuration

In the preparation step, you have created a directory named <workspace_name> and inside, you have a vars.tf file with all the parameters to configure the Pulsar endpoint.

Edit the variable flavors and gpu_node_count in <workspace-name>/vars.tf, replacing the default values with your own details.

Example:

variable "flavors" {
  type = "map"
  default = {
    "central-manager" = "m1.medium"
    "nfs-server" = "m1.medium"
    "exec-node" = "m1.medium"
    "gpu-node" = "gpu_flavor_name"  <--
  }
}

variable "gpu_node_count" {
  default = 10                      <--
}

Now you can validate the new terraform configuration:

WS=<workspace-name> make plan

and if the previous step doesn’t show any error, you can go forward applying the new configuration.

WS=<workspace-name> make apply

Test your setup

Access one of your new shiny workers with a GPU enabled and digit:

nvidia-smi

You will receive a message like this:

$ nvidia-smi
Tue May 19 17:51:12 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
| N/A   37C    P0    21W /  70W |      0MiB / 15109MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

and the same with the latest CUDA docker image:

$ docker run --gpus all nvidia/cuda:10.1-base nvidia-smi
   Tue May 19 16:08:27 2020
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 440.64.00    Driver Version: 440.64.00    CUDA Version: 10.2     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |===============================+======================+======================|
   |   0  Tesla T4            Off  | 00000000:00:05.0 Off |                    0 |
   | N/A   37C    P0    20W /  70W |      0MiB / 15109MiB |      0%      Default |
   +-------------------------------+----------------------+----------------------+

   +-----------------------------------------------------------------------------+
   | Processes:                                                       GPU Memory |
   |  GPU       PID   Type   Process name                             Usage      |
   |=============================================================================|
   |  No running processes found                                                 |
   +-----------------------------------------------------------------------------+