Home / NVIDIA A100 MIG Cheat Sheat

NVIDIA A100 MIG Cheat Sheat

mig

MIG is only compatible with Linux distributions that support CUDA 11/R450 or higher. It’s also a good idea to use the NVIDIA Datacenter Linux driver version 450.80.02 or higher. MIG is not supported on any Windows Pro or Server OS.

The new Multi-Instance GPU (MIG) functionality allows NVIDIA Ampere-based GPUs (such as the NVIDIA A100) to be securely partitioned into up to seven independent GPU Instances for CUDA applications, giving different users separate GPU resources for optimal GPU use.

This functionality is especially useful for workloads that do not completely utilize the GPU’s computing capabilities, and users may want to run many tasks in parallel to get the most out of their GPU.

The following prerequisites and software versions are highly recommended when using supported GPUs in MIG mode.

Only the GPUs and systems listed below are compatible with MIG.

  • NVIDIA driver 450.80.02 or later is required for CUDA 11.
  • Linux distributions that were supported by CUDA 11

If you are using Kubernetes or containers, make sure to use;

  • Version 2.5.0 or later of the NVIDIA Container Toolkit (nvidia-docker2)
  • v0.7.0 or newer of the NVIDIA K8s Device Plugin
  • Version 0.2.0 or later of NVIDIA gpu-feature-discovery

NVIDIA Management Library (NVML) APIs or the nvidia-smi command-line interface can be used to manage MIG programmatically. Some of the nvidia-smi output in the following examples may be cropped for brevity to highlight the relevant sections of interest.

Enable or Disable MIG

To enable or disable MIG on all GPUs or a certain GPU, enable persistence mode on all GPUs.

nvidia-smi -pm 1

root@server:~# nvidia-smi -pm 1
Enabled persistence mode for GPU 00000000:31:00.0.
Enabled persistence mode for GPU 00000000:CA:00.0.
All done.

Then enable MIG by using the following command.

root@server:~# nvidia-smi -mig 1

root@server:~# nvidia-smi
Tue Feb 22 11:47:45 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.39.01    Driver Version: 510.39.01    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:31:00.0 Off |                   On |
| N/A   34C    P0    41W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100 80G...  On   | 00000000:CA:00.0 Off |                   On |
| N/A   34C    P0    43W / 300W |      0MiB / 81920MiB |     N/A      Default |
|                               |                      |              Enabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  No MIG devices found                                                       |
+-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

To disable MIG use the following command.

nvidia-smi -mig 0

root@server:~# nvidia-smi -mig 0
Disabled MIG Mode for GPU 00000000:31:00.0
Disabled MIG Mode for GPU 00000000:CA:00.0
All done.

To enable or disable MIG on a specific device.

nvidia-smi -i 3 -mig 1
nvidia-smi -i 3 -mig 0

List all GPU Instances

root@server:~# nvidia-smi mig -lgip
+-----------------------------------------------------------------------------+
| GPU instance profiles:                                                      |
| GPU   Name             ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                              Free/Total   GiB              CE    JPEG  OFA  |
|=============================================================================|
|   0  MIG 1g.10gb       19     7/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   0  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 3g.40gb        9     2/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   0  MIG 7g.80gb        0     1/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+
|   1  MIG 1g.10gb       19     7/7        9.50       No     14     0     0   |
|                                                             1     0     0   |
+-----------------------------------------------------------------------------+
|   1  MIG 1g.10gb+me    20     1/1        9.50       No     14     1     0   |
|                                                             1     1     1   |
+-----------------------------------------------------------------------------+
|   1  MIG 2g.20gb       14     3/3        19.50      No     28     1     0   |
|                                                             2     0     0   |
+-----------------------------------------------------------------------------+
|   1  MIG 3g.40gb        9     2/2        39.25      No     42     2     0   |
|                                                             3     0     0   |
+-----------------------------------------------------------------------------+
|   1  MIG 4g.40gb        5     1/1        39.25      No     56     2     0   |
|                                                             4     0     0   |
+-----------------------------------------------------------------------------+
|   1  MIG 7g.80gb        0     1/1        78.75      No     98     5     0   |
|                                                             7     1     1   |
+-----------------------------------------------------------------------------+

Create GPU instances on MIG-enabled GPU

root@server:~# nvidia-smi mig -cgi 19,19,19,19,19,19,19 -i 3
Successfully created GPU instance ID 13 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 11 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 12 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID  7 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID  8 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID  9 on GPU  0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 10 on GPU  0 using profile MIG 1g.5gb (ID 19)

Create Compute instances on a MIG-enabled GPU

root@server:~# nvidia-smi mig -cci -i 3
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  7 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  8 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID  9 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 10 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 11 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 12 using profile MIG 1g.5gb (ID  0)
Successfully created compute instance ID  0 on GPU  0 GPU instance ID 13 using profile MIG 1g.5gb (ID  0)

View MIG Instances

root@server:~# nvidia-smi

+-----------------------------------------------------------------------------+
| MIG devices:                                                                |
+------------------+----------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |         Memory-Usage |        Vol|         Shared        |
|      ID  ID  Dev |           BAR1-Usage | SM     Unc| CE  ENC  DEC  OFA  JPG|
|                  |                      |        ECC|                       |
|==================+======================+===========+=======================|
|  0    7   0   0  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    8   0   1  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0    9   0   2  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   10   0   3  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   11   0   4  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   12   0   5  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
|  0   13   0   6  |      1MiB /  4864MiB | 14      0 |  1   0    0    0    0 |
|                  |      0MiB /  8191MiB |           |                       |
+------------------+----------------------+-----------+-----------------------+
root@server:~# nvidia-smi -L
GPU 0: A100-PCIE-40GB (UUID: GPU-0069414c-9f30-41f9-d5d8-87890423f0c4)
  MIG 1g.5gb Device 0: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/7/0)
  MIG 1g.5gb Device 1: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/8/0)
  MIG 1g.5gb Device 2: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/9/0)
  MIG 1g.5gb Device 3: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/10/0)
  MIG 1g.5gb Device 4: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/11/0)
  MIG 1g.5gb Device 5: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/12/0)
  MIG 1g.5gb Device 6: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/13/0)

List GPU Instances

root@server:~# nvidia mig -lgi

List Compute Instances

root@server:~# nvidia mig -lci

+-------------------------------------------------------+
| Compute instances:                                    |
| GPU     GPU       Name             Profile   Instance |
|       Instance                       ID        ID     |
|         ID                                            |
|=======================================================|
|   0      7       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0      8       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0      9       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     11       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     12       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     13       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     14       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+

Destroy Compute Instances

root@server:~# nvidia-smi mig -dci -i 3
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  7
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  8
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID  9
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 10
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 11
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 12
Successfully destroyed compute instance ID  0 from GPU  0 GPU instance ID 13

Destroy GPU Instances

root@server:~# nvidia-smi mig -dgi -i 3
Successfully destroyed GPU instance ID  7 from GPU  0
Successfully destroyed GPU instance ID  8 from GPU  0
Successfully destroyed GPU instance ID  9 from GPU  0
Successfully destroyed GPU instance ID 10 from GPU  0
Successfully destroyed GPU instance ID 11 from GPU  0
Successfully destroyed GPU instance ID 12 from GPU  0
Successfully destroyed GPU instance ID 13 from GPU  0

Getting MIG Help

root@server:~# nvidia-smi mig -h

    mig -- Multi Instance GPU management.

    Usage: nvidia-smi mig [options]

    Options include:
    [-h | --help]: Display help information.
    [-i | --id]: Enumeration index, PCI bus ID or UUID.
                 Provide comma separated values for more than one device.
    [-gi | --gpu-instance-id]: GPU instance ID.
                               Provide comma separated values for more than one GPU instance.
    [-ci | --compute-instance-id]: Compute instance ID.
                                   Provide comma separated values for more than one compute 
                                   instance.
    [-lgip | --list-gpu-instance-profiles]: List supported GPU instance profiles.
                                            Option -i can be used to restrict the command to
                                            run on a specific GPU.
    [-lgipp | --list-gpu-instance-possible-placements]: List possible GPU instance placements
                                                        in the following format, {Start}:Size.
                                                        Option -i can be used to restrict the
                                                        command to run on a specific GPU.
    [-C | --default-compute-instance]: Create compute instance with the default profile when used
                                       with the option to create a GPU instance (-cgi).
    [-cgi | --create-gpu-instance]: Create GPU instances for the given profile tuples. A profile
                                    tuple consists of a profile name or ID and an optional placement
                                    specifier, which consists of a colon and a placement start index.
                                    Provide comma separated values for more than one profile tuple.
                                    Option -i can be used to restrict the command to run on
                                    a specific GPU.
    [-dgi | --destroy-gpu-instance]: Destroy GPU instances.
                                     Options -i and -gi can be used individually or combined
                                     to restrict the command to run on a specific GPU or GPU
                                     instance.
    [-lgi | --list-gpu-instances]: List GPU instances.
                                   Option -i can be used to restrict the command to run on a
                                   specific GPU.
    [-lcip | --list-compute-instance-profiles]: List supported compute instance profiles.
                                                Options -i and -gi can be used individually or
                                                combined to restrict the command to run on a
                                                specific GPU or GPU instance.
    [-cci | --create-compute-instance]: Create compute instance for the given profile name or IDs.
                                        Provide comma separated values for more than one profile.
                                        If no profile name or ID is given, then the default*
                                        compute instance profile ID will be used. Options -i and
                                        -gi can be used individually or combined to restrict the
                                        command to run on a specific GPU or GPU instance.
    [-dci | --destroy-compute-instance]: Destroy compute instances.
                                         Options -i, -gi and -ci can be used individually or
                                         combined to restrict the command to run on a specific
                                         GPU or GPU instance or compute instance.
    [-lci | --list-compute-instances]: List compute instances.
                                       Options -i and -gi can be used individually or combined
                                       to restrict the command to run on a specific GPU or GPU
                                       instance.

3 comments

  1. Rahul says:

    Do you also have cheatsheet on how MIG works in ESXi..
    I am having hard time in making it work on ESXi,
    I see VMs doesn’t power on post attaching vGPU GRID profile to it

    • admin says:

      Hi, Have you installed GRID software in ESXi. Make sure you have Linux based OS like Ubuntu in VM. MIG is not supported in Windows. MIG also need to be enabled in GPU card like A100.

    • admin says:

      Hi, Secondly MIG cannot be enabled or configured in ESXi. Just enabled MIG compatible GPU and attach it to VM. MIG is enabled in VM using nvidia-smi. Also make sure NVIDIA license is installed in ESXi for using GPU.

Leave a Reply