Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

About GPU Resources in O2

The first 2 3 GPU nodes are now available on O2, including: 4 Tesla M40 and 8 16 Tesla K80 GPUs. To list information about all the nodes with GPU resources you can use the command: 

Code Block
languagetext
login01:~$ sinfo --Format=nodehost,cpusstate,memory,gres|grep 'HOSTNAMES\|gpu'
HOSTNAMES           CPUS(A/I/O/T)       MEMORY              GRES
compute-g-16-177    0/0/24/24           257548              gpu:teslaK80:8
compute-g-16-176    108/1012/0/20           257548              gpu:teslaM40:4
compute-g-16-177194    16/2314/0/2420           257548              gpu:teslaK80:8

GPU Partition Limits

The following limits are applied to this partition in order to facilitate a fair use of the limited resources:

GPU hours

The amount of GPU resources that can be used by each user at any time in the O2 cluster is measured in term terms of GPU hours / user, currently there is an active limit of 72 GPU hours for each user.

This means that For example at any time each user can allocate* at most 1 GPU card for 72 hours , or 12 GPU cards for 6 hours or any intermediate combination, for example 6 GPU cards for 12 hoursother combination that does not exceed the total GPU hours limit

The current limit will be increased * as resources allow 

Memory

The total amount of memory, from all running GPU jobs, that each user can get allocated is set to 250GB

CPU cores

The total amount of CPU cores, from all running GPU jobs, that each user can get allocated is set to 20


Those limits will be adjusted as we migrate additional GPU nodes from the older cluster to O2. 

* as resources allow 

How to compile cuda programs

...

To submit a GPU job in O2 you will need to use the partition gpu and must add the flag --gres=gpu:1 to request a GPU resource. The example below shows how to start an interactive bash job requesting 1 CPU core and 1 GPU card:

...