Skip to end of metadata
Go to start of metadata

What is a HPC Cluster

A typical configuration for a High Performance Computing Cluster contains the following components:

  • Login Nodes: Servers where the users connect remotely and from where they can submit jobs to the cluster. No memory or cpu intense process should ever be executed on the login nodes. In O2 we are strictly limiting cpu and memory access on login nodes, so intense processes executed on login nodes will most likely be killed or have very poor performance.

  • Computing Nodes: Servers designed specifically to support intense memory and cpu processes as well as special resources (GPU, TB of memory, etc.). Any job correctly submitted to the cluster is eventually dispatched by the scheduler on the first available compute node.

  • Storage Server: A system of servers storing the data used on the Cluster. These are usually accessible on both login and compute nodes 

  • Scheduler: The scheduler main task is to efficiently manage the cluster computing resources and to dispatch jobs on computing nodes accordingly with the different job priorities while maximizing the cluster efficiency. 





O2 Cluster Architecture

O2 currently includes 268 computing nodes for a total of 8064 cores and ~68TB of memory

  • 144 nodes, each node hostname is composed by the prefix compute-a-16- and the node number, for example compute-a-16-28, compute-a-16-29, ..., compute-a-16-171. Each node has 32 physical compute cores, 256GB of memory and is connected to the network with a 10Gb ethernet card and in addition with a 40Gb Infiniband card.   
  • 69 nodes, each node hostname is composed by the prefix compute-e-16- and the node number. Each node has 28 physical compute cores, 256GB of memory and is connected to the network with a 10Gb ethernet card.
  • 17 nodes, each node hostname is composed by the prefix compute-f-16- and the node number. Each node has 20 physical compute cores, 188GB of memory and is connected to the network with a 10Gb ethernet card.
  • 2 heterogenous high memory nodes,  each node hostname is composed by the prefix compute-h-16- and the node number. The two nodes have 16 and 12 cores with 307GB and 768GB or memory, both connected to the network with a 10Gb ethernet card.
  • 4 GPU compute nodes, each node hostname is composed by the prefix compute-g-16- and the node number. Two nodes have 8 Tesla K-80 computing units each and two nodes have 4 Tesla M-40 for a total of 24 GPU units.
  • 3 transfer nodes, each node hostname is composed by the prefix compute-t-16- and the node number. Each node is a VM with 4 cores and 6GB of memory, those nodes are intended for data transfer to/from the /files filesystem.
  • 29 contributed nodes, for a total of  1064 cores and ~10TB of memory


Detailed Node Hardware Information

Compute-a-16 CPU

vendor_id : GenuineIntel
model name : Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
cache size : 40960 KB
flags*  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc  

Compute-e-16 CPU

vendor_id : GenuineIntel
model name : Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
cache size : 35840 KB
flags*  : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc

Compute-f-16 CPU

vendor_id : GenuineIntel
model name :  Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
cache size : 25600 KB
flags*  :  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt

Compute-h-16 CPU

vendor_id : GenuineIntel
model name : Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz   or   Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
cache size : 20480 KB  or 15360 KB
flags*  :  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat pln pts dtherm tpr_shadow vnmi flexpriority ept vpid xsaveopt

Compute-g-16 CPU

vendor_id : GenuineIntel
model name :  Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz  or Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
cache size :  30720 KB or 25600 KB
flags*  :  fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc

* this information might not be relevant to most users but can be helpful if you are writing complex compiled codes or application


  • No labels