What is a HPC Cluster
A typical configuration for a High Performance Computing Cluster contains the following components:
- Login Nodes: Servers where the users connect remotely and from where they can submit jobs to the cluster. No memory or cpu intense process should ever be executed on the login nodes. In O2 we are strictly limiting cpu and memory access on login nodes, so intense processes executed on login nodes will most likely be killed or have very poor performance.
- Computing Nodes: Servers designed specifically to support intense memory and cpu processes as well as special resources (GPU, TB of memory, etc.). Any job correctly submitted to the cluster is eventually dispatched by the scheduler on the first available compute node.
- Storage Server: A system of servers storing the data used on the Cluster. These are usually accessible on both login and compute nodes
- Scheduler: The scheduler main task is to efficiently manage the cluster computing resources and to dispatch jobs on computing nodes accordingly with the different job priorities while maximizing the cluster efficiency.
O2 Cluster Architecture
O2 currently includes 144 identical nodes, each node hostname is composed by the prefix compute-a-16- and the node number, for example compute-a-16-28, compute-a-16-29, ..., compute-a-16-171.
Each node has 32 physical compute cores, 256GB of memory and is connected to the network with a 10Gb ethernet card and in addition with a 40Gb Infiniband card.
Detailed Node Hardware Information
vendor_id : GenuineIntel
model name : Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
cache size : 40960 KB
flags* : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc