On Betzy

Login on Betzy with your usual Sigma2 username (i.e., you are not ubuntu any more), and proceed like on the Virtual Machine but use /cluster/work/users/$USER (which is equivalent to $USERWORK) instead of /home/ubuntu for staging and job data:

ssh YourSigma2UserName@betzy.sigma2.no
cd $USERWORK
mkdir -p work archive

Note

All the necessary inputdata being already available on /cluster/shared/noresm/inputdata there is no need to download the Zenodo tarball when you are on Betzy (or Fram)

Pull the same container image as on the Virtual Machine

wget https://zenodo.org/record/5652619/files/NorESM_user_workshop_2021.sif

make it executable

chmod ugo+rwx NorESM_user_workshop_2021.sif

and extract this time the Slurm batch job script job_hpc.sh

singularity exec NorESM_user_workshop_2021.sif cp /opt/esm/job_hpc.sh .

On many systems it is common to use an alternative launcher to start parallel applications, for instance Slurm’s srun rather than the mpirun wrapper provided by a particular MPI installation (as we did on the Virtual Machine for the “outside-in” exercise)

This approach is supported with Singularity as long as the MPI version installed in the container supports the same Process Management Interface (PMI) and version as that used by the launcher (which is the case with our container)

In the frame of this workshop we are therefore going to use srun for automatically distributing the processes to precisely the resources allocated to the job, without loading any module with MPI

Exercise 9

Edit the script to set nodes=1, tasks-per-node=16 and specify that we are using the development queue, then submit the Slurm job which will run NorESM for 1 day on 1x node and 16x CPUs.

Monitor the execution and once the simulation has finished check the timing profile

Note

Short simulations do not necessarily provide the most representative performance figures therefore, for the purpose of benchmarking, longer simulations are normally performed (typically 1 month or more)

Exercise 10

Edit the script to set nodes=8 and tasks-per-node=128, delete the line with - - qos (in order to use the normal queue), change the name of the machine to ‘container’ and replace ‘ndays’ by ‘nmonths’, then resubmit the job to get a more precise estimate of the model throughput

Exercise 11

To thing about: which of the 1x16 or 8x128 CPU runs provides the best value-for-money (as in simulated year per KWh)?

A few words on Bit-For-Bit reproducibility

Are NorESM simulations always identical on platforms with different hardware and software configurations?

Exercise 12

The short answer is “No!” when running on bare metal, but what happens when using a container?

We already had a look at outputs from the “outside-in” and “inside-out” simulations carried out with the container on the Virtual Machine with x16 Intel® Haswell processors.

How does that compare to the run on Betzy this time with x16 AMD® Rome processors?