On the Virtual Machine

From “inside-out”

Login on your own Virtual Machine with the private SSH key corresponding to the public key that you provided the Organizers of this Workshop (it does not grant you access to any other Virtual Machine anyway):

ssh -i ~/.ssh/YourPrivateSSHkey ubuntu@aaa.bb.cc.ddd

Each Virtual Machine features 16 VCPUs + 64GB RAM + 80GB root disk, and comes with:

  • an Operating System (OS) with the default C/C++/FORTRAN compilers

  • a Message Passing Interface (MPI) library, required for the hands-on exercise on “outside-in” interaction with the container (but not for “inside-out”)

  • Singularity

already installed, nothing else

Note

Use the actual path to where your SSH keys are stored on your laptop (instead of ~/.ssh/), replace @aaa.bb.cc.ddd by the IP address that you have been allocated, and all use the same user name (ubuntu)

Basic verifications

Login on your Virtual Machine and verify that a container engine is available

Exercise 1

Check which version of Singularity is installed on your Virtual Machine

Exercise 2

Check the architecture, model and number of processors available on the Virtual Machine

Preparation of the necessary folders and input data

Create the work and archive folders in your $HOME directory

cd $HOME

mkdir work archive 

Get the inputdata from Zenodo

wget https://zenodo.org/record/4683483/files/inputdata_NF2000climo_f19_f19_mg17.tar.gz

Extract (or untar) all the files from the archive

tar zxvf inputdata_NF2000climo_f19_f19_mg17.tar.gz

This will add the inputdata folder on $HOME and will be much faster than downloading individual files on-the-fly, hence saving us a lot of time.

Pull the container image and execute it

Get the NorESM container from Zenodo

wget https://zenodo.org/record/5652619/files/NorESM_user_workshop_2021.sif

The download should take less than 1 minute:

Type the following commands to change the access permissions of the .sif file (i.e., to make it executable), then start a Singularity container and run an interactive shell within it

chmod ugo+rwx NorESM_user_workshop_2021.sif

ls -l NorESM_user_workshop_2021.sif

singularity shell --contain NorESM_user_workshop_2021.sif

Note

Inside the container the Bash shell prompt (Singularity>) is different from that on the host (where, depending on the name that was given to your Virtual Machine when it was created, it will be something like ubuntu@nuw_name:)

Exercise 3

Once inside the container explore, navigate through the folders, try to create new files in the home directory (/home/ubuntu), for instance create a file called text.txt, what happens?

Now exit the container and try to see whether the text.txt file is accessible on the host

Run the container with bindings

To be able to share files and folders between the container and the host, explicit binding paths have to be specified in the format src:dest, where src and dest are paths outside and inside of the container, respectively, separated by a colon (:)

Shared work and archive directories will allow us to access model outputs from the host, even after the container ceased to exist. Also the shared inputdata which was created and populated from the host can be made accessible inside the container

singularity shell --bind $HOME/work:/opt/esm/work,$HOME/inputdata:/opt/esm/inputdata,$HOME/archive:/opt/esm/archive NorESM_user_workshop_2021.sif

This means for instance that the content of the directory known as $HOME/archive on the host can be accessed on /opt/esm/archive inside the container, and vice versa

Create a new simulation, set it up, compile and run it inside the container

The source code for NorESM (release 2.0.5) can be found inside the container in /opt/esm/my_sandbox

A machine named “virtual” has already been configured with the correct compilers, libraries and paths inside the container

Exercise 4

  • Create a new case called “test” in the /opt/esm/archive/cases directory with the NF2000climo compset and f19_f19_mg17 resolution

  • Modify (with the xmlchange tool) the necessary environment variables to only run the simulation for 1 day

  • Change the number of tasks so that all the processors available on the Virtual Machine (single node) are used for all the model components (instead of ‘NTASKS_ATM’: -2, etc.)

  • Then perform the setup, compile and run the simulation

If everything went well the run will start

Note

nohup is a POSIX command which means “no hang up”, its purpose is to execute a command such that it ignores the HUP (hangup) signal and therefore does not stop when the user logs out (From https://en.wikipedia.org/wiki/Nohup)

Then you can type:

  • Ctrl+Z to stop (pause) the program and get back to the shell

  • bg to run it in the background

You can then if you wish exit the container (with the exit command) and re-enter it with the same command that you have used to run it with bindings (singularity shell - - bind … workshop_2021.sif)

Monitor your run

To monitor your job from outside the container while it is still running you can open a 2nd terminal and login like on the 1st one, then type the following command

htop

On the upper part of the window is displayed information about the CPU and memory usage, and in the bottom part is a table listing the various processes which are running on the Virtual Machine, and in this instance the 16 cesm.exe tasks, their process ID, etc.

Exercise 5

Check the simulation progress in the workdir

Run time, cost and model throughput

After the end of the simulation it is possible to obtain general information about the total run time and cost as well as several metrics to facilitate analysis and comparisons with other runs

Exercise 6

Have a look at the timing profile located in the case directory from outside the container

From “outside-in”

Let’s now do the same simulation in a more automated way using the bash script called “job_vm.sh” which comes in the container (and can be extracted from /opt/esm)

cd /home/ubuntu

singularity exec NorESM_user_workshop_2021.sif cp /opt/esm/job_vm.sh .

For the sake of convenience this bash script follows the same structure as the Slurm job batch script which will be used on Betzy

This will export several environmental variables (number of nodes, CPUs, etc.), then create the new case, do the setup, compile, run the simulation and generate the timing profile

Submit the job on the Virtual Machine by typing the following command:

bash job_vm.sh

This time we use the sequence

mpirun singularity … esm.exe

instead of

singularity mpirun … esm.exe

Exercise 7

Monitor this run, and at the end compare the timing profile to the one obtained “inside-out”

Exercise 8

Can you spot any difference between the outputs of the simulations performed from “inside-out” and “outside-in”, for instance by comparing the atm.logs?