This lesson is still being designed and assembled (Pre-Alpha version)

Get familiar with CESM and the computing environment

Overview

Teaching: 30 min
Exercises: 60 min
Questions
  • How to setup CESM on Saga?

  • How to run a CESM case?

  • How to monitor my CESM case?

  • What does CESM produce?

  • What is netCDF data format?

  • How to quickly inspect and visualize netCDF data files?

Objectives
  • Learn to setup CESM on Saga

  • Learn to run and monitor a simple cesm case on Saga

  • Learn about netCDF data format

  • Learn to inspect a netCDF file

  • Learn to quickly visualize a netCDF file

First practical: get familiar with cesm

We do all the practicals on Saga.

Notur Initialization

Make sure you have set up your SSH keys properly and you can transfer files with scp without entering your password. If not go here.

To run CAM-6 on Saga, we will use:

To be able to compile and run CESM on Saga, no changes to the source code are necessary; we just have to adapt a few scripts for setting the compilers and libraries used by CESM.

To simplify and allow you to run CESM as quickly as possible, we have prepared a ready to use version of cesm. It means all the machine specific configuration files for running CESM on Saga have been added.

On Saga:
cd $HOME

module use /cluster/projects/nn1000k/modulefiles
module load cesm/2.1.0

link_dirtree /cluster/projects/nn1000k/cesm/inputdata /cluster/work/users/$USER/inputdata

The commands above allows you to set up your environment (PATH, Libraries, etc.) to use cesm 2.1.0.

All the input data necessary to run our model configuration is in /cluster/work/users/$USER/inputdata (where $USER is your login username on saga). Input data can be large this is why we create symbolic links instead of making several copies (one per user). The main copy is located in /cluster/projects/nn1000k/cesm/inputdata.

Create a New case

The CESM source code is in /cluster/projects/nn1000k/cesm/, and you can have a first look at the code.

We will build and run CAM in its standalone configuration i.e. without having all the other components active.

The basic workflow to run the CESM code is the following:

To create a new case, we will be using create_newcase script.
There are many options and we won’t discuss all of them. The online help provides information about how get the full usage of create_newcase.

On Saga:
create_newcase --help

Command not found

If you get an error when invoking create_newcase make sure you have loaded cesm in your environment:

module use /cluster/projects/nn1000k/modulefiles
module load cesm/2.1.0

The 4 main arguments of create_newcase are explained on the figure below:

On Saga:
#
# Simulation 1: short simulation
#

module use /cluster/projects/nn1000k/modulefiles
module load cesm/2.1.0

create_newcase --case $HOME/cases/F2000climo-f19_g17 --res f19_g17 --compset F2000climo --mach saga --run-unsupported --project nn1000k

The full list of supported grid is given here.

The notation for the compset longname is:

TIME_ATM[%phys]_LND[%phys]_ICE[%phys]_OCN[%phys]_ROF[%phys]_GLC[%phys]_WAV[%phys][_BGC%phys]

The compset longname has the specified order: atm, lnd, ice, ocn, river, glc wave cesm-options.

Where:

Now you should have a new directory in $HOME/cases/F2000climo-f19_g17 corresponding to our new case.

On Saga:
cd $HOME/cases/F2000climo-f19_g17

Check the content of the directory and browse the sub-directories:

For this tests (and all our simulations), we do not wish to have a “cold” start and we will therefore restart and continue an existing simulation we have previously run.

On Saga:
./xmlchange RUN_TYPE=hybrid
./xmlchange RUN_REFCASE=F2000climo.f19_g17.control
./xmlchange RUN_REFDATE=0014-01-01

We use xmlchange, a small script to update variables (such as RUN_TYPE, RUN_REFCASE, etc.) defined in xml files. All the xml files contained in your test case directory will be used by cesm_setup to generate your configuration setup (Fortran namelist, etc.).

On Saga:
ls *.xml

If we do not want the dates to start from 0001-01-01 we need to specify the starting date of our test simulation.

On Saga:
./xmlchange RUN_STARTDATE=0014-01-01

We are also going to change the duration of our test simulation and set it to 1 month only.

On Saga:
./xmlchange  STOP_N=1
./xmlchange  STOP_OPTION=nmonths

Now we are ready to set-up our model configuration and build the cesm executable.

On Saga:
./case.setup

./case.build

After building CESM for your configuration, a new directory (and a set of sub-directories) are created in /cluster/work/users/$USERS/cesm/F2000climo-f19_g17:

Running a case

Namelists can be changed before configuring and building CESM but it can also be done before running your test case. Then, you cannot use xmlchange and update the xml files, you need to directly change the namelist files.

The default history file from CAM is a monthly average, and this is what we are going to use in this lesson.

However, it is possible to change the output frequency with the namelist variable nhtfrq

For instance if we wanted to change the history file from monthly average to daily average, we would have to set the namelist variable nhtfrq to -24.

cat is a unix shell command to display the content of files or combine and create files. Using » followed by a filename (here user_nl_cam) means we wish to concatenate information to a file. If it does not exist, it is automatically created. Using « followed by a string (here EOF) means that the content we wish to concatenate is not in a file but written after EOF until another EOF is found.

Finally, we have to copy the control restart files (contains the state of the model at a given time so we can restart it). The files are stored on NIRD (they were generated from a previous simulation where the model was run for several years).

On Saga:
cd /cluster/work/users/$USER/cesm/F2000climo-f19_g17/run
wget https://zenodo.org/record/3702975/files/F2000climo.f19_g17.control.rest.0014-01-01-00000.tar.gz
tar zxvf F2000climo.f19_g17.control.rest.0014-01-01-00000.tar.gz
mv 0014-01-01-00000/* .

Now we wish to run our model and as it may run for several days, we need to use the batch scheduler (SLURM) from Saga. Its role is to dispatch jobs to be run on the cluster. It reads information given in your job command file (named here .case.run). This file contains information on the number of processors to use (ntasks), the amount of memory per processor (mem-per-cpu) and the maximum amount of time you wish to allow for your job (time).

Check what is in your current job command file (.case.run).

On Saga:
head .case.run
#!/usr/bin/env python
# Batch system directives
#SBATCH  --job-name=F2000climo-f19_
#SBATCH  --nodes=2
#SBATCH  --ntasks-per-node=40
#SBATCH  --output=F2000climo-f19_
#SBATCH  --exclusive
#SBATCH  --mem-per-cpu=4G
#SBATCH  --ntasks=80
#SBATCH  --export=ALL

The lines starting with #SBATCH are not comments but SLURM directives.
You can now submit your test case.

On Saga:
./xmlchange --subgroup case.run JOB_WALLCLOCK_TIME=01:00:00
./case.submit

Why chaning JOB_WALLCLOCK_TIME?

Adjusting the wall clock time for short runs (as here for running 1 month) will allow us to reduce the queuing time.

Monitor your test run

The script “case.submit” submits two jobs (one for running the model and one for the short term archive e.g. storing data for future analysis) to the job scheduler on Saga. More information can be found here.

To monitor your job on Saga:

squeue -u $USER

It will return something like:

JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
444258    normal F2000cli  annefou PD       0:00      2 (None)
444259    normal F2000cli  annefou PD       0:00      1 (Dependency)

Full list of available commands and their usage can be found here.

First look at your 1 month test run

On Saga during your test case run, CAM-6 generates outputs in the “run” directory:


At the end of your experiment, the run directory will only contain files that are needed to continue an existing simulation but all the model outputs are moved to another directory (archive directory). On Saga this directory is semi-temporary which means data will be automatically deleted after a short period of time.

Check your run was successful and generated all the necessary files you need for your analysis.

On Saga:
cd /cluster/work/users/$USER/cesm/F2000climo-f19_g17/run
ls -lrt
cd /cluster/work/users/$USER/archive/F2000climo-f19_g17/atm/hist/
ls -lrt

What is a netCDF file?

Netcdf stands for “network Common Data Form”. It is self-describing, portable, metadata friendly, supported by many languages (including python, R, fortran, C/C++, Matlab, NCL, etc.), viewing tools (like panoply, ncview/ncdump) and tool suites of file operators (in particular NCO and CDO).

Inspect a netCDF file

NetCDF files are often too big to open directly (with your favorite text editor, for instance), however one can look at the content of a netCDF file instead, for example to dump the header of one of the netCDF history files.

On Saga:
module load netCDF/4.6.1-intel-2018b
cd /cluster/work/users/$USER/archive/F2000climo-f19_g17/atm/hist/
ncdump -h F2000climo-f19_g17.cam.h0.2000-06.nc
netcdf F2000climo-f19_g17.cam.h0.2000-06 {
dimensions:
        lat = 96 ;
	lon = 144 ;
	time = UNLIMITED ; // (1 currently)
        nbnd = 2 ;
        chars = 8 ;
        lev = 32 ;
        ilev = 33 ;
variables:
        double lat(lat) ;
                lat:_FillValue = -900. ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
        double lon(lon) ;
                lon:_FillValue = -900. ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
        double gw(lat) ;
                gw:_FillValue = -900. ;
                gw:long_name = "latitude weights" ;
    ....

Key Points

  • CESM

  • High-Performance Computing

  • Saga

  • SLURM

  • netCDF

  • ncdump

  • ncview