This lesson is still being designed and assembled (Pre-Alpha version)

Long-term archiving

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • How to archive a CESM experiment?

  • How to clean my storage area?

Objectives
  • Learn about long-term archiving

  • Learn to archive CESM experiment

  • Clean storage area

Archive experiment on the NIRD archive

For more information, read the NIRD Archive User guide.

Log onto the Web Interface

To access the Archive web interface, direct your browser to: https://archive.sigma2.no and click on “Deposit”.

Choose FEIDE as the Identity provider and click on Login. Use your UIO username and password to login.

Request Approval (First time users)

If you have never used the Archive before you will be presented with a page informing you that your Feide account is not registered.

You can submit a request for access from this page. Only approved users are allowed to deposit datasets in the Archive.

The Archive Manager will contact you if additional information is required. Approval should be granted within 3 business days (and usually much sooner).

Initiate Deposit request

You probably had to wait for your access request to be approved. So now when you login again, make sure you click on “Deposit” (as usual, use your UIO username and password to login). When you click on Deposit, you will be presented with a page containing a short introduction to the Archive and a link to the Terms and Conditions as shown in the figure below.

The Terms and Conditions outline your responsibilities and those of the Archive. You will need to agree (tick to agree and click on Proceed) to these before you can start the deposit process.

Upload dataset from the NIRD project area

The goal of the Archive is to provide long-term storage for datasets that are considered to be of lasting value. Your experiment is valuable for CAM-6 learners and students from GEO4962 in the coming years.
However, it is not subject to a scientific publication so tick “Unrefereed document (thesis, presentation, proceedings)” and add the course github website as URI:

Part of the course GEO4962 "The general circulation of the Atmosphere".
See https://nordicesmhub.github.io/GEO4962/

Click on “next” and a new section will appear where information onyour data needs to be provided:

Adjust the title according to your experiment (here I gave the CO2 experiment as an example), and set the Data Manager and Rights Holder (you) for your dataset.

Once completed, click on “save dataset information”.
A last section will appear on the same web page where you can choose how to upload your dataset. Click on “Project Area”:

You will receive an email with some intructions to upload your dataset. The upload is initiated from the web interface and completed by running the command-line script “ArchiveDataset” on login.nird.sigma2.no.
Once archived and your post-processing and visualization are done, you can delete all the files stored on the NIRD project area.

On NIRD (login.nird.sigma2.no)

:`

cd /projects/NS1000K/climate/GEO4962/outputs/$USER/archive

As stated in the email you received, you need to create a "Manifest" file that contains a list of files that make-up your dataset. We wish to archive everything (including files and subdirectories) in directory /projects/NS1000K/climate/GEO4962/outputs/$USER/archive/F2000climo-f19_g17.$EXPNAME (where EXPNAME is your Experiment Name):

cd /projects/NS1000K/climate/GEO4962/outputs/$USER/archive

# Define properly EXPNAME (here CO2 but adjust it: sea_ice, SST, rocky)

export EXPNAME=CO2
cat > archiveFiles$EXPNAME.txt << EOF
/projects/NN1000K/climateGEO4962/outputs/$USER/archive///F2000climo-f19_g17.$EXPNAME
EOF

`

The 3 slashes before “archive” allows to remove the prefix /projects/NN1000K/climate/GEO4962/outputs/$USER/archive from the final path in the archive. It is important as you wish to be able to download your dataset anywhere and not necessarily onthe NIRD project area /projects/NN1000K/climate/GEO4962/outputs/$USER/archive.
Now you can upload your dataset:`

# Do not use this UUID: a UUID is unique for a given archive request. 
# The UUID you need to use is defined in the email you recieved (at the top after the title)
# Please let us know if you need help before archiving your dataset!

export UUID=EA9D4DC5-7E5A-4C68-B9E9-850F1910B4B0

ArchiveDataset $UUID archiveFiles$EXPNAME.txt

NOTE that once a dataset has been archived using the “ArchiveDataset” script it is considered closed and it is not possible to add more files to the dataset. You will need to create a new dataset if you wish to update the dataset.

Publish Dataset (Archiving data)

Clean your storage area on NIRD

Once your experiments are archived on the NIRD archive, you can delete files on the NIRD project area:

On NIRD (login.nird.sigma2.no)

:
`

rm -rf /projects/NS1000K/climate/GEO4962/outputs/$USER/runs
rm -rf /projects/NS1000K/climate/GEO4962/outputs/$USER/archive

`
Now if you wish to work again on your experiment, you need to download it from the NIRD archive. According your dataset is public, anyone can search for it (this is why it is important to give meaningful title and keywords!) and download it.

Key Points

  • long-term archiving

  • clean