How to work during the course

You will be working on your Galaxy instance throughout the course and produce two different kinds of results:

  • source codes: jupyter notebooks, python scripts

  • data: netCDF, tabular/csv, etc. and figures/plots (png, HTML, etc.)

We will use github to save our code changes (jupyter notebooks, python scripts) and our private s3-compatible shared storage (http://forces2021.uiogeo-apps.sigma2.no/) for storing data (netCDF, tabular/csv, etc.) and plots/figures (png, HTML, etc.)

Clone the forces-2021 git repo to your Galaxy instance

  • start a terminal window in your JupyterLab to clone our github respository:

    1. make a personal token (https://www.shanebart.com/clone-repo-using-token/) and make sure you save it in a safe place!

    2. type:

        git clone https://github.com/NordicESMhub/forces-2021
        password = token
      
    3. use the sidebar to navigate in the github structure.

    4. Find code examples under content/learning/example-notebooks

Warning

As previously mentioned, make sure to push your code changes regularly to github.

Save your notebooks

  1. Always add your name in the filename to avoid any mistakes (overwrite other’s notebooks, etc.) such as eClimate_2021_NAME.ipynb

  2. Add/update your notebook in the private forces-2021 repository:

    • If you are familiar with git and github: make a new branch, add your notebook in the notebooks folder and make a Pull Request;

    • otherwise:

If you want to learn more about sharing your notebooks, check our section on “How to share your notebooks?”.

Data Management Plan

It is important to think about what you will be using (input data), what you will be producing (jupyter notebooks, python scripts, datasets) and what to do with all the different data (before, during and after your project is completed).

Write a small document (can be in a Jupyter notebook or mardown), detailing:

  • What input data is needed/used for your project (update it whenever you have new information to add):

    • Do they have a unique identifier (DOI), are they available from an authoritative data provider, etc. Provide as much information as you would need to find/reuse the same data in a year or more;

  • What data will you generate?

    • python scripts and jupyter notebooks should be saved in for forces-2021 private repository

    • new generated datasets, plots/figures should be saved in http://forces2021.uiogeo-apps.sigma2.no/: bucket: work and then create a new folder with your name (no spaces or special characters). You may also want to save your python scripts and jupyter notebooks there too before you upload the final jupyter notebook & python scripts;

If you have any doubts, ask your group leader.

Organize my project directories

Most people who have done a bit of data analysis have experienced that your working directories can quickly become an unintelligible mess. Therefore, make sure to keep a clean house, with e.g. inputs in one corner, outputs in another, and your notebooks in yet a third. For instance, in http://forces2021.uiogeo-apps.sigma2.no/, bucket: work/yourname, I created 3 folders:

  • input: I will have a document with a list of input data I am using, with the DOI for each datasets;

  • tool: I regularly save all my jupyter notebooks and scripts (remember that you should also upload them to the private github repository forces-2021, at least at the end of the course)

  • output: I created two sub-folders:

    • data: I save all the new datasets (data files), including intermediate results;

    • figures: I save all my important plots/figures (png, HTML, etc.)