EOSC

EOSC-Nordic Climate Science Workbench roadmap

Video presented at BCC2020 online conference

DOI

Authors: Fouilloux, Anne Claire; Hasan, Adil; Lukkarinen, Ari and Struthers, Hamish

EOSC-Nordic Climate Community, EOSC-Life and EOSC-Pillar

EOSC Nordic activities and synergies with other EOSC projects such as EOSC-Life and EOSC-Pillar.

In this working document, we try to identify areas where EOSC-Nordic and the Nordic Climate Community could collaborate with other EOSC related projects to fotser and advance take-up of of the European Open Science Cloud.

EOSC-Nordic & EOSC-Pillar demonstrators

Both EOSC-Pillar and EOSC-Nordic have use cases and community driven pilots:

Roadmaps

EOSC-Life tool roadmap

Reference: https://github.com/eosc-life/tools-collaboratory-roadmap

EOSC-Nordic tool roadmap for the Climate community

Rather than reinventing the wheel, EOSC-Nordic chose to base its roadmap on the EOSC-Life tools roadmap and reuse as many components as possible for the Climate community. Strengthening collaboration with Galaxy Community and EOSC-Life is an integral part of its plan.

Data

IDC - Intergalactic (reference) Data Commission

Climate (reference) Data

The size of climate datasets can be large (several PB) and making several copies on various Galaxy instance may not be a reasonnable approach.

The current approach for climate datasets is to use cloud storage and zarr so that anyone can seamingless access data remotely (faster access when locally available; should we then move tools to data?)

We usually distinguish “reference” data from “research data”:

Remark: some datasets are already stored on cloud storage such as CMIP6. See CMIP6 public cloud storage datasets

Possible collaboration

The approach chosen by the climate community may be suitable for other communities.

We need to analyze and understand how all these different approaches can be supported by the Galaxy ecosystem (rather than trying to fit everyone within one framework).

The work planned within EOSC-Nordic framework and WP5: Open Research data and services – demonstrators could serve as a base for further collaboration work.

Tools & packaging

So far we have use bioconda so that we would automatically get the corresponding docker container (see https://github.com/BioContainers:

This means that our tools appear in biocontainers.pro. CESM is a concrete example:

Possible collaboration

The collabroation is already effective. A few questions:

Workflows

Workflow hub developments are driven by EOSC-Life. Most of the work is generic and perfectly reusable by other communities.

A workflow Hub Github organization has been created.

Workflow hub is based on SEEK,a web-based cataloguing and commons platform, for sharing heterogeneous scientific research datasets, models or simulations, processes and research outcomes. It preserves associations between them, along with information about the people and organisations.

ro-crate objects can be downloaded from workflow hub.

Workflow Hub Climate Team

Any workflows related to the Climate community and generated within Galaxy (either with interactive JupyterLab or with Galaxy tools) can be deposited to the Climate Team.

BioComputeObj

Possible collaboration

This work could be done by the NICEST2 (Nordic Collaboration on e-Infrastructures for Earth System Modeling) and WP4: ESM workflows to efficiently run NorESM and EC-Earth on euroHPC.

Infrastructure

The Climate community mostly relies on the use of High Performance Computing for running operational simulations. However, with the use of Galaxy, we have demonstrated that cloud computing can provide a better framework for model development, teaching and most data analysis performed by researchers.

EOSC-Nordic investigates (at a policy level) on how to solve issues related to different access, accounting, authentication policies.

Registries

bio.tools

biotoolsregistry (bio.tools) is the Web application of the ELIXIR Tools & Data Services Registry. It allows the curation and discovery of bioinformatics resources including databases, tools, services and so on, available under a variety of interfaces.

The main objective of such registry is to help researchers to find existing tools thus avoid re-inventing the wheel. It is a community based registry which means that anyone can add new tools (you first need to register).

bio.tools depends upon a resource description model: biotoolsSchema. This description model is in XML and can be found here or in json format.

The EDAM ontology is used in biotoolsSchema for operations, types of data, data identifiers, data formats, and topics.

EDAM ontology

EOSC-Life makes use of the EDAM ontology for bioinformatics operations, types of data, data identifiers, data formats, and topics.

EDAM ontology is used by bio.tools.

The EDAM ontology also covers the needs of the ecology community.

biocontainers

As mentioned above, tools we have added to bioconda are also listed in biocontainers.pro.

Possible collaboration

geo.tools & geocontainers.pro

With other communities interested in having registries for their tools and containers, the main question is whether we should “duplicate” or “generalize” what has been done for the Life Science community.

EDAM ontology

This work could be done within EOSC-Nordic WP5 and NICEST2 WP3: FAIR climate data for NorESM and EC-Earth. This would be beneficial for Climate tool registries, container registries and Galaxy tools.

Training

EOSC-Life Training Open Call

An application was submitted to EOSC-Life Training Open Call and accepted.

Development of CLM-FATES training course

The main objective of the course is to learn how to compose and execute repeatable and reproducible modelling workflow with FATES (Functional Assembled Terrestrial Ecosystem Simulator). FATES is a numerical terrestrial ecosystem model for use in Earth System Models (ESM) that simulates and predicts growth, death, and regeneration of plants and subsequent tree size distributions. The simulation is done by allowing plants with different traits to compete for light, water, and nutrients, within an environment that tracks both natural and anthropogenic disturbance and recovery. FATES introduces ecosystem demography and dynamic vegetation into the structure of the land surface components of ESMs, allowing for full coupling to global atmosphere, ocean, and sea-ice processes. FATES is currently coupled to several Earth System Models, including CESM (Community Earth System Model), CTSM (Community Terrestrial Systems Model) and E3SM (Energy Exascale Earth System Model).

The learning objectives include:

During BCC2020 CollaborationFest we have started to draft a Galaxy tutorial:

More information will come as the project starts.