Video presented at BCC2020 online conference
Authors: Fouilloux, Anne Claire; Hasan, Adil; Lukkarinen, Ari and Struthers, Hamish
In this working document, we try to identify areas where EOSC-Nordic and the Nordic Climate Community could collaborate with other EOSC related projects to fotser and advance take-up of of the European Open Science Cloud.
Rather than reinventing the wheel, EOSC-Nordic chose to base its roadmap on the EOSC-Life tools roadmap and reuse as many components as possible for the Climate community. Strengthening collaboration with Galaxy Community and EOSC-Life is an integral part of its plan.
The size of climate datasets can be large (several PB) and making several copies on various Galaxy instance may not be a reasonnable approach.
The current approach for climate datasets is to use cloud storage and zarr so that anyone can seamingless access data remotely (faster access when locally available; should we then move tools to data?)
We usually distinguish “reference” data from “research data”:
The approach chosen by the climate community may be suitable for other communities.
We need to analyze and understand how all these different approaches can be supported by the Galaxy ecosystem (rather than trying to fit everyone within one framework).
The work planned within EOSC-Nordic framework and WP5: Open Research data and services – demonstrators could serve as a base for further collaboration work.
The collabroation is already effective. A few questions:
A workflow Hub Github organization has been created.
Workflow hub is based on SEEK,a web-based cataloguing and commons platform, for sharing heterogeneous scientific research datasets, models or simulations, processes and research outcomes. It preserves associations between them, along with information about the people and organisations.
ro-crate objects can be downloaded from workflow hub.
Any workflows related to the Climate community and generated within Galaxy (either with interactive JupyterLab or with Galaxy tools) can be deposited to the Climate Team.
This work could be done by the NICEST2 (Nordic Collaboration on e-Infrastructures for Earth System Modeling) and WP4: ESM workflows to efficiently run NorESM and EC-Earth on euroHPC.
The Climate community mostly relies on the use of High Performance Computing for running operational simulations. However, with the use of Galaxy, we have demonstrated that cloud computing can provide a better framework for model development, teaching and most data analysis performed by researchers.
EOSC-Nordic investigates (at a policy level) on how to solve issues related to different access, accounting, authentication policies.
biotoolsregistry (bio.tools) is the Web application of the ELIXIR Tools & Data Services Registry. It allows the curation and discovery of bioinformatics resources including databases, tools, services and so on, available under a variety of interfaces.
The main objective of such registry is to help researchers to find existing tools thus avoid re-inventing the wheel. It is a community based registry which means that anyone can add new tools (you first need to register).
The EDAM ontology is used in biotoolsSchema for operations, types of data, data identifiers, data formats, and topics.
Tools can be added either manually or automatically, harvesting scientific publications (mostly from PubMed.gov.
The Climate Community could reuse what is done for bio.tools. The EDAM ontology would need to be extended.
EOSC-Life makes use of the EDAM ontology for bioinformatics operations, types of data, data identifiers, data formats, and topics.
EDAM ontology is used by bio.tools.
The EDAM ontology also covers the needs of the ecology community.
As mentioned above, tools we have added to bioconda are also listed in biocontainers.pro.
With other communities interested in having registries for their tools and containers, the main question is whether we should “duplicate” or “generalize” what has been done for the Life Science community.
This work could be done within EOSC-Nordic WP5 and NICEST2 WP3: FAIR climate data for NorESM and EC-Earth. This would be beneficial for Climate tool registries, container registries and Galaxy tools.
An application was submitted to EOSC-Life Training Open Call and accepted.
The main objective of the course is to learn how to compose and execute repeatable and reproducible modelling workflow with FATES (Functional Assembled Terrestrial Ecosystem Simulator). FATES is a numerical terrestrial ecosystem model for use in Earth System Models (ESM) that simulates and predicts growth, death, and regeneration of plants and subsequent tree size distributions. The simulation is done by allowing plants with different traits to compete for light, water, and nutrients, within an environment that tracks both natural and anthropogenic disturbance and recovery. FATES introduces ecosystem demography and dynamic vegetation into the structure of the land surface components of ESMs, allowing for full coupling to global atmosphere, ocean, and sea-ice processes. FATES is currently coupled to several Earth System Models, including CESM (Community Earth System Model), CTSM (Community Terrestrial Systems Model) and E3SM (Energy Exascale Earth System Model).
The learning objectives include:
During BCC2020 CollaborationFest we have started to draft a Galaxy tutorial:
More information will come as the project starts.