Galaxy training and hackathon at Roscoff/France

April, 23-26, 2019

The ELIXIR Galaxy Community organized a 4-days workshop at Roscoff (France) related to Galaxy tool integration and training.

More information here

Participants

Galaxy training

During the first day, we learned about the integration of high-quality tools within Galaxy with their dependencies (Bioconda, Planemo) using the IUC best practice guidelines. We also learned how to use Galaxy as a training tool and develop training material for the Galaxy Training Network.

Hackathon

The second half of the workshop was dedicated to Hackathon sessions where we were able to bring our own project around tool integration and/or training material and develop them collaboratively, with the support of community experts.

NordicESMHub project

NordicESMhub initiative aims at sharing knowledge, tools and resources to study the Earth’s climate and the regional impact of global changes in the Nordic countries
(Denmark, Finland, Sweden, Iceland, Estonia, Norway).

The development of this project is mentored by Bérénice as part of the Mozilla Open Leaders Round 7.

See our roadmap to get more information.

Our objective for the hackathon was to integrate Climate tools into Galaxy Toolshed. Before the hackathon, the climate community was not represented in the Galaxy Community and no tools related to climate Analysis were available.

What we did before the hackathon

We had two major objectives for the hackathon:

Simple python tools

To be ready, we chose to develop 4 simple tools in python:

When arriving to the hackathon, these 4 python tools were ready i.e. developed and tested with test data.

We also did our homework and started to create xml wrappers for Galaxy. For this we got help from Bérénice and the Galaxy Training materials on tool developments. All our Galaxy tools are in our galaxy-tools repository.

Björn Grüning also kindly added Travis Continuous Integration to get ready for automating tests and publishing to Galaxy toolshed.

All our tools were failing… And this is how we started our hackathon in Roscoff/France!

More complex tool: Community Earth System Model (CESM)

And we also chose CESM as a more complex tool to be integrated in Galaxy. The Community Earth System Model is a very important model for us because it is used for Climate Model Intercomparison Project Phase 6 and is one of the most known model used for predicting climate and assessing climate changes.

The Norwegian Earth System Model is derived from CESM, sharing most of its code and used to assess regional impact of climate changes in the Nordic countries. If we manage to integrate CESM in Galaxy, we would also be able to run NorESM in Galaxy!

We usually run CESM/NorESM on Norwegian HPC using intel compilers. However, for small configurations and for teaching purposes, using HPC is an overkill. It adds unnecessary complexity and makes it difficult for new users to focus on Science. We therefore started to compile and run simple configuration (2 degrees resolution) on IaaS Cloud that we have at the University of Oslo.

We managed to compile and run CESM on small Virtual Machines (Ubuntu 18.04, Centos 7) with 32 processors and 64 GB. It is to be noted that we could not use WACCM because it requires 128GB of memory. We tried to create a Docker container but never managed to make it work properly.

What we achieved during the hackathon

Flake8 compliance

Many of our python tools were not suitable for Galaxy Toolshed publishing because of flake8. Flake8 is a python package for checking python codes against coding style (PEP8), programming errors (like “library imported but unused” and “Undefined name”) and to check cyclomatic complexity.

That is a bit cumbersome to fix afterwards and for our future developments, we would enforce it from the start. If your favourite editor is Atom as editor, you can add Atom linter plugin for Python, using flake8.

New datatypes in Galaxy for the Climate Community

Many of our data is encoded in hdf5, netCDF, geoJSON and shapefile.

We were quite lucky because both hdf5 and netCDF datatypes were already included in Galaxy and therefore ready to be used.

Björn added geoJSON and shapefile datatype in Galaxy is in progress.

New conda package and docker/Singularity containers for the Climate Community

In fact, the easiest is always to add your tool to conda. Adding CESM to bioconda channel is a major achievement for us.

It seems a bit controversial for the Climate Community to use bioconda but when we understood that we would also get a docker/singularity container for “free” i.e. without any additional efforts, then we were easily convinced…

The most amazing thing is that you only need to create two files to get your package in bioconda and the associated docker/singularity container! Once a day, docker/singularity containers are automatically created for all new bioconda packages.

To install cesm using bioconda:

conda install -c conda-forge -c bioconda cesm 

We have successfully used the new cesm bioconda package on our Virtual Machines. The performance is exactly the same as if we compile with the GNU compiler.

We also successfully used the new cesm bioconda package on Fram, the current largest HPC in Norway. Our run was about 30% slower than when we compile CESM on Fram with the Intel compiler. There are many things that can explain this drop in performance but our first thought is linked to HDF5: we usually use parallel HDF5 when compiling CESM while in bioconda, HDF5 is not parallel. Of course, we need to perform more tests and anyway, being able to run small configuration on the cloud is definitely a huge achievement for us!

We also started to create the corresponding CESM xml wrapper to add CESM as a new Galaxy tool. We are really looking forward to it. Thanks to the hackathon, there is no technical obstacle anymore; it’s on us!

Thanks Björn for adding CESM to bioconda!

New Climate Analysis Tools in Galaxy Toolshed

A little bit like magic… A new category Climate Analysis appeared in Galaxy Toolshed! Having a team of Galaxy Experts makes everything so much easier!

Getting our simple python tools flake8 compliant was the first step. Then we also needed to link properly our galaxy-tools repository to Travis by:

And after many planemo test and planemo server, we successfully published 3 new tools in the Galaxy Toolshed:

We did not managed to publish smithsonian-volcanoes because the python code still has some problems (names of icelandic volcanoes contain characters that are not utf-8) but being able to publish 3 new tools in the Galaxy Toolshed and more importantly having the framework for publishing new tools made our day!

Thank you!

This Galaxy training and Hackathon was really well organized and a big success for the Climate Community.

A big thank you to all the Scientific coordinators and trainers:

Sorbonne Université/CNRS, ABiMS, Station Biologique de Roscoff, France

Freiburg Galaxy team, Backofen lab, University of Freiburg, Germany

INRA, BIPAA/GenOuest, Rennes, France

Freiburg Galaxy team, Backofen lab, University of Freiburg, Germany

CNRS/Sorbonne Université, ABiMS, Station Biologique de Roscoff, France

And to all participants. These 4 days were really great!

The pictures by Bérénice Batut are licensed under Copyright 2019.