All source codes are available on github. Forks (to allow the Nordic Climate community to make joint development) are made available in the NordicESMHub github organization.
Based on Climate workbench:
To be used in Galaxy, one needs to:
Tool | Description | Reference |
---|---|---|
cds_essential_variability | Get Copernicus Essential Climate Variables for assessing climate variability | Copernicus CDS |
climate_stripes | Create climate stripes from a tabular input file | |
psy_maps | Visualization of regular geographical data on a map with psyplot | |
shift_longitudes | Shift longitudes ranging from 0. and 360 degrees to -180. and 180. degrees | |
gdal | GDAL Geospatial Data Abstraction Library functions | |
mean_per_zone | Creates a png image showing statistic over areas as defined in the vector file |
Interactive tools are currently not published in the toolshed (thus may not be easily findable and more difficult to deploy on other Galaxy instances).
Some Machine Learning tools are already available as Galaxy Tools on the default Galaxy Europe instance.
Users can access it through the Machine Learning Workbench:
In this section we list the most important tools that have been integrated into the Machine Learning workbench. There are many more tools available so please have a more detailed look at the tool panel. For better readability, we have divided them into categories.
Identifying which category an object belongs to.
Tool | Description | Reference |
---|---|---|
“SVM Classifier” | Support vector machines (SVMs) for classification | Pedregosa et al. 2011 |
“NN Classifier” | Nearest Neighbors Classification | Pedregosa et al. 2011 |
“Ensemble classification” | Ensemble methods for classification and regression | Pedregosa et al. 2011 |
“Discriminant Classifier” | Linear and Quadratic Discriminant Analysis | Pedregosa et al. 2011 |
“Generalized linear” | Generalized linear models for classification and regression | Pedregosa et al. 2011 |
“CLF Metrics” | Calculate metrics for classification performance | Pedregosa et al. 2011 |
Predicting a continuous-valued attribute associated with an object.
Tool | Description | Reference |
---|---|---|
“Ensemble regression” | Ensemble methods for classification and regression | Pedregosa et al. 2011 |
“Generalized linear” | Generalized linear models for classification and regression | Pedregosa et al. 2011 |
“Regression metrics” | Calculate metrics for regression performance | Pedregosa et al. 2011 |
Automatic grouping of similar objects into sets.
Tool | Description | Reference |
---|---|---|
“Numeric clustering” | Different numerical clustering algorithms | Pedregosa et al. 2011 |
Building general machine learning models.
Tool | Description | Reference |
---|---|---|
“Estimator Attributes” | Estimator attributes to get all attributes from an estimator or scikit object | Pedregosa et al. 2011 |
“Stacking Ensemble Models” | Stacking Ensembles to build stacking, voting ensemble models with numerous base options | Pedregosa et al. 2011 |
“Search CV” | Hyperparameter Search performs hyperparameter optimization using various SearchCVs | Pedregosa et al. 2011 |
“Build Pipeline” | Pipeline Builder as an all-in-one platform to build pipeline, single estimator, preprocessor and custom wrappers | Pedregosa et al. 2011 |
Evaluation, validating and choosing parameters and models.
Tool | Description | Reference |
---|---|---|
“Model validation” | Model Validation includes cross_validate, cross_val_predict, learning_curve, and more | Pedregosa et al. 2011 |
“Pairwise Metrics” | Evaluate pairwise distances or compute affinity or kernel for sets of samples | Pedregosa et al. 2011 |
“Train/Test evaluation” | Train, Test and Evaluation to fit a model using part of dataset and evaluate using the rest | Pedregosa et al. 2011 |
“Model Prediction” | Model Prediction predicts on new data using a preffited model | Chollet et al. 2011 |
“Fitted model evaluation” | Evaluate a Fitted Model using a new batch of labeled data | Pedregosa et al. 2011 |
“Model fitting” | Fit a Pipeline, Ensemble or other models using a labeled dataset | Pedregosa et al. 2011 |
Feature selection and preprocessing.
Tool | Description | Reference |
---|---|---|
“Data preprocessing” | Preprocess raw feature vectors into standardized datasets | Pedregosa et al. 2011 |
“Feature selection” | Feature Selection module, including univariate filter selection methods and recursive feature elimination algorithm | Pedregosa et al. 2011 |
Build and use deep neural networks.
Tool | Description | Reference |
---|---|---|
“Batch Models” | Build Deep learning Batch Training Models with online data generator for Genomic/Protein sequences and images | Chollet et al. 2011 |
“Model Builder” | Create deep learning model with an optimizer, loss function and fit parameters | Chollet et al. 2011 |
“Model Config” | Create a deep learning model architecture using Keras | Chollet et al. 2011 |
“Train and evaluation” | Deep learning training and evaluation either implicitly or explicitly | Chollet et al. 2011 |
Plotting and visualization.
Tool | Description | Reference |
---|---|---|
“Regression performance plots” | Plot actual vs predicted curves and residual plots of tabular data | |
ML performance plots” | Plot confusion matrix, precision, recall and ROC and AUC curves of tabular data | |
“Visualization” | Machine Learning Visualization Extension includes several types of plotting for machine learning | Chollet et al. 2011 |
General data and table manipulation tools.
Tool | Description | Reference |
---|---|---|
“Table compute” | The power of the pandas data library for manipulating and computing expressions upon tabular data and matrices. | |
“Datamash operations” | Datamash operations on tabular data | |
“Datamash transpose” | Transpose rows/columns in a tabular file | |
“Sample Generator” | Generate random samples with controlled size and complexity | Pedregosa et al. 2011 |
“Train/Test splitting” | Split Dataset into training and test subsets | Pedregosa et al. 2011 |
Galaxy Training Material github The fork in the NordicESMHub github organization is available here.
The Galaxy training material is generated automatically. The procedure to develop and publish new training material is explained here.
Training material relevant for the Climate community:
A new topic called Climate gathers all the training material specifically related to Climate Analysis:
New training material planned:
Training material on Machine Learning: