NO2 over Spain with CAMS European air quality analysis using RELIANCE services¶

Analysis over a particular country and a town in the country of interest¶

How to discover RELIANCE datacube resources (spatial & temporal search and subsetting), share resources using EGI datahub, and use RoHub to create FAIR digital Objects

This notebook shows how to discover and access the Copernicus Atmosphere Monitoring products available in the RELIANCE datacube resources, by using the functionalities provided in the Adam API . The process is structured in 7 steps, including example of data analysis and visualization with the Python libraries installed in the Jupyter environment as well as the creation of a FAIR digital object on RoHUB where all the resources used and generated in this notebook are aggregated.

You can customize this Jupyter notebook, for instance by updating the content of Data Management section.

1. Data Management
2. Authentication
3. Datasets Discovery
4. Products Discovery
5. Data Access
6. Data Analysis and Visualizarion
7. Create Research Object and Share my work

Step 1: Data Management¶

Authors¶

Make sure you first register to RoHub at https://reliance.rohub.org/.
We recommend you use your ORCID identifier to login and register to EOSC services.
In the list of authors, add any co-authors using the email address they used when they registered in RoHub.

author_emails = ['annefou@geo.uio.no']
contributor_emails = ['jeani@uio.no', 'mantovani@meeo.it']

Add the University of Olso and the Nordic e-Infrastructure Collaboration as publishers¶

UiO_organization = {"org_id":"http://www.uio.no/english/", 
                     "display_name": "University of Oslo", 
                     "agent_type": "organization",
                     "ror_identifier":"01xtthb56",
                     "organization_url": "http://www.uio.no/english/"}

NeIC_organization = {"org_id":"https://neic.no/",
                    "display_name": "Nordic e-Infrastructure Collaboration", 
                     "agent_type": "organization",
                    "ror_identifier":"04jcwf484",
                    "organization_url": "https://neic.no/"}

list_publishers = [UiO_organization, NeIC_organization]

list_copyright_holders = [UiO_organization]

Add the funding¶

if your work is not funded set

funded_by = {}

funded_by = {
"grant_id": "101017502",
"grant_Name": "RELIANCE",
"grant_title": "Research Lifecycle Management for Earth Science Communities and Copernicus Users",
"funder_name": "European Commission",
"funder_doi": "10.13039/501100000781",
}

Choose a license for your FAIR digital object¶

pip install rohub

import rohub

licenses = rohub.list_available_licenses()
# Update line below to print more licenses
licenses[0:5]

license = 'MIT'

Organize my data using EGI datahub¶

Define a prefix for my project (you may need to adjust it for your own usage on your infrastructure).
- input folder where all the data used as input to my Jupyter Notebook is stored (and eventually shared)
- output folder where all the results to keep are stored
- tool folder where all the tools, including this Jupyter Notebook will be copied for sharing
Create all corresponding folders

Import Python packages¶

import os
import warnings
import pathlib

warnings.filterwarnings('ignore')

Initialization¶

Choose a country and add its name and country code
Choose the variable to analyze (PM10, PM25, NO2, O3, etc.)
Choose the area for your analysis

Choose the country of interest¶

country_code = 'ES' 
country_fullname = "Spain"
town_fullname = 'Madrid' 
town_coordinates = {'latitude': 40.4168, 'longitude': 3.7038}
variable_name = 'NO2'
variable_unit = 'µg m-3'
variable_long_name = 'Nitrogen Dioxide'
month_name = 'April'
month_number = '04'
month_nb_days = '30'

Geojson for selecting data from ADAM¶

The geometry field is extracted from a GeoJSON file, retrieving the value of the “feature” element.
To create a geojson file for the area of interest, you can use https://geojson.io/
Then paste the result below in the geojson variable

geojson = """{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon","coordinates": [[[3.05419921875,42.601619944327965],[-1.69189453125,43.46886761482925],[-8.10791015625,43.866218006556394],[-9.60205078125,43.03677585761058],[-9.11865234375,42.24478535602799],[-9.03076171875,40.245991504199026],[-9.580078125,39.07890809706475],[-9.73388671875,38.70265930723801],[-9.25048828125,38.30718056188316],[-8.942871093749998,38.25543637637947],[-9.052734375,37.142803443716836],[-9.29443359375,36.79169061907076],[-8.10791015625,36.89719446989036],[-7.778320312499999,36.79169061907076],[-7.27294921875,37.07271048132943],[-6.78955078125,36.86204269508728],[-6.17431640625,36.06686213257888],[-5.69091796875,35.90684930677121],[-5.09765625,36.08462129606931],[-4.74609375,36.33282808737917],[-4.10888671875,36.59788913307022],[-3.09814453125,36.54494944148322],[-2.43896484375,36.56260003738545],[-2.04345703125,36.63316209558658],[-1.69189453125,37.16031654673677],[-1.34033203125,37.43997405227057],[-0.439453125,37.49229399862877],[-0.59326171875,37.75334401310656],[-0.37353515625,38.272688535980976],[0.263671875,38.59970036588819],[0.3955078125,38.839707613545144],[0.06591796875,38.94232097947902],[-0.17578125,39.2832938689385],[-0.19775390625,39.58875727696545],[0.24169921874999997,39.977120098439634],[0.68115234375,40.463666324587685],[1.07666015625,40.83043687764923],[1.58203125,41.062786068733026],[2.2412109375,41.178653972331674],[2.83447265625,41.541477666790286],[3.33984375,41.73852846935917],[3.3618164062499996,42.13082130188811],[3.05419921875,42.601619944327965]]]}}]}"""

Create folders¶

WORKDIR_FOLDER = os.path.join(os.environ['HOME'], "datahub/Reliance/Climate" + '_' + country_code + '_' + variable_name + '_' + month_name)
print("WORKDIR FOLDER: ", WORKDIR_FOLDER)

INPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'input')
OUTPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'output')
TOOL_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'tool')

list_folders = [INPUT_DATA_DIR, OUTPUT_DATA_DIR, TOOL_DATA_DIR]

for folder in list_folders:
    pathlib.Path(folder).mkdir(parents=True, exist_ok=True)

Geojson file for selecting data from ADAM¶

We dissolve geojson in case we have more than one polygon and then save the results into a geojson file

import cartopy
import geopandas as gpd

local_path_geom = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
local_path_geom

if (pathlib.Path(local_path_geom).exists()):
    os.remove(local_path_geom)
f = open(local_path_geom, "w")
f.write(geojson)
f.close()

data = gpd.read_file(local_path_geom)

single_shape = data.dissolve()

Show area of interest¶

single_shape.plot()

if (pathlib.Path(local_path_geom).exists()):
    os.remove(local_path_geom)

single_shape.to_file(local_path_geom, driver='GeoJSON')

Step 2: Authentication¶

The following lines of code will show the personal Adam API-Key of the user and the endpoint currently in use, that provides access to the products in the related catalogue. At the end of the execution, if the authentication process is successfull the personal token and the expiration time should be returned as outputs.

pip install adamapi

adam_key = open(os.path.join(os.environ['HOME'],"adam-key")).read().rstrip()

import adamapi as adam
a = adam.Auth()

a.setKey(adam_key)
a.setAdamCore('https://reliance.adamplatform.eu')
a.authorize() 

Step 3: Datasets Discovery¶

After authorization, the user can browse the whole catalogue, structured as a JSON object after a pagination process, displaying all the available datasets. This operation can be executed with the getDatasets() function without including any argument. Some lines of code should be added to parse the Json object and extract the names of the datasets.The Json object can be handled as a Python dictionary.

Pre-filter datasets¶

We will discover all the available datasets in the ADAM platform but will only print elements of interest EU_CAMS e.g. European air quality datasets from Copernicus Atmosphere Monitoring Service

def list_datasets(a, search="", dataset_name=""):
    datasets = adam.Datasets(a)
    catalogue = datasets.getDatasets()
    datasetID = None

# Extracting the size of the catalogue

    total = catalogue['properties']['totalResults']
    items = catalogue['properties']['itemsPerPage']
    pages = total // items
    
    print('\033[1;34m')
    print('----------------------------------------------------------------------')
    print( 'List of available datasets:')
    print ('\033[0;0m')

# Extracting the list of datasets across the whole catalogue

    for i in range(0, pages):
        page = datasets.getDatasets(page=i)
        for element in page['content']:
            if search == "" or search in element['title']:
                print(element['title'] + " --> datasetId = " + element['datasetId'])
                if element['datasetId'].split(':')[1] == dataset_name:
                    datasetID = element['datasetId']
    return datasets, datasetID

datasets, datasetID = list_datasets(a, search="CAMS", dataset_name = 'EU_CAMS_SURFACE_' + variable_name + '_G')

We are interested by one variable only so we will discover the corresponding dataset and print its metadata, showing the data provenance.

def get_metadata(datasetID, datasets, verbose=False):
    print('\033[1;34m' + 'Metadata of ' + datasetID + ':')
    print ('\033[0;0m')
    
    paged = datasets.getDatasets(datasetID)
    for i in paged.items():
        print("\033[1m" +  str(i[0]) + "\033[0m" + ': ' + str(i[1]))
    return paged

metadata_variable = get_metadata(datasetID, datasets, verbose=True)

Step 4: Products Discovery¶

The products discovery operation related to a specific dataset is implemented in the Adam API with the getProducts() function. A combined spatial and temporal search can be requested by specifying the datasetId for the selected dataset, the geometry argument that specifies the Area Of Interest, and a temporal range defined by startDate and endDate . The geometry must always be defined by a GeoJson object that describes the polygon in the counterclockwise winding order. The optional arguments startIndex and maxRecords can set the list of the results returned as an output. The results of the search are displayed with their metadata and they are sorted starting from the most recent product.

Search data¶

pip install geojson_rewind

from geojson_rewind import rewind
import json

The GeoJson object needs to be rearranged according to the counterclockwise winding order. This operation is executed in the next few lines to obtain a geometry that meets the requirements of the method. Geom_1 is the final result to be used in the discovery operation.

with open(local_path_geom) as f:
    geom_dict = json.load(f)
output = rewind(geom_dict)    
geom_1 = str(geom_dict['features'][0]['geometry'])

Copernicus air quality analyses are hourly product but when we select a given date, we will only get the first 10 products. Below, we make a list of the first 10 available products for the 1st day of the studied month in 2020 e.g. we restrict our search to this date.

start_date = '2019-' + month_number + '-01'
end_date = start_date

search = adam.Search( a )
results = search.getProducts(
    datasetID, 
    geometry=geom_1,
    startDate=start_date,
    endDate=end_date
 )

# Printing the results

print('\033[1;34m' + 'List of available products (maximum 10 products printed):')
print ('\033[0;0m')

count = 1
for i in results['content']:
        print("\033[1;31;1m" + "#" + str(count))
        print ('\033[0m')
        for k in i.items():
            print(str(k[0]) + ': ' + str(k[1]))
        count = count+1
        print('------------------------------------')

Step 5: Data Access¶

After the data discovery operation that retrieves the availability of products in the catalogue, it is possible to access the data with the getData function. Each product in the output list intersects the selected geometry and the following example shows how to access a specific product from the list of results obtained in the previous step. While the datasetId is always a mandatory parameter, for each data access request the getData function needs only one of the following arguments: geometry or productId , that is the value of the _id field in each product metadata. In the case of a spatial and temporal search the geometry must be provided to the function, together with the time range of interest. The output of the getData function is always a .zip file containing the data retrieved with the data access request, providing the spatial subset of the product. The zip file will contain a geotiff file for each of the spatial subsets extracted in the selected time range.

Define a function to select a time range and get data¶

def getZipData(auth, dataset_info):
    if not (pathlib.Path(pathlib.Path(dataset_info['outputFname']).stem).exists() or pathlib.Path(dataset_info['outputFname']).exists()):
        data=adam.GetData(auth)
        image = data.getData(
        datasetId = dataset_info['datasetID'],
        startDate = dataset_info['startDate'],
        endDate = dataset_info['endDate'],
        geometry = dataset_info['geometry'],
        outputFname = dataset_info['outputFname'])
        print(image)

Get variable of interest for each day of the month we study for 2019, 2020 and 2021 (time 00:00:00)¶

This process can take a bit of time so be patient!

import time
from IPython.display import clear_output

start = time.time()

for year in ['2019', '2020', '2021']:
    datasetInfo = {
    'datasetID' : datasetID,
    'startDate' : year + '-' + month_number + '-01',
    'endDate' : year + '-' + month_number + '-' + month_nb_days,
    'geometry' : geom_1,
    'outputFname' : INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'
    }
    getZipData(a, datasetInfo)
    
end = time.time()
clear_output(wait=True)
delta1 = end - start
print('\033[1m'+'Processing time: ' + str(round(delta1,2)))

Step 6: Data Analysis and Visualization¶

The data retrieved via the Adam API is now available as a zip file that must be unzipped to directly handle the data in a geotiff format. Then with the Python packages provided in the Jupyter environment it is possible to process and visualized the requested product.

Unzip data¶

import zipfile

def unzipData(filename, out_prefix):
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall(path = os.path.join(out_prefix, pathlib.Path(filename).stem))

for year in ['2019', '2020', '2021']:
    filename = INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'
    target_file = pathlib.Path(os.path.join(INPUT_DATA_DIR, pathlib.Path(pathlib.Path(filename).stem)))
    if not target_file.exists():
        unzipData(filename, INPUT_DATA_DIR)

Read data and make a monthly average¶

import xarray as xr
import xesmf as xe
import glob

We need to regrid data.

def read_file(filename, variable, metadata, factor=1):
    tmp = xr.open_rasterio(filename, parse_coordinates=True)
    # Convert our xarray.DataArray into a xarray.Dataset
    tmp = tmp.to_dataset('band')*factor
    # Rename the dimensions to make it CF-convention compliant
    tmp = tmp.rename_dims({'y': 'latitude', 'x':'longitude'})
    # Rename the variable to a more useful name
    tmp = tmp.rename_vars({1: variable, 'y':'latitude', 'x':'longitude'})
    tmp[variable].attrs = {'units' : metadata['units'], 'long_name' : metadata['description']}
    return tmp

output_grid = read_file(INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_2021/eu_cams_surface_' + variable_name.lower() + '_g_2021-' + month_number + '-' + month_nb_days + 't000000.tif', variable_name, metadata_variable)
output_grid

We now read these files using xarray. First, we make a list of all the geotiff files in a given folder. To ensure each raster is labelled correctly with its time, we can use a helper function paths_to_datetimeindex() to extract time information from the file paths we obtained above. We then load and concatenate each dataset along the time dimension using xarray.open_rasterio(), convert the resulting xarray.DataArray to a xarray.Dataset, and give the variable a more useful name (PM10)

from datetime import datetime

def paths_to_datetimeindex(paths):
    return  [datetime.strptime(date.split('_')[-1].split('.')[0], '%Y-%m-%dt%f') for date in paths]

def getData(dirtif, variable, metadata, factor=1, grid_out=None):
    geotiff_list = glob.glob(dirtif)
    # Create variable used for time axis
    time_var = xr.Variable('time', paths_to_datetimeindex(geotiff_list))
    # Load in and concatenate all individual GeoTIFFs
    xarray_list = []
    if grid_out is not None:
        nlats = len(grid_out.latitude.values)
        nlons = len(grid_out.longitude.values)
    for i in geotiff_list:
        tmp = read_file(i, variable, metadata, factor=factor)
        if grid_out is not None:
            print("regridding ", i)
            regridder = xe.Regridder(tmp, grid_out, 'conservative')
            tmp_regrid = regridder(tmp, keep_attrs=True)
            xarray_list.append(tmp_regrid)
        else:
            xarray_list.append(tmp)
    #print(xarray_list[0:2])
    geotiffs_da = xr.concat(xarray_list, dim=time_var)
    return geotiffs_da

geotiff_ds = getData( INPUT_DATA_DIR + '/' + variable_name + '_'+ country_code + '_ADAMAPI_20*/*.tif', variable_name, metadata_variable, factor=1.e9, grid_out=output_grid)
geotiff_ds[variable_name].attrs = {'units' : variable_unit, 'long_name' : variable_long_name }
geotiff_ds

Analyze data¶

Make yearly average for the month we study

geotiff_dm = geotiff_ds.groupby('time.year').mean('time', keep_attrs=True)

geotiff_dm

Visualize data¶

pip install cmaps "holoviews<1.14.8" GeoViews cartopy

import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import cmaps

# To plot over Norway, taking a central longitude of 60 is fine. You may want to change it when plotting over different geographical areas
central_longitude = town_coordinates['latitude']

# generate figure
proj_plot = ccrs.Mercator(central_longitude=central_longitude)

lcmap = cmaps.BlueYellowRed
# Only plot values greater than 0
p = geotiff_dm[variable_name].where(geotiff_dm[variable_name] > 0).plot(x='longitude', y='latitude',
                                                                 transform=ccrs.PlateCarree(),
                                                                 subplot_kws={"projection": proj_plot},
                                                                 size=8,
                                                                 col='year', col_wrap=3, robust=True,
                                                                 cmap=lcmap, add_colorbar=True)

# We have to set the map's options on all four axes
for ax,i in zip(p.axes.flat,  geotiff_dm.year.values):
    ax.coastlines()
    ax.set_title('Surface ' + variable_name + '\n' + month_name + ' ' + str(i), fontsize=10)

plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')
if os.path.exists(plot_file + '.bak'):
    os.remove(plot_file + '.bak')
if os.path.exists(plot_file):
    os.rename(plot_file, plot_file + '.bak') 
plt.savefig(plot_file)

Plot one single date¶

fig=plt.figure(figsize=(10,10))
# Define the projection
crs=ccrs.PlateCarree()

# We're using cartopy and are plotting in Orthographic projection 
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.Mercator(central_longitude=central_longitude))
ax.coastlines(resolution='10m')

# custom colormap

lcmap = cmaps.BlueYellowRed

# We need to project our data to the new Mercator projection and for this we use `transform`.
# we set the original data projection in transform (here PlateCarree)
# we only plot values greather than 0
img = geotiff_ds[variable_name].where(geotiff_ds[variable_name] > 0).sel(time='2021-' + month_number + '-15').plot(ax=ax,
                                                                                                 transform=ccrs.PlateCarree(),
                                                                                                 cmap=lcmap)  

# Title for plot
plt.title('Surface ' + variable_name + '\n 15th ' + month_name + ' 2021 over ' + country_fullname,
          fontsize = 16, fontweight = 'bold', pad=10)

plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name +' _' + country_code + '_2021-' + month_number + '-15.png')
if os.path.exists(plot_file + '.bak'):
    os.remove(plot_file + '.bak')
if os.path.exists(plot_file):
    os.rename(plot_file, plot_file + '.bak')  
plt.savefig(plot_file)

geotiff_ds = geotiff_ds.sortby('time')

Save Data Cube selection into netCDF¶

output_file = os.path.join(OUTPUT_DATA_DIR, variable_name + "_" + month_name + "_" + country_code + "_2019-2021.nc")
if os.path.exists(output_file + '.bak'):
    os.remove(output_file + '.bak')
if os.path.exists(output_file):
    os.rename(output_file, output_file + '.bak') 
geotiff_ds.to_netcdf(output_file)

Create Research Object in ROHUB¶

pip install rohub

import os
import pathlib
from rohub import rohub, settings

Authenticating¶

If the code cell below fails, make sure you have created the two files:
- rohub-user: contains your rohub username
- rohub-pwd: contains your rohub password

rohub_user = open(os.path.join(os.environ['HOME'],"rohub-user")).read().rstrip()
rohub_pwd = open(os.path.join(os.environ['HOME'],"rohub-pwd")).read().rstrip()

rohub.login(username=rohub_user, password=rohub_pwd)

Create a new Exectuable RO¶

ro_title =  variable_name + ' (' + month_name + ' 2019, 2020, 2021) in ' + country_fullname + " Jupyter notebook demonstrating the usage of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services"
ro_research_areas = ["Earth sciences"]
ro_description = "This Research Object demonstrates how to use CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services and compute monthly map of " + \
                 variable_name + " over a given geographical area, here " + country_fullname
ro = rohub.ros_create(title=ro_title, research_areas=ro_research_areas, 
                      description=ro_description, 
                      use_template=True,
                      ros_type="Executable Research Object")

Show metadata¶

ro.show_metadata()

Add additional authors and/or contributors to our Research Object¶

ro.set_authors(agents=author_emails)

ro.set_contributors(agents=contributor_emails)

Add publisher/copyright holder¶

Use Research Organization Registry (ROR) to find the identifier of your organization

Add publishers¶

ro.set_publishers(agents=list_publishers)

ro.set_copyright_holders(agents=list_copyright_holders)

organizations = rohub.organizations_find()
organizations

Add RO Funding information¶

if funded_by:
    ro.add_funding(grant_identifier=funded_by["grant_id"], grant_name=funded_by["grant_Name"],
                   funder_name=funded_by["funder_name"], grant_title=funded_by["grant_title"],
                   funder_doi=funded_by["funder_doi"])

Add RO license¶

ro.set_license(license_id=license) 

Aggregate Resources¶

We will be adding all the resources generated by our notebook (data and plots)
Our data and plots can also be shared in B2DROP so we will get the shared link from B2DROP and add it to our research object

List RO folders for this type of RO¶

myfolders = ro.list_folders()
myfolders

Aggregate internal resources¶

Add sketch to my RO¶

res_file_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')
res_res_type = "Sketch"
res_title = variable_long_name + " [" + variable_unit + "] over " + country_fullname + " for " + month_name + "  2019, 2020 and 2021"
res_description = "Monthly average maps of CAMS " + variable_long_name + " [" + variable_unit + "] over " + country_fullname + " in 2019, 2020 and 2021"
res_folder =  'output'

ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)

Add conda environment to my RO¶

def copy_conda_env(local_conda_path, shared_conda_path):
    bkfile = shared_conda_path + '.bak'
    if os.path.exists(bkfile):
        os.remove(bkfile)
    if os.path.exists(shared_conda_path):
        os.rename(shared_conda_path, bkfile)
    shutil.copy2(local_conda_path, shared_conda_path)

import datetime
import shutil

for os_name, conda_filename in zip(['', 'linux-64', 'osx-64'], ['cams-conda.yml', 'conda-lock-linux-64.yml', 'conda-lock-osx-64.yml']):
    local_conda_path = os.path.join('./', conda_filename)
    shared_conda_path = os.path.join(TOOL_DATA_DIR, conda_filename)
    copy_conda_env(local_conda_path, shared_conda_path)
    
    res_file_path = os.path.join(INPUT_DATA_DIR, shared_conda_path)
    res_res_type = "Script"
    res_title = 'Conda environment ' + os_name
    if os_name == "":
        res_description = "Conda environment used on EGI notebook on " + datetime.date.today().strftime("%d/%m/%Y")
    else:
        res_description = "Conda environment generated with conda-lock for " + os_name
    res_folder =  'input'

    ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)

Aggregate external resources¶

Get shared link from datahub¶

Retrieve your client identifier, client secret and refresh token from https://aai.egi.eu/fedcloud/
Create a new file in your HOME area for instance using nano (keep the $ character in front of HOME; this is meant to be an environment variable):

nano $HOME/egi_fedcloud.cfg

Paste your client identifier, client secret and refresh token using the following syntax (do not forget { and } as well as comma , and columns :

{
"id": "XXXXXXXXXXXXXXXXX",
"secret": "YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY",
"token": "ffffffffffffffffffffffff"
}

You are ready to go. If you have issues, check first that the syntax is correct in your egi_fedcloud.cfg, that this file is located in the correct folder and then finally that your refresh token is still valid (check on https://aai.egi.eu/fedcloud/)

import os
import json
import requests

EGI Datahub functions to initialize EGI datahub and get shared link¶

def egi_datahub_init():
    egi_fedcloud_filename = os.path.join(os.environ['HOME'], 'egi_fedcloud.cfg')
    with open(egi_fedcloud_filename, 'r') as convert_file:
        tmp = convert_file.read()
        egi_fedcloud_auth = json.loads(tmp)
        try:
            # Retrieving an OIDC token from Check-in
            data = {
                'client_id': egi_fedcloud_auth['id'],
                'client_secret': egi_fedcloud_auth['secret'],
                'refresh_token': egi_fedcloud_auth['token'],
                'scope': 'openid email profile',
                'grant_type': 'refresh_token'
            }
            response = requests.post('https://aai.egi.eu/oidc/token', data=data, auth=(egi_fedcloud_auth['id'], 
                                                                                       egi_fedcloud_auth['secret']))
            #print(json.dumps(response.json(), indent=2))
            EGItoken = response.json()['access_token']
            headers = {
                'X-Auth-Token': f"egi:" + EGItoken,
                'Content-type': 'application/json',
            }
            # get current timestamp
            ts = datetime.datetime.now().timestamp()
            data = json.dumps({ 
                'name': 'REST and CDMI access token ' + str(ts), 
                'type': { 
                    'accessToken': {} 
                }, 
                'caveats': [ { 
                    'type': 'interface', 
                    'interface': 'rest' 
                }] 
            })

            response = requests.post('https://datahub.egi.eu/api/v3/onezone/user/tokens/named', headers=headers, data=data)
            DATAHUB_TOKEN = response.json()['token']
            return DATAHUB_TOKEN
        except:
            print("EGI Datahub Authentication problem: check your credentials")

def egi_datahub_getlink(datahub_token, filename):
    bname = os.path.basename(filename)
    datahub_remote_prefix = 'https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/files/'
    hname = filename.split('datahub/')[1]
    datahub_location = os.path.join(datahub_remote_prefix, hname)
    print(datahub_location)
    headers = { 'X-Auth-Token': datahub_token }
    response = requests.get(datahub_location, headers=headers)
    dh_fileid = response.json()[0]['id']
    
    headers = { 'X-Auth-Token': datahub_token, 'Content-Type': 'application/json',}
    data = json.dumps({ 'name': bname,
    'fileId': dh_fileid
    })
    response = requests.post('https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/shares', headers=headers, data=data)
    # print(json.dumps(response.json(), indent=2))
    shareIdGenerated=response.json()['shareId']
    
    headers = {'X-Auth-Token': datahub_token}
    response = requests.get('https://cesnet-oneprovider-01.datahub.egi.eu/api/v3/oneprovider/shares/'+shareIdGenerated, headers=headers) 
    # print(json.dumps(response.json(), indent=2))
    publicURL = response.json()['publicUrl']
    return publicURL

EGI DataHub initialization¶

DATAHUB_TOKEN = egi_datahub_init()

Add inputs to my RO¶

I used ADAM to retrieve relevant data but I will be sharing what I retrieved from the data cube so that my collaborators do not have to re-download the same input data again.

Geojson file used for rertieving data from ADAM data-cube¶

shared_input_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
print(shared_input_path)
res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)
res_type = "Dataset"
res_title = "Geojson for " + country_fullname
res_description = "Geojson file used for retrieving data from the ADAM platform over " + country_fullname
res_folder = 'input'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Input data retrieved from ADAM data-cube¶

for year in ['2019', '2020', '2021']:
    shared_input_path = os.path.join(INPUT_DATA_DIR, variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip')
    print(shared_input_path)
    res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)
    res_type = "Data Cube Product"
    res_title = "Data-Cube from ADAM platform over " + country_fullname + " in " + month_name + " " + year
    res_description = "This dataset is a data-Cube retrieved from the ADAM platform over " + country_fullname + " in " + month_name + " " + year
    res_folder ='input'
    ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add our Jupyter Notebook to our RO¶

Make a copy of the current notebook to the tool folder for sharing as an external resource

notebook_filename = 'RELIANCE_' + country_fullname + '_' + variable_name + '_month.ipynb'
local_notebook_path = os.path.join('./', notebook_filename)
shared_notebook_path = os.path.join(TOOL_DATA_DIR, notebook_filename)

shared_notebook_path

Copy current notebook to shared datahub¶

bkfile = shared_notebook_path + '.bak'
if os.path.exists(bkfile):
    os.remove(bkfile)
if os.path.exists(shared_notebook_path):
    os.rename(shared_notebook_path, bkfile)
shutil.copy2(local_notebook_path, shared_notebook_path)

Create a shared link for my Jupyter Notebook¶

Add link to Jupyter notebook in github¶

res_file_url = 'https://raw.githubusercontent.com/NordicESMhub/RELIANCE/main/content/demo-eosc-future/RELIANCE_Spain_NO2_month.ipynb'
res_type = "Jupyter Notebook"
res_title = "Jupyter Notebook (Github) of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services - Applied over " + country_fullname + " and variable " + variable_long_name
res_description = "Jupyter Notebook (stored in Github) for discovering, accessing and processing RELIANCE data cube, and creating a Research Object with results, and finally publish it in Zenodo"
res_folder = 'tool'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add link to online rendered Jupyter notebook¶

res_file_url = 'https://nordicesmhub.github.io/RELIANCE/demo-eosc-future/RELIANCE_Spain_NO2_month.html'
res_type = "Result"
res_title = "online rendered jupyter book of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services - Applied over " + country_fullname + " and variable " + variable_long_name
res_description = "Jupyter Notebook (stored in Github) for discovering, accessing and processing RELIANCE data cube, and creating a Research Object with results, and finally publish it in Zenodo"
res_folder = 'tool'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add first plot as Image to my Research Object (external resource from EGI Datahub)¶

shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')
print(shared_plot_path)
res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)
res_type = "Image"
res_title = variable_long_name + " [" + variable_unit + "] over " + country_fullname + " for " +  month_name + " 2019, 2020 and 2021"
res_description = "Monthly average maps of CAMS " + variable_long_name + " [" + variable_unit + "] over " + country_fullname + " in 2019, 2020 and 2021"
res_folder = 'output'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add second plot as Image to my Research Object (external resource from EGI Datahub)¶

shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + ' _' + country_code + '_2021-' + month_number + '-15.png')
res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)
res_type = "Image"
res_title = variable_long_name + " [" + variable_unit + "] over " + country_fullname + " on " + month_name +  " 15, 2021"
res_description="Daily average maps of CAMS " + variable_long_name + variable_unit + "] over " + country_fullname + " on " + month_name + " 15, 2021"
res_folder = 'output'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add netCDF file corresponding to Data cube selection (external resource from EGI Datahub)¶

shared_plot_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.nc')
res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_plot_path)
res_type="Result"
res_title="netCDF data for daily " + variable_name + "over " + country_fullname + " in " + month_name + " 2019, 2020 and 2021"
res_description="netCDF data corresponding to daily average of CAMS " + variable_long_name + " [" + variable_unit + "] over " + country_fullname + " for " +  month_name + " 2019, " + month_name + " 2020 and " + month_name + " 2021"
res_folder = 'output'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Additional metadata for the RO¶

Add geolocation to my Research Object¶

We need to transform our geojson file into geojson-ld

from geojson_rewind import rewind
import json

geojson_ld_file = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo-ld.json')
bkfile = geojson_ld_file + '.bak'
if os.path.exists(bkfile):
    os.remove(bkfile)
if os.path.exists(geojson_ld_file):
    os.rename(geojson_ld_file, bkfile)
shutil.copy2(os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json'), geojson_ld_file)

with open(geojson_ld_file , 'r+') as f:
    data = json.load(f)
    output = rewind(data)
    output['@context'] = { "geojson": "https://purl.org/geojson/vocab#" } 
    f.seek(0)        
    json.dump(output, f, indent=None)
    f.truncate()

geolocation_file_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
ro.add_geolocation(body_specification_json=geolocation_file_path)

Add tags¶

ro.add_keywords(keywords=[country_fullname, "CAMS", "air quality", "copernicus", variable_name, "jupyter-notebook"])

Export to RO-crate¶

ro.export_to_rocrate(filename="climate_EU-CAMS_" + country_code +  "_" + variable_name + "_ro-crate", use_format="zip")

Take a snapshot of my RO¶

#snapshot_id=ro.snapshot()

Archive and publish to Zenodo, optionally assign DOI¶

snapshot_title="Jupyter Notebook Analysing the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service - Applied over "  + country_fullname + ' (' + month_name + " 2019, 2020, 2021) with " + variable_long_name
#snapshot_id_pub=ro.snapshot(title=snapshot_title, create_doi=True, publication_services=["Zenodo"])
#snapshot_id_pub

Load the published Research Object¶

#published_ro = rohub.ros_load(identifier=snapshot_id)

Show the DOI and get the link¶

#published_ro.show_publication()

Fork and reuse existing RO to create derivative work¶

#fork_id=ro.fork(title="Forked Jupyter Notebook to analyze  the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service")
#forked_ro = rohub.ros_load(identifier=fork_id)
#forked_ro.show_metadata()

ro.delete()

Showcase Jupyter notebooks series

NO2 over Spain with CAMS European air quality analysis using RELIANCE services

Contents