Contents

NO2 over Spain with CAMS European air quality analysis using RELIANCE services

Contents

Layout Layout

Layout


NO2 over Spain with CAMS European air quality analysis using RELIANCE services

Analysis over a particular country and a town in the country of interest

How to discover RELIANCE datacube resources (spatial & temporal search and subsetting), share resources using EGI datahub, and use RoHub to create FAIR digital Objects

This notebook shows how to discover and access the Copernicus Atmosphere Monitoring products available in the RELIANCE datacube resources, by using the functionalities provided in the Adam API . The process is structured in 7 steps, including example of data analysis and visualization with the Python libraries installed in the Jupyter environment as well as the creation of a FAIR digital object on RoHUB where all the resources used and generated in this notebook are aggregated.

You can customize this Jupyter notebook, for instance by updating the content of Data Management section.

Step 1: Data Management

Authors

  • Make sure you first register to RoHub at https://reliance.rohub.org/.

  • We recommend you use your ORCID identifier to login and register to EOSC services.

  • In the list of authors, add any co-authors using the email address they used when they registered in RoHub.

author_emails = ['annefou@geo.uio.no']
contributor_emails = ['jeani@uio.no', 'mantovani@meeo.it']

Add the University of Olso and the Nordic e-Infrastructure Collaboration as publishers

UiO_organization = {"org_id":"http://www.uio.no/english/", 
                     "display_name": "University of Oslo", 
                     "agent_type": "organization",
                     "ror_identifier":"01xtthb56",
                     "organization_url": "http://www.uio.no/english/"}
NeIC_organization = {"org_id":"https://neic.no/",
                    "display_name": "Nordic e-Infrastructure Collaboration", 
                     "agent_type": "organization",
                    "ror_identifier":"04jcwf484",
                    "organization_url": "https://neic.no/"}
list_publishers = [UiO_organization, NeIC_organization]
list_copyright_holders = [UiO_organization]

Add the funding

  • if your work is not funded set

funded_by = {}
funded_by = {
"grant_id": "101017502",
"grant_Name": "RELIANCE",
"grant_title": "Research Lifecycle Management for Earth Science Communities and Copernicus Users",
"funder_name": "European Commission",
"funder_doi": "10.13039/501100000781",
}

Choose a license for your FAIR digital object

pip install rohub
import rohub
licenses = rohub.list_available_licenses()
# Update line below to print more licenses
licenses[0:5]
license = 'MIT'

Organize my data using EGI datahub

  • Define a prefix for my project (you may need to adjust it for your own usage on your infrastructure).

    • input folder where all the data used as input to my Jupyter Notebook is stored (and eventually shared)

    • output folder where all the results to keep are stored

    • tool folder where all the tools, including this Jupyter Notebook will be copied for sharing

  • Create all corresponding folders

Import Python packages

import os
import warnings
import pathlib
warnings.filterwarnings('ignore')

Initialization

  • Choose a country and add its name and country code

  • Choose the variable to analyze (PM10, PM25, NO2, O3, etc.)

  • Choose the area for your analysis

Choose the country of interest

country_code = 'ES' 
country_fullname = "Spain"
town_fullname = 'Madrid' 
town_coordinates = {'latitude': 40.4168, 'longitude': 3.7038}
variable_name = 'NO2'
variable_unit = 'µg m-3'
variable_long_name = 'Nitrogen Dioxide'
month_name = 'April'
month_number = '04'
month_nb_days = '30'

Geojson for selecting data from ADAM

  • The geometry field is extracted from a GeoJSON file, retrieving the value of the “feature” element.

  • To create a geojson file for the area of interest, you can use https://geojson.io/

  • Then paste the result below in the geojson variable

geojson = """{"type": "FeatureCollection","features": [{"type": "Feature","properties": {},"geometry": {"type": "Polygon","coordinates": [[[3.05419921875,42.601619944327965],[-1.69189453125,43.46886761482925],[-8.10791015625,43.866218006556394],[-9.60205078125,43.03677585761058],[-9.11865234375,42.24478535602799],[-9.03076171875,40.245991504199026],[-9.580078125,39.07890809706475],[-9.73388671875,38.70265930723801],[-9.25048828125,38.30718056188316],[-8.942871093749998,38.25543637637947],[-9.052734375,37.142803443716836],[-9.29443359375,36.79169061907076],[-8.10791015625,36.89719446989036],[-7.778320312499999,36.79169061907076],[-7.27294921875,37.07271048132943],[-6.78955078125,36.86204269508728],[-6.17431640625,36.06686213257888],[-5.69091796875,35.90684930677121],[-5.09765625,36.08462129606931],[-4.74609375,36.33282808737917],[-4.10888671875,36.59788913307022],[-3.09814453125,36.54494944148322],[-2.43896484375,36.56260003738545],[-2.04345703125,36.63316209558658],[-1.69189453125,37.16031654673677],[-1.34033203125,37.43997405227057],[-0.439453125,37.49229399862877],[-0.59326171875,37.75334401310656],[-0.37353515625,38.272688535980976],[0.263671875,38.59970036588819],[0.3955078125,38.839707613545144],[0.06591796875,38.94232097947902],[-0.17578125,39.2832938689385],[-0.19775390625,39.58875727696545],[0.24169921874999997,39.977120098439634],[0.68115234375,40.463666324587685],[1.07666015625,40.83043687764923],[1.58203125,41.062786068733026],[2.2412109375,41.178653972331674],[2.83447265625,41.541477666790286],[3.33984375,41.73852846935917],[3.3618164062499996,42.13082130188811],[3.05419921875,42.601619944327965]]]}}]}"""

Create folders

WORKDIR_FOLDER = os.path.join(os.environ['HOME'], "datahub/Reliance/Climate" + '_' + country_code + '_' + variable_name + '_' + month_name)
print("WORKDIR FOLDER: ", WORKDIR_FOLDER)
INPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'input')
OUTPUT_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'output')
TOOL_DATA_DIR = os.path.join(WORKDIR_FOLDER, 'tool')

list_folders = [INPUT_DATA_DIR, OUTPUT_DATA_DIR, TOOL_DATA_DIR]

for folder in list_folders:
    pathlib.Path(folder).mkdir(parents=True, exist_ok=True)

Geojson file for selecting data from ADAM

  • We dissolve geojson in case we have more than one polygon and then save the results into a geojson file

import cartopy
import geopandas as gpd
local_path_geom = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
local_path_geom
if (pathlib.Path(local_path_geom).exists()):
    os.remove(local_path_geom)
f = open(local_path_geom, "w")
f.write(geojson)
f.close()
data = gpd.read_file(local_path_geom)
single_shape = data.dissolve()

Show area of interest

single_shape.plot()
if (pathlib.Path(local_path_geom).exists()):
    os.remove(local_path_geom)
single_shape.to_file(local_path_geom, driver='GeoJSON')

Step 2: Authentication

The following lines of code will show the personal Adam API-Key of the user and the endpoint currently in use, that provides access to the products in the related catalogue. At the end of the execution, if the authentication process is successfull the personal token and the expiration time should be returned as outputs.

pip install adamapi
adam_key = open(os.path.join(os.environ['HOME'],"adam-key")).read().rstrip()
import adamapi as adam
a = adam.Auth()

a.setKey(adam_key)
a.setAdamCore('https://reliance.adamplatform.eu')
a.authorize() 

Step 3: Datasets Discovery

After authorization, the user can browse the whole catalogue, structured as a JSON object after a pagination process, displaying all the available datasets. This operation can be executed with the getDatasets() function without including any argument. Some lines of code should be added to parse the Json object and extract the names of the datasets.The Json object can be handled as a Python dictionary.

Pre-filter datasets

We will discover all the available datasets in the ADAM platform but will only print elements of interest EU_CAMS e.g. European air quality datasets from Copernicus Atmosphere Monitoring Service

def list_datasets(a, search="", dataset_name=""):
    datasets = adam.Datasets(a)
    catalogue = datasets.getDatasets()
    datasetID = None

# Extracting the size of the catalogue

    total = catalogue['properties']['totalResults']
    items = catalogue['properties']['itemsPerPage']
    pages = total // items
    
    print('\033[1;34m')
    print('----------------------------------------------------------------------')
    print( 'List of available datasets:')
    print ('\033[0;0m')

# Extracting the list of datasets across the whole catalogue

    for i in range(0, pages):
        page = datasets.getDatasets(page=i)
        for element in page['content']:
            if search == "" or search in element['title']:
                print(element['title'] + " --> datasetId = " + element['datasetId'])
                if element['datasetId'].split(':')[1] == dataset_name:
                    datasetID = element['datasetId']
    return datasets, datasetID
datasets, datasetID = list_datasets(a, search="CAMS", dataset_name = 'EU_CAMS_SURFACE_' + variable_name + '_G')

We are interested by one variable only so we will discover the corresponding dataset and print its metadata, showing the data provenance.

def get_metadata(datasetID, datasets, verbose=False):
    print('\033[1;34m' + 'Metadata of ' + datasetID + ':')
    print ('\033[0;0m')
    
    paged = datasets.getDatasets(datasetID)
    for i in paged.items():
        print("\033[1m" +  str(i[0]) + "\033[0m" + ': ' + str(i[1]))
    return paged
metadata_variable = get_metadata(datasetID, datasets, verbose=True)

Step 4: Products Discovery

The products discovery operation related to a specific dataset is implemented in the Adam API with the getProducts() function. A combined spatial and temporal search can be requested by specifying the datasetId for the selected dataset, the geometry argument that specifies the Area Of Interest, and a temporal range defined by startDate and endDate . The geometry must always be defined by a GeoJson object that describes the polygon in the counterclockwise winding order. The optional arguments startIndex and maxRecords can set the list of the results returned as an output. The results of the search are displayed with their metadata and they are sorted starting from the most recent product.

Search data

pip install geojson_rewind
from geojson_rewind import rewind
import json

The GeoJson object needs to be rearranged according to the counterclockwise winding order. This operation is executed in the next few lines to obtain a geometry that meets the requirements of the method. Geom_1 is the final result to be used in the discovery operation.

with open(local_path_geom) as f:
    geom_dict = json.load(f)
output = rewind(geom_dict)    
geom_1 = str(geom_dict['features'][0]['geometry'])

Copernicus air quality analyses are hourly product but when we select a given date, we will only get the first 10 products. Below, we make a list of the first 10 available products for the 1st day of the studied month in 2020 e.g. we restrict our search to this date.

start_date = '2019-' + month_number + '-01'
end_date = start_date
search = adam.Search( a )
results = search.getProducts(
    datasetID, 
    geometry=geom_1,
    startDate=start_date,
    endDate=end_date
 )

# Printing the results

print('\033[1;34m' + 'List of available products (maximum 10 products printed):')
print ('\033[0;0m')

count = 1
for i in results['content']:
        print("\033[1;31;1m" + "#" + str(count))
        print ('\033[0m')
        for k in i.items():
            print(str(k[0]) + ': ' + str(k[1]))
        count = count+1
        print('------------------------------------')

Step 5: Data Access

After the data discovery operation that retrieves the availability of products in the catalogue, it is possible to access the data with the getData function. Each product in the output list intersects the selected geometry and the following example shows how to access a specific product from the list of results obtained in the previous step. While the datasetId is always a mandatory parameter, for each data access request the getData function needs only one of the following arguments: geometry or productId , that is the value of the _id field in each product metadata. In the case of a spatial and temporal search the geometry must be provided to the function, together with the time range of interest. The output of the getData function is always a .zip file containing the data retrieved with the data access request, providing the spatial subset of the product. The zip file will contain a geotiff file for each of the spatial subsets extracted in the selected time range.

Define a function to select a time range and get data

def getZipData(auth, dataset_info):
    if not (pathlib.Path(pathlib.Path(dataset_info['outputFname']).stem).exists() or pathlib.Path(dataset_info['outputFname']).exists()):
        data=adam.GetData(auth)
        image = data.getData(
        datasetId = dataset_info['datasetID'],
        startDate = dataset_info['startDate'],
        endDate = dataset_info['endDate'],
        geometry = dataset_info['geometry'],
        outputFname = dataset_info['outputFname'])
        print(image)

Get variable of interest for each day of the month we study for 2019, 2020 and 2021 (time 00:00:00)

This process can take a bit of time so be patient!

import time
from IPython.display import clear_output
start = time.time()

for year in ['2019', '2020', '2021']:
    datasetInfo = {
    'datasetID' : datasetID,
    'startDate' : year + '-' + month_number + '-01',
    'endDate' : year + '-' + month_number + '-' + month_nb_days,
    'geometry' : geom_1,
    'outputFname' : INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'
    }
    getZipData(a, datasetInfo)
    
end = time.time()
clear_output(wait=True)
delta1 = end - start
print('\033[1m'+'Processing time: ' + str(round(delta1,2)))

Step 6: Data Analysis and Visualization

The data retrieved via the Adam API is now available as a zip file that must be unzipped to directly handle the data in a geotiff format. Then with the Python packages provided in the Jupyter environment it is possible to process and visualized the requested product.

Unzip data

import zipfile
def unzipData(filename, out_prefix):
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall(path = os.path.join(out_prefix, pathlib.Path(filename).stem))
for year in ['2019', '2020', '2021']:
    filename = INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip'
    target_file = pathlib.Path(os.path.join(INPUT_DATA_DIR, pathlib.Path(pathlib.Path(filename).stem)))
    if not target_file.exists():
        unzipData(filename, INPUT_DATA_DIR)

Read data and make a monthly average

import xarray as xr
import xesmf as xe
import glob

We need to regrid data.

def read_file(filename, variable, metadata, factor=1):
    tmp = xr.open_rasterio(filename, parse_coordinates=True)
    # Convert our xarray.DataArray into a xarray.Dataset
    tmp = tmp.to_dataset('band')*factor
    # Rename the dimensions to make it CF-convention compliant
    tmp = tmp.rename_dims({'y': 'latitude', 'x':'longitude'})
    # Rename the variable to a more useful name
    tmp = tmp.rename_vars({1: variable, 'y':'latitude', 'x':'longitude'})
    tmp[variable].attrs = {'units' : metadata['units'], 'long_name' : metadata['description']}
    return tmp
output_grid = read_file(INPUT_DATA_DIR + '/' + variable_name + '_' + country_code + '_ADAMAPI_2021/eu_cams_surface_' + variable_name.lower() + '_g_2021-' + month_number + '-' + month_nb_days + 't000000.tif', variable_name, metadata_variable)
output_grid

We now read these files using xarray. First, we make a list of all the geotiff files in a given folder. To ensure each raster is labelled correctly with its time, we can use a helper function paths_to_datetimeindex() to extract time information from the file paths we obtained above. We then load and concatenate each dataset along the time dimension using xarray.open_rasterio(), convert the resulting xarray.DataArray to a xarray.Dataset, and give the variable a more useful name (PM10)

from datetime import datetime
def paths_to_datetimeindex(paths):
    return  [datetime.strptime(date.split('_')[-1].split('.')[0], '%Y-%m-%dt%f') for date in paths]
def getData(dirtif, variable, metadata, factor=1, grid_out=None):
    geotiff_list = glob.glob(dirtif)
    # Create variable used for time axis
    time_var = xr.Variable('time', paths_to_datetimeindex(geotiff_list))
    # Load in and concatenate all individual GeoTIFFs
    xarray_list = []
    if grid_out is not None:
        nlats = len(grid_out.latitude.values)
        nlons = len(grid_out.longitude.values)
    for i in geotiff_list:
        tmp = read_file(i, variable, metadata, factor=factor)
        if grid_out is not None:
            print("regridding ", i)
            regridder = xe.Regridder(tmp, grid_out, 'conservative')
            tmp_regrid = regridder(tmp, keep_attrs=True)
            xarray_list.append(tmp_regrid)
        else:
            xarray_list.append(tmp)
    #print(xarray_list[0:2])
    geotiffs_da = xr.concat(xarray_list, dim=time_var)
    return geotiffs_da
geotiff_ds = getData( INPUT_DATA_DIR + '/' + variable_name + '_'+ country_code + '_ADAMAPI_20*/*.tif', variable_name, metadata_variable, factor=1.e9, grid_out=output_grid)
geotiff_ds[variable_name].attrs = {'units' : variable_unit, 'long_name' : variable_long_name }
geotiff_ds

Analyze data

Make yearly average for the month we study

geotiff_dm = geotiff_ds.groupby('time.year').mean('time', keep_attrs=True)
geotiff_dm

Visualize data

pip install cmaps "holoviews<1.14.8" GeoViews cartopy
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import cmaps
# To plot over Norway, taking a central longitude of 60 is fine. You may want to change it when plotting over different geographical areas
central_longitude = town_coordinates['latitude']
# generate figure
proj_plot = ccrs.Mercator(central_longitude=central_longitude)

lcmap = cmaps.BlueYellowRed
# Only plot values greater than 0
p = geotiff_dm[variable_name].where(geotiff_dm[variable_name] > 0).plot(x='longitude', y='latitude',
                                                                 transform=ccrs.PlateCarree(),
                                                                 subplot_kws={"projection": proj_plot},
                                                                 size=8,
                                                                 col='year', col_wrap=3, robust=True,
                                                                 cmap=lcmap, add_colorbar=True)

# We have to set the map's options on all four axes
for ax,i in zip(p.axes.flat,  geotiff_dm.year.values):
    ax.coastlines()
    ax.set_title('Surface ' + variable_name + '\n' + month_name + ' ' + str(i), fontsize=10)

plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')
if os.path.exists(plot_file + '.bak'):
    os.remove(plot_file + '.bak')
if os.path.exists(plot_file):
    os.rename(plot_file, plot_file + '.bak') 
plt.savefig(plot_file)

Plot one single date

fig=plt.figure(figsize=(10,10))
# Define the projection
crs=ccrs.PlateCarree()

# We're using cartopy and are plotting in Orthographic projection 
# (see documentation on cartopy)
ax = plt.subplot(1, 1, 1, projection=ccrs.Mercator(central_longitude=central_longitude))
ax.coastlines(resolution='10m')

# custom colormap

lcmap = cmaps.BlueYellowRed

# We need to project our data to the new Mercator projection and for this we use `transform`.
# we set the original data projection in transform (here PlateCarree)
# we only plot values greather than 0
img = geotiff_ds[variable_name].where(geotiff_ds[variable_name] > 0).sel(time='2021-' + month_number + '-15').plot(ax=ax,
                                                                                                 transform=ccrs.PlateCarree(),
                                                                                                 cmap=lcmap)  

# Title for plot
plt.title('Surface ' + variable_name + '\n 15th ' + month_name + ' 2021 over ' + country_fullname,
          fontsize = 16, fontweight = 'bold', pad=10)

plot_file = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name +' _' + country_code + '_2021-' + month_number + '-15.png')
if os.path.exists(plot_file + '.bak'):
    os.remove(plot_file + '.bak')
if os.path.exists(plot_file):
    os.rename(plot_file, plot_file + '.bak')  
plt.savefig(plot_file)
geotiff_ds = geotiff_ds.sortby('time')

Save Data Cube selection into netCDF

output_file = os.path.join(OUTPUT_DATA_DIR, variable_name + "_" + month_name + "_" + country_code + "_2019-2021.nc")
if os.path.exists(output_file + '.bak'):
    os.remove(output_file + '.bak')
if os.path.exists(output_file):
    os.rename(output_file, output_file + '.bak') 
geotiff_ds.to_netcdf(output_file)

Step 7: Create Research Object and Share my work

Create Research Object in ROHUB

pip install rohub
import os
import pathlib
from rohub import rohub, settings

Authenticating

  • If the code cell below fails, make sure you have created the two files:

    • rohub-user: contains your rohub username

    • rohub-pwd: contains your rohub password

rohub_user = open(os.path.join(os.environ['HOME'],"rohub-user")).read().rstrip()
rohub_pwd = open(os.path.join(os.environ['HOME'],"rohub-pwd")).read().rstrip()
rohub.login(username=rohub_user, password=rohub_pwd)

Create a new Exectuable RO

ro_title =  variable_name + ' (' + month_name + ' 2019, 2020, 2021) in ' + country_fullname + " Jupyter notebook demonstrating the usage of CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services"
ro_research_areas = ["Earth sciences"]
ro_description = "This Research Object demonstrates how to use CAMS European air quality analysis from Copernicus Atmosphere Monitoring with RELIANCE services and compute monthly map of " + \
                 variable_name + " over a given geographical area, here " + country_fullname
ro = rohub.ros_create(title=ro_title, research_areas=ro_research_areas, 
                      description=ro_description, 
                      use_template=True,
                      ros_type="Executable Research Object")

Show metadata

ro.show_metadata()

Add additional authors and/or contributors to our Research Object

ro.set_authors(agents=author_emails)
ro.set_contributors(agents=contributor_emails)

Add RO Funding information

if funded_by:
    ro.add_funding(grant_identifier=funded_by["grant_id"], grant_name=funded_by["grant_Name"],
                   funder_name=funded_by["funder_name"], grant_title=funded_by["grant_title"],
                   funder_doi=funded_by["funder_doi"])

Add RO license

ro.set_license(license_id=license) 

Aggregate Resources

  • We will be adding all the resources generated by our notebook (data and plots)

  • Our data and plots can also be shared in B2DROP so we will get the shared link from B2DROP and add it to our research object

List RO folders for this type of RO

myfolders = ro.list_folders()
myfolders

Aggregate internal resources

Add sketch to my RO

res_file_path = os.path.join(OUTPUT_DATA_DIR, variable_name + '_' + month_name + '_' + country_code + '_2019-2021.png')
res_res_type = "Sketch"
res_title = variable_long_name + " [" + variable_unit + "] over " + country_fullname + " for " + month_name + "  2019, 2020 and 2021"
res_description = "Monthly average maps of CAMS " + variable_long_name + " [" + variable_unit + "] over " + country_fullname + " in 2019, 2020 and 2021"
res_folder =  'output'

ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)

Add conda environment to my RO

def copy_conda_env(local_conda_path, shared_conda_path):
    bkfile = shared_conda_path + '.bak'
    if os.path.exists(bkfile):
        os.remove(bkfile)
    if os.path.exists(shared_conda_path):
        os.rename(shared_conda_path, bkfile)
    shutil.copy2(local_conda_path, shared_conda_path)
import datetime
import shutil
for os_name, conda_filename in zip(['', 'linux-64', 'osx-64'], ['cams-conda.yml', 'conda-lock-linux-64.yml', 'conda-lock-osx-64.yml']):
    local_conda_path = os.path.join('./', conda_filename)
    shared_conda_path = os.path.join(TOOL_DATA_DIR, conda_filename)
    copy_conda_env(local_conda_path, shared_conda_path)
    
    res_file_path = os.path.join(INPUT_DATA_DIR, shared_conda_path)
    res_res_type = "Script"
    res_title = 'Conda environment ' + os_name
    if os_name == "":
        res_description = "Conda environment used on EGI notebook on " + datetime.date.today().strftime("%d/%m/%Y")
    else:
        res_description = "Conda environment generated with conda-lock for " + os_name
    res_folder =  'input'

    ro.add_internal_resource(res_type=res_res_type, file_path=res_file_path, title=res_title, description=res_description, folder=res_folder)

Aggregate external resources

EGI DataHub initialization

DATAHUB_TOKEN = egi_datahub_init()

Add inputs to my RO

  • I used ADAM to retrieve relevant data but I will be sharing what I retrieved from the data cube so that my collaborators do not have to re-download the same input data again.

Geojson file used for rertieving data from ADAM data-cube

shared_input_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
print(shared_input_path)
res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)
res_type = "Dataset"
res_title = "Geojson for " + country_fullname
res_description = "Geojson file used for retrieving data from the ADAM platform over " + country_fullname
res_folder = 'input'
ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Input data retrieved from ADAM data-cube

for year in ['2019', '2020', '2021']:
    shared_input_path = os.path.join(INPUT_DATA_DIR, variable_name + '_' + country_code + '_ADAMAPI_' + year + '.zip')
    print(shared_input_path)
    res_file_url = egi_datahub_getlink(DATAHUB_TOKEN, shared_input_path)
    res_type = "Data Cube Product"
    res_title = "Data-Cube from ADAM platform over " + country_fullname + " in " + month_name + " " + year
    res_description = "This dataset is a data-Cube retrieved from the ADAM platform over " + country_fullname + " in " + month_name + " " + year
    res_folder ='input'
    ro.add_external_resource(res_type=res_type, input_url=res_file_url, title=res_title, description=res_description, folder=res_folder)

Add our Jupyter Notebook to our RO

  • Make a copy of the current notebook to the tool folder for sharing as an external resource

notebook_filename = 'RELIANCE_' + country_fullname + '_' + variable_name + '_month.ipynb'
local_notebook_path = os.path.join('./', notebook_filename)
shared_notebook_path = os.path.join(TOOL_DATA_DIR, notebook_filename)
shared_notebook_path

Copy current notebook to shared datahub

bkfile = shared_notebook_path + '.bak'
if os.path.exists(bkfile):
    os.remove(bkfile)
if os.path.exists(shared_notebook_path):
    os.rename(shared_notebook_path, bkfile)
shutil.copy2(local_notebook_path, shared_notebook_path)

Additional metadata for the RO

Add geolocation to my Research Object

  • We need to transform our geojson file into geojson-ld

from geojson_rewind import rewind
import json
geojson_ld_file = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo-ld.json')
bkfile = geojson_ld_file + '.bak'
if os.path.exists(bkfile):
    os.remove(bkfile)
if os.path.exists(geojson_ld_file):
    os.rename(geojson_ld_file, bkfile)
shutil.copy2(os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json'), geojson_ld_file)

with open(geojson_ld_file , 'r+') as f:
    data = json.load(f)
    output = rewind(data)
    output['@context'] = { "geojson": "https://purl.org/geojson/vocab#" } 
    f.seek(0)        
    json.dump(output, f, indent=None)
    f.truncate()
geolocation_file_path = os.path.join(INPUT_DATA_DIR, country_code.lower() + '.geo.json')
ro.add_geolocation(body_specification_json=geolocation_file_path)

Add tags

ro.add_keywords(keywords=[country_fullname, "CAMS", "air quality", "copernicus", variable_name, "jupyter-notebook"])

Export to RO-crate

ro.export_to_rocrate(filename="climate_EU-CAMS_" + country_code +  "_" + variable_name + "_ro-crate", use_format="zip")

Take a snapshot of my RO

#snapshot_id=ro.snapshot()

Archive and publish to Zenodo, optionally assign DOI

snapshot_title="Jupyter Notebook Analysing the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service - Applied over "  + country_fullname + ' (' + month_name + " 2019, 2020, 2021) with " + variable_long_name
#snapshot_id_pub=ro.snapshot(title=snapshot_title, create_doi=True, publication_services=["Zenodo"])
#snapshot_id_pub

Load the published Research Object

#published_ro = rohub.ros_load(identifier=snapshot_id)

Fork and reuse existing RO to create derivative work

#fork_id=ro.fork(title="Forked Jupyter Notebook to analyze  the Air quality during Covid-19 pandemic using Copernicus Atmosphere Monitoring Service")
#forked_ro = rohub.ros_load(identifier=fork_id)
#forked_ro.show_metadata()
ro.delete()