Using pydap and pandas to read EBAS data

See more at http://ebas.nilu.no/

The EBAS database collects observational data on atmospheric chemical composition and physical properties from a variety of national and international research projects and monitoring programs, such as ACTRIS, AMAP, EMEP, GAW and HELCOM, as well as for the Norwegian monitoring programs funded by the Norwegian Environment Agency, the Ministry of Climate and Environment and NILU – Norwegian Institute for Air Research.

Import Python packages

from pydap.client import open_dods, open_url
from netCDF4 import num2date
import pandas as pd
import cftime
import matplotlib.pyplot as plt

Get data directly from EBAS database

  • syntax: ST_STATION_CODE.FT_TYPE.RE_REGIME_CODE.MA_MATRIX_NAME.CO_COMP_NAME.DS_RESCODE.FI_REF.ME_REF.DL_DATA_LEVEL.dods

  • or: #station.instrument_type.IMG.matrix.component.resolution.instrument_reference.datalevel.dods

  • if no level, then ..dods

  • if doesn’t work, download one file to check what are the FI_REF and ME_REF.

  • other example: http://dev-ebas-pydap.nilu.no/NO0042G.Hg_mon.IMG.air.mercury.1h.NO01L_tekran_42_dup.NO01L_afs..dods

# get directly from EBAS
ds = open_dods(
'http://dev-ebas-pydap.nilu.no/' 
'NO0042G.dmps.IMG.aerosol.particle_number_size_distribution'
'.1h.NO01L_NILU_DMPSmodel2_ZEP.NO01L_dmps_DMPS_ZEP01.2.dods')

Format data into pandas dataframe

#get the actual data
dmps_data = ds['particle_number_size_distribution_amean']

# get normalised size distribution in dNdlogDp
dNdlogDp = dmps_data.particle_number_size_distribution_amean.data

# get time in datatime format using function from netCDF4 package
tim_dmps = num2date(dmps_data.time.data,units='days since 1900-01-01 00:00:00',
calendar ='gregorian')

# get diameter vector
dp_NILU = dmps_data.D.data

# make DataFrame to simplify the handling of data
df_NILU = pd.DataFrame(dNdlogDp.byteswap().newbyteorder(), index=dp_NILU, columns=tim_dmps)
df_NILU.head()
2016-05-06 05:30:00 2016-05-06 06:30:00 2016-05-06 07:30:00 2016-05-06 08:30:00 2016-05-06 09:30:00 2016-05-06 10:30:00 2016-05-06 11:30:00 2016-05-06 12:30:00 2016-05-06 13:30:00 2016-05-06 14:30:00 ... 2017-12-31 14:30:00 2017-12-31 15:30:00 2017-12-31 16:30:00 2017-12-31 17:30:00 2017-12-31 18:30:00 2017-12-31 19:30:00 2017-12-31 20:30:00 2017-12-31 21:30:00 2017-12-31 22:30:00 2017-12-31 23:30:00
10.0 20.16 27.86 33.89 38.96 32.82 44.25 295.19 697.74 840.26 946.78 ... 8.52 8.29 13.86 8.18 9.96 8.41 6.92 9.81 10.94 6.79
12.0 41.16 37.93 46.07 53.32 50.76 66.31 169.43 543.82 1132.41 1570.59 ... 8.50 9.41 7.67 7.36 6.70 7.47 7.53 7.19 6.92 7.05
14.0 59.79 48.38 57.63 64.44 68.81 92.40 100.44 330.05 889.54 1576.16 ... 10.86 11.29 5.13 8.20 5.69 6.98 8.08 5.97 4.80 10.96
17.0 52.01 45.14 56.40 62.01 73.59 102.47 79.43 117.88 266.77 772.37 ... 15.88 13.76 8.77 9.54 8.48 7.13 7.99 6.12 5.17 12.47
21.0 21.98 28.03 37.41 44.52 54.98 73.01 87.58 115.21 152.36 243.23 ... 23.46 17.98 23.14 11.82 15.61 9.80 8.35 9.21 8.28 7.09

5 rows × 14515 columns

Select time and Plot

df_NILU.columns
Index([2016-05-06 05:30:00, 2016-05-06 06:30:00, 2016-05-06 07:30:00,
       2016-05-06 08:30:00, 2016-05-06 09:30:00, 2016-05-06 10:30:00,
       2016-05-06 11:30:00, 2016-05-06 12:30:00, 2016-05-06 13:30:00,
       2016-05-06 14:30:00,
       ...
       2017-12-31 14:30:00, 2017-12-31 15:30:00, 2017-12-31 16:30:00,
       2017-12-31 17:30:00, 2017-12-31 18:30:00, 2017-12-31 19:30:00,
       2017-12-31 20:30:00, 2017-12-31 21:30:00, 2017-12-31 22:30:00,
       2017-12-31 23:30:00],
      dtype='object', length=14515)
type(df_NILU.columns[0])
cftime._cftime.DatetimeGregorian
tsel = cftime.DatetimeGregorian(2016, 5, 6, 5, 30)
fig = plt.figure(1, figsize=[15,5])
ax = df_NILU[tsel].plot(lw=2, colormap='coolwarm', marker='.', markersize=10, title='Aerosol particle number size distribution', fontsize=15)
ax.set_xlabel("particle size", fontsize=15)
ax.set_ylabel("Number of particles", fontsize=15)
ax.title.set_size(20)
../../_images/from_ebas_using_pydap_12_0.png

Save into a local csv file

filename = 'size_dist' + tsel.strftime('%d%B%Y_%H%M') + '.csv'
print(filename)
size_dist06May2016_0530.csv
df_NILU[tsel].to_csv(filename, sep='\t', index=True, header=True)

Save into Galaxy history

  • You can also use bioblend python package to directly interact with your galaxy history

!put -p size_dist06May2016_0530.csv -t tabular

Save to NIRD via s3fs

  • Make sure you have your credential in $HOME/.aws/credentials

import s3fs

Set the path on NIRD (and add “s3:/” in front of it)

s3_path = "s3://work/" + filename
print(s3_path)
s3://work/size_dist06May2016_0530.csv
fsg = s3fs.S3FileSystem(anon=False,
      client_kwargs={
         'endpoint_url': 'https://forces2021.uiogeo-apps.sigma2.no/'
      })
bytes_to_write = df_NILU[tsel].to_csv(None, sep='\t', index=True, header=True).encode()

with fsg.open(s3_path, 'wb') as f:
    f.write(bytes_to_write)

Check your file

dfo = pd.read_csv(fsg.open(s3_path), sep='\t', index_col=0)
dfo.head()
2016-05-06 05:30:00
10.0 20.16
12.0 41.16
14.0 59.79
17.0 52.01
21.0 21.98
ax = dfo.plot(lw=2, color='red', marker='.', markersize=10, title='Aerosol particle number size distribution', fontsize=15)
ax.set_xlabel("particle size", fontsize=15)
ax.set_ylabel("Number of particles", fontsize=15)
ax.title.set_size(14)
../../_images/from_ebas_using_pydap_27_0.png