Rolling mean with CMIP6

Import Python packages

import s3fs
import xarray as xr
import intake
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np

Open CMIP6 online catalog

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col

pangeo-cmip6 catalog with 7632 dataset(s) from 517667 asset(s):

unique
activity_id 18
institution_id 36
source_id 88
experiment_id 170
member_id 657
table_id 37
variable_id 709
grid_label 10
zstore 517667
dcpp_init_year 60
version 715

Search for data

cat = col.search(source_id=['CESM2-WACCM'], experiment_id=['historical'], table_id=['AERmon'], variable_id=['so2'], member_id=['r1i1p1f1'])
cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP NCAR CESM2-WACCM historical r1i1p1f1 AERmon so2 gn gs://cmip6/CMIP6/CMIP/NCAR/CESM2-WACCM/histori... NaN 20190227

Create a dictionary from the list of dataset

dset_dict = cat.to_dataset_dict(zarr_kwargs={'use_cftime':True})
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [1/1 00:00<00:00]
list(dset_dict.keys())
['CMIP.NCAR.CESM2-WACCM.historical.AERmon.gn']

Open dataset

dset = dset_dict['CMIP.NCAR.CESM2-WACCM.historical.AERmon.gn']
dset
<xarray.Dataset>
Dimensions:    (lat: 192, lev: 70, lon: 288, member_id: 1, nbnd: 2, time: 1980)
Coordinates:
  * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
    lat_bnds   (lat, nbnd) float32 dask.array<chunksize=(192, 2), meta=np.ndarray>
  * lev        (lev) float64 -5.96e-06 -9.827e-06 -1.62e-05 ... -976.3 -992.6
    lev_bnds   (lev, nbnd) float32 dask.array<chunksize=(70, 2), meta=np.ndarray>
  * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
    lon_bnds   (lon, nbnd) float32 dask.array<chunksize=(288, 2), meta=np.ndarray>
  * time       (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
    time_bnds  (time, nbnd) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
  * member_id  (member_id) <U8 'r1i1p1f1'
Dimensions without coordinates: nbnd
Data variables:
    so2        (member_id, time, lev, lat, lon) float32 dask.array<chunksize=(1, 5, 70, 192, 288), meta=np.ndarray>
Attributes: (12/48)
    Conventions:             CF-1.7 CMIP-6.2
    activity_id:             CMIP
    branch_method:           standard
    branch_time_in_child:    674885.0
    branch_time_in_parent:   20075.0
    case_id:                 4
    ...                      ...
    variable_id:             so2
    variant_info:            CMIP6 CESM2 hindcast (1850-2014) with high-top a...
    variant_label:           r1i1p1f1
    status:                  2019-11-05;created;by nhn2@columbia.edu
    intake_esm_varname:      ['so2']
    intake_esm_dataset_key:  CMIP.NCAR.CESM2-WACCM.historical.AERmon.gn
dset.so2
<xarray.DataArray 'so2' (member_id: 1, time: 1980, lev: 70, lat: 192, lon: 288)>
dask.array<broadcast_to, shape=(1, 1980, 70, 192, 288), dtype=float32, chunksize=(1, 5, 70, 192, 288), chunktype=numpy.ndarray>
Coordinates:
  * lat        (lat) float64 -90.0 -89.06 -88.12 -87.17 ... 88.12 89.06 90.0
  * lev        (lev) float64 -5.96e-06 -9.827e-06 -1.62e-05 ... -976.3 -992.6
  * lon        (lon) float64 0.0 1.25 2.5 3.75 5.0 ... 355.0 356.2 357.5 358.8
  * time       (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
  * member_id  (member_id) <U8 'r1i1p1f1'
Attributes: (12/19)
    cell_measures:  area: areacella
    cell_methods:   area: time: mean
    comment:        Mole fraction is used in the construction mole_fraction_o...
    description:    Mole fraction is used in the construction mole_fraction_o...
    frequency:      mon
    id:             so2
    ...             ...
    time_label:     time-mean
    time_title:     Temporal mean
    title:          SO2 Volume Mixing Ratio
    type:           real
    units:          mol mol-1
    variable_id:    so2

Compute Weighted average

# Compute weights based on the xarray you pass
weights = np.cos(np.deg2rad(dset.lat))
weights.name = "weights"
# Compute weighted mean
var_weighted = dset.sel(lev=-1000, method="nearest").weighted(weights)
weighted_mean = var_weighted.mean(("lon", "lat"))

Rolling mean

  • Choose rolling time of 100

%%time
weighted_mean.load()
CPU times: user 2min 51s, sys: 51.6 s, total: 3min 42s
Wall time: 7min 24s
<xarray.Dataset>
Dimensions:    (member_id: 1, time: 1980)
Coordinates:
    lev        float64 -992.6
  * time       (time) object 1850-01-15 12:00:00 ... 2014-12-15 12:00:00
  * member_id  (member_id) <U8 'r1i1p1f1'
Data variables:
    so2        (member_id, time) float64 5.779e-11 4.755e-11 ... 7.229e-10
%%time
dpmean = weighted_mean.chunk(chunks={'time': 100}).rolling({'time':100},
            min_periods=1,
            center=True
           ).mean()
CPU times: user 65.7 ms, sys: 2.29 ms, total: 68 ms
Wall time: 66.3 ms

Visualize

%%time
fig = plt.figure(1, figsize=[15,7])

ax = plt.subplot(1, 1, 1)

weighted_mean.so2.plot(ax=ax,
             marker='.',
             linewidth=0,
             label = 'all points',
             alpha=.1
            )

dpmean.so2.plot(ax=ax,
             marker='',
             linewidth=1,
             label = 'running mean [100 points]'
            )
ax.legend()
ax.grid()
ax.set_ylabel(dset.so2.attrs['long_name'] + ' (' + dset.so2.attrs['units'] + ')')
ax.set_xlabel('Time (year)')
ax.set_title('CMIP6 NCAR CESM2-WACCM historical (weighted rolling mean)')
plt.savefig('CMIP_NCAR_CESM2-WACCM_historical_weighted_rolling_mean.png')
CPU times: user 570 ms, sys: 113 ms, total: 683 ms
Wall time: 682 ms
../../_images/gridded_model_data_rolling_mean_18_1.png

Save Results

To improve the reproducibility of our work (first for ourselves!), we need to keep track of all our work and ease its reuse. We will:

  • Save this Jupyter notebook

  • Save weighted mean

  • Save rolling mean

  • Save figure

We save intermediate results as netCDF files because they are small and ca be easily re-loaded (even if stored on cloud storage).

Save results locally

  • Useful for further analysis but can be lost if you close your JupyterLab or if there is any problem with your JupyterLab instance

weighted_mean.to_netcdf('CMIP_NCAR_CESM2-WACCM_historical_weighted_mean.nc')
dpmean.to_netcdf('CMIP_NCAR_CESM2-WACCM_historical_rolling_mean.nc')

Save your results on NIRD (Norwegian infrastructure for Research Data)

  • your credentials are in $HOME/.aws/credentials

  • check with your instructor to get the secret access key (replace XXX by the right key)

[default]
aws_access_key_id=forces2021-work
aws_secret_access_key=XXXXXXXXXXXX
aws_endpoint_url=https://forces2021.uiogeo-apps.sigma2.no/
It is important to save yoru results in a place that can last longer than a few days/weeks!
import s3fs
fsg = s3fs.S3FileSystem(anon=False,
      client_kwargs={
         'endpoint_url': 'https://forces2021.uiogeo-apps.sigma2.no/'
      })

Set “remote” path (update annefou by your username) and save weighted mean as netCDF file

s3_path =  "s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_weighted_mean.nc"
print(s3_path)
s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_weighted_mean.nc
with fsg.open(s3_path, 'wb') as f:
    f.write(weighted_mean.to_netcdf(None))

Save rolling mean to remote location (update annefou with your username)

s3_path =  "s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_rolling_mean.nc"
print(s3_path)
s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_rolling_mean.nc
with fsg.open(s3_path, 'wb') as f:
    f.write(dpmean.to_netcdf(None))

Upload existing png file to remote s3 location

s3_path =  "s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_weighted_rolling_mean.png"
print(s3_path)
s3://work/annefou/CMIP_NCAR_CESM2-WACCM_historical_weighted_rolling_mean.png
fsg.put('CMIP_NCAR_CESM2-WACCM_historical_weighted_rolling_mean.png', s3_path)