Load CMIP6 Data with Intake ESM

Intake ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any issues or suggestions on github.

import dask
import xarray as xr
import intake
%matplotlib inline
dask.config.set({'array.slicing.split_large_chunks': True})
Intake ESM works by parsing an ESM Collection Spec and converting it to an intake catalog. The collection spec is stored in a .json file. Here we open it using intake.

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
pangeo-cmip6 catalog with 5630 dataset(s) from 347099 asset(s):

activity_id 17
institution_id 35
source_id 79
experiment_id 146
member_id 495
table_id 36
variable_id 659
grid_label 10
zstore 347099
dcpp_init_year 60
version 547

We can now use intake methods to search the collection, and, if desired, export a pandas dataframe.

cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP CCCma CanESM5-CanOE historical r1i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
1 CMIP CCCma CanESM5-CanOE historical r2i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
2 CMIP CCCma CanESM5-CanOE historical r3i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
3 CMIP CCCma CanESM5 historical r10i1p1f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
4 CMIP CCCma CanESM5 historical r10i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
... ... ... ... ... ... ... ... ... ... ... ...
138 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r10i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710
139 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710
140 ScenarioMIP MRI MRI-ESM2-0 ssp585 r1i2p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MRI/MRI-ESM2-0/ssp585/r... NaN 20200303
141 ScenarioMIP NCC NorESM2-LM ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/NCC/NorESM2-LM/ssp585/r... NaN 20191108
142 ScenarioMIP NCC NorESM2-MM ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/NCC/NorESM2-MM/ssp585/r... NaN 20191108

143 rows × 11 columns

Intake knows how to automatically open the datasets using xarray. Furthermore, intake esm contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated xarray datasets.

dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
--> The keys in the returned dictionary of datasets are constructed as follows:
100.00% [22/22 00:05<00:00]
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
Dimensions:    (i: 360, j: 291, lev: 45, member_id: 35, time: 165)
  * i          (i) int32 0 1 2 3 4 5 6 7 8 ... 352 353 354 355 356 357 358 359
  * j          (j) int32 0 1 2 3 4 5 6 7 8 ... 283 284 285 286 287 288 289 290
    latitude   (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * lev        (lev) float64 3.047 9.454 16.36 ... 5.126e+03 5.375e+03 5.625e+03
    longitude  (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * time       (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:00:00
  * member_id  (member_id) <U9 'r10i1p1f1' 'r10i1p2f1' ... 'r9i1p1f1' 'r9i1p2f1'
Data variables:
    o2         (member_id, time, lev, j, i) float32 dask.array<chunksize=(1, 12, 45, 291, 360), meta=np.ndarray>
