Load CMIP6 Data with Intake ESM

Intake ESM is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any issues or suggestions on github.

import dask
import xarray as xr
xr.set_options(display_style='html')
import intake
%matplotlib inline
dask.config.set({'array.slicing.split_large_chunks': True})
<dask.config.set at 0x7efce5395700>

Intake ESM works by parsing an ESM Collection Spec and converting it to an intake catalog. The collection spec is stored in a .json file. Here we open it using intake.

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6.json"
col = intake.open_esm_datastore(cat_url)
col
/opt/conda/envs/pangeo/lib/python3.8/site-packages/IPython/core/interactiveshell.py:3418: DtypeWarning: Columns (10) have mixed types.Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)

pangeo-cmip6 catalog with 5630 dataset(s) from 347099 asset(s):

unique
activity_id 17
institution_id 35
source_id 79
experiment_id 146
member_id 495
table_id 36
variable_id 659
grid_label 10
zstore 347099
dcpp_init_year 60
version 547

We can now use intake methods to search the collection, and, if desired, export a pandas dataframe.

cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',
                 grid_label='gn')
cat.df
activity_id institution_id source_id experiment_id member_id table_id variable_id grid_label zstore dcpp_init_year version
0 CMIP CCCma CanESM5-CanOE historical r1i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
1 CMIP CCCma CanESM5-CanOE historical r2i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
2 CMIP CCCma CanESM5-CanOE historical r3i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN 20190429
3 CMIP CCCma CanESM5 historical r10i1p1f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
4 CMIP CCCma CanESM5 historical r10i1p2f1 Oyr o2 gn gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN 20190429
... ... ... ... ... ... ... ... ... ... ... ...
138 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r10i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710
139 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN 20190710
140 ScenarioMIP MRI MRI-ESM2-0 ssp585 r1i2p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/MRI/MRI-ESM2-0/ssp585/r... NaN 20200303
141 ScenarioMIP NCC NorESM2-LM ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/NCC/NorESM2-LM/ssp585/r... NaN 20191108
142 ScenarioMIP NCC NorESM2-MM ssp585 r1i1p1f1 Oyr o2 gn gs://cmip6/ScenarioMIP/NCC/NorESM2-MM/ssp585/r... NaN 20191108

143 rows × 11 columns

Intake knows how to automatically open the datasets using xarray. Furthermore, intake esm contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated xarray datasets.

dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})
list(dset_dict.keys())
--> The keys in the returned dictionary of datasets are constructed as follows:
	'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'
100.00% [22/22 00:05<00:00]
['ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'CMIP.NCC.NorESM2-MM.historical.Oyr.gn',
 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',
 'ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.Oyr.gn',
 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',
 'ScenarioMIP.MRI.MRI-ESM2-0.ssp585.Oyr.gn',
 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',
 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',
 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',
 'CMIP.MRI.MRI-ESM2-0.historical.Oyr.gn',
 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',
 'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn',
 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-LM.ssp585.Oyr.gn',
 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',
 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',
 'ScenarioMIP.NCC.NorESM2-MM.ssp585.Oyr.gn',
 'CMIP.CCCma.CanESM5.historical.Oyr.gn',
 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',
 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn']
ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']
ds
<xarray.Dataset>
Dimensions:    (i: 360, j: 291, lev: 45, member_id: 35, time: 165)
Coordinates:
  * i          (i) int32 0 1 2 3 4 5 6 7 8 ... 352 353 354 355 356 357 358 359
  * j          (j) int32 0 1 2 3 4 5 6 7 8 ... 283 284 285 286 287 288 289 290
    latitude   (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * lev        (lev) float64 3.047 9.454 16.36 ... 5.126e+03 5.375e+03 5.625e+03
    longitude  (j, i) float64 dask.array<chunksize=(291, 360), meta=np.ndarray>
  * time       (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:00:00
  * member_id  (member_id) <U9 'r10i1p1f1' 'r10i1p2f1' ... 'r9i1p1f1' 'r9i1p2f1'
Data variables:
    o2         (member_id, time, lev, j, i) float32 dask.array<chunksize=(1, 12, 45, 291, 360), meta=np.ndarray>
Attributes:
    experiment:                  all-forcing simulation of the recent past
    parent_time_units:           days since 1850-01-01 0:0:0.0
    mip_era:                     CMIP6
    realization_index:           9
    sub_experiment_id:           none
    creation_date:               2019-05-30T08:58:45Z
    license:                     CMIP6 model data produced by The Government ...
    nominal_resolution:          100 km
    experiment_id:               historical
    forcing_index:               1
    status:                      2019-10-25;created;by nhn2@columbia.edu
    parent_activity_id:          CMIP
    intake_esm_varname:          ['o2']
    sub_experiment:              none
    cmor_version:                3.4.0
    title:                       CanESM5 output prepared for CMIP6
    parent_experiment_id:        piControl
    institution_id:              CCCma
    data_specs_version:          01.00.29
    Conventions:                 CF-1.7 CMIP-6.2
    references:                  Geophysical Model Development Special issue ...
    YMDH_branch_time_in_parent:  5950:01:01:00
    further_info_url:            https://furtherinfo.es-doc.org/CMIP6.CCCma.C...
    history:                     2019-05-02T13:53:53Z ;rewrote data to be con...
    realm:                       ocnBgchem
    table_id:                    Oyr
    source_id:                   CanESM5
    CCCma_model_hash:            Unknown
    initialization_index:        1
    grid:                        ORCA1 tripolar grid, 1 deg with refinement t...
    source_type:                 AOGCM
    CCCma_parent_runid:          p2-pictrl
    version:                     v20190429
    branch_time_in_child:        0.0
    tracking_id:                 hdl:21.14100/41426118-701c-482b-ae16-82932e4...
    YMDH_branch_time_in_child:   1850:01:01:00
    CCCma_runid:                 p2-his09
    parent_mip_era:              CMIP6
    table_info:                  Creation Date:(20 February 2019) MD5:374fbe5...
    source:                      CanESM5 (2019): \naerosol: interactive\natmo...
    parent_source_id:            CanESM5
    external_variables:          areacello volcello
    contact:                     ec.cccma.info-info.ccmac.ec@canada.ca
    variant_label:               r9i1p2f1
    activity_id:                 CMIP
    branch_time_in_parent:       1496500.0
    variable_id:                 o2
    frequency:                   yr
    institution:                 Canadian Centre for Climate Modelling and An...
    branch_method:               Spin-up documentation
    grid_label:                  gn
    product:                     model-output
    intake_esm_dataset_key:      CMIP.CCCma.CanESM5.historical.Oyr.gn