{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Load CMIP6 Data with Intake ESM\n", "\n", "[Intake ESM](https://intake-esm.readthedocs.io/en/latest/) is an experimental new package that aims to provide a higher-level interface to searching and loading Earth System Model data archives, such as CMIP6. The packages is under very active development, and features may be unstable. Please report any issues or suggestions [on github](https://github.com/NCAR/intake-esm/issues)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import xarray as xr\n", "xr.set_options(display_style='html')\n", "import intake\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intake ESM works by parsing an [ESM Collection Spec](https://github.com/NCAR/esm-collection-spec/) and converting it to an [intake catalog](https://intake.readthedocs.io/en/latest). The collection spec is stored in a .json file. Here we open it using intake." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "pangeo-cmip6-ESM Collection with 235624 entries:\n", "\t> 15 activity_id(s)\n", "\n", "\t> 32 institution_id(s)\n", "\n", "\t> 69 source_id(s)\n", "\n", "\t> 101 experiment_id(s)\n", "\n", "\t> 135 member_id(s)\n", "\n", "\t> 29 table_id(s)\n", "\n", "\t> 313 variable_id(s)\n", "\n", "\t> 10 grid_label(s)\n", "\n", "\t> 235624 zstore(s)\n", "\n", "\t> 60 dcpp_init_year(s)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat_url = \"https://storage.googleapis.com/cmip6/pangeo-cmip6.json\"\n", "col = intake.open_esm_datastore(cat_url)\n", "col" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now use intake methods to search the collection, and, if desired, export a pandas dataframe." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
activity_idinstitution_idsource_idexperiment_idmember_idtable_idvariable_idgrid_labelzstoredcpp_init_year
0CMIPCCCmaCanESM5-CanOEhistoricalr1i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN
1CMIPCCCmaCanESM5-CanOEhistoricalr2i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN
2CMIPCCCmaCanESM5-CanOEhistoricalr3i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical...NaN
3CMIPCCCmaCanESM5historicalr10i1p1f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1...NaN
4CMIPCCCmaCanESM5historicalr10i1p2f1Oyro2gngs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1...NaN
.................................
132ScenarioMIPIPSLIPSL-CM6A-LRssp585r4i1p1f1Oyro2gngs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58...NaN
133ScenarioMIPIPSLIPSL-CM6A-LRssp585r6i1p1f1Oyro2gngs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58...NaN
134ScenarioMIPMIROCMIROC-ES2Lssp585r1i1p1f2Oyro2gngs://cmip6/ScenarioMIP/MIROC/MIROC-ES2L/ssp585...NaN
135ScenarioMIPMPI-MMPI-ESM1-2-LRssp585r10i1p1f1Oyro2gngs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp...NaN
136ScenarioMIPMPI-MMPI-ESM1-2-LRssp585r1i1p1f1Oyro2gngs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp...NaN
\n", "

137 rows × 10 columns

\n", "
" ], "text/plain": [ " activity_id institution_id source_id experiment_id member_id \\\n", "0 CMIP CCCma CanESM5-CanOE historical r1i1p2f1 \n", "1 CMIP CCCma CanESM5-CanOE historical r2i1p2f1 \n", "2 CMIP CCCma CanESM5-CanOE historical r3i1p2f1 \n", "3 CMIP CCCma CanESM5 historical r10i1p1f1 \n", "4 CMIP CCCma CanESM5 historical r10i1p2f1 \n", ".. ... ... ... ... ... \n", "132 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r4i1p1f1 \n", "133 ScenarioMIP IPSL IPSL-CM6A-LR ssp585 r6i1p1f1 \n", "134 ScenarioMIP MIROC MIROC-ES2L ssp585 r1i1p1f2 \n", "135 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r10i1p1f1 \n", "136 ScenarioMIP MPI-M MPI-ESM1-2-LR ssp585 r1i1p1f1 \n", "\n", " table_id variable_id grid_label \\\n", "0 Oyr o2 gn \n", "1 Oyr o2 gn \n", "2 Oyr o2 gn \n", "3 Oyr o2 gn \n", "4 Oyr o2 gn \n", ".. ... ... ... \n", "132 Oyr o2 gn \n", "133 Oyr o2 gn \n", "134 Oyr o2 gn \n", "135 Oyr o2 gn \n", "136 Oyr o2 gn \n", "\n", " zstore dcpp_init_year \n", "0 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "1 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "2 gs://cmip6/CMIP/CCCma/CanESM5-CanOE/historical... NaN \n", "3 gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN \n", "4 gs://cmip6/CMIP/CCCma/CanESM5/historical/r10i1... NaN \n", ".. ... ... \n", "132 gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN \n", "133 gs://cmip6/ScenarioMIP/IPSL/IPSL-CM6A-LR/ssp58... NaN \n", "134 gs://cmip6/ScenarioMIP/MIROC/MIROC-ES2L/ssp585... NaN \n", "135 gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN \n", "136 gs://cmip6/ScenarioMIP/MPI-M/MPI-ESM1-2-LR/ssp... NaN \n", "\n", "[137 rows x 10 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat = col.search(experiment_id=['historical', 'ssp585'], table_id='Oyr', variable_id='o2',\n", " grid_label='gn')\n", "cat.df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Intake knows how to automatically open the datasets using xarray. Furthermore, intake esm contains special logic to concatenate and merge the individual results of our query into larger, more high-level aggregated xarray datasets." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "--> The keys in the returned dictionary of datasets are constructed as follows:\n", "\t'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'\n", " \n", "--> There is/are 17 group(s)\n", "[########################################] | 100% Completed | 1min 27.1s\n" ] }, { "data": { "text/plain": [ "['CMIP.CCCma.CanESM5.historical.Oyr.gn',\n", " 'CMIP.CCCma.CanESM5-CanOE.historical.Oyr.gn',\n", " 'CMIP.CSIRO.ACCESS-ESM1-5.historical.Oyr.gn',\n", " 'CMIP.HAMMOZ-Consortium.MPI-ESM-1-2-HAM.historical.Oyr.gn',\n", " 'CMIP.IPSL.IPSL-CM6A-LR.historical.Oyr.gn',\n", " 'CMIP.MIROC.MIROC-ES2L.historical.Oyr.gn',\n", " 'CMIP.MPI-M.MPI-ESM1-2-HR.historical.Oyr.gn',\n", " 'CMIP.MPI-M.MPI-ESM1-2-LR.historical.Oyr.gn',\n", " 'CMIP.NCC.NorESM2-LM.historical.Oyr.gn',\n", " 'ScenarioMIP.CCCma.CanESM5.ssp585.Oyr.gn',\n", " 'ScenarioMIP.CCCma.CanESM5-CanOE.ssp585.Oyr.gn',\n", " 'ScenarioMIP.CSIRO.ACCESS-ESM1-5.ssp585.Oyr.gn',\n", " 'ScenarioMIP.DKRZ.MPI-ESM1-2-HR.ssp585.Oyr.gn',\n", " 'ScenarioMIP.DWD.MPI-ESM1-2-HR.ssp585.Oyr.gn',\n", " 'ScenarioMIP.IPSL.IPSL-CM6A-LR.ssp585.Oyr.gn',\n", " 'ScenarioMIP.MIROC.MIROC-ES2L.ssp585.Oyr.gn',\n", " 'ScenarioMIP.MPI-M.MPI-ESM1-2-LR.ssp585.Oyr.gn']" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True})\n", "list(dset_dict.keys())" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "Show/Hide data repr\n", "\n", "\n", "\n", "\n", "\n", "Show/Hide attributes\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
xarray.Dataset
" ], "text/plain": [ "\n", "Dimensions: (bnds: 2, i: 360, j: 291, lev: 45, member_id: 35, time: 165, vertices: 4)\n", "Coordinates:\n", " longitude (j, i) float64 dask.array\n", " lev_bnds (lev, bnds) float64 dask.array\n", " latitude (j, i) float64 dask.array\n", " time_bnds (time, bnds) object dask.array\n", " * j (j) int32 0 1 2 3 4 5 6 ... 284 285 286 287 288 289 290\n", " * time (time) object 1850-07-02 12:00:00 ... 2014-07-02 12:00:00\n", " * i (i) int32 0 1 2 3 4 5 6 ... 353 354 355 356 357 358 359\n", " * lev (lev) float64 3.047 9.454 16.36 ... 5.375e+03 5.625e+03\n", " * member_id (member_id) \n", " vertices_longitude (j, i, vertices) float64 dask.array\n", " o2 (member_id, time, lev, j, i) float32 dask.array\n", "Attributes:\n", " contact: ec.cccma.info-info.ccmac.ec@canada.ca\n", " parent_activity_id: CMIP\n", " license: CMIP6 model data produced by The Government ...\n", " intake_esm_varname: o2\n", " mip_era: CMIP6\n", " Conventions: CF-1.7 CMIP-6.2\n", " variable_id: o2\n", " version: v20190429\n", " source_type: AOGCM\n", " variant_label: r9i1p2f1\n", " table_id: Oyr\n", " external_variables: areacello volcello\n", " cmor_version: 3.4.0\n", " references: Geophysical Model Development Special issue ...\n", " institution_id: CCCma\n", " forcing_index: 1\n", " source: CanESM5 (2019): \\naerosol: interactive\\natmo...\n", " realization_index: 9\n", " CCCma_parent_runid: p2-pictrl\n", " tracking_id: hdl:21.14100/41426118-701c-482b-ae16-82932e4...\n", " initialization_index: 1\n", " branch_method: Spin-up documentation\n", " status: 2019-10-25;created;by nhn2@columbia.edu\n", " table_info: Creation Date:(20 February 2019) MD5:374fbe5...\n", " activity_id: CMIP\n", " YMDH_branch_time_in_parent: 5950:01:01:00\n", " creation_date: 2019-05-30T08:58:45Z\n", " product: model-output\n", " YMDH_branch_time_in_child: 1850:01:01:00\n", " frequency: yr\n", " data_specs_version: 01.00.29\n", " experiment: all-forcing simulation of the recent past\n", " CCCma_model_hash: Unknown\n", " branch_time_in_parent: 1496500.0\n", " institution: Canadian Centre for Climate Modelling and An...\n", " parent_source_id: CanESM5\n", " sub_experiment_id: none\n", " nominal_resolution: 100 km\n", " parent_experiment_id: piControl\n", " further_info_url: https://furtherinfo.es-doc.org/CMIP6.CCCma.C...\n", " source_id: CanESM5\n", " history: 2019-05-02T13:53:53Z ;rewrote data to be con...\n", " CCCma_runid: p2-his09\n", " grid_label: gn\n", " title: CanESM5 output prepared for CMIP6\n", " parent_time_units: days since 1850-01-01 0:0:0.0\n", " grid: ORCA1 tripolar grid, 1 deg with refinement t...\n", " experiment_id: historical\n", " branch_time_in_child: 0.0\n", " sub_experiment: none\n", " realm: ocnBgchem\n", " parent_mip_era: CMIP6" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = dset_dict['CMIP.CCCma.CanESM5.historical.Oyr.gn']\n", "ds" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "pangeo", "language": "python", "name": "pangeo" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }