Search
How to handle large files

Example to reduce file size in case of memory errors (using pyaerocom)

import pyaerocom as pya
from warnings import filterwarnings
filterwarnings('ignore')
pya.change_verbosity('critical', log=pya.const.print_log)
pya.__version__
'0.8.1.dev4'

NOTE: YOU NEED PYAEROCOM v0.8.1.dev4 or larger for this

DATA_DIR = '/home/notebook/shared-ns1000k/inputs/pyaerocom-testdata/modeldata/MIROC-SPRINTARS_AP3-CTRL/renamed/'
LARGE_FILE = 'aerocom3_MIROC-SPRINTARS_AP3-CTRL_ec550aer_ModelLevel_2010_6hourly.nc'

This file has 64 GB filesize

data = pya.GriddedData(DATA_DIR +  LARGE_FILE, var_name='ec550aer')
data
pyaerocom.GriddedData
Grid data: <iris 'Cube' of volume_extinction_coefficient_in_air_due_to_ambient_aerosol_particles / (1/Mm) (time: 1460; atmosphere_sigma_coordinate: 56; latitude: 320; longitude: 640)>

Extract surface level and crop summer months and arctic region

data_surf = data.extract_surface_level()
data_surf_cropped = data_surf.crop(time_range=(4, 8), lat_range=(66, 90))
data_surf_cropped
pyaerocom.GriddedData
Grid data: <iris 'Cube' of volume_extinction_coefficient_in_air_due_to_ambient_aerosol_particles / (1/Mm) (time: 4; latitude: 43; longitude: 640)>

Save to NetCDF

data_surf_cropped.to_netcdf('.', 'SPRINTARS_extract.nc')
'./SPRINTARS_extract.nc'

Reload the subset that was just saved

reloaded_subset = pya.GriddedData('SPRINTARS_extract.nc')
reloaded_subset
pyaerocom.GriddedData
Grid data: <iris 'Cube' of volume_extinction_coefficient_in_air_due_to_ambient_aerosol_particles / (1/Mm) (time: 4; latitude: 43; longitude: 640)>
reloaded_subset.quickplot_map();