NeuralGCM simulation datasets

Before running NeuralGCM yourself, consider if you can use one of our pre-computed simulation datasets. All are available in the Zarr format on Google Cloud Storage the under the Creative Commons CC-BY 4.0 license.

We currently have two types of datasets available:

Below, we show how to open and inspect these datsets. You’ll need Xarray, Zarr and GCSFS installed:

! pip install -q -U zarr xarray gcsfs
import xarray
import zarr

Details on coordinate axes:

  • time: The date and time when the simulation was initialized.

  • prediction_timedelta: Time delta (stored as timedelta64[ns]) since the simulation was initialized.

  • longitude: Longitude in degrees.

  • latitude: Latitude in degrees.

  • level: Pressure level (in hPa) as the vertical coordinate.

Medium-range weather forecasts for 2020

15 day weather forecasts from the 0.7° deterministic model (NeuralGCM-0.7)

Corresponding model checkpoint: gs://neuralgcm/models/v1/deterministic_0_7_deg.pkl

# output available at different resolutions:
# `2020-512x256.zarr`  (original)
# `2020-240x121_equiangular_with_poles_conservative.zarr`
# `2020-64x32_equiangular_conservative.zarr`
path = 'gs://weatherbench2/datasets/neuralgcm_deterministic/2020-512x256.zarr'
ngcm_0_7 = xarray.open_zarr(path, storage_options=dict(token='anon'))
ngcm_0_7
<xarray.Dataset> Size: 4TB
Dimensions:                              (time: 797, prediction_timedelta: 31,
                                          longitude: 512, latitude: 256,
                                          level: 37)
Coordinates:
  * latitude                             (latitude) float64 2kB -89.46 ... 89.46
  * level                                (level) int64 296B 1 2 3 ... 975 1000
  * longitude                            (longitude) float64 4kB 0.0 ... 359.3
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] 248B ...
  * time                                 (time) datetime64[ns] 6kB 2019-12-15...
Data variables:
    P_minus_E_cumulative                 (time, prediction_timedelta, longitude, latitude) float32 13GB dask.array<chunksize=(1, 1, 512, 256), meta=np.ndarray>
    geopotential                         (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    specific_cloud_ice_water_content     (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    specific_cloud_liquid_water_content  (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    specific_humidity                    (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    temperature                          (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    u_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    v_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
    wind_speed                           (time, prediction_timedelta, level, longitude, latitude) float32 479GB dask.array<chunksize=(1, 1, 37, 512, 256), meta=np.ndarray>
Attributes:
    experiment_id:  67001173
    worker_id:      1

15 day weather forecasts from the 1.4° stochastic model (NeuralGCM-ENS)

Corresponding model checkpoint: gs://neuralgcm/models/v1/stochastic_1_4_deg.pkl

# output available at different resolutions:
# `2020-256x128.zarr`  (original)
# `2020-240x121_equiangular_with_poles_conservative.zarr`
# `2020-240x121_equiangular_with_poles_conservative_mean.zarr`
# `2020-64x32_equiangular_conservative.zarr`
# `2020-64x32_equiangular_conservative_mean.zarr
path = 'gs://weatherbench2/datasets/neuralgcm_ens/2020-256x128.zarr'
ngcm_ens = xarray.open_zarr(path, storage_options=dict(token='anon'))
ngcm_ens
<xarray.Dataset> Size: 49TB
Dimensions:                              (realization: 50, time: 797,
                                          prediction_timedelta: 32, level: 37,
                                          longitude: 256, latitude: 128)
Coordinates:
  * latitude                             (latitude) float64 1kB -88.93 ... 88.93
  * level                                (level) int64 296B 1 2 3 ... 975 1000
  * longitude                            (longitude) float64 2kB 0.0 ... 358.6
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] 256B ...
  * realization                          (realization) int64 400B 0 1 ... 48 49
  * time                                 (time) datetime64[ns] 6kB 2019-12-15...
Data variables:
    geopotential                         (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    specific_cloud_ice_water_content     (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    specific_cloud_liquid_water_content  (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    specific_humidity                    (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    temperature                          (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    u_component_of_wind                  (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    v_component_of_wind                  (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
    wind_speed                           (realization, time, prediction_timedelta, level, longitude, latitude) float32 6TB dask.array<chunksize=(50, 1, 2, 1, 256, 128), meta=np.ndarray>
Attributes:
    experiment_id:  73974210
    worker_id:      3

Yearly to decadal climate simulations

2-year 1.4 degree determinisic NeuralGCM model

Corresponding model checkpoint: gs://neuralgcm/models/v1/deterministic_1_4_deg.pkl

path = 'gs://neuralgcm/amip_runs/v1_deterministic_1_4_deg/2019-to-2021_256x128_gauss_37-level_stride12h.zarr/'
ngcm_1_4 = xarray.open_zarr(path, storage_options=dict(token='anon'))
ngcm_1_4
<xarray.Dataset> Size: 2TB
Dimensions:                              (time: 37, prediction_timedelta: 1461,
                                          longitude: 256, latitude: 128,
                                          level: 37)
Coordinates:
  * latitude                             (latitude) float64 1kB -88.93 ... 88.93
  * level                                (level) int64 296B 1 2 3 ... 975 1000
  * longitude                            (longitude) float64 2kB 0.0 ... 358.6
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] 12kB ...
  * time                                 (time) datetime64[ns] 296B 2019-01-0...
Data variables:
    P_minus_E_cumulative                 (time, prediction_timedelta, longitude, latitude) float32 7GB dask.array<chunksize=(1, 1, 256, 128), meta=np.ndarray>
    geopotential                         (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    specific_cloud_ice_water_content     (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    specific_cloud_liquid_water_content  (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    specific_humidity                    (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    temperature                          (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    u_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
    v_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 262GB dask.array<chunksize=(1, 1, 37, 256, 128), meta=np.ndarray>
Attributes:
    experiment_id:  66950211
    worker_id:      1

40-year runs 2.8 degree determinitic NeuralGCM model

Corresponding model checkpoint: gs://neuralgcm/models/v1/deterministic_2_8_deg.pkl

path = 'gs://neuralgcm/amip_runs/v1_deterministic_2_8_deg/1980-to-2020_128x64_gauss_37-level_stride6h.zarr/'
ngcm_2_8 = xarray.open_zarr(path, storage_options=dict(token='anon'))
ngcm_2_8
<xarray.Dataset> Size: 18TB
Dimensions:                              (time: 37,
                                          prediction_timedelta: 58449,
                                          level: 37, longitude: 128,
                                          latitude: 64)
Coordinates:
  * latitude                             (latitude) float64 512B -87.86 ... 8...
  * level                                (level) int64 296B 1 2 3 ... 975 1000
  * longitude                            (longitude) float64 1kB 0.0 ... 357.2
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] 468kB ...
  * time                                 (time) datetime64[ns] 296B 1980-01-0...
Data variables:
    geopotential                         (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    specific_cloud_ice_water_content     (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    specific_cloud_liquid_water_content  (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    specific_humidity                    (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    temperature                          (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    u_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
    v_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 3TB dask.array<chunksize=(1, 1, 37, 128, 64), meta=np.ndarray>
Attributes:
    experiment_id:  70001146
    worker_id:      1

20-year runs 2.8 degree stochastic NeuralGCM model with precipitation

Corresponding model checkpoint: gs://neuralgcm/models/v1_precip/stochastic_precip_2_8_deg.pkl

path = 'gs://neuralgcm/amip_runs/v1_precip_stochastic_2_8_deg/2001-to-2021_128x64_gauss_37-level_stride3h.zarr'
ngcm_2_8_precip = xarray.open_zarr(path, storage_options=dict(token='anon'))
ngcm_2_8_precip
<xarray.Dataset> Size: 2TB
Dimensions:                              (time: 37,
                                          prediction_timedelta: 58408,
                                          surface: 1, longitude: 128,
                                          latitude: 64, level: 3)
Coordinates:
  * latitude                             (latitude) float64 512B -87.86 ... 8...
  * level                                (level) int64 24B 500 700 850
  * longitude                            (longitude) float64 1kB 0.0 ... 357.2
  * prediction_timedelta                 (prediction_timedelta) timedelta64[ns] 467kB ...
  * surface                              (surface) int64 8B 1
  * time                                 (time) datetime64[ns] 296B 2001-01-0...
Data variables:
    evaporation                          (time, prediction_timedelta, surface, longitude, latitude) float32 71GB dask.array<chunksize=(1, 8, 1, 128, 64), meta=np.ndarray>
    geopotential                         (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    precipitation_cumulative_mean        (time, prediction_timedelta, surface, longitude, latitude) float32 71GB dask.array<chunksize=(1, 8, 1, 128, 64), meta=np.ndarray>
    specific_cloud_ice_water_content     (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    specific_cloud_liquid_water_content  (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    specific_humidity                    (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    temperature                          (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    u_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
    v_component_of_wind                  (time, prediction_timedelta, level, longitude, latitude) float32 212GB dask.array<chunksize=(1, 8, 3, 128, 64), meta=np.ndarray>
Attributes:
    experiment_id:  121052169
    worker_id:      2

In this dataset, precipitation_cumulative_mean indicates cumulative precipitation (in meters) from the start of the simulation (i.e., from prediction_timedelta=0).

This makes it straightforward to calculate mean precipitation without needing to load the entire 71 GB dataset from storage:

# Calculate mean precipitation
m_to_mm = 1000
measurement_freq = 3  # in hours
measurements_in_a_day = 24 / measurement_freq
daily_precip = ngcm_2_8_precip.precipitation_cumulative_mean.isel(
    time=-1, surface=0, prediction_timedelta=-1
) * (
    m_to_mm
    * measurements_in_a_day
    / (len(ngcm_2_8_precip.prediction_timedelta) - 1)
)
daily_precip.plot(x="longitude", cbar_kwargs={"label": "(mm/day)"})
<matplotlib.collections.QuadMesh at 0x79abc2dd4880>
_images/08f5b194dced1ec2abb2f118aed7ee59a533e53c537eea55e2d5fc85e827b21d.png

To calculate precipitation over shorter internals, use .diff('prediction_timedelta'):

ngcm_2_8_precip.precipitation_cumulative_mean.isel(
    time=-1, surface=0, prediction_timedelta=slice(10, 15)
).diff('prediction_timedelta').plot(
    x='longitude',
    col='prediction_timedelta',
    cbar_kwargs={'label': '(meter/3-hour)'},
)  # In m per 3 hours
<xarray.plot.facetgrid.FacetGrid at 0x79abcb421cf0>
_images/b46c1ed49ecc5424d30832e5a02644be57cb26766bc65efeb58637bc6491bfc9.png