9  Meteorology Inputs

We are downloading, subsetting, and processing forecasted meteorology drivers for each NEON site. Currently, we have NOAA’s Global Ensemble Forecasting System (GEFS) V12 output available at the native time resolution and a 1 hr time resolution for each NEON site.

The following are important considerations when using the NOAA GEFS forecasts

The following meteorological variables are included:

The weather forecasts are available through an s3 bucket (see NOAA Global Ensemble Forecasting System below) and we provide an R functions code in the neon4cast package for downloading all the ensemble members for particular location, forecast cycle (00, 06, 12, or 18), and NEON site

remotes::install_github("eco4cast/neon4cast")

9.1 NOAA Global Ensemble Forecasting System

9.1.1 Stage 1

At each site, 31 ensemble member forecasts are provided at 3 hr intervals for the first 10 days, and 6 hr intervals for up to 35 days (840 hr horizon).

Ensemble member 31 only extends 16-days in the future

Forecasts include the following variables:

  • TMP: temperature (C)
  • RH: Relative humidity (%)
  • PRES: Atmospheric pressure (Pa)
  • UGRD: U-component of wind speed (m/s)
  • VGRD: V-component of wind speed (m/s)
  • APCP: Total precipitation in interval (kg/m^2)
  • DSWRF: Downward shortwave radiation flux in interval (W/m^2)
  • DLWRF: Downward longwave radiation flux in interval (W/m^2)

GEFS forecasts are issued four times a day, as indicated by the start_date and cycle. Only forecasts at midnight, cycle = "0" extend for the full 840 hour horizon. Other cycles 6, 12, 18 are provided only 16-day ahead, as mostly being of interest for short-term forecasts. (Though users should note that other NOAA products provide more much accurate and higher resolution short term forecasts than GEFS - though without uncertainty from ensemble members)

All variables are given at height 2m above ground, surface, or 10 m above ground as indicated in height column. See https://www.nco.ncep.noaa.gov/pmb/products/gens/ for more details on GEFS variables and intervals.

Common ways to filter the data before running collect() include start_date, site_id, variable, and horizon.

weather <- neon4cast::noaa_stage1()
# 5.7M rows of data:
weather |> 
dplyr::filter(start_date == "2022-04-01") |>
dplyr::collect()

Note: that noaa_stage1() defaults to the 00 cycle.

If you are using python tools (pyarrow) the endpoint is data.ecoforecast.org and the bucket is neon4cast-drivers/noaa/gefs-v12/stage1

Stage 1 has the following columns:

site_id: string : NEON site ID
prediction: double : forecasted value
variable: string : weather variable
height: string : height above ground for weather variable
horizon: double : number of hours in the future  family: string: class of uncertainty (ensemble)
parameter: int32 : ensemble member number  reference_datetime: timestamp[us, tz=UTC]: datetime of horizon 0  forecast_valid: string: period of time (in hours) that the predicted value applies  datetime: timestamp[us, tz=UTC] : datetime of forecast  longitude: double : location of NEON site  latitude: double : location of NEON site  start_date: string: date of horizon 0 (the initiation of the forecast)  cycle: int32: the UTC hour of the start_date of horizon 0 (the initiation of the forecast), possible values are 0, 6, 12, and 18.

9.1.2 Stage 2

Stage 2 is a processed version of Stage 1 and involves the following transforms of the data that may be useful for some modeling approaches:

  • Fluxes are standardized to per second rates
  • Fluxes and states are interpolated to 1 hour intervals

Ensemble member 31 only extends 16-days in the future

Variables are renamed to match CF conventions:

  • TMP -> air_temperature (K)
  • PRES -> air_pressure (Pa)
  • RH -> relative_humidity (proportion)
  • DLWRF -> surface_downwelling_longwave_flux_in_air (W/m^2)
  • DSWRF -> surface_downwelling_shortwave_flux_in_air (W/m^2) 
  • APCP -> precipitation_flux (kg/(m^2 s))
  • VGRD -> eastward_wind (m/s)
  • UGRD -> northward_wind (m/s)
weather_1hr <- neon4cast::noaa_stage2()
weather_1hr |> 
dplyr::filter(start_date == "2022-04-01" & site_id == "BART") |>
dplyr::collect()

Note: that noaa_stage2() defaults to the 00 cycle.

If you are using python tools (pyarrow) the endpoint is data.ecoforecast.org and the bucket is neon4cast-drivers/noaa/gefs-v12/stage2/parquet

Stage 2 has the following columns:

site_id: string : NEON site ID
prediction: double : forecasted value
variable: string : weather variable
height: string : height above ground for weather variable
horizon: double : number of hours in the future  family: string: class of uncertainty (ensemble)
parameter: int32 : ensemble member number  reference_datetime: timestamp[us, tz=UTC]: datetime of horizon 0  forecast_valid: string: period of time (in hours) that the predicted value applies  datetime: timestamp[us, tz=UTC] : datetime of forecast  longitude: double : location of NEON site  latitude: double : location of NEON site  start_date: string: date of horizon 0 (the initiation of the forecast)  cycle: int32: the UTC hour of the start_date of horizon 0 (the initiation of the forecast), possible values are 0, 6, 12, and 18.

9.1.3 Stage 3

Stage 3 processing presents a ‘nowcast’ product by combining the most recent predictions from each available cycle. Product uses CF variable names and 1 hr interval. The resulting produce is single time-series for each ensemble member, rather than a set of time series from all the forecasts. Stage 3 can be viewed as the “historical” weather for site as simulated by NOAA GEFS. Stage 3 is useful for model training because it ensures that the magnitude and variability of the weather data used to train your model is similar to that in the NOAA GEFS weather forecast you may use as inputs to your forecast.

Stage 3 uses CF variable names and 1 hr interval

  • air_temperature (K)
  • air_pressure (Pa)
  • relative_humidity (proportion)
  • surface_downwelling_longwave_flux_in_air (W/m^2)
  • surface_downwelling_shortwave_flux_in_air (W/m^2) 
  • precipitation_flux (kg/(m^2 s))
  • eastward_wind (m/s)
  • northward_wind (m/s)
weather_stage3 <- neon4cast::noaa_stage3()
weather_stage3 |> 
dplyr::filter(site_id == "BART") |>
dplyr::collect()

If you are using python tools (pyarrow) the endpoint is data.ecoforecast.org and the bucket is neon4cast-drivers/noaa/gefs-v12/stage3/parquet

Stage 3 has the following columns

site_id: string : NEON site ID
prediction: double : forecasted value
variable: string : weather variable
height: string : height above ground for weather variable
family: string: class of uncertainty (ensemble)
parameter: int32 : ensemble member number
datetime: timestamp[us, tz=UTC] : datetime of forecast
longitude: double : location of NEON site
latitude: double : location of NEON site

9.2 NEON Observed

The observed weather at each NEON site as also available in the monthly released data products by NEON. We are downloading these data and providing through the following function

neon <- arrow::s3_bucket("neon4cast-targets/neon",
                  endpoint_override = "data.ecoforecast.org",
                  anonymous = TRUE)

The list of data tables that are available can be found using this command.

neon$ls()

And here is an example for accessing the triple aspirated temperature data product. See the data product (i.e., DP1.00003.001) in NEON data portal for more information about the table names.

taat <- arrow::open_dataset(neon$path("TAAT_30min-basic-DP1.00003.001")) 

Here is an example showing how to download particular subsets of the data table.

neon_temp <- taat |>
  mutate(time = as.Date(startDateTime)) |>
  group_by(siteID, time) |>
  summarise(mean_tmp = mean(tempTripleMean, na.rm = TRUE),
            min_tmp = min(tempTripleMinimum, na.rm = TRUE),
            max_tmp = max(tempTripleMaximum, na.rm = TRUE)) |>
  rename(site_id = siteID) |>
  collect()

9.3 Downloading of raw NOAA GEFS Forecasts

The documentation above describes how to access NOAA GEFS forecasts that we have subseted for NEON sites. We recommend using this approach.

The code used to download and process the raw NOAA GEFS forecast from Amazon Web Services Registry of Open Data is available in the gefs4cast package found at github.com/eco4cast/gefs4cast

9.4 Timing of data availability

See Appendix B for information on the time day that the NOAA data become available on our servers for use in the Challenge.