EFI-TERN Challenge - Instructions

What: Net ecosystem exchange of CO2 and evapotranspiration in terrestrial ecosystems

Where: 2 TERN sites in Australia

When: Daily forecasts for at least 30 days in the future are accepted at any time. New forecast submissions are accepted daily. The only requirement is that submissions are predictions of the future at the time the forecast is submitted.

Why: Carbon and water cycling are fundamental for climate and water regulation services provided by ecosystems

Who: Open to any individual or team that registers

How: Upload to our submissions system (instructions below)

Overview

The exchange of water and carbon dioxide between the atmosphere and the land is akin to earth’s terrestrial ecosystems breathing rate and lung capacity. One of the best ways to monitor changes in the amount of carbon and water in an ecosystem is the eddy-covariance method. This method observes the net amount of carbon and water entering and exiting ecosystems at half-hourly timesteps, which is important because it can provide information on ecosystem processes such as photosynthesis, respiration, and transpiration, their sensitivities to ongoing climate and land use change, and greenhouse gas budgets for carbon accounting and natural climate solutions. Forecasts of carbon uptake and release along with water use can provide insights into the future production of food, fiber, timber, and carbon credits. Additionally, forecasts will highlight the influence that stress and disturbance have on carbon and water cycling.

Challenge

This forecasting challenge asks teams to forecast net ecosystem exchange of carbon dioxide (NEE) and latent heat flux of evapotranspiration (LE) across 2 TERN sites with differing climates. Forecasts of daily time steps over the next 30 days. Weather forecasts from NOAA Global Ensemble Forecast System are provided to use as model drivers (if the forecasting model uses meteorological inputs). Forecasts can be submitted daily

Teams are asked to submit their forecast of NEE and LE, along with uncertainty estimates. Any existing NEE and LE may be used to build and improve the forecast models. Other data can be used as inputs to the forecasting model.

Data: Targets

The challenge uses the following TERN data products:

https://portal.tern.org.au/results?topicTerm=flux

A file with previously released TERN data that has been processed into “targets” is provided below. The same processing will be applied to new data that are used for forecast evaluation.

Net ecosystem exchange

Definition

Net ecosystem exchange (NEE) is the net movement of carbon dioxide from the atmosphere to the ecosystem. At the 30-minute time resolution it is reported as \(\mu\)mol CO₂ m^-2 s^-1. At the daily time resolution, it is reported as g C m^-2 day^-1. Negative values correspond to an ecosystem absorbing CO² from the atmosphere, positive values correspond to an ecosystem emitting CO₂ to the atmosphere.

Motivation

NEE quantifies the net exchange of CO₂ between the ecosystem and the atmosphere over a daily time period. Making daily predictions will allow us to rapidly assess skill and provide information in a timeframe pertinent to inform and implement natural resource management.

Latent heat flux

Definition

Latent heat flux is the movement of water as water vapor from the ecosystem to the atmosphere. It is reported as W m^-2 (equivalent to J m^-2 s^-1). At the daily time resolution, it is reported as mean W m^-2. Positive values correspond to a transfer of water vapor from the ecosystem to the atmosphere.

Motivation

Latent heat measures the water loss from an ecosystem to the atmosphere through evapotranspiration (transpiration through plants + evaporation from surfaces).

Forecasting latent heat (evapotranspiration) can provide insights into water stress for plants and the efficiency that plants use water relative to NEE, and to the amount of liquid water remaining in the soil for soil moisture forecasting

Focal sites

Information on the sites can be found here:

site_data <- readr::read_csv("https://raw.githubusercontent.com/eco4cast/tern4cast/main/tern_field_site_metadata.csv") |> 
  dplyr::filter(!is.na(data_url))

site	site_id	fluxnet_id	latitude	longitude	elevation	mat	map	world_ecoregion	data_url
Boyagin	ABOY	AU-Boy	-32.4771	116.9386	484	NA	NA	NA	https://dap.ozflux.org.au/thredds/dodsC/ozflux/sites/Boyagin/L3/neon_efi/Boyagin_L3_EFI.nc
Calperum	ACPR	AU-Cpr	-34.0027	140.5877	60	17.9	301	Mediterranean woodlands	https://dap.ozflux.org.au/thredds/dodsC/ozflux/sites/Calperum/L3/neon_efi/Calperum_L3_EFI.nc

Target data calculation

To create the data for evaluation (and training) for NEE and LE we extract NEE and LE that pass the turbulence quality control flags provided by TERN and has flux values between -50 and 50 umol CO2 m^-2 s^-1.

To evaluate daily flux forecasts, we select only days with at least 24 of 48 half hours that pass the quality control flags. For these days, we average the half-hours and convert carbon to daily units (gC/m2/day). The daily data table has the following columns.

datetime: YYYY-MM-DD (the day is determined using UTC time)
site_id: NEON site code (e.g., BART)
variable: nee (g C m^-2 day^-1) or le (W m^-2)
observation: value for the variable

Here is the download link and format of the terrestrial_daily target file

readr::read_csv("https://data.ecoforecast.org/tern4cast-targets/terrestrial_daily/terrestrial_daily-targets.csv.gz", guess_max = 1e6) |> 
  na.omit()

# A tibble: 13,857 × 4
   datetime   site_id variable observation
   <date>     <chr>   <chr>          <dbl>
 1 2010-07-30 ACPR    le            48.8  
 2 2010-07-31 ACPR    le            22.6  
 3 2010-07-31 ACPR    nee           -0.296
 4 2010-08-01 ACPR    le            69.6  
 5 2010-08-01 ACPR    nee           -1.27 
 6 2010-08-02 ACPR    le            21.1  
 7 2010-08-02 ACPR    nee           -0.375
 8 2010-08-03 ACPR    le            15.7  
 9 2010-08-03 ACPR    nee           -0.563
10 2010-08-04 ACPR    le             9.48 
# ℹ 13,847 more rows

Timeline

Forecasts for a minimum of 30 days in the forecast can be submitted daily. New forecasts can be submitted daily as new weather forecasts and observations (e.g., NEE) become available. The key is that submissions are predictions of the future.

Daily submissions are allowed and encouraged as new observations and weather forecasts become available, therefore the automation of forecast generation may be ideal. There are many ways to automate scripts that are written to download observations and meteorology drivers, generate forecasts, and submit forecasts.

Flux data latency

There is approximately 2 days between data collection and availability in the targets file.

Submissions

The required names for forecasted variables: nee or le.

The required time unit: YYYY-MM-DD

The following provides the requirements for the format of the forecasts that will be submitted. It is important to follow these format guidelines in order for your submitted forecasts to pass a set of internal checks that allow the forecast to be visualized and scored correctly.

Steps to submitting

We provide an overview of the steps for submitting with the details below:

Submit registration and model overview for the model using a web form.
Generate forecast with required columns. There are three options for the file format described below.
Write the forecast to a file that follows the standardized naming format.
Submit forecast (preferably using the submission function that we provide)/

Step 1: Forecast file format

The file is a csv format with the following columns:

datetime: forecast timestamp. Format %Y-%m-%d.
reference_datetime: The start of the forecast; this should be 0 times steps in the future. This should only be one value of reference_datetime in the file. Format is %Y-%m-%d %H:%M:%S for terrestrial_30min theme and %Y-%m-%d for all other themes.
site_id: NEON code for site
family name of the probability distribution that is described by the parameter values in the parameter column; only normal or ensemble are currently allowed.
parameter required to be the string mu or sigma (see note below about parameter column) or the number of the ensemble member.
variable: standardized variable name for the theme
prediction: forecasted value for the parameter in the parameter column
model_id: the short name of the model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.

The time unit (i.e., date or date-time) should correspond to the time unit of the theme-specific target file (see Theme description). All datetimes with time zones should use UTC as the time zone.

For a parametric forecast, the family column uses the word normal to designate a normal distribution and the parameter column must have values of mu and sigma for each forecasted variable, site_id, and time.

If you are submitting a forecast that does not have a normal distribution we recommend using the ensemble format and sampling from your non-normal distribution to generate a set of ensemble members that represents your distribution.

We aim to support more distributions beyond the normal in the distribution format file (e.g., log-normal distribution to support zero-bounded forecast submissions). Please check back at this site for updates on the list of supported distributions.

Here is an example of a forecast that uses a normal distribution:

readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-climatology.csv.gz", show_col_types = FALSE)

# A tibble: 68 × 8
   datetime   variable    parameter family model_id   reference_datetime site_id
   <date>     <chr>       <chr>     <chr>  <chr>      <date>             <chr>  
 1 2022-11-07 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 2 2022-11-07 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 3 2022-11-08 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 4 2022-11-08 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 5 2022-11-09 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 6 2022-11-09 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 7 2022-11-10 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 8 2022-11-10 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 9 2022-11-11 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
10 2022-11-11 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
# ℹ 58 more rows
# ℹ 1 more variable: prediction <dbl>

For an ensemble (or sample) forecast, the family column uses the word ensemble to design that it is a ensemble forecast and the parameter column is the ensemble member number (1, 2, 3 …)

readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-persistenceRW.csv.gz", show_col_types = FALSE)

# A tibble: 523,600 × 8
   model_id      datetime   reference_datetime site_id family parameter variable
   <chr>         <date>     <date>             <chr>   <chr>      <dbl> <chr>   
 1 persistenceRW 2022-11-08 2022-11-07         BARC    ensem…         1 chla    
 2 persistenceRW 2022-11-09 2022-11-07         BARC    ensem…         1 chla    
 3 persistenceRW 2022-11-10 2022-11-07         BARC    ensem…         1 chla    
 4 persistenceRW 2022-11-11 2022-11-07         BARC    ensem…         1 chla    
 5 persistenceRW 2022-11-12 2022-11-07         BARC    ensem…         1 chla    
 6 persistenceRW 2022-11-13 2022-11-07         BARC    ensem…         1 chla    
 7 persistenceRW 2022-11-14 2022-11-07         BARC    ensem…         1 chla    
 8 persistenceRW 2022-11-15 2022-11-07         BARC    ensem…         1 chla    
 9 persistenceRW 2022-11-16 2022-11-07         BARC    ensem…         1 chla    
10 persistenceRW 2022-11-17 2022-11-07         BARC    ensem…         1 chla    
# ℹ 523,590 more rows
# ℹ 1 more variable: prediction <dbl>

Step 3: Submission process

Individual forecast files can be uploaded any time.

The correct file name and format is critical for the automated processing of submissions

Teams can submit their forecast csv files through an R function.

We have developed a function called submit() that is available in this GitHub repo

library(aws.s3)
source("https://raw.githubusercontent.com/eco4cast/tern4cast/main/R/submit.R")
forecast_file = "theme_name-forecast-year-month-day-model_id.csv"

submit(forecast_file = forecast_file,
                  metadata = NULL,
                 ask = FALSE)

Alternatively, if you using another programming language, you can submit using AWS S3-like tools (i.e., aws.s3 R package) to the tern4cast-submissions bucket at the data.ecoforecast.org endpoint.

Submissions need to adhere to the forecast format that is provided above, including the file naming convention. Our cyberinfrastructure automatically evaluates forecasts and relies on the expected formatting. Contact eco4cast.initiative@gmail.com if you experience technical issues with submitting.

Meteorological inputs for modeling

s3 <- arrow::s3_bucket("tern4cast-drivers/noaa/gefs-v12/stage1", endpoint_override = "data.ecoforecast.org", anonymous = TRUE) 
weather <- arrow::open_dataset(s3) |> 
  dplyr::filter(reference_datetime == "2023-05-25", 
         site_id == "ABOY") |>  
  dplyr::collect()

Useful functions

We run a validator script when processing the submissions. If your submission does not meet the file standards above, you can run a function that provides information describing potential issues. The forecast file needs to be in your local working directory or you need to provide a full path to the file

source("https://raw.githubusercontent.com/eco4cast/tern4cast/main/R/forecast_output_validator.R")
forecast_output_validator("phenology-2022-09-01-persistenceRW.csv.gz")

Null models

Two null models are automatically generated each day - these are simple baseline models. The persistence null model uses the most recent measurement of nee or le and predicts that the values will be constant in the future. The climatology null model forecasts that the nee or lee will be equal to the historical mean of that day of the year. We apply both the persistence and climatology models to the daily fluxes.

Evaluation

Forecasts will be scored using the continuous ranked probability score (CRPS), a proper scoring rule for evaluating forecasts presented as distributions or ensembles (Gneiting & Raftery 2007). The CRPS compares the forecast probability distribution to that of the validation observation and assigns a score based on both the accuracy and precision of the forecast. We will use the ‘crps_sample’ function from the scoringRules package in R to calculate the CRPS for each forecast.

Design team

Terrestrial Ecosystem Research Network (TERN) has been involved in the design of the challenge. ````