<- readr::read_csv("https://raw.githubusercontent.com/eco4cast/tern4cast/main/tern_field_site_metadata.csv") |>
site_data ::filter(!is.na(data_url)) dplyr
What: Net ecosystem exchange of CO2 and evapotranspiration in terrestrial ecosystems
Where: 2 TERN sites in Australia
When: Daily forecasts for at least 30 days in the future are accepted at any time. New forecast submissions are accepted daily. The only requirement is that submissions are predictions of the future at the time the forecast is submitted.
Why: Carbon and water cycling are fundamental for climate and water regulation services provided by ecosystems
Who: Open to any individual or team that registers
How: Upload to our submissions system (instructions below)
Overview
The exchange of water and carbon dioxide between the atmosphere and the land is akin to earth’s terrestrial ecosystems breathing rate and lung capacity. One of the best ways to monitor changes in the amount of carbon and water in an ecosystem is the eddy-covariance method. This method observes the net amount of carbon and water entering and exiting ecosystems at half-hourly timesteps, which is important because it can provide information on ecosystem processes such as photosynthesis, respiration, and transpiration, their sensitivities to ongoing climate and land use change, and greenhouse gas budgets for carbon accounting and natural climate solutions. Forecasts of carbon uptake and release along with water use can provide insights into the future production of food, fiber, timber, and carbon credits. Additionally, forecasts will highlight the influence that stress and disturbance have on carbon and water cycling.
Challenge
This forecasting challenge asks teams to forecast net ecosystem exchange of carbon dioxide (NEE) and latent heat flux of evapotranspiration (LE) across 2 TERN sites with differing climates. Forecasts of daily time steps over the next 30 days. Weather forecasts from NOAA Global Ensemble Forecast System are provided to use as model drivers (if the forecasting model uses meteorological inputs). Forecasts can be submitted daily
Teams are asked to submit their forecast of NEE and LE, along with uncertainty estimates. Any existing NEE and LE may be used to build and improve the forecast models. Other data can be used as inputs to the forecasting model.
Data: Targets
The challenge uses the following TERN data products:
https://portal.tern.org.au/results?topicTerm=flux
A file with previously released TERN data that has been processed into “targets” is provided below. The same processing will be applied to new data that are used for forecast evaluation.
Net ecosystem exchange
Definition
Net ecosystem exchange (NEE) is the net movement of carbon dioxide from the atmosphere to the ecosystem. At the 30-minute time resolution it is reported as \(\mu\)mol CO2 m-2 s-1. At the daily time resolution, it is reported as g C m-2 day-1. Negative values correspond to an ecosystem absorbing CO2 from the atmosphere, positive values correspond to an ecosystem emitting CO2 to the atmosphere.
Motivation
NEE quantifies the net exchange of CO2 between the ecosystem and the atmosphere over a daily time period. Making daily predictions will allow us to rapidly assess skill and provide information in a timeframe pertinent to inform and implement natural resource management.
Latent heat flux
Definition
Latent heat flux is the movement of water as water vapor from the ecosystem to the atmosphere. It is reported as W m-2 (equivalent to J m-2 s-1). At the daily time resolution, it is reported as mean W m-2. Positive values correspond to a transfer of water vapor from the ecosystem to the atmosphere.
Motivation
Latent heat measures the water loss from an ecosystem to the atmosphere through evapotranspiration (transpiration through plants + evaporation from surfaces).
Forecasting latent heat (evapotranspiration) can provide insights into water stress for plants and the efficiency that plants use water relative to NEE, and to the amount of liquid water remaining in the soil for soil moisture forecasting
Focal sites
Information on the sites can be found here:
site | site_id | fluxnet_id | latitude | longitude | elevation | mat | map | world_ecoregion | data_url |
---|---|---|---|---|---|---|---|---|---|
Boyagin | ABOY | AU-Boy | -32.4771 | 116.9386 | 484 | NA | NA | NA | https://dap.ozflux.org.au/thredds/dodsC/ozflux/sites/Boyagin/L3/neon_efi/Boyagin_L3_EFI.nc |
Calperum | ACPR | AU-Cpr | -34.0027 | 140.5877 | 60 | 17.9 | 301 | Mediterranean woodlands | https://dap.ozflux.org.au/thredds/dodsC/ozflux/sites/Calperum/L3/neon_efi/Calperum_L3_EFI.nc |
Target data calculation
To create the data for evaluation (and training) for NEE and LE we extract NEE and LE that pass the turbulence quality control flags provided by TERN and has flux values between -50 and 50 umol CO2 m-2 s-1.
To evaluate daily flux forecasts, we select only days with at least 24 of 48 half hours that pass the quality control flags. For these days, we average the half-hours and convert carbon to daily units (gC/m2/day). The daily data table has the following columns.
datetime
: YYYY-MM-DD (the day is determined using UTC time)
site_id
: NEON site code (e.g., BART)
variable
:nee
(g C m-2 day-1) orle
(W m-2)observation
: value for the variable
Here is the download link and format of the terrestrial_daily
target file
::read_csv("https://data.ecoforecast.org/tern4cast-targets/terrestrial_daily/terrestrial_daily-targets.csv.gz", guess_max = 1e6) |>
readrna.omit()
# A tibble: 13,857 × 4
datetime site_id variable observation
<date> <chr> <chr> <dbl>
1 2010-07-30 ACPR le 48.8
2 2010-07-31 ACPR le 22.6
3 2010-07-31 ACPR nee -0.296
4 2010-08-01 ACPR le 69.6
5 2010-08-01 ACPR nee -1.27
6 2010-08-02 ACPR le 21.1
7 2010-08-02 ACPR nee -0.375
8 2010-08-03 ACPR le 15.7
9 2010-08-03 ACPR nee -0.563
10 2010-08-04 ACPR le 9.48
# ℹ 13,847 more rows
Timeline
Forecasts for a minimum of 30 days in the forecast can be submitted daily. New forecasts can be submitted daily as new weather forecasts and observations (e.g., NEE) become available. The key is that submissions are predictions of the future.
Daily submissions are allowed and encouraged as new observations and weather forecasts become available, therefore the automation of forecast generation may be ideal. There are many ways to automate scripts that are written to download observations and meteorology drivers, generate forecasts, and submit forecasts.
Flux data latency
There is approximately 2 days between data collection and availability in the targets file.
Submissions
The required names for forecasted variables: nee
or le
.
The required time unit: YYYY-MM-DD
The following provides the requirements for the format of the forecasts that will be submitted. It is important to follow these format guidelines in order for your submitted forecasts to pass a set of internal checks that allow the forecast to be visualized and scored correctly.
Steps to submitting
We provide an overview of the steps for submitting with the details below:
- Submit registration and model overview for the model using a web form.
- Generate forecast with required columns. There are three options for the file format described below.
- Write the forecast to a file that follows the standardized naming format.
- Submit forecast (preferably using the submission function that we provide)/
Step 1: Forecast file format
The file is a csv format with the following columns:
datetime
: forecast timestamp. Format%Y-%m-%d
.
reference_datetime
: The start of the forecast; this should be 0 times steps in the future. This should only be one value ofreference_datetime
in the file. Format is%Y-%m-%d %H:%M:%S
for terrestrial_30min theme and%Y-%m-%d
for all other themes.site_id
: NEON code for sitefamily
name of the probability distribution that is described by the parameter values in the parameter column; onlynormal
orensemble
are currently allowed.parameter
required to be the stringmu
orsigma
(see note below about parameter column) or the number of the ensemble member.variable
: standardized variable name for the themeprediction
: forecasted value for the parameter in the parameter columnmodel_id
: the short name of the model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.
The time unit (i.e., date or date-time) should correspond to the time unit of the theme-specific target file (see Theme description). All datetimes with time zones should use UTC as the time zone.
For a parametric forecast, the family
column uses the word normal
to designate a normal distribution and the parameter column must have values of mu
and sigma
for each forecasted variable, site_id, and time.
If you are submitting a forecast that does not have a normal distribution we recommend using the ensemble format and sampling from your non-normal distribution to generate a set of ensemble members that represents your distribution.
We aim to support more distributions beyond the normal in the distribution format file (e.g., log-normal distribution to support zero-bounded forecast submissions). Please check back at this site for updates on the list of supported distributions.
Here is an example of a forecast that uses a normal distribution:
::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-climatology.csv.gz", show_col_types = FALSE) readr
# A tibble: 68 × 8
datetime variable parameter family model_id reference_datetime site_id
<date> <chr> <chr> <chr> <chr> <date> <chr>
1 2022-11-07 temperature mu normal GLEON_phy… 2022-11-07 BARC
2 2022-11-07 temperature sigma normal GLEON_phy… 2022-11-07 BARC
3 2022-11-08 temperature mu normal GLEON_phy… 2022-11-07 BARC
4 2022-11-08 temperature sigma normal GLEON_phy… 2022-11-07 BARC
5 2022-11-09 temperature mu normal GLEON_phy… 2022-11-07 BARC
6 2022-11-09 temperature sigma normal GLEON_phy… 2022-11-07 BARC
7 2022-11-10 temperature mu normal GLEON_phy… 2022-11-07 BARC
8 2022-11-10 temperature sigma normal GLEON_phy… 2022-11-07 BARC
9 2022-11-11 temperature mu normal GLEON_phy… 2022-11-07 BARC
10 2022-11-11 temperature sigma normal GLEON_phy… 2022-11-07 BARC
# ℹ 58 more rows
# ℹ 1 more variable: prediction <dbl>
For an ensemble (or sample) forecast, the family
column uses the word ensemble
to design that it is a ensemble forecast and the parameter column is the ensemble member number (1
, 2
, 3
…)
::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-persistenceRW.csv.gz", show_col_types = FALSE) readr
# A tibble: 523,600 × 8
model_id datetime reference_datetime site_id family parameter variable
<chr> <date> <date> <chr> <chr> <dbl> <chr>
1 persistenceRW 2022-11-08 2022-11-07 BARC ensem… 1 chla
2 persistenceRW 2022-11-09 2022-11-07 BARC ensem… 1 chla
3 persistenceRW 2022-11-10 2022-11-07 BARC ensem… 1 chla
4 persistenceRW 2022-11-11 2022-11-07 BARC ensem… 1 chla
5 persistenceRW 2022-11-12 2022-11-07 BARC ensem… 1 chla
6 persistenceRW 2022-11-13 2022-11-07 BARC ensem… 1 chla
7 persistenceRW 2022-11-14 2022-11-07 BARC ensem… 1 chla
8 persistenceRW 2022-11-15 2022-11-07 BARC ensem… 1 chla
9 persistenceRW 2022-11-16 2022-11-07 BARC ensem… 1 chla
10 persistenceRW 2022-11-17 2022-11-07 BARC ensem… 1 chla
# ℹ 523,590 more rows
# ℹ 1 more variable: prediction <dbl>
Step 3: Submission process
Individual forecast files can be uploaded any time.
The correct file name and format is critical for the automated processing of submissions
Teams can submit their forecast csv files through an R function.
We have developed a function called submit()
that is available in this GitHub repo
library(aws.s3)
source("https://raw.githubusercontent.com/eco4cast/tern4cast/main/R/submit.R")
= "theme_name-forecast-year-month-day-model_id.csv"
forecast_file
submit(forecast_file = forecast_file,
metadata = NULL,
ask = FALSE)
Alternatively, if you using another programming language, you can submit using AWS S3-like tools (i.e., aws.s3
R package) to the tern4cast-submissions
bucket at the data.ecoforecast.org
endpoint.
Submissions need to adhere to the forecast format that is provided above, including the file naming convention. Our cyberinfrastructure automatically evaluates forecasts and relies on the expected formatting. Contact eco4cast.initiative@gmail.com if you experience technical issues with submitting.
Meteorological inputs for modeling
<- arrow::s3_bucket("tern4cast-drivers/noaa/gefs-v12/stage1", endpoint_override = "data.ecoforecast.org", anonymous = TRUE)
s3 <- arrow::open_dataset(s3) |>
weather ::filter(reference_datetime == "2023-05-25",
dplyr== "ABOY") |>
site_id ::collect() dplyr
Useful functions
We run a validator script when processing the submissions. If your submission does not meet the file standards above, you can run a function that provides information describing potential issues. The forecast file needs to be in your local working directory or you need to provide a full path to the file
source("https://raw.githubusercontent.com/eco4cast/tern4cast/main/R/forecast_output_validator.R")
forecast_output_validator("phenology-2022-09-01-persistenceRW.csv.gz")
Null models
Two null models are automatically generated each day - these are simple baseline models. The persistence null model uses the most recent measurement of nee or le and predicts that the values will be constant in the future. The climatology null model forecasts that the nee or lee will be equal to the historical mean of that day of the year. We apply both the persistence and climatology models to the daily fluxes.
Evaluation
Forecasts will be scored using the continuous ranked probability score (CRPS), a proper scoring rule for evaluating forecasts presented as distributions or ensembles (Gneiting & Raftery 2007). The CRPS compares the forecast probability distribution to that of the validation observation and assigns a score based on both the accuracy and precision of the forecast. We will use the ‘crps_sample’ function from the scoringRules
package in R to calculate the CRPS for each forecast.
Design team
Terrestrial Ecosystem Research Network (TERN) has been involved in the design of the challenge. ````