2  Submission Instructions

The following provides the requirements for the format of the forecasts that will be submitted. It is important to follow these format guidelines in order for your submitted forecasts to pass a set of internal checks that allow the forecast to be visualized on the NEON Ecological Forecast Challenge dashboard and evaluated with the scoring process.

2.1 Steps to submitting

We provide an overview of the steps for submitting with the details below:

  1. Submit registration and model overview for the model using a web form. 
  2. Generate forecast with required columns. There are three options for the file format described below.
  3. Write forecast to a file that follows the standardized naming format.
  4. Submit forecast (preferably using the submission function that we provide)/

2.2 Step 1: Forecast file format

There are two key options for the format. First, the file can be either in a csv or a netcdf format. Second the file can represent uncertainty using a family and parameter column.

For an ensemble (or sample) forecast, the family column uses the word ensemble to design that it is a ensemble forecast and the parameter column is the ensemble member number (1, 2, 3 …)

For a parametric forecast, the family column uses the word normal to designate a normal distribution and the parameter column must have values of mu and sigma for each forecasted variable, site_id, and time.

If you are submitting a forecast that does not have a normal distribution we recommend using the ensemble format and sampling from your non-normal distribution to generate a set of ensemble members that represents your distribution.

We aim to support more distributions beyond the normal in the distribution format file (e.g., log-normal distribution to support zero-bounded forecast submissions). Please check back at this site for updates on the list of supported distributions.

2.2.1 CSV

  • datetime: forecast timestamp. Format is %Y-%m-%d %H:%M:%S for terrestrial_30min theme and %Y-%m-%d for all other themes.
  • reference_datetime: The start of the forecast; this should be 0 times steps in the future. This should only be one value of reference_datetime in the file. Format is %Y-%m-%d %H:%M:%S for terrestrial_30min theme and %Y-%m-%d for all other themes.
  • site_id: NEON code for site
  • family name of probability distribution that is described by the parameter values in the parameter column; only normal or ensemble are currently allowed.
  • parameter required to be the string mu or sigma (see note below about parameter column) or the number of the ensemble member.
  • variable: standardized variable name for the theme
  • prediction: forecasted value for parameter in the parameter column
  • model_id: the short name of model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.

The time unit (i.e., date or date-time) should correspond to the time unit of the theme specific target file (see Theme description). All datetimes with time zones should use UTC as the time zone.

Here is an example of a forecast that uses a normal distribution:

readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-climatology.csv.gz", show_col_types = FALSE)
# A tibble: 68 × 8
   datetime   variable    parameter family model_id   reference_datetime site_id
   <date>     <chr>       <chr>     <chr>  <chr>      <date>             <chr>  
 1 2022-11-07 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 2 2022-11-07 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 3 2022-11-08 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 4 2022-11-08 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 5 2022-11-09 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 6 2022-11-09 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 7 2022-11-10 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
 8 2022-11-10 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
 9 2022-11-11 temperature mu        normal GLEON_phy… 2022-11-07         BARC   
10 2022-11-11 temperature sigma     normal GLEON_phy… 2022-11-07         BARC   
# ℹ 58 more rows
# ℹ 1 more variable: prediction <dbl>

Here is an example of a forecast that uses ensembles:

readr::read_csv("https://data.ecoforecast.org/neon4cast-forecasts/raw/aquatics/aquatics-2022-11-07-persistenceRW.csv.gz", show_col_types = FALSE)
# A tibble: 523,600 × 8
   model_id      datetime   reference_datetime site_id family parameter variable
   <chr>         <date>     <date>             <chr>   <chr>      <dbl> <chr>   
 1 persistenceRW 2022-11-08 2022-11-07         BARC    ensem…         1 chla    
 2 persistenceRW 2022-11-09 2022-11-07         BARC    ensem…         1 chla    
 3 persistenceRW 2022-11-10 2022-11-07         BARC    ensem…         1 chla    
 4 persistenceRW 2022-11-11 2022-11-07         BARC    ensem…         1 chla    
 5 persistenceRW 2022-11-12 2022-11-07         BARC    ensem…         1 chla    
 6 persistenceRW 2022-11-13 2022-11-07         BARC    ensem…         1 chla    
 7 persistenceRW 2022-11-14 2022-11-07         BARC    ensem…         1 chla    
 8 persistenceRW 2022-11-15 2022-11-07         BARC    ensem…         1 chla    
 9 persistenceRW 2022-11-16 2022-11-07         BARC    ensem…         1 chla    
10 persistenceRW 2022-11-17 2022-11-07         BARC    ensem…         1 chla    
# ℹ 523,590 more rows
# ℹ 1 more variable: prediction <dbl>

2.2.2 Netcdf

A netcdf should have the following variables

  • datetime: The start of the forecast
  • site_id: NEON code for site
  • parameter integer value for forecast replicate member (i.e. ensemble member or MCMC sample);

and include additional variables with names that correspond to the theme-specific standardized variables. For example, a netcdf from the aquatics theme would have temperature, oxygen, and chla variables.

The order of dimensions on forecast variables should be: time, site, and parameter. The netcdf format should only be used for ensemble-based forecasts.

The time unit (i.e., date or date-time) should correspond to the time unit of the theme specific target file (see Theme description)

A netcdf should have the following global attributes:

  • the reference_datetime variable, which is the start of the forecast (should be 0 times steps in the future).
  • the model_id, the short name of model defined as the model_id in the file name (see below) and in your metadata. The model_id should have no spaces.
netcdf terrestrial_30min-2022-10-07-climatology {
dimensions:
        datetime = 1681 ;
        site = 47 ;
        parameter = 200 ;
        nchar = 4 ;
variables:
        double datetime(datetime) ;
                datetime:units = "seconds since 2022-10-07" ;
                datetime:long_name = "datetime" ;
        int site(site) ;
                site:long_name = "NEON siteID" ;
        int parameter(parameter) ;
                parameter:long_name = "ensemble member" ;
        double nee(parameter, site, datetime) ;
                nee:units = "umol CO2 m-2 s-1" ;
                nee:_FillValue = 1.e+32 ;
                nee:long_name = "net ecosystem exchange of CO2" ;
        double le(parameter, site, datetime) ;
                le:units = "W m-2" ;
                le:_FillValue = 1.e+32 ;
                le:long_name = "latent heat flux" ;
        char site_id(site, nchar) ;
                site_id:long_name = "NEON site codes" ;

// global attributes:
                :reference_datetime = "2022-10-07 00:00:00" ;
                :model_id = "climatology" ;
}

2.3 Step 2: File name

the correct naming convention is critical for the automated processing of submissions

Teams will submit their forecasts as a single netCDF or csv file with the following naming convention:

theme_name-year-month-day-model_id.csv (or .nc if a netcdf). Compressed csv files with the csv.gz extension are accepted. R will automatically compress if you add the csv.gz as the extension of your file name when running write.csv or write_csv.

  1. The theme_name options are: terrestrial_daily, terrestrial_30min, aquatics, beetles, ticks, or phenology.

  2. The year, month, and day are the year, month, and day the reference_datetime (horizon = 0). For example, if a forecast starts today and tomorrow is the first forecasted day, horizon = 0 would be today, and used in the file name.

  3. model_id is the code for the model name that you specified in the model metadata Google Form (model_id has no spaces in it).

2.4 Step 3: Metadata format

For multi-model analysis, it is important to provide a description of your forecasting approach. There is a required f orm and an optional EML metadata

  1. Required: Complete the registration/model overview form that describes your model and provide your model_id. You need to fill out the form once per model_id. If you revise your modeling approach associated with a model_id then you will need to resubmit the form. A revised model does not change the project_id

  2. Optional: You can submit an xml file that provides metadata that follow the proposed EFI standards. If you chose to submit metadata will your forecast submission, the metadata file must have the same name as the submitted forecast, but with the .xml extension (theme_name-year-month-day-model_id.xml). Metadata files should be uploaded with the forecast files. The metadata standard has been designed by the Ecological Forecasting Initiative and is built off the widely used Ecological Metadata Language (EML). A guide for completing the metadata is in Appendix A.

2.5 Step 4: Submission process

Individual forecast (csv, csv.gz, netCDF) files can be uploaded any time.

The correct file name and format is critical for the automated processing of submissions

Teams will submit their forecast netCDF or csv files through an R function.

We have developed a function called submit() in the neon4cast package handles submission process.

neon4cast::submit(forecast_file = "theme_name-forecast-year-month-day-model_id.csv")

Alternatively, if you using another programming language, you can submit using AWS S3-like tools (i.e., aws.s3 R package) to the neon4cast-submissions bucket at the data.ecoforecast.org endpoint.

Submissions need to adhere to the forecast format that is provided above, including the file naming convention. Our cyberinfastructure automatically evaluates forecasts and relies on the expected formatting. Contact eco4cast.initiative@gmail.com if you experience technical issues with submitting.

Note: If you have used AWS in the past you might have credential files in an .aws folder in your home directory that will cause an error when you try to upload to a non-AWS bucket. If you encounter this error you may need to rename your credentials files so put_object doesn’t try to read them.

2.6 Validating submission

You can check the status of your submission using the following function in the neon4cast package

neon4cast::check_submission("phenology-2022-02-07-persistence.nc")

A successful submission can be found at the following links within 2 hours of submissions

We run a validator script when processing the submissions. If your submission does not meet the file standards above, you can run a function that provides information describing potential issues. The forecast file needs to be in your local working directory or you need to provide a full path to the file

neon4cast::forecast_output_validator("phenology-2022-09-01-persistenceRW.csv.gz")

2.7 Visualizing submissions

Plots of submissions and table of scores can be found at our dashboard

2.8 Video Describing How to Submit to the Challenge

This video was recorded for the 2021 Early Career Annual Meeting