3 Theme: Tick Populations
What: Amblyomma americanum and Ixodes scapularis nymphal tick abundance per sampled area
Where: 22 plots at 7 NEON sites
When: Weekly forecasts for 34 weeks into the future starting March 31-October 31, 2021 with training data available January 31, 2021. Forecasts are submitted monthly and later submissions after the March 31 start are permissible.
Why: There is a correlation between tick population abundance and disease incidence, meaning forecasts for tick abundance have the potential to aid in our understanding of disease risk through time and space.
Who: Open to any individual or team that registers
How: REGISTER your team and submit forecast
We held a Q&A session on March 24, 2021. You can find a recording from that session HERE.
Target species for the population forecasts are Amblyomma americanum and Ixodes scapularis nymphal ticks. A. americanum is a vector of ehrlichiosis, tularemia, and southern tick-associated rash illness, while I. scapularis is a vector for Lyme disease, the most prevalent tick-borne disease in North America. Both species are present in the eastern United States, and have been collected at numerous NEON sites. There is a correlation between tick population abundance and disease incidence, meaning forecasts for tick abundance have the potential to aid in our understanding of disease risk through time and space.
The challenge is open to any individual, group, or institution that may want to participate. The goals of this challenge are to forecast total Ixodes scapularis and Amblyomma americanum nymphs each epidemiological week (Sun-Sat) per sampled area at a set of NEON plots within NEON sites. Due to challenges in data collected in 2020, this round of the forecasting challenge will simulate a true forecasting challenge by focusing on data from the 2019 field season.
Teams must post information about any additional data they wish to use on the theme Slack channel so that other teams can potentially use the data as well.
3.3 Data: Targets
The challenge uses the following NEON data products:
DP1.10093.001: Ticks sampled using drag cloths
Total Ixodes scapularis will be forecasting for the following plots (siteID_plotID):
BLAN_012, BLAN_005, SCBI_013, SCBI_002, SERC_001, SERC_005, SERC_006, SERC_012, ORNL_007
Total Amblyomma americanum will be forecasting for the following plots (siteID_plotID):
SCBI_013, SERC_001, SERC_005, SERC_006, SERC_002, SERC_012, KONZ_025, UKFS_001, UKFS_004, UKFS_003, ORNL_002, ORNL_040, ORNL_008, ORNL_007, ORNL_009, ORNL_003, TALL_001, TALL_008, TALL_002
A file with previously released NEON data that has been processed into “targets” is provided below. The same processing will be applied to new data that are used for forecast evaluation. This processing script is available in the neon4cast-ticks GitHub repository.
3.3.1 Amblyomma americanum nymphs
Total Amblyomma americanum nymphs per week per plot. Determined by the number of individuals caught and identified to species each epidemiological week at each plot. Each tick caught is identified to the lowest taxonomic level possible, and we are only interested in nymphal ticks identified to species (instead of only being identified to Family, Order, Genus etc.)
This species is a vector of disease, so forecasting tick abundance can potentially aid in assessing disease risk.
3.3.2 Ixodes scapularis nymphs
Total Ixodes scapularis nymphs per week per plot. Determined by the number of individuals caught and identified to species each epidemiological week at each plot. Each tick caught is identified to the lowest taxonomic level possible, and we are only interested in nymphal ticks identified to species (instead of only being identified to Family, Order, Genus etc.)
This species is a vector of disease, so forecasting tick abundance can potentially aid in assessing disease risk.
3.3.3 Focal sites
|Site Name||SiteID||NEON Domain||Latitude||Longitude||Ixodes scapularis Plots||Amblyomma americanum Plots|
|Blandy Experimental Farm, VA||BLAN||D02: Mid-Atlantic||39.06026||-78.07164||BLAN_012, BLAN_005|
|Smithsonian Conservation Biology Institute, VA||SCBI||D02: Mid-Atlantic||38.89292||-78.1395||SCBI_013, SCBI_002||SCBI_013|
|Smithsonian Environmental Research Center, MD||SRER||D02: Mid-Atlantic||38.89008||-76.56001||SERC_001, SERC_005, SERC_006, SERC_012||SERC_001, SERC_002, SERC_005, SERC_006 SERC_012|
|Oak Ridge, TN||ORNL||D07: Appalachians & Cumberland Plateau||35.96412||-84.2826||ORNL_007||ORNL_002, ORNL_003, ORNL_007, ORNL_008, ORNL_009, ORNL_040|
|Konza Prairie Biological Station, KS||KONZ||D06: Prairie Peninsula||39.10077||-96.56309||KONZ_025|
|The University of Kansas Field Station, KS||UKFS||D06: Prairie Peninsula||39.04043||-95.19215||UKFS_001, UKFS_003, UKFS_004|
|Talladega National Forest, AL||TALL||DO8: Ozarks Complex||32.95046||-87.39327||TALL_001, TALL_002, TALL_008|
3.3.4 Target data calculation
The data used for this challenge is a subset of the full NEON tick data set. While ticks of multiple species have been identified at most NEON sites, not all species-by-site combinations have enough non-zero observations to build adequate population models. Therefore, the targets for this challenge are A. americanum and I. scapularis nymphs, which represent the two most abundant species observed at NEON. Additionally, the plots that forecasts will be made are plots where these ticks have been identified at least three times each year from 2016 to 2018. The latency for taxonomic identifications of the NEON tick field data is roughly one year (meaning forecast for 2021 won’t be validated until 2022), and the 2020 field season was irregular due to the COVID-19 pandemic. Therefore, the target year of 2019 was chosen so that forecasts can be evaluated in a timely manner for a regular field season.
Use of 2019 data: The forecasting challenge is for the 2019 field season, thus tick observations and environmental covariates are known. However, in the spirit of keeping this as much of a “forecasting” challenge as possible, 2019 data (tick and environmental covariates) can only be used in the timeline described below in the timeline section. For example, if a forecast is submitted on May 31st, and a team is using temperature as a covariate in their model, it is up to the team to forecast temperature from May 31st through the end of the season. This policy is in place because if forecasts use the observed temperature from May 31 through the end of the season in their forecast, these forecasts will be overconfident.
3.3.5 Target file
Here is the format of the target file
::read_csv("https://data.ecoforecast.org/targets/ticks/ticks-targets.csv.gz", guess_max = 1e6)readr
## # A tibble: 3,024 x 21 ## Year epiWeek yearWeek plotID siteID nlcdClass decimalLatitude ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> ## 1 2015 37 201537 BLAN_005 BLAN deciduousForest 39.1 ## 2 2015 38 201538 BLAN_005 BLAN deciduousForest 39.1 ## 3 2015 39 201539 BLAN_005 BLAN deciduousForest 39.1 ## 4 2015 40 201540 BLAN_005 BLAN deciduousForest 39.1 ## 5 2015 41 201541 BLAN_005 BLAN deciduousForest 39.1 ## 6 2015 42 201542 BLAN_005 BLAN deciduousForest 39.1 ## 7 2015 43 201543 BLAN_005 BLAN deciduousForest 39.1 ## 8 2015 44 201544 BLAN_005 BLAN deciduousForest 39.1 ## 9 2015 45 201545 BLAN_005 BLAN deciduousForest 39.1 ## 10 2015 46 201546 BLAN_005 BLAN deciduousForest 39.1 ## # … with 3,014 more rows, and 14 more variables: decimalLongitude <dbl>, ## # elevation <dbl>, totalSampledArea <dbl>, amblyomma_americanum <dbl>, ## # ixodes_scapularis <dbl>, time <date>, RHMin_precent <dbl>, ## # RHMin_variance <dbl>, RHMax_precent <dbl>, RHMax_variance <dbl>, ## # airTempMin_degC <dbl>, airTempMin_variance <dbl>, airTempMax_degC <dbl>, ## # airTempMax_variance <dbl>
Year: Year of observation
epiWeek: The ISO week that starts on Sunday, consistent with CDC version of the epidemiological week (integer, WW)
yearWeek: (YYYYWW) Year week, combination of year and epidemiological week _
time: (YYYY-MM-DD), the first day of the epidemiological week, as defined by
ixodes_scapularis: Count. If no observation in the associated yearWeek: NA
amblyomma_americanum: Count. If no observation in the associated yearWeek: NA
plotID: Plot where ticks are observed (HARV_002)
siteID: Site where ticks are observed (HARV)
nlcdClass: Land cover classification (mixedForest)
decimalLatitude: Latitude of the site
decimalLongitude: Longitude of the site
Elevation: Elevation of the plot (meters)
totalSampledArea: Area sampled by drag cloth (sq. m). If there is not a sampling event in given week: NA
RHMin_percent: The minimum relative humidity percent recorded in the associated yearWeek. If no observation: NA
RHMin_variance: Variance (percent squared) of the minimum relative humidity recorded in the associated yearWeek. If no observation: NA
RHMax_percent: The maximum relative humidity percent recorded in the associated yearWeek. If no observation: NA
RHMax_variance: Variance (percent squared) of the maximum relative humidity recorded in the associated yearWeek. If no observation: NA
airTempMin_degC: The minimum air temperature, in degrees celsius, recorded in the associated yearWeek. If no observation: NA
airTempMin_variance: Variance (degrees celsius squared) of the minimum air temperature recorded in the associated yearWeek. If no observation: NA
airTempMax_degC: The maximum air temperature, in degrees Celsius, recorded in the associated yearWeek. If no observation: NA
airTempMax_variance: Variance (degrees celsius squared) of the maximum air temperature recorded in the associated yearWeek. If no observation: NA
Environmental data (weekly relative humidity and air temperature) in the challenge data set are provided as a starting point for teams that may not want to look for other environmental data. The challenge design team does not recommend one of these variables over another (from NEON or otherwise) or guarantee that their use will improve forecast accuracy. Furthermore, the environmental data provided is only available for the core terrestrial sites (KONZ, ORNL, SCBI, TALL), and is from NEON’s Summary weather statistics data product DP4.00001.001.
The timeline for this challenge will be monthly, which is how often new data will be released by the EFI RCN.
The final data set containing the training data will be available no later than January 31st, 2021. The challenge will begin (first forecast submission) on March 31st, 2021 at 11:59 PM Eastern Standard Time, and will run through October 31st, 2021 (last forecast submission).
2019 data will be released on the first of the month following a submission deadline, which gives teams a month to assimilate new data. For example, the forecasts submitted on March 31st, 2021 will be for every epidemiological week starting at the beginning of March 2019 through the end of November 2019. Then, on April 1st, 2021, tick counts from March 2019 will be released. The next forecast submission is April 30th, 2021, which will be for every epidemiological week starting at the beginning of April 2019 through the end of November 2019. The table below shows which epidemiological weeks are to be forecasted for each submission date.
|2021 Forecast Submission date||2019 Target Epidemiological weeks|
Evaluation will occur shortly after each forecast submission.
3.5 Design team
John Foster, Boston University
Matt Bitters, University of Colorado, Boulder
Melissa Chen, University of Colorado, Boulder
Leah Johnson, Virginia Tech
Shannon LaDeau, Cary Institute of Ecosystem Studies
Cat Lippi, University of Florida
Brett Melbourne, University of Colorado, Boulder
Wynne Moss, University of Colorado, Boulder
Sadie Ryan, University of Florida
Data used in the challenge are collected by the National Ecological Observatory Network (NEON; https://www.neonscience.org/).