STEM (Spatio-Temporal Exploratory Model) is an environmental niche model that has been specifically developed for eBird data by statisticians and researchers at the Cornell Lab of Ornithology. The first iterations of this were shared on eBird in 2009, and the description of the process published by Fink et al. (2010), but these new 2017 versions have a number of improvements, including:
- Instead of modeling occurrence (e.g., frequency or likelihood of encountering species) we are now modeling relative abundance, making use of the species counts from your eBird checklists.
- The regional extent has been extended from the Lower 48 states to the entire Western Hemisphere, including both North America and South America.
- In order to expand beyond the borders of the U.S., we now use habitat data from NASA's MODIS satellite, which provides global habitat information. These models previously used the U.S. National Land Cover Dataset (NLCD).
- Model results are finer-scale than before, making predictions at 8km instead of 30km, which makes for more site-specific predictions and better resolution on the maps.
- New statistical techniques to automatically mitigate the effects of over-extrapolation at the edges of predicted ranges.
- Critically important for us, many improvements to the efficiency and processing workflow, which now allow each species model to be run much more easily than ever before.
Understanding STEM mapsThere are many sources of variability in eBird data, and STEM does its best to account for most of them. The effort information that we collect in eBird is one of the main sources of variability: how long did you spend birding and how much distance did you cover? Time of day is another major source of variability—think about how your chances of finding a calling owl, a singing thrush, and a soaring vulture might vary across the course of the day. All of these variables are standardized in the STEM output, and you can think of the predictions as representing the expected count for a one-hour bird walk at 07:00 covering one kilometer of distance.
Research is ongoing to better account for one of the other sources of variation: that of the individual birder. This paper highlights research on individual birders' detection rates and that research is ongoing. A recently-published paper takes us closer to being able to use this information in these models and shows that incorporating this information gives significant improvements to the predictive power of the models.
STEM map abundance predictions are displayed at 750,000 points to generate a given map for a specific date. Each point is the centroid of an 8-km square, and the predictions are based on an average of the habitat classes that occur in that square. For example, if a given 8-km square had 75% forest, 5% open water, and 20% urban habitat, then the predictions for Wood Thrush (a forest species), Common Loon (an open water species), and House Sparrow (an urban species) would all be adjusted to account for the prevalence of those habitats in the area. The color ramp goes from gray to orange to white, so areas that are hotter (more white or more orange) indicate higher predicted relative abundance. The scale is the same throughout the animation.
The current versions of STEM make predictions to ~750,000 map pixels across the western hemisphere, composing the maps of abundance for each species. Watch for certain habitats to "pop out", such as mountaintops (e.g., Adirondacks, Rocky Mountains, Andes), shorelines, river valleys (e.g., Mississippi River, Amazon River), prairies, and even individual cities. Brighter colors indicate the species is more abundant in those areas and colors closer to gray indicate that the species is found in low abundance in those areas
For the animations, we give separate predictions for each week of the year (e.g., 4 Jan, 11 Jan, 18 Jan, etc.) and in the animation the sequential maps reveal bird migration as species flow generally north in spring and south in fall in the Northern Hemisphere (and the reverse for Southern Hemisphere migrants). Please keep in mind that while some species (e.g., Blackpoll Warbler, Hudsonian Godwit) may migrate over the open ocean, the model won't reveal these patterns: even if we included observations from the open ocean, these species are so rarely detected in active migration over the open ocean that they would appear as "zeros". This absence of birds in active migration may appear in other species (such as certain thrushes) that overfly parts of the continents on their migration and only stop over very rarely.
In order to generate these predictions, STEM uses what we call "stixels", short for a spatio-temporal pixel. Each stixel includes eBird data from within its borders and overlaps with other stixels, and this overlap allows a balance between having enough data to make robust predictions and also being able to identify fine-scale patterns in bird occurrence. These stixels vary in size between North America and South America, since North America tends to have denser data. Each stixel also uses a temporal averaging process. By averaging data across several weeks, and then overlapping these predictions, the model can again strike a balance between having enough data and still being able to identify the very specific migration timing for species.
In order to generate these predictions, STEM uses what we call a "stixel", short for a spatio-temporal pixel. Each stixel includes eBird data from within its borders and overlaps with other stixels, and this overlap allows a balance between having enough data to make robust predictions and also being able to identify fine-scale associations between habitat and bird abundance. These stixels are smaller in North America than South America, since North America tends to have denser data. Within each stixel are several weeks of data and by averaging multiple overlapping stixels, the model strikes a balance between having enough data and still being able to identify the specific migration timing for species.
Fig. 1. This image shows stixels for North America and South America, showing how the randomized lat-long boxes (representing an individual stixel) vary in size between North and South America and partly overlap. This image shows a randomized set of stixels with three replications. For the 2017 STEM models, we use 100 randomized replicates.
An important piece to understand is the distinction between a prediction of 0.0 versus "no prediction". On the maps, this is displayed as a boundary between a light gray (0.0 predicted abundance) and a slightly darker gray (no prediction), and can easily be seen by pausing the Barn Swallow animation on the 4 January map: most of eastern Brazil, Guyana, French Guiana, and Suriname show "no prediction" whereas southern Canada and the United States predict that you would find 0.0 Barn Swallows on a 7am bird walk there on that date. We are confident that Barn Swallows are absent in most of the U.S. and Canada on those dates. But in those regions of South America, we would need more eBird data to confidently state that the species is absent, and as more data are accumulated from South America our ability to predict in those areas will improve.
Your eBird data, STEM, and ConservationeBird is designed to mobilize birder data to put those observations to use for science and conservation. Since STEM models need to account for effort (time of day, duration, and distance) we only use effort-based protocols (i.e., Stationary and Traveling Counts). To have a close tie between the species observations and the habitat on the ground, there are limits to the distance that is used in these models. Since we need to understand not only what species were observed, but what species were not detected, we only use complete checklists for these models (i.e., no Incidental observations). In eBird we ask you to honestly report whether the checklist is complete or not and also the effort you put in: shorter distances and shorter duration lists improve our ability to associate bird observations with specific areas and specific times of day. With adjustment to the statistical techniques used, we may be able to incorporate more data in the future, but for now these basic divisions still give enough data for these models and limit some of the extreme sources of variability (such as counts that cover huge distances or span an entire day). Please remember that eBird data are used for all kinds of scientific output, and data that are excluded from STEM models still may be critically important for other studies.
In addition to following best data submission practices, STEM really benefits from checklists from areas that are not well-covered. Try checking out eBird maps for a common species and plan a birding trip to submit some checklists from sites where there is not much data. The hotspot map is another good resource to find undercovered areas: refine to a season of interest and zoom in to see the fine scale (20 km grid). Areas with lower species tallies probably need your help and if you zoom in further until hotspots appear as points, you can target some hotspots that are gray, blue or green since those indicate very low species totals and probably are undercovered.
Every checklist you submit helps. Try doing an effort-based submission from your yard for 15 minutes or more at least once a week. Or if you really want a challenge, try to do a checklist a day from wherever you are. Even short checklists in spots that don't seem great for birding help us understand bird abundance in those habitats which helps strengthen STEM and other science.
Scientific output from STEM feeds directly into conservation. Results from these models have been used in three State of the Birds Reports. These reports help highlight specific conservation issues with birds and often guide federal policy to better address the needs of birds. The 2011 State of the Birds Report was the first to use eBird data. By combining STEM output for species with similar habitats (e.g., aridlands, western forests) this report was able to summarize distribution for a suite of species with similar habitat requirements. This information was then combined with a map identifying federal protected lands, revealing how important federal protected lands were for different species. This report led to followup reports for the U.S. Bureau of Land Management (important for aridland species) and U.S. Forest Service (important for Western forest species) and helped to guide policies that covered land holdings across the country. The 2013 State of the Birds Report provided the alternate view, using the same habitat suites to identify the stewardship responsibilities for private lands. For grassland and eastern forest birds, private lands are quite important. The 2016 State of the Birds Report was trinational for the first time, partnering with Canada and Mexico to look at bird abundance across international borders. STEM data averaged across breeding, winter, and migration seasons in novel ways helped to highlight specific regions (e.g., Yucatan Peninsula) that are especially important for bird conservation.
The ability of STEM of compensate for the biases and variability in eBird data make them useful for other scientific applications as well. Check out the eBird publications list to see some of the other papers using eBird data.
A full description of STEM has been published by Fink et al. (2010), although it is worth noting that the 2017 model has advanced significantly and the paper describing these improvements is still in preparation.