Frequently Asked Questions
How long have you been forecasting crop yields?
CropProphet was developed in 2008/2009 and was first used in a beta test to predict U.S. corn and soybean yields in 2009. We consider 2010 as the first year of “operational” CropProphet forecasts but have improved the model each year since then.
What is being forecasted?
We predict the final USDA yield and production numbers released early in each new year after the crop is grown and harvested. These “final” numbers are sometimes revised slightly in future years, and we update our historical data base accordingly. We consider the USDA as the “ground truth”, because the long historical record allows for statistical modeling.
Are you forecasting the USDA monthly updated yield estimates?
No. See above.
How much history is used to train your models?
At the county level, we use weather and crop data since 1981. For the state and national level, we add additional predictors such as the USDA crop condition reports that are available starting in 1986.
Is satellite data used in your predictions?
No, we no longer use satellite NDVI as a direct input to our model to predict yields and production. Satellite data is a lagging indicator of the impact weather has on crops. We do, however, provide a visualization of NDVI during the crop season.
What data is used as inputs to the model?
The inputs are:
– Historical (USDA) end-of-season yield, production, and acreage data (county, state, national).
– Daily historical and recent gridded weather analysis data, derived from both weather observations and National Weather Service weather models. We convert the gridded values into daily area-average values for each county.
– Weekly state-level USDA crop progress and crop condition data.
– USDA state-level acreage estimates, obtained from the prospective plantings report (prior to June 30) and then from the acreage report (June 30 and later).
– For the weather outlook component of CropProphet, we use gridded ensemble weather model forecast data from the National Weather Service and the European Centre for Medium-Range Weather Forecasts (ECMWF). The gridded values are converted into daily area-average values for each county.
How does the data in Modeler relate to the daily, updated forecasts?
During the crop season all forecasts are made available at about 8 AM Eastern Time in the United States. The Point-in-Time data were all produced to align with this daily release schedule. The cross validated and rolling forecast data made available in Modeler are designed to emulate this release schedule such that customers can assume a consistency in time when creating models using the Modeler data but trading based on the daily, in-season crop forecasts.
How do you handle the US counties that don’t regularly report yield and production?
We create models and forecasts for all counties with no more than 10 years of data missing from the past 40 years. However, the aggregation of the county forecasts to the state and national forecasts takes account of missing counties by appropriately weighting the acreage.
How do you handle the technology trend?
We use a forward-looking approach to generate a linear technology trend for each crop and each geography (county, state, national). Specifically, we combine linear trend estimates based on the past 20, 30, and 40 years (when available) to obtain a single best estimate of the trend. We also use a median regression approach, rather than least squares regression, so that the trend estimates are not overly affected by outlier years.
How do you combine the county forecasts to the US national yield and production forecasts?
The state and national level forecasts are generated using acreage weighted aggregation of all the county level forecasts. However, not all counties have CropProphet forecasts, and so the acreage weights are computed after excluding the acreage contribution of the missing counties to the state and national totals. We also perform a separate aggregation on the county technology trends and compare the results to the state and national technology trends, to ensure that the aggregated forecasts show consistent departures from trend.
Where does your acreage data come from?
We use the USDA state-level acreage estimates provided in the March prospective plantings report and the June 30 acreage report. To obtain county estimates, we downscale the state values by using the historical ratio of county to state acres in the past 5 years.
How do you account for different soil types?
The impacts of soil type are implicitly captured in the county-level regression models, which are developed for each county separately. Also, as part of the modeling process, we use a different technology trend for each county, so the local baseline for the CropProphet forecasts is tuned to each county’s historical yield normal.
When does your model make predictions during the crop year?
Forecasts for winter wheat are updated daily from early April through mid-June, and the corn and soybean forecasts are updated daily from early May through early November.
Does your model take planting delays into account?
During the 2020 model update process, USDA state level planting delays were integrated into CropProphet. In this case, a combination of sophisticated used of satellite NDVI data was used to estimate planting dates over the 1986 to 2019 history were integrated into the model and used as a predictor.
How do you account for abandoned acres?
The effect of acreage abandonment is included implicitly in the county models. This is easy to explain for the yield models, which predict yield per harvested acre. If acres are abandoned, the yield does not change except insofar as the worst acreage may be abandoned preferentially, leading to a slight upward shift in yield; but this effect will be captured in the historical yield regression modeling.
Acreage abandonment is captured in the production forecasts by using regression models that predict production per planted acre rather than production per harvested acre. The resulting forecasts are then scaled by estimated planted acres to obtain production. This avoids the need to make an explicit estimate of harvested acres, which can change dramatically through the season depending on abandonment. Instead, the effects of abandonment are captured within the production per planted acre regression; for example, when poor weather favors high abandonment, the production per planted acre will decrease more than the production per harvested acre, and the production forecasts will reflect this change.
Can you add international forecasts?
In principle, yes, and we plan to add yield and production forecasts for Brazil and Argentina in time for the southern hemisphere summer of 2020/2021. We have excellent weather and satellite data for international yield modeling, but it is challenging to obtain long-term local yield data for building our statistical models. We expect to bridge this gap in data resources by applying CropProphet to downscale existing yield data, with additional constraints based on satellite crop analysis.
Can a forecast of planted acres be “backed out” of CropProphet by dividing Production by Yield for a county, state, or nationally?
CropProphet uses USDA prospective planting acreage as the initial acreage estimate each season. That data is not updated again until the late June acreage report. CropProphet is not forecasting planted acreage.
It is important to recognize that CropProphet models production and yield independently, so it may not be valid to back out acreage from the two numbers. Independent modeling is necessary because production is modeled using yield per planted acre, which implicitly accounts for field abandonment. The yield forecasts are yield per harvested acre, the same yield definition that USDA uses. So not planting or abandoning a field does not affect yield, but it obviously impacts production.
Do you forecast planted acreage?
CropProphet does not forecast planted acreage for each crop. The CropProphet models have been designed to be systematic and fully objective. Inputs into the system are not manually adjusted during the season. The primary reason for this design is that each year we generate and provide a long-term history of model forecasts and performance statistics in order to demonstrate the skill and value of the CP forecasts. Consequently, for our production forecasts, we have always used the USDA acreage estimates, and the cross validated verification statistics show that the production model skill is exceptional.