Modelling agricultural data for harvest date prediction is one of many projects showcasing Matogen Applied Insights’ (MAI) analytics capabilities within the agricultural sector.
“Precision farming”, also known as “smart farming”, is an integral part of the Third Agricultural Revolution and pertains to the use of digital technologies as a crop management tool to monitor and optimise agricultural production. Vast amounts of farm data are collected via a plethora of devices, for example sensors, satellites and probes, as well as data from leaf, soil and fruit sampling. The data is combined, analysed and subjected to machine learning methodologies in order to derive insights that aim to guide decision-making — the ultimate objective being to optimise food production efficiency through increased yields at lower costs.
Matogen Applied Insights (MAI) concluded a project for a major client in the South African agriculture sector which entailed using several sources of data to predict harvest dates where fruit would be at its “peak vegetative stage”. Significant inputs into this process were readings for Normalised Difference Vegetation Index (NDVI), as well as Enhanced Vegetation Index (EVI) data.
The Normalised Difference Vegetation Index (NDVI) is calculated based on data obtained from satellite imagery. It measures the relationship between the amount of near-infrared light reflected by plants and the amount of reflected visible light. Healthy plants reflect more near-infrared light and absorb more red light in comparison to other wavelengths.
The Enhanced Vegetation Index (EVI) is also derived from satellite image calculations based on how plants reflect light, as well as aerosol resistance. As such, EVI conveys more information about canopy structure, whereas NDVI indicates the presence of chlorophyll. These NDVI and EVI indices are complementary in assessing vegetation based on image data from space.
Vegetation Indices are an important indicator of “peak vegetative stage” and therefore very useful in the prediction of optimal dates for harvesting to maximise crop yield.
Data wrangling, feature engineering and modelling
The client provided MAI with data detailing both the NDVI and EVI for more than a dozen grape cultivars. In addition, almost a decade’s worth of growth stage data was supplied for a selection of farm blocks dissimilar in age and in different locations. Specifically, the time series data for the growth stages for the “bud”, “bloom”, “veraison” (when grapes turn from yellow to green or red) and “harvest”, as well as the “initial planting date”, was included.
The different types of data were combined and new variables created in order to produce time series plots to identify trends pertaining to the crop growth stages and examine differences in respect to farm block age and location.
After exploratory data analysis, correlations between the variables were examined to determine which relationships were most significant with regards to the outcome variable “days_until_harvest”. Subsequently, a variety of techniques were applied, including Multiple Linear Regression, XGBoost Regressor and Random Forest Regressor. Multiple Linear Regression is an extension of Simple Linear Regression and is a technique used to predict the outcome variable, using two or more other variables. XGBoost Regression and Random Forest Regression have the same objective but use a Decision Tree methodology.
This process produced a model that achieved an R2 score of 98.1% accuracy on test set data when attempting to predict optimal harvesting dates for maximum crop yield. It means that given new, unseen (but known) harvest date data, the model was able to predict “days_until_harvest” impressively accurately.
Harvest date prediction use case
The insights gained from applying machine learning techniques to a combination of disparate types of agricultural data enable improved planning and management of the planting and harvesting of crops. Better harvest scheduling would result in lower crop losses, as well as optimise the application of crop protection products, i.e at the most impactful phase, whilst limiting residues at the point of export or retail. As more data is obtained, algorithms can be fine-tuned and improved to such an extent that future labour costs and time can be decreased, ultimately increasing efficiency in farming.