Olive oil price prediction with machine learning algorithms

The main objective of this study is to develop a machine learning model to accurately predict the price at origin of Extra Virgin Olive Oil (EVOO) in Jaén (South Spain), addressing the growing importance of price prediction in the food sector. The methodology used integrates an exhaustive set of predictor variables obtained from official sources between 2011 and 2024. These variables range from historical base price, seasonal factors and production costs, to climatic conditions, economic indicators, global oil production, and early prediction of olive crop production. Several regression algorithms are applied and compared, including Linear Regression, SVM, Neural Networks, Random Forest, Gradient Boosting and K-Nearest Neighbors. The results highlight that ensemble models, particularly Gradient Boosting and Random Forest, show superior predictive ability and are more effective in capturing the complex non-linear relationships and market inflections that affect the price of EVOO. Crossvalidation confirms that the use of the multivariate ensemble consistently improves accuracy over using the base price alone, validating the relevance of the selected attributes and their ability to reflect market dynamics. A pattern of price behaviour is detected that is significantly influenced by weather conditions, crop production, energy costs, speculation and global markets, factors that the ensemble models manage to model effectively.