Prediction of wine sensorial quality: a classification problem

Prediction of wine sensorial quality: a classification problem Maurizio Carpita University of Brescia, Italy Silvia Golia University of Brescia, Italy This is a section of ASA 2021 Statistics and Information Systems for Policy Evaluation (DOI: 10.36253/978-88-5518-461-8) by Alessandra Petrucci, Bruno Bertaccini, Luigi Fabbris Firenze University Press Firenze 2021 https://doi.org/10.36253/978-88-5518-461-8.44

Available for academic research purposes

Open Access

Copyright Author(s)

Content licence CC BY 4.0

Metadata licence CC0 1.0

This is original content, published for academic research purposes

Digital edition XML powered by Booksflow

When dealing with a wine, it is of interest to be able to predict its quality based on chemical and/or sensory variables. There is no agreement on what wine quality means, or how it should be assessed and it is often viewed in intrinsic (physicochemical, sensory) or extrinsic (price, prestige, context) terms (Jackson, 2017). In this paper, the wine quality was evaluated by experienced judges who scored the wine on the base of a 0-10 scale, with 0 meaning very bad and 10 excellent, so, the resulting variable was categorical. The models applied to predict this variable provide the prediction of the occurrence probabilities of each of its categories. Nevertheless, jointly with this probabilities’ record, the practitioners need the predicted value (category) of the variable, so the statistical problem to be covered refers to the way in which this probabilities’ record is transformed into a single value. In this paper we compare the predictive performances of the default method (Bayes Classifier - BC), which assigns a unit to the most likely category, and other two methods (Maximum Difference Classifier and Maximum Ratio Classifier). The BC is the optimal criterion if one is interested in the accuracy of the classification, but, given that it favors the prevalent category most, when there is not a category of interest, it cannot be the best choice. The data under study concern the quality of the red variant of the Portuguese "Vinho Verde" wine (Cortez et al., 2009), measured on a 0-10 scale. Nevertheless, only 6 scores were used, with 2 scores with a very few number of observations, so this is the right context for predictive performance comparisons. In the study, we investigated different merging of categories and we used 11 explanatory variables to estimate the probabilities’ record of the wine quality variable.

wine quality categorical classifier Bayes classifier

It is available online at https://doi.org/10.36253/978-88-5518-461-8.44

References Agresti, A. (2010). Analysis of Ordinal Categorical Data, 2nd ed. John Wiley & Sons, Hoboken, New Jersey. Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47, pp. 547–5.33. Cramer, J.S. (1999). Predictive performance of the binary logit model in unbalanced samples. The Statistician, 48(1), pp. 85–94. Golia, S., Brentari, E., Carpita, M. (2017). Causal reasoning applied to sensory analysis: The case of the Italian wine. Food Quality and Preference, 59, pp. 97–108. Golia, S., Carpita, M. (2018). On classifiers to predict soccer match results, in ASMOD 2018: Proceedings of the International Conference on Advances in Statistical Modelling of Ordinal Data, eds. S. Capecchi, F. Di Iorio and R. Simone, FedOAPress, pp. 125–132. Golia, S., Carpita, M. (2020). Comparing classifiers for ordinal variables, in Book of short papers SIS 2020, eds. A. Pollice, N. Salvati and F. Schirripa Spagnolo, Pearson, pp. 1160–1165. Jackson R.S. (2017). Wine Tasting, 3rd ed. Academic Press. James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). An introduction to statistical learning with applications in R. Springer, New York. Raschka, S., Mirjalili, V. (2019). Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing, Birmingham.