Analisis Kualitas Wine Menggunakan Machine Learning dengan Pendekatan SMOTE dan Seleksi Fitur

Authors

  • Triandes Sinaga Universitas Pelita Harapan
  • Kevin Bastian Sirait Universitas Pelita Harapan
  • Jefri Junifer Pangaribuan Universitas Pelita Harapan
  • Okky Putra Barus Universitas Pelita Harapan
  • Romindo Romindo Universitas Pelita Harapan

DOI:

https://doi.org/10.55123/insologi.v4i3.5436

Keywords:

Machine Learning, Wine Quality, SMOTE, Feature Selection, Random Forest, Classification

Abstract

Conventional wine quality assessment remains reliant on subjective expert judgment, which introduces potential bias and inconsistency in quality control processes. This study aims to develop an objective and automated machine learning-based classification model to enhance the accuracy of wine quality prediction. To address the issue of class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied, along with ANOVA F-test-based feature selection to optimize model performance. The White Wine Quality dataset from the UCI Machine Learning Repository (4,898 samples, 11 numerical features) was utilized to evaluate five classification algorithms: Naïve Bayes, Decision Tree, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). Before SMOTE application, the Random Forest model achieved an accuracy of only 67.55%. After implementing SMOTE and parameter tuning, the Random Forest (Tuned) model demonstrated the best performance with 90.29% accuracy, 89.99% precision, 90.29% recall, and 89,97%.  % F1-score. Additionally, Decision Tree and KNN algorithms also exhibited notable improvements. SMOTE effectively balanced extreme minority class representations (quality levels 3 and 9). The most influential features in quality classification were alcohol content, density, and chlorides. These findings indicate that the proposed framework offers a reliable, objective, and scalable solution for automated wine quality control in industrial production environments.

Downloads

Download data is not yet available.

References

Aich Satyabrata, Abdulhakim Al-Absi, A., Lee Hui, K., & Sain, M. (2019). Prediction of Quality for Different Type of Wine based on Different Feature Sets Using Supervised Machine Learning Techniques. IEEE.

Akinwande, M. O., Dikko, H. G., & Samson, A. (2015). Variance Inflation Factor: As a Condition for the Inclusion of Suppressor Variable(s) in Regression Analysis. Open Journal of Statistics, 05(07), 754–767. https://doi.org/10.4236/ojs.2015.57075

Barus, O. P., Happy, J., Pangaribuan, J. J., & Nadjar, F. (2022, September). Liver disease prediction using support vector machine and logistic regression model with combination of PCA and SMOTE. In 2022 1st International Conference on Technology Innovation and Its Applications (ICTIIA) (pp. 1-6). IEEE. https://doi.org/10.1109/ICTIIA54654.2022.9935879

Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. In Journal of Artificial Intelligence Research (Vol. 16).

Conti, M. E., Rapa, M., Simone, C., Calabrese, M., Bosco, G., Canepari, S., & Astolfi, M. L. (2024). From land to glass: An integrated approach for quality and traceability assessment of top Italian wines. Food Control, 158. https://doi.org/10.1016/j.foodcont.2023.110226

Cortez, P., Cerdeira, A., Almeida, F., Matos, T., & Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547–553. https://doi.org/10.1016/j.dss.2009.05.016, (Diakses Mei 2025).

Croce, R., Malegori, C., Oliveri, P., Medici, I., Cavaglioni, A., & Rossi, C. (2020). Prediction of quality parameters in straw wine by means of FT-IR spectroscopy combined with multivariate data processing. Food Chemistry, 305. https://doi.org/10.1016/j.foodchem.2019.125512

Dahal, K. R., Dahal, J. N., Banjade, H., & Gaire, S. (2021). Prediction of Wine Quality Using Machine Learning Algorithms. Open Journal of Statistics, 11(02), 278–289. https://doi.org/10.4236/ojs.2021.112015

de Amorim, L. B. V., Cavalcanti, G. D. C., & Cruz, R. M. O. (2022). The choice of scaling technique matters for classification performance. https://doi.org/10.1016/j.asoc.2022.109924

Gupta, Y. (2018). Selection of important features and predicting wine quality using machine learning techniques. Procedia Computer Science, 125, 305–312. https://doi.org/10.1016/j.procs.2017.12.041

Khakim, E. N. R., Hermawan, A., & Avianto, D. (2023). IMPLEMENTASI CORRELATION MATRIX PADA KLASIFIKASI DATASET WINE. JIKO (Jurnal Informatika Dan Komputer), 7(1), 158. https://doi.org/10.26798/jiko.v7i1.771

Kumar, S., Agrawal, K., & Mandan, N. (2020, January 1). Red wine quality prediction using machine learning techniques. 2020 International Conference on Computer Communication and Informatics, ICCCI 2020. https://doi.org/10.1109/ICCCI48352.2020.9104095

Kurniasari, D., Nurul Hidayah, R., & Khoirun Nisa, R. (2024). CLASSIFICATION MODELS FOR ACADEMIC PERFORMANCE: A COMPARATIVE STUDY OF NAÏVE BAYES AND RANDOM FOREST ALGORITHMS IN ANALYZING UNIVERSITY OF LAMPUNG STUDENT GRADES. Jurnal Teknik Informatika (JUTIF), 5(5), 1267–1276. https://doi.org/10.52436/1.jutif.2024.5.5.2066

Kurniawan, P., & Widi Nugroho, H. (2024). Implementasi Data Mining Dalam Klasifikasi Tingkat Kesenjangan Kompetensi PNS Menggunakan Metode Naive Bayes. Technology and Science (BITS), 6(2). https://doi.org/10.47065/bits.v6i2.5641

Mor, N. S., Akiva Schools, B., Tigabo Asras, I., Gal, E., Demasia, T., Tarab, E., Ezekiel, N., Nikapros, O., Semimufar, O., Gladky, E., Karpenko, M., Sason, D., Maslov, D., & Mor, O. (2022). Wine Quality and Type Prediction from Physicochemical Properties Using Neural Networks for Machine Learning: A Free Software for Winemakers and Customers.

Olteanu, M., Rossi, F., & Yger, F. (2023). Meta-survey on outlier and anomaly detection. https://doi.org/10.1016/j.neucom.2023.126634

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Müller, A., Nothman, J., Louppe, G., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2012). Scikit-learn: Machine Learning in Python. http://arxiv.org/abs/1201.0490

Raschka, S. (2018). Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. http://arxiv.org/abs/1811.12808

Romindo, R., Barus, O. P., Pangaribuan, J. J., Pratama, Y. A., & Wiliem, E. (2022). Implementasi Algoritma Support Vector Machine Terhadap Klasifikasi Pose Balet. Building of Informatics, Technology and Science (BITS), 4(3). https://doi.org/10.47065/bits.v4i3.2647

Sinaga, T., Candra, A., & Purnama, B. (2024). 2024 2nd International Conference On Technology Innovation And Its Applications. IEEE. https://doi.org/https://doi.org/10.1109/ICTIIA61827.2024.10761639

Vu, D. H., Muttaqi, K. M., & Agalgaonkar, A. P. (2015). A variance inflation factor and backward elimination based robust regression model for forecasting monthly electricity demand using climatic variables. Applied Energy, 140, 385–394. https://doi.org/10.1016/j.apenergy.2014.12.011

Wayan, N., Praditya, P. Y., Kunci, K., & Komputer, J. S. (2023). Prediksi Kualitas Red Wine dan White Wine Menggunakan Data Mining. JOURNAL SHIFT VOL, 3.

Zahedi, L., Mohammadi, F. G., Rezapour, S., Ohland, M. W., & Amini, M. H. (2021). Search Algorithms for Automated Hyper-Parameter Tuning. http://arxiv.org/abs/2104.14677

Downloads

Published

2025-06-15

How to Cite

Triandes Sinaga, Kevin Bastian Sirait, Pangaribuan, J. J. ., Barus, O. P. ., & Romindo, R. (2025). Analisis Kualitas Wine Menggunakan Machine Learning dengan Pendekatan SMOTE dan Seleksi Fitur. INSOLOGI: Jurnal Sains Dan Teknologi, 4(3), 656–668. https://doi.org/10.55123/insologi.v4i3.5436

Issue

Section

Articles