Arndt-Kohlway_assignment5

docx

School

University of Maryland Global Campus (UMGC) *

*We aren’t endorsed by this school

Course

630

Subject

Electrical Engineering

Date

Apr 3, 2024

Type

docx

Pages

Uploaded by Yoloswaggins12

Clustering K-Means Clustering with Wine Nicholas Arndt-Kohlway DATA630: Machine Learning (2215) Professor Ami Gates 1

Clustering Introduction K-Means Clustering will be utilized in this assignment to determine the factors that lead to a higher and lower quality wines. The insights from this analysis will determine common characteristics among red wines and how those common characteristics lead to differing qualities among red wines. For example, which common characteristics impact the end quality of a wine? How can manufacturers ensure a higher quality of wine from the inputs used? And crucially, which instances of red wines are statistical anomalies that might impact this analysis? From this analysis it is necessary to identify any statistical anomalies/outliers that impact the overall result of the clustering technique. For background, red wine is an alcoholic beverage that is produced through the fermentation process of dark grape juice. The difference between red wine and white wine is the type of grape used; red wine uses a dark-skinned grape whereas white wine uses a light-skinned grape. Pressed grape juice is infused and fermented with the dark grape skins to add color, flavor, and tannin to the wine. The alcohol is produced when yeast is introduced to convert the sugars in the grape to ethanol and carbon dioxide. There are four characteristics to wine: color, tannin, flavor, and acid. The colors in red wine vary from a deep purple to a light pink which is dependent on the grapes and the age of the wine. Tannins are formed form the skins, seeds, and even the stems of the grapes. Tannins add texture, structure, and age ability to the wine. These tannins determine the dryness of the wine and soften over time which makes red wine best consumed after only a few years of aging. “Different grape varieties produce aromas of fruits, flowers, herbs, spices, and earthy characteristics. For example, Pinot Noir tends to have raspberry, cherry, and forest floor notes while Cabernet Sauvignon generally boasts notes of cassis, licorice, and wet gravel”. Acid provides freshness and structure by acting as a 2

Clustering preservative. The acidity produces tart and sour notes to balance the sweet and bitter tannin components. A K-Means Cluster is the chosen method for this analysis since it can provide a clustering of characteristics to determine which types of characteristics impact the quality of the red wine significantly. This also helps determine outliers which may have a negative impact on the significance of the K-Means Cluster. These outliers will also determine which red wines are excellent or poor. From this, winemakers can identify which ingredients to use to create a better- quality wine and increase profitability. Analysis This dataset was provided by the UC Irvine Machine Learning Repository through Paul Cortez at the University of Minho in Guimarães, Portugal. The dataset is related to red wines of the Portuguese “Vinho Verde” wine. The dataset includes the variables as follows: fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol, and quality. Quality is the dependent variable in the dataset while the rest are independent. Fixed acidity had a minimum value of 4.6 and a maximum value of 15.9 grams per milliliters (g/mL). Volatile acidity had a minimum value of 0.12 and a maximum value of 1.58 grams per milliliter (g/mL). Citric acid had a minimum value of 0 and a maximum value of 1 gram per milliliter (g/mL). Residual sugar had a minimum value of 0.9 and a maximum value of 15.5 grams per liter (g/L). Chlorides had a minimum value of 0.012 and a maximum value of 0.611 grams per milliliter (g/mL). Free sulfur dioxide had a minimum value of 1 and maximum value of 72 milligrams per liter (mg/L). Total sulfur dioxide had a minimum value of 6 and maximum value of 289 milligrams per liter (mg/L). Density had a minimum value of 0.9901 and a maximum value of 1.0037 grams per milliliter (g/mL). pH had a minimum value 3

Your preview ends here