Final_Project_milestone1_CharanThota

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6010

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

Uploaded by KidDragonfly3195

Wine Tasting Report Introduction Obtain a wine tasting data set to perform hypothesis analysis, visualization, inferential statistical techniques, and basic descriptive statistics. To start the any analysis, we must grasp the provided dataset. Wine tasting is a data set which contains the investigation and review of wine, based on its sensory qualities. While the process of wine tasting is as old as the wine itself, there is a more organized technique. Professional wine tasters who explain the range of perceived aromas, tastes, and overall qualities of a wine. This data collection has 1100 observations separated into 14 categories. Before beginning any data analysis, the data set needs be transformed from raw to useable data. This may be accomplished in the data analysis section and with R programming. Data Analysis Section 1. Import the wine tasting data set to R programming Wine tasting data set imported by the function know as read.csv and it contains 13 columns. Such as country, description, designation province, region_1, region_2, taster_name, taster_twitter_handle, title, variety, winery are categorical variables and price, and points are numerical variables. 2. Cleaning of a wine tasting data set: Here, raw data is converted to usable data for the further analysis. And the null data is removed using the function na.omit(). In the removed null values, region_2 has highest null values is 633. After cleaning data set is named as a winedata.[1]

3. Basic Descriptive Statistics of Wine tasting: Observation: I observed that we obtain the basic descriptive statistics table for the price and points because that two are only numerical variables in the data set. In the table we can see the mean, median, range, standard deviation. Mean of wine price is 45.83 while points are 89.63, highest price of a wine is 200, cheapest price of a wine is 13, maximum points are 82 and minimum points are 86.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4. Number of reviews by Tasters Observation: In the above barplot we can obtain the number of reviews given for the wine tasting by a tasters. Virginie Boone has given the highest number of reviews and the Roger and Kostrzewa has given the least for the wine. 5. Comparsion of Price and Points Observation: In this above graph, we obtain the scatter plot which determines the relationship between price and points of a wines. In the plot, we can observe that most of the thickmarks of the price lies

between 10 to 100. We can detect a thick marks for points between 85 and 95, with prices near 50. A high-priced wine receives approximately 96 points, but a wine priced around 75 get 95 points. Hence we can say that price and points are has good proportion and correlation. 6. Number of reviews in different provinces Observation: In the above bar plot, we can observe the total number of reviews in the different provinces such as California, New York, Oregon, and Washington. California has the highest reviews and New York has the least reviews. Hence I am considering the California for the next analysis. 7. Basic Descriptive Statistics of Wine tasting in California province:

Observation: We discovered that California had the most reviews after plotting the number of reviews in province barplot. As a result, I consider California for creating subset for the better analysis by applying inferential statistical method. The table shows the mean, median, range, and standard deviation. The mean wine price is 49.58 while the points are 90.02, the median price is 45 while the points are 90, the highest price of a wine in California is 200, the cheapest price of wine in a California is 15, and the maximum points are 82 and the minimum points are 86. Summary:  We imported the wine tasting dataset and cleaned the raw data as a usable data. Retrieved a basic descriptive statistics table for the price and points because that two are numerical variables.  Virginie Boone gave the highest number of reviews for wine, while Roger and Kostrzewa gave it the lowest.  The province of California receives the most reviews. And mean price of the wine in California is more than total mean price. When compared to the other provinces, California has the highest wine price is 200.  I conclude that in this I presented graphs and their observation by using the hypothesis testing and inferential statistics for my analysis. References: 1. Robert Kabacoff, I. (2021). Quick-R: Missing Data. Statmethods. https://www.statmethods.net/input/missingdata.html 2. Toth, M. (2019, April 8). How to Filter in R: A Detailed Introduction to the dplyr Filter Function. R-Bloggers. https://www.r-bloggers.com/2019/04/how-to-filter-in-r-a-detailed- introduction-to-the-dplyr-filter-function/ 3. Coder, R. (2021, May 24). Boxplot in R. R CODER. https://r-coder.com/boxplot-r/ Appendix: 1. I utilized the wine_tasting.csv file for my analysis. 2. Attached the R markdown file named as Final_Project_milestone1_CharanThota.Rmd

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version