Final_Project_milestone2_CharanThota

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

6010

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

7

Uploaded by KidDragonfly3195

Report
1
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 Wine Tasting Dataset Analysis Introduction To undertake hypothesis analysis and visualization get a wine tasting data set. To begin any study, we must first comprehend the dataset that has been presented. Wine tasting is a set of data that includes the evaluation and analysis of wine based on sensory characteristics. While wine tasting is as ancient as the wine itself, it may be done in a more systematic manner. Professional wine tasters who describe a wine's variety of smells, flavors, and overall attributes. The data set must be changed from raw to usable data before any data analysis can occur. This may be done using R programming. After doing, there are 194 observations in this data set, which are divided into 13 categories. Data Analysis 1. Importing and cleaning of a wine tasting dataset in R Using read.csv, dataset of wine tasting is imported to the R. It contains 13 columns. The dataset is cleaned with the function called na.omit and analysis will be carried out the cleaned data, that is 194 observations. We can perform the hypothesis testing for the price of the wine in the data set. 2. Visualize the Central coast reviews by tasters and the prices of the wine 2
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 Observation: From bar plot of the reviewers of tasters in the central coast region we can observe that the Matt Kettmann has the highest review of wine tasting when compared to other two tasters. From the boxplot of the wine prices in the central coast region we can say that the price of wine is right skewed and few of the outliers are present. Median price of wines is approximately 38. 3. Scatter plot of price and points of wine in Pinot Noir Variety Observation: In this scatter plot, we can see the relation between price and points of wines in the Pinot Noir variety. We can observe that most of the thick marks of the price and points lies between 20 to 85. A highly priced wine receives approx. 92 points and the low-priced wine gets the low points 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 that is approx. 84 points. Finally, we can conclude that price and points of a Pinot Noir has good proportion and correlation. 4. One sided T-test of wine price in a wine tasting data set Observation: From the one sided test of wine tasting price, the p value is 0.4618 which is greater than the threshold of 0.05 we can that we failed to reject the null hypothesis that the assumed mean the 46 and the real mean is 45.82 and lies between 95 percent confidence interval. 5. Two-sided T-test of wine price in a wine tasting data set 4
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 Observation: From the two sided test of wine tasting price, the p value is 0.923 which is greater than the threshold of 0.05 we can that we failed to reject the null hypothesis that the assumed mean the 46 and the real mean is 45.82 and lies between 42.33174 and 49.32805 of 95 percent confidence interval. 6. One sided T-test of wine price in California province from the wine tasting dataset Observation: From the one sided test of wine price in California, the p value is 0.432 which is greater than the threshold of 0.05 we can that we failed to reject the null hypothesis that the assumed mean the 50 and the real mean is 49.58 and lies in 95 percent confidence interval. 5
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 7. Two-sided T-test of wine price in California province from the wine tasting Observation: From the two sided test of wine price in California, the p value is 0.864 which is greater than the threshold of 0.05 we can that we failed to reject the null hypothesis that the assumed mean the 50 and the real mean is 49.58 and lies between 44.73010 and 54.43383 of 95 percent confidence interval. Summary As we completing with the data cleaning of a data set, we started the visualizing the number of reviews by a taster in central coast where Matt Kettmann has the highest reviews and boxplot of prices in the central coast region is right skewed with the median of approx. 38. Then performed the one sided and two-sided t test for price of wines in the wine tasting dataset which concludes p values is greater than threshold value hence we failed to reject the null hypothesis in the both cases. Finally performed the one sided and two-sided t test for price of wines in California 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
CHARAN THOTA Probability Theory and Intro to Statistics Final Project - Milestone 2 - 70525 province from wine tasting dataset which concludes p values is greater than threshold value hence we failed to reject the null hypothesis in the both cases. References 1. Toth, M. (2019, April 8). How to Filter in R: A Detailed Introduction to the dplyr Filter Function. R-Bloggers. https://www.r-bloggers.com/2019/04/how-to-filter-in-r-a-detailed- introduction-to-the-dplyr-filter-function/ 2. Spector, P. (2021). Using t-tests in R | Department of Statistics. Berkeley. https://statistics.berkeley.edu/computing/r-t-tests 3. Robert Kabacoff, Ph.D., I. (2021). Quick-R: Missing Data. Statmethods. https://www.statmethods.net/input/missingdata.html Appendix 1. For data analysis I used the data set named as Wine_tasting.csv 2. I attached the R markdown file which is named as Final_Project_milestone2_CharanThota.Rmd 7