Week 3 Project 2

docx

School

Cumberland University *

*We aren’t endorsed by this school

Course

441

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

8

Uploaded by DrElementNewt28

Report
1 Week 3 Project Manisha Reddy Yerla Masters in data science, University of Cumberland’s 2023 Fall - Statistics for Data Science (MSDS-531-B02) - Second Bi-term Dr. Mina Richards November 10, 2023
2 Project 2: Decision making based on historical data 1. Explain the variance and skewness a. Show a simple example of how to calculate variance and then explain the meaning of it. Variance measures the spread or dispersion of a set of values. It quantifies how much the values in a dataset differ from the mean (average) of the dataset. A low variance indicates that the data points tend to be close to the mean, while a high variance indicates that the data points are spread out over a wider range. The formula to calculate the variance of a dataset with n values (x 1 ,x 2 ,...,x n ) is: Variance=1/n∑ n i=1 (x i −xˉ) 2 Where n is the number of data points, x i represents individual data points and xˉ is the mean Example of Calculating Variance: Let's say we have a dataset: 3,5,7,10,12. To calculate the variance, first, find the mean: xˉ=3+5+7+10+12/5=37/5=7.4 Then, calculate the squared differences from the mean and average them: Variance=(3−7.4) 2 +(5−7.4) 2 +(7−7.4) 2 +(10−7.4) 2 +(12−7.4) 2 /5 =16.36+5.76+0.16+6.76+20.24/5=49.28/5=9.856 So, the variance of the dataset is approximately 9.856. Calculating variance in R: Variance is calculated using the var() function in R # Create a sample numeric vector data <- c(3, 5, 7, 10, 12)
3 # Calculate the variance variance <- var(data) b. Show a simple example of how to calculate skewness and then explain the meaning of it. Skewness indicates whether the data is skewed to the left (negatively skewed), centered (symmetrical), or skewed to the right (positively skewed). Negative Skewness: If skewness is negative, it means that the left tail of the distribution is longer or fatter than the right tail. In simpler terms, most of the data points are concentrated on the right side of the mean, and there are some unusually small values pulling the mean to the left. Positive Skewness: If skewness is positive, it means that the right tail of the distribution is longer or fatter than the left tail. In this case, most of the data points are concentrated on the left side of the mean, and there are some unusually large values pulling the mean to the right. Understanding skewness helps in assessing the shape of the distribution and provides insights into the patterns within the data. Skewness can be calculated using the following formula: Skewness=n/(n−1)(n−2)∑ n i=1 (x i −xˉ/s) 3 Where n is the number of data points, x i represents individual data points, xˉ is the mean, and s is the standard deviation of the dataset. Example of Calculating Skewness: Let's use the same dataset as before: 3,5,7,10,12. We've already calculated the mean (xˉ=7.4) and the variance (Variance=9.856).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 The standard deviation (s) can be calculated as the square root of the variance: s=9.856≈3.14. Using the skewness formula: Skewness=5/(5−1)(5−2) [(3−7.4/3.14) 3 +(5−7.4/3.14) 3 +(7−7.4/3.14) 3 +(10−7.4/3.14) 3 +(12−7.4/3.14) 3 ] =5/4×3[−1.369+0.761+0+0.797+4.632] =5/12×4.821 =2.0085 So, the skewness of the dataset is approximately 2.0085 Calculating Skewness in R: Skewness is calculated using the Skewness() function in R # Create a sample numeric vector data <- c(3, 5, 7, 10, 12) # Calculate skewness skew <- skewness(data) 2. After loading dataG2.csv into R or Octave, explain the meaning of each column or what the attributes explain. Columns are for skewness, median, mean, standard deviation, and the last price (each row describes with the numbers the distribution of the stock prices): The provided data in the "dataG2.csv" file appears to contain information related to the distribution of stock prices. Each row represents a specific distribution, and there are five columns with the following attributes:
5 Skewness: This is the first column and appears to contain values related to skewness. Skewness measures the asymmetry of a distribution. A positive skewness value indicates a right-skewed distribution (tail on the right), while a negative value suggests a left-skewed distribution (tail on the left). Median: This is the second column and contains values that represent the median of each distribution. The median is a measure of central tendency and represents the middle value in a dataset when it is sorted. It's less affected by extreme outliers compared to the mean. Mean: This is the third column and contains values that represent the mean (average) of each distribution. The mean is another measure of central tendency, and it's calculated by summing all values and dividing by the number of values. Standard Deviation: This is the fourth column and contains values that represent the standard deviation of each distribution. Standard deviation measures the spread or dispersion of data points in a distribution. Smaller values indicate less spread, while larger values indicate more spread. Last Price: This is the fifth column and appears to represent the last price of the stock. This value could be the most recent closing price of the stock in each distribution. Overall, this dataset provides statistical information about different distributions of stock prices. It includes information about skewness, central tendency (median and mean), spread (standard deviation), and the most recent price. These statistics can be used to analyse and compare the characteristics of different stock price distributions. 3. Draw your own conclusions based on what you learned under 1. and 2. a. Explain the meaning of variables 'I_1' and 'I_2' after you execute (after dataG2.csv is loaded in R or Octave) imported_data <- read.csv("dataG2.csv")
6 S=imported_data[,5]-imported_data[,3] I_1 =which.min(S) # use figure I_1 (see attached) I_2 = which.max(S) # use figure I_2 (see attached) I_1 = which.min(S): This line finds the index of the minimum value in the vector S and assigns it to the variable I_1. It identifies the row where the difference between the last price and the mean is the smallest. I_2 = which.max(S): This line finds the index of the maximum value in the vector S and assigns it to the variable I_2. It identifies the row where the difference between the last price and the mean is the largest. These indices can be used to identify specific rows or distributions in the dataset that have the minimum and maximum differences between the last price and the mean. b. Based on the results in a., which row (stock) would you buy and sell and why (if you believe history repeats)? Given the information provided, we would consider buying and selling based on the rows represented by indices I_1 and I_2, as calculated in the R code. Now, to decide which stock to buy and sell: Buy Decision (I_1): The row at index I_1 represents a distribution where the difference between the last price and the mean is the smallest, we might interpret this as a potential buying opportunity. This is because the historical pattern suggests that the current price is close to the historical mean, and we may anticipate that the price could increase in the future.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 Sell Decision (I_2): The row at index I_2 represents a distribution where the difference between the last price and the mean is the largest, we might interpret this as a potential selling opportunity. This is because the historical pattern suggests that the current price is relatively higher compared to the historical mean, and we may anticipate that the price could decrease in the future. c. Explain how would you use the skewness (first column attribute) to decide about buying or selling a stock. Positive Skewness (Right-skewed distribution): If the skewness is positive, it suggests that there is a longer right tail, indicating the potential for positive outliers or higher returns. A right-skewed distribution might be interpreted as having the potential for favourable returns. In this case, we might consider buying the stock, expecting the possibility of higher returns. Negative Skewness (Left-skewed distribution): If the skewness is negative, it suggests that there is a longer left tail, indicating the potential for negative outliers or lower returns. A left-skewed distribution might be interpreted as having the potential for unfavourable returns. In this case, we might consider selling the stock, anticipating the possibility of lower returns 4. If you want to decide, based on the historical data, which row (stock) to buy or sell, would you base your decision on skewness attribute (1st column) or the differences between the last prices with mean (differences between 5th attribute and 3rd attribute)? Explain. To decide on whether to buy or sell a stock based on historical data, it's important to consider the factors that provide meaningful insights into the stock's behaviour. Let's analyse both the
8 skewness (1st column) and the differences between the last prices and mean (5th attribute minus 3rd attribute). Skewness (1st Column): If skewness is significantly positive, it might suggest that the stock has experienced more positive returns (or smaller negative returns) than negative returns. This could be an indicator of a potentially upward-trending stock. If we are looking for signs of a potential trend in the stock's performance, skewness might be more indicative. A positive skewness could suggest an upward trend, while a negative skewness could suggest a downward trend. Differences between Last Prices and Mean (5th Attribute - 3rd Attribute): A positive difference could mean that the stock has been consistently trading at prices higher than the historical average, which might indicate positive market sentiment and potential for further growth. On the other hand, a negative difference might suggest that the stock is trading below its historical average, possibly indicating a dip in performance. If we are more interested in the current position of the stock relative to its historical average, focusing on the differences between the last prices and mean could provide insight into whether the stock is overvalued or undervalued. The decision on whether to buy or sell a stock based on historical data may involve considering both skewness and the differences between the last prices and mean.