6501 hw 02_15_23

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6501

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by MajorOtterMaster1158

Question 8.1 Describe a situation or problem from your job, everyday life, current events, etc., for which a linear regression model would be appropriate. List some (up to 5) predictors that you might use. I would use a linear regression model to predict Telsa stock price based on Musk’s tweets. We could use data from twitter to do this. Using likes, replies, and retweets, or other metrics that twitter may provide, would allow us to potentially show a relationship. For a more detailed perspective we would likely use Musk’s most recent tweet to grab the metrics that we would compare to the current stock price. The issues programmatically would be the frequency that Musk tweets. Sometimes he tweets 20 times in 10 mins and sometimes he does not tweet for hours. This might be a place to use exponential smoothing. Using past data we would be able to predict the likelihood of Tesla stock falling or rising based on a given tweet. Question 8.2 Using crime data from http://www.statsci.org/data/general/uscrime.txt (file uscrime.txt, description at http://www.statsci.org/data/general/uscrime.html ), use regression (a useful R function is lm or glm) to predict the observed crime rate in a city with the following data: M = 14.0 So = 0 Ed = 10.0 Po1 = 12.0 Po2 = 15.5 LF = 0.640 M.F = 94.0 Pop = 150 NW = 1.1 U1 = 0.120 U2 = 3.6 Wealth = 3200 Ineq = 20.1 Prob = 0.04 Time = 39.0 Show your model (factors used and their coefficients), the software output, and the quality of fit. Note that because there are only 47 data points and 15 predictors, you’ll probably notice some overfitting. We’ll see ways of dealing with this sort of problem later in the course. We are using the lm function in R to model this data via linear regression. The output of the summary of the model is as follows: Call: lm(formula = Crime ~ M + So + Ed + Po1 + Po2 + LF + M.F + Pop + NW + U1 + U2 + Wealth + Ineq + Prob + Time, data = crime_data)

Residuals: Min 1Q Median 3Q Max -395.74 -98.09 -6.69 112.99 512.67 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -5.984e+03 1.628e+03 -3.675 0.000893 *** M 8.783e+01 4.171e+01 2.106 0.043443 * So -3.803e+00 1.488e+02 -0.026 0.979765 Ed 1.883e+02 6.209e+01 3.033 0.004861 ** Po1 1.928e+02 1.061e+02 1.817 0.078892 . Po2 -1.094e+02 1.175e+02 -0.931 0.358830 LF -6.638e+02 1.470e+03 -0.452 0.654654 M.F 1.741e+01 2.035e+01 0.855 0.398995 Pop -7.330e-01 1.290e+00 -0.568 0.573845 NW 4.204e+00 6.481e+00 0.649 0.521279 U1 -5.827e+03 4.210e+03 -1.384 0.176238 U2 1.678e+02 8.234e+01 2.038 0.050161 . Wealth 9.617e-02 1.037e-01 0.928 0.360754 Ineq 7.067e+01 2.272e+01 3.111 0.003983 ** Prob -4.855e+03 2.272e+03 -2.137 0.040627 * Time -3.479e+00 7.165e+00 -0.486 0.630708 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 209.1 on 31 degrees of freedom Multiple R-squared: 0.8031, Adjusted R-squared: 0.7078 F -statistic: 8.429 on 15 and 31 DF, p-value: 3.539e-07 Initially this output shows the multivariate linear regression model we wrote in R. The residuals represent the differences between the actual values of the dependent variable (in this case, the crime rate) and the predicted values based on the model. More specifically, the residual is the vertical distance between an observed data point and the regression line. Ideally residuals should be small and randomly distributed near zero. Our residuals seem to be distributed around zero but are quite large. I am not quite sure how to interpret this but I think there are only two options. Either this model does not fit the data well and there is something we are missing or the difference between crime rate and its predictors are quite large. In this case we should scale all of our data to be on similar orders of magnitude in order to determine if the model fits the data well. We have four columns of data for our coefficients. An explanation of the output is as follows: 1. Estimate - this is the estimated value of the regression coefficient for each predictor variable. It represents the amount of change in the dependent variable (crime rate) associated with a one-unit change in the predictor variable, holding all other variables constant.

2. Std. Error - this is the standard error of the coefficient estimate. It represents the average amount that the coefficient estimate is expected to vary from the true population value across different samples. A smaller standard error indicates a more precise estimate. 3. t value - this is the t-statistic for the coefficient estimate. It represents the ratio of the estimated coefficient to its standard error. The t-value indicates the extent to which the estimated coefficient deviates from zero, relative to its variability. A larger t-value indicates a larger deviation from zero and a higher degree of statistical significance. 4. Pr(>|t|) - this is the p-value associated with the t-statistic for the coefficient estimate. It represents the probability of observing a t-statistic as extreme or more extreme than the observed value, assuming that the null hypothesis (i.e., that the coefficient is equal to zero) is true. A smaller p-value indicates a lower probability of observing the coefficient estimate by chance and a higher degree of statistical significance. 5. Signif. Codes - these are asterisks that indicate the level of statistical significance of the coefficient estimate. They are based on the p-value of the t-statistic and provide a convenient summary of the statistical significance of each coefficient. I am a bit confused on exactly what this means so any input is appreciated. 6. Residual standard error - this is the estimated standard deviation of the error terms, or residuals, in the regression model. In this case, the residual standard error is 209.1, which means that on average, the predicted crime rate from the model is expected to differ from the true crime rate by about 209.1. 7. Multiple R-squared - this is a measure of the variance in the dependent variable (crime rate) that is explained by the independent variables in the model. In this case, the multiple R-squared value is 0.8031, which means that the independent variables in the model explain 80.31% of the variance in the crime rate. 8. Adjusted R-squared - this is a modified version of the multiple R-squared that adjusts for the number of independent variables in the model. It provides a more conservative estimate of the amount of variance explained by the model. In this case, the adjusted R-squared is 0.7078, which is slightly lower than the multiple R-squared due to the inclusion of 15 predictor variables. 9. F-statistic - this is a test of the overall significance of the regression model. It compares the variance explained by the regression model to the variance not explained by the model. A larger F-statistic indicates a more significant fit of the model to the data. In this case, the F-statistic is

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

8.429 with 15 and 31 degrees of freedom, which means that the regression model as a whole is statistically significant. 10. P-value - this is the probability of obtaining an F-statistic as extreme or more extreme than the observed value, assuming that the null hypothesis, that all regression coefficients are zero, is true. In this case, the p-value is 3.539e-07, which is very small and indicates strong evidence against the null hypothesis. Plotting this data shows that: The data follows a normal distribution, meaning scaling the data isn’t necessary for this data set. Scaling the data may still have beneficial effects though.

The data holds its linearity fairly well, meaning a linear regression is appropriate. These are some awesome graphs using the car package. They show a single predictor variable vs the response. Lines with slopes further away from zero shows a greater relationship between the two variables graphed.

Now for our prediction we loaded the data manually by creating a dataframe with it. With the given data we determined the crime rate should be around 155 offenses per 100,000 population in 1960. To do this again I would love to scale the data to get more easily understandable data. I don’t think this would change our prediction at all but it would make it much more obvious which factors were affecting the crime rate.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

EPID404_Homework1_20230313 (4).docx

CA2, Task 1 Kostelc.doc

Lab 3 Probability R Afara Emmanuel.docx

HW 2 (SPC) (2).docx

CW #4 012424 Indep Samples vs Dependent .xlsx

STAT429_sp24_HW03 - Copy.pdf

2024 HW 4 SCM 305.docx

S24 - Data Analysis Exercise.pdf

Math 220_ Project 3.pdf

Sample Final 2 8AM.pdf

STATS HW.docx

3130392A-6BA0-485F-A3C0-ACEB16115FCA.pdf

Recommended textbooks for you

Functions and Change: A Modeling Approach to Coll...

Algebra

ISBN:9781337111348

Author:Bruce Crauder, Benny Evans, Alan Noell

Publisher:Cengage Learning

Algebra and Trigonometry (MindTap Course List)

Algebra

ISBN:9781305071742

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

College Algebra

Algebra

ISBN:9781305115545

Author:James Stewart, Lothar Redlin, Saleem Watson

Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

College Algebra

Algebra

ISBN:9781938168383

Author:Jay Abramson

Publisher:OpenStax

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

SEE MORE TEXTBOOKS

Recommended textbooks for you

Functions and Change: A Modeling Approach to Coll...
Algebra
ISBN:9781337111348
Author:Bruce Crauder, Benny Evans, Alan Noell
Publisher:Cengage Learning
Algebra and Trigonometry (MindTap Course List)
Algebra
ISBN:9781305071742
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
College Algebra
Algebra
ISBN:9781938168383
Author:Jay Abramson
Publisher:OpenStax
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt