19BCE1567_LAB1
pdf
keyboard_arrow_up
School
University of South Carolina *
*We aren’t endorsed by this school
Course
MISC
Subject
Statistics
Date
Apr 3, 2024
Type
Pages
6
Uploaded by JudgeDeerMaster933
19BCE1567 13/01/2022 SARA KULKARNI L21+L22: PROF LAKSHMI PATHI ESSENTIALS OF DATA ANALYTICS LAB 1 Tasks for Week-1: Regression Understand the following operations/functions on random dataset and perform similar operations on mtcars and ‘data.csv’ dataset based on given
instructions. Aim
: To develop linear regression model for the given data using R programming and to verify the null hypothesis 1. MTCARS DATASET ALGORITHM: 1.Start 2.load mtcars data 3.Split the data into training and testing data 4.Use lm command to generate linear model for target variable with respect to dependent variable 5.print the lmModel
6.Print the summary of the model 7.Predict the target variable using the linear model and compare with the actual values in the test dataset STATISTICS: lm(formula = mpg ~ wt, data = train1) Coefficients: (Intercept) wt 37.490 -5.422 lm(formula = mpg ~ wt, data = train1) Residuals: Min 1Q Median 3Q Max -4.5012 -2.3686 -0.2967 1.3515 6.8391 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.4900 2.4144 15.528 2.44e-13 *** wt -5.4224 0.7193 -7.538 1.56e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.272 on 22 degrees of freedom Multiple R-squared: 0.7209, Adjusted R-squared: 0.7082 F-statistic: 56.83 on 1 and 22 DF, p-value: 1.561e-07 INFERENCE: We can observe that the p value is less than 0.05, indicating the existence of a strong correlation between mpg and wt. Hence this model is selected. [ACCEPTED] Comparing the predicted values with actual values in the test dataset Program: data1<-mtcars library(dplyr) require(caTools) set.seed(123) sample = sample.split(data1,SplitRatio = 0.75) train1 =subset(data1,sample ==TRUE) # creates a training dataset named train1 with rows which are marked as TRUE test1=subset(data1, sample==FALSE) dim(test1) lmmodel<-lm(mpg~wt,data=train1) print(lmmodel) summary(lmmodel)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
test1$Preditedmpg<- predict(lmmodel, test1) # Priting top 6 rows of actual and predited mpg head(test1[ , c("mpg", "Preditedmpg")]) 2 SAMPLE DATASET ALGORITHM 1.Start 2.Read data.csv from files 3.Split the data into training and testing data 4.Use lm command to generate linear model for target variable with respect to dependent variable 5.print the lmModel 6.Print the summary of the model 7.check the p value , since it is less than 0.05 reject the model STATISTICS:
Call: lm(formula = Weight ~ Height, data = train) Coefficients: (Intercept) Height
17.0547 0.5347 lm(formula = Weight ~ Height, data = train) Residuals: Min 1Q Median 3Q Max -64.904 -25.591 -0.766 29.361 56.206 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.0547 49.5133 0.344 0.7320 Height 0.5347 0.2911 1.837 0.0722 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 31.59 on 50 degrees of freedom Multiple R-squared: 0.0632, Adjusted R-squared: 0.04446 F-statistic: 3.373 on 1 and 50 DF, p-value: 0.07222 INFERENCE: Since the value of P is greater than 0.05 , there is no significant relation between Weight and Height in our dataset. Hence this model is rejected. [REJECTED]
PROGRAM: data1 <- read.csv("data.csv", header = TRUE, sep = ",") head(data1) library(caret) # Split data into train and test index <- createDataPartition(data1$Weight, p = .10, list = FALSE) train <- data1[index, ] test <- data1[-index, ] # Checking the dim of train dim(train) lmModel <- lm(Weight ~ Height , data = train) # Printing the model object print(lmModel) summary(lmModel)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model
What is the predicted average mileage at tire pressure x = 31?
arrow_forward
Interpret the least squares regression line of this data set.
Meteorologists in a seaside town wanted to understand how their annual rainfall
is affected by the temperature of coastal waters.
For the past few years, they monitored the average temperature of coastal
waters (in Celsius), x, as well as the annual rainfall (in millimetres), y.
Rainfall statistics
• The mean of the x-values is 11.503.
• The mean of the y-values is 366.637.
• The sample standard deviation of the x-values is 4.900.
• The sample standard deviation of the y-values is 44.387.
• The correlation coefficient of the data set is 0.896.
The correct least squares regression line for the data set is:
y = 8.116x + 273.273
Use it to complete the following sentence:
The least squares regression line predicts an additional
annual rainfall if the average temperature of coastal waters increases by one degree
millimetres of
Celsius.
arrow_forward
I ONLY NEED PART C,D, and E answered please thanks
A regression was run to determine if there is a relationship between the happiness index (y) and life expectancy in years of a given country (x).The results of the regression were:
ˆyy^=a+bxa=-1.102b=0.074
(a) Write the equation of the Least Squares Regression line of the formˆyy^= + x(b) Which is a possible value for the correlation coefficient, rr?
-0.858
-1.07
1.07
0.858
(c) If a country increases its life expectancy, the happiness index will
decrease
increase
(d) If the life expectancy is increased by 2.5 years in a certain country, how much will the happiness index change? Round to two decimal places._____(e) Use the regression line to predict the happiness index of a country with a life expectancy of 63 years. Round to two decimal places._______Use the space below to type your answer AND/OR to upload a picture of your work for all the questions in this problem.
arrow_forward
The model developed from sample data that has the form of Yhat = bo +bjX is known as the multiple regression model
with two predictor variables. (True or False)
O True
O False
arrow_forward
The experimenters conducted a regression analysis on the results and made the output from the analysis publicly available on Open Science Framework (osfio/fm5c2). From this output, we can derive that regression equation shown here. The equation predicts participants' relative performance estimates (from 0% to 99%) from their scores on the measure of their theories of intelligence, while controlling their actual scores on the antoymns test. Remember: The theories of intelligence survey includes a series of statements that participants rate on a scale of 1 to 6; average scores on the overall measure range from 1, indicating views of intelligence as highly stable; to 6, indicating views of intelligence as highly changeable.
Relative performance estimate = 12.10 (theories of intelligence score) +0.13 (antonyms test score).
Based on this regression equation, what kind of regression analysis did the researchers perform?
a. A simple linear regression analysis
b. Multiple regression analysis…
arrow_forward
COmpare and constrast the use of prediction intervals for a Single Linear Regression model having one X and Multiple Linear Regression Model having two predictors X1 and X2. WHat are the similarities/differences in process and interpretation?
arrow_forward
Use the least squares regression line of this data set to predict a value.
Meteorologists in a seaside town wanted to understand how their annual rainfall
is affected by the temperature of coastal waters.
For the past few years, they monitored the average temperature of coastal
waters (in Celsius), x, as well as the annual rainfall (in millimetres), y.
Rainfall statistics
• The mean of the x-values is 11.503.
• The mean of the y-values is 366.637.
• The sample standard deviation of the x-values is 4.900.
• The sample standard deviation of the y-values is 44.387.
• The correlation coefficient of the data set is 0.896.
The least squares regression line of this data set is:
y = 8.116x + 273.273
How much rainfall does this line predict in a year if the average temperature of coastal waters
is 15 degrees Celsius?
Round your answer to the nearest integer.
millimetres
arrow_forward
Write a simple linear regression model with the total number of wins as the response variable and the average points scored as the predictor variable.
Also, find the:
1 Null Hypothesis (statistical notation and its description in words)
2 Alternative Hypothesis (statistical notation and its description in words)
arrow_forward
Students who complete their exams early certainly can intimidate the other students, but do the early finishers perform significantly differently than the other
students? A random sample of 40 students was chosen before the most recent exam in Prof. Martingale's class, and for each student, both the score on the
exam and the time
took the student to complete the exam were recorded. The least-squares regression equation relating time to complete (denoted by x, in
minutes) and exam score (denoted by y) was y = 64.90+0.35x . The standard error of the slope of this least-squares regression line was approximately 0.25.
Test for a significant linear relationship between the two variables exam score and exam completion time for students in Prof. Martingale's class by doing a
hypothesis test regarding the population slope B1. (Assume that the variable y follows a normal distribution for each value of x.) Use the 0.10 level of
significance, and perform a two-tailed test. Then fill in the table…
arrow_forward
You are analyzing a dataset with 749 datapoints. You decide to create a linear
regression model with this dataset, using 12 predictor variables. Using this
information, what is the degrees of freedom associated with the sum of squares
regression in your analysis?
arrow_forward
The following result perspective in RapidMiner shows a multiple linear regression model.
Based on the diagram, the model for our dependent variable Y is Predicted Y=
(Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589
Attribute
Insulation
Temperature
Avg Age
Home Size
(Intercept)
O True
O False
Coefficient
3.323
-0.869
1.968
3.173
134.511
Std. Error
0.420
0.071
0.065
0.311
7.589
Std. Coefficient
0.164
-0.262
0.527
0.131
?
Tolerance
0.431
0.405
0.491
0.914
?
t-Stat
7.906
-12.222
30.217
10.210
17.725
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- Tire pressure (psi) and mileage (mpg) were recorded for a random sample of seven cars of thesame make and model. The extended data table (left) and fit model report (right) are based on aquadratic model What is the predicted average mileage at tire pressure x = 31?arrow_forwardInterpret the least squares regression line of this data set. Meteorologists in a seaside town wanted to understand how their annual rainfall is affected by the temperature of coastal waters. For the past few years, they monitored the average temperature of coastal waters (in Celsius), x, as well as the annual rainfall (in millimetres), y. Rainfall statistics • The mean of the x-values is 11.503. • The mean of the y-values is 366.637. • The sample standard deviation of the x-values is 4.900. • The sample standard deviation of the y-values is 44.387. • The correlation coefficient of the data set is 0.896. The correct least squares regression line for the data set is: y = 8.116x + 273.273 Use it to complete the following sentence: The least squares regression line predicts an additional annual rainfall if the average temperature of coastal waters increases by one degree millimetres of Celsius.arrow_forwardI ONLY NEED PART C,D, and E answered please thanks A regression was run to determine if there is a relationship between the happiness index (y) and life expectancy in years of a given country (x).The results of the regression were: ˆyy^=a+bxa=-1.102b=0.074 (a) Write the equation of the Least Squares Regression line of the formˆyy^= + x(b) Which is a possible value for the correlation coefficient, rr? -0.858 -1.07 1.07 0.858 (c) If a country increases its life expectancy, the happiness index will decrease increase (d) If the life expectancy is increased by 2.5 years in a certain country, how much will the happiness index change? Round to two decimal places._____(e) Use the regression line to predict the happiness index of a country with a life expectancy of 63 years. Round to two decimal places._______Use the space below to type your answer AND/OR to upload a picture of your work for all the questions in this problem.arrow_forward
- The model developed from sample data that has the form of Yhat = bo +bjX is known as the multiple regression model with two predictor variables. (True or False) O True O Falsearrow_forwardThe experimenters conducted a regression analysis on the results and made the output from the analysis publicly available on Open Science Framework (osfio/fm5c2). From this output, we can derive that regression equation shown here. The equation predicts participants' relative performance estimates (from 0% to 99%) from their scores on the measure of their theories of intelligence, while controlling their actual scores on the antoymns test. Remember: The theories of intelligence survey includes a series of statements that participants rate on a scale of 1 to 6; average scores on the overall measure range from 1, indicating views of intelligence as highly stable; to 6, indicating views of intelligence as highly changeable. Relative performance estimate = 12.10 (theories of intelligence score) +0.13 (antonyms test score). Based on this regression equation, what kind of regression analysis did the researchers perform? a. A simple linear regression analysis b. Multiple regression analysis…arrow_forwardCOmpare and constrast the use of prediction intervals for a Single Linear Regression model having one X and Multiple Linear Regression Model having two predictors X1 and X2. WHat are the similarities/differences in process and interpretation?arrow_forward
- Use the least squares regression line of this data set to predict a value. Meteorologists in a seaside town wanted to understand how their annual rainfall is affected by the temperature of coastal waters. For the past few years, they monitored the average temperature of coastal waters (in Celsius), x, as well as the annual rainfall (in millimetres), y. Rainfall statistics • The mean of the x-values is 11.503. • The mean of the y-values is 366.637. • The sample standard deviation of the x-values is 4.900. • The sample standard deviation of the y-values is 44.387. • The correlation coefficient of the data set is 0.896. The least squares regression line of this data set is: y = 8.116x + 273.273 How much rainfall does this line predict in a year if the average temperature of coastal waters is 15 degrees Celsius? Round your answer to the nearest integer. millimetresarrow_forwardWrite a simple linear regression model with the total number of wins as the response variable and the average points scored as the predictor variable. Also, find the: 1 Null Hypothesis (statistical notation and its description in words) 2 Alternative Hypothesis (statistical notation and its description in words)arrow_forwardStudents who complete their exams early certainly can intimidate the other students, but do the early finishers perform significantly differently than the other students? A random sample of 40 students was chosen before the most recent exam in Prof. Martingale's class, and for each student, both the score on the exam and the time took the student to complete the exam were recorded. The least-squares regression equation relating time to complete (denoted by x, in minutes) and exam score (denoted by y) was y = 64.90+0.35x . The standard error of the slope of this least-squares regression line was approximately 0.25. Test for a significant linear relationship between the two variables exam score and exam completion time for students in Prof. Martingale's class by doing a hypothesis test regarding the population slope B1. (Assume that the variable y follows a normal distribution for each value of x.) Use the 0.10 level of significance, and perform a two-tailed test. Then fill in the table…arrow_forward
- You are analyzing a dataset with 749 datapoints. You decide to create a linear regression model with this dataset, using 12 predictor variables. Using this information, what is the degrees of freedom associated with the sum of squares regression in your analysis?arrow_forwardThe following result perspective in RapidMiner shows a multiple linear regression model. Based on the diagram, the model for our dependent variable Y is Predicted Y= (Insulation *0.420)+(Temperature *0.071)+(Avg_Age*0.065)+(Home_Size *0.311)+7.589 Attribute Insulation Temperature Avg Age Home Size (Intercept) O True O False Coefficient 3.323 -0.869 1.968 3.173 134.511 Std. Error 0.420 0.071 0.065 0.311 7.589 Std. Coefficient 0.164 -0.262 0.527 0.131 ? Tolerance 0.431 0.405 0.491 0.914 ? t-Stat 7.906 -12.222 30.217 10.210 17.725arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- College AlgebraAlgebraISBN:9781305115545Author:James Stewart, Lothar Redlin, Saleem WatsonPublisher:Cengage LearningAlgebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageTrigonometry (MindTap Course List)TrigonometryISBN:9781305652224Author:Charles P. McKeague, Mark D. TurnerPublisher:Cengage Learning
- Elementary Linear Algebra (MindTap Course List)AlgebraISBN:9781305658004Author:Ron LarsonPublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

College Algebra
Algebra
ISBN:9781305115545
Author:James Stewart, Lothar Redlin, Saleem Watson
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Trigonometry (MindTap Course List)
Trigonometry
ISBN:9781305652224
Author:Charles P. McKeague, Mark D. Turner
Publisher:Cengage Learning

Elementary Linear Algebra (MindTap Course List)
Algebra
ISBN:9781305658004
Author:Ron Larson
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt