Assignment_1
pdf
keyboard_arrow_up
School
Langara College *
*We aren’t endorsed by this school
Course
4800
Subject
Mathematics
Date
Feb 20, 2024
Type
Pages
9
Uploaded by kamo_0
DANA4800 – SPRING 2024
GETTING TO KNOW R LANGUAGE ASSIGNMENT 1.1
Name : Komalpreet
Student Number : 100420375
Question 1:
1.
Does the dataset has missing values? Impute the missing values using median and mode.
The dataset contains missing values in the 'horsepower' column, with five instances denoted by the '?' symbol. Initially, the 'horsepower' column had a 'char' data type. In order to perform operations, I converted the data type to 'numeric,' replacing the '?' values with 'NA.'
Here is the code snippet for checking missing values, imputing with median and mode. However, while executing, I used median to impute the missing values.
1. #changing the column type to numeric, to perform operations 2. dataset$horsepower <- as
.
numeric
(
dataset$horsepower
) 3. class
(
dataset$horsepower
) 4. 5. # Check for missing values using is.na() and sum() 6. missing_values <- sum
(
is
.
na
(
dataset
)) 7. 8. # Display the result 9. print
(
paste
(
"Number of missing values in the dataset:"
, missing_values
)) 10. 11. # Replace missing values with the mean 12. median <- apply
(
dataset
[
'horsepower'
], 2
, median
, na
.
rm = TRUE
) 13. dataset$horsepower
[
is
.
na
(
dataset$horsepower
)] <- median 14. 15. # Replace missing values with the mode 16. mode <- mode
(
dataset$horsepower
) 17. mode 18. dataset$horsepower
[
is
.
na
(
dataset$horsepower
)] <- mode 2.
Which of the predictors are quantitative, and which are qualitative?
Predictors
Classification
mpg
quantitative
cylinders
qualitative
displacement
quantitative
horsepower
quantitative
weight
quantitative
acceleration
quantitative
year
quantitative
origin
qualitative
name
qualitative
3.
What is the range of each quantitative predictor? You can answer this using the range() function.
The quantitative predictors are mpg, displacement, horsepower, weight, acceleration and year.
Here is the code snippet for this:
1. range
(
dataset$mpg
) 2. range
(
dataset$displacement
) 3. range
(
dataset$horsepower
) 4. range
(
dataset$weight
) 5. range
(
dataset$acceleration
) 6. range
(
dataset$year
) Quantitative Predictors
Range(min)
Range(max)
mpg
9.0 46.6
displacement
68 455
horsepower
46 230
weight
1613 5140
acceleration
8.0 24.8
year
70 82
4.
What is the mean and standard deviation of each quantitative predictor?
The quantitative predictors are mpg, displacement, horsepower, weight, acceleration and year.
Here is the code snippet for this:
1. mean
(
dataset$mpg
) 2. mean
(
dataset$displacement
) 3. mean
(
dataset$horsepower
) 4. mean
(
dataset$weight
) 5. mean
(
dataset$acceleration
) 6. mean
(
dataset$year
) 7. 8. sd
(
dataset$mpg
) 9. sd
(
dataset$displacement
) 10. sd
(
dataset$horsepower
) 11. sd
(
dataset$weight
) 12. sd
(
dataset$acceleration
) 13. sd
(
dataset$year
) Quantitative Predictors
Mean
Standard Deviation
mpg
23.51587
7.825804
displacement
193.5327
104.3796
horsepower
104.3312
38.26699
weight
2970.262
847.9041
acceleration
15.55567
2.749995
year
75.99496
3.690005
5.
Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?
Here is the code snippet for this:
1. data_subset <- dataset
[-
c
(
10
:
85
), ] 2. data_subset$horsepower <- as
.
numeric
(
data_subset$horsepower
) 3. summary_stats <- sapply
(
data_subset
[,
1
:
7
], function
(
x
) c
(
Range = diff
(
range
(
x
, na
.
rm = TRUE
)), Mean = mean
(
x
, na
.
rm = TRUE
), SD = sd
(
x
, na
.
rm = TRUE
))) 4. print
(
summary_stats
) Predictors
Range
Mean
Standard deviation
mpg
35.600000
24.438629 7.908184
cylinders
5.000000
5.370717
1.653486
displacement
387.00000
187.04984
99.63539 horsepower
184.00000
100.99962
35.67265
weight
3348.0000
2933.9626
810.6429
acceleration
16.300000
15.723053
2.680514
year
12.00000
77.15265
3.11123
6.
Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.
Code Snippet:
1. library
(
car
) 2. color_codes <- ifelse
(
dataset$origin == 1
, "#3498db"
, 3. ifelse
(
dataset$origin == 2
, "#e74c3c"
, "#2ecc71"
)) 4. 5. # Create a scatterplot matrix with color-coded points 6. pairs
(
dataset
[, 1
:
7
], col = color_codes
, pch = 16
,
cex = 0.4
, main = "Pairplot Matrix"
)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Findings:
In the above pair plot, the colour of circles is assigned based on origin column.
In the pair plot, it is seen that the variables such as year and acceleration exhibit a positive correlation with mpg (miles per gallon),which means as their value increases, there is a subsequent increase in mpg. Also, year has a strong positive relationship with mpg, but acceleration has modest positive relationship with mpg.
However, variables like weight, horsepower, cylinders and displacement demonstrate a negative correlation with mpg, which means, as these variables increases, the value of mpg decreases.
7.
Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.
In this dataset, the predictors are :
cylinders, displacement, horsepower, weight, acceleration, year and origin.
The below graphs will describe the relationship between mpg and other variables in the dataset.
Miles per Gallon by Cylinders :
In overall, it can be seen that there is an negative correlation between the number of cylinders and miles per gallon (mpg). As the number of cylinders increases, there is a corresponding decrease in mpg.
Miles per Gallon vs. Displacement
In overall, it can be seen that there is an negative correlation between displacement and miles per gallon (mpg).As engine displacement increases, there is corresponding decrease in miles per gallon (mpg).
Miles per Gallon vs. Horsepower: In overall, it can be seen that there is an negative correlation between horsepower and miles per gallon (mpg).As the horsepower of a vehicle increases, there is generally a corresponding decrease in miles per gallon (mpg).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Miles per Gallon vs. Weight
In overall, it can be seen that there is an negative correlation between weight of vehicle and miles per gallon (mpg).As the weight of a vehicle goes up, there is typically a corresponding decrease in miles per gallon (mpg). Miles per Gallon vs. Acceleration
The miles per gallon (mpg) variable shows a modest positive correlation with acceleration. As the value of acceleration increases, there is gradual increase in mpg.
Year vs. Mean Miles per Gallon (mpg):
In overall, it can be seen that there is an positive correlation between year and miles per gallon (mpg).As the year increases, there is corresponding increase in mpg.
Miles per Gallon by Origin
The order of countries based on their origin is 1, 2, 3. Origin 3 exhibits a higher miles per gallon (mpg) compared to Origin 2, which, in turn, has a higher mpg than Origin 1.
The above variables(predictors) has positive/negative relationship with mpg.These can be further utilized to predict values of mpg using machine learning algorithms.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help