Assignment_1

pdf

School

Langara College *

*We aren’t endorsed by this school

Course

4800

Subject

Mathematics

Date

Feb 20, 2024

Type

pdf

Pages

9

Uploaded by kamo_0

Report
DANA4800 – SPRING 2024 GETTING TO KNOW R LANGUAGE ASSIGNMENT 1.1 Name : Komalpreet Student Number : 100420375 Question 1: 1. Does the dataset has missing values? Impute the missing values using median and mode. The dataset contains missing values in the 'horsepower' column, with five instances denoted by the '?' symbol. Initially, the 'horsepower' column had a 'char' data type. In order to perform operations, I converted the data type to 'numeric,' replacing the '?' values with 'NA.' Here is the code snippet for checking missing values, imputing with median and mode. However, while executing, I used median to impute the missing values. 1. #changing the column type to numeric, to perform operations 2. dataset$horsepower <- as . numeric ( dataset$horsepower ) 3. class ( dataset$horsepower ) 4. 5. # Check for missing values using is.na() and sum() 6. missing_values <- sum ( is . na ( dataset )) 7. 8. # Display the result 9. print ( paste ( "Number of missing values in the dataset:" , missing_values )) 10. 11. # Replace missing values with the mean 12. median <- apply ( dataset [ 'horsepower' ], 2 , median , na . rm = TRUE ) 13. dataset$horsepower [ is . na ( dataset$horsepower )] <- median 14. 15. # Replace missing values with the mode 16. mode <- mode ( dataset$horsepower ) 17. mode 18. dataset$horsepower [ is . na ( dataset$horsepower )] <- mode 2. Which of the predictors are quantitative, and which are qualitative? Predictors Classification mpg quantitative cylinders qualitative displacement quantitative horsepower quantitative weight quantitative acceleration quantitative year quantitative origin qualitative name qualitative
3. What is the range of each quantitative predictor? You can answer this using the range() function. The quantitative predictors are mpg, displacement, horsepower, weight, acceleration and year. Here is the code snippet for this: 1. range ( dataset$mpg ) 2. range ( dataset$displacement ) 3. range ( dataset$horsepower ) 4. range ( dataset$weight ) 5. range ( dataset$acceleration ) 6. range ( dataset$year ) Quantitative Predictors Range(min) Range(max) mpg 9.0 46.6 displacement 68 455 horsepower 46 230 weight 1613 5140 acceleration 8.0 24.8 year 70 82 4. What is the mean and standard deviation of each quantitative predictor? The quantitative predictors are mpg, displacement, horsepower, weight, acceleration and year. Here is the code snippet for this: 1. mean ( dataset$mpg ) 2. mean ( dataset$displacement ) 3. mean ( dataset$horsepower ) 4. mean ( dataset$weight ) 5. mean ( dataset$acceleration ) 6. mean ( dataset$year ) 7. 8. sd ( dataset$mpg ) 9. sd ( dataset$displacement ) 10. sd ( dataset$horsepower ) 11. sd ( dataset$weight ) 12. sd ( dataset$acceleration ) 13. sd ( dataset$year ) Quantitative Predictors Mean Standard Deviation mpg 23.51587 7.825804 displacement 193.5327 104.3796 horsepower 104.3312 38.26699 weight 2970.262 847.9041 acceleration 15.55567 2.749995
year 75.99496 3.690005 5. Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains? Here is the code snippet for this: 1. data_subset <- dataset [- c ( 10 : 85 ), ] 2. data_subset$horsepower <- as . numeric ( data_subset$horsepower ) 3. summary_stats <- sapply ( data_subset [, 1 : 7 ], function ( x ) c ( Range = diff ( range ( x , na . rm = TRUE )), Mean = mean ( x , na . rm = TRUE ), SD = sd ( x , na . rm = TRUE ))) 4. print ( summary_stats ) Predictors Range Mean Standard deviation mpg 35.600000 24.438629 7.908184 cylinders 5.000000 5.370717 1.653486 displacement 387.00000 187.04984 99.63539 horsepower 184.00000 100.99962 35.67265 weight 3348.0000 2933.9626 810.6429 acceleration 16.300000 15.723053 2.680514 year 12.00000 77.15265 3.11123 6. Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings. Code Snippet: 1. library ( car ) 2. color_codes <- ifelse ( dataset$origin == 1 , "#3498db" , 3. ifelse ( dataset$origin == 2 , "#e74c3c" , "#2ecc71" )) 4. 5. # Create a scatterplot matrix with color-coded points 6. pairs ( dataset [, 1 : 7 ], col = color_codes , pch = 16 , cex = 0.4 , main = "Pairplot Matrix" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Findings: In the above pair plot, the colour of circles is assigned based on origin column. In the pair plot, it is seen that the variables such as year and acceleration exhibit a positive correlation with mpg (miles per gallon),which means as their value increases, there is a subsequent increase in mpg. Also, year has a strong positive relationship with mpg, but acceleration has modest positive relationship with mpg. However, variables like weight, horsepower, cylinders and displacement demonstrate a negative correlation with mpg, which means, as these variables increases, the value of mpg decreases. 7. Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer. In this dataset, the predictors are : cylinders, displacement, horsepower, weight, acceleration, year and origin. The below graphs will describe the relationship between mpg and other variables in the dataset. Miles per Gallon by Cylinders :
In overall, it can be seen that there is an negative correlation between the number of cylinders and miles per gallon (mpg). As the number of cylinders increases, there is a corresponding decrease in mpg. Miles per Gallon vs. Displacement In overall, it can be seen that there is an negative correlation between displacement and miles per gallon (mpg).As engine displacement increases, there is corresponding decrease in miles per gallon (mpg).
Miles per Gallon vs. Horsepower: In overall, it can be seen that there is an negative correlation between horsepower and miles per gallon (mpg).As the horsepower of a vehicle increases, there is generally a corresponding decrease in miles per gallon (mpg).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Miles per Gallon vs. Weight In overall, it can be seen that there is an negative correlation between weight of vehicle and miles per gallon (mpg).As the weight of a vehicle goes up, there is typically a corresponding decrease in miles per gallon (mpg). Miles per Gallon vs. Acceleration The miles per gallon (mpg) variable shows a modest positive correlation with acceleration. As the value of acceleration increases, there is gradual increase in mpg.
Year vs. Mean Miles per Gallon (mpg): In overall, it can be seen that there is an positive correlation between year and miles per gallon (mpg).As the year increases, there is corresponding increase in mpg.
Miles per Gallon by Origin The order of countries based on their origin is 1, 2, 3. Origin 3 exhibits a higher miles per gallon (mpg) compared to Origin 2, which, in turn, has a higher mpg than Origin 1. The above variables(predictors) has positive/negative relationship with mpg.These can be further utilized to predict values of mpg using machine learning algorithms.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help