40203478_Assignment_3

pdf

School

Concordia University *

*We aren’t endorsed by this school

Course

280

Subject

Statistics

Date

Jan 9, 2024

Type

pdf

Pages

Uploaded by DeaconSalamander3940

Introduction to Statistical Programming: Assignment 3 Marissa Gonçalves (Student ID: 40203478) Textbook Problems Section 2.7 on Page 34 Problem 1 a) # Create a vector containing solar radiation observation data, then display the values # from the vector sample. solar.radiation <- c( 11.1 , 10.6 , 6.3 , 8.8 , 10.7 , 11.2 , 8.9 , 12.2 ) solar.radiation ## [1] 11.1 10.6 6.3 8.8 10.7 11.2 8.9 12.2 b) # Determine the mean, median, range and variance for solar radiation observations by # utilizing appropriate functions. mean(solar.radiation) ## [1] 9.975 median(solar.radiation) ## [1] 10.65 range(solar.radiation) ## [1] 6.3 12.2 var(solar.radiation) ## [1] 3.525 c) The mean, median and range values for sr10 increased by 10, but the variance remains the same as the variance value obtained from solar.radiation data. # Create a variable sr10, which includes all solar radiation observation values added # by 10, then use the mean, median, range and variance functions to calculate needed # values for comparison. sr10 <- solar.radiation + 10 mean(sr10) ## [1] 19.975 median(sr10) ## [1] 20.65 range(sr10) ## [1] 16.3 22.2 1

var(sr10) ## [1] 3.525 d) All mean, median, range and variance values for srm2 differ from solar.radiation data values. # Create a variable srm2, which includes all solar radiation observation values # multiplied by -2, then use the mean, median, range and variance functions to # calculate needed values for comparison. srm2 <- solar.radiation * (- 2 ) mean(srm2) ## [1] -19.95 median(srm2) ## [1] -21.3 range(srm2) ## [1] -24.4 -12.6 var(srm2) ## [1] 14.1 e) # Create three histograms with appropriate titles and unique colour codes to compare # data stored in solar.radiation, sr10, srm2 variables with one another. par( mfrow= c( 1 , 3 )) hist(solar.radiation, main= "Solar Radiation Graph I" , col= "blue" , ylab= "Number of Solar Radiation Observations" , xlab= "Solar Radiation Values" ) hist(sr10, main= "Solar Radiation Graph II" , col= "gold" , ylab= "Number of Solar Radiation Observations" , xlab= "Solar Radiation Values Added by 10" ) hist(srm2, main= "Solar Radiation Graph III" , col= "purple" , ylab= "Number of Solar Radiation Observations" , xlab= "Solar Radiation Values Multiplied by -2" ) 2

Solar Radiation Graph I Solar Radiation Values Number of Solar Radiation Observations 6 8 10 12 14 0 1 2 3 4 Solar Radiation Graph II Solar Radiation Values Added by 10 Number of Solar Radiation Observations 16 18 20 22 24 0 1 2 3 4 Solar Radiation Graph III Solar Radiation Values Multiplied by -2 Number of Solar Radiation Observations -25 -20 -15 -10 0 1 2 3 4 5 Problem 3 # Create variable n as a sequence from 1 to 15, then determine the pairwise maxima # between both 2ˆn and nˆ3, before summing the pairwise value using the function pmax(). n <- 1 : 15 sum(pmax( 2 ˆn, nˆ 3 )) ## [1] 66538 Section 2.9 on Page 46 Problem 2 a) # Utilize the nrow() function to determine the number of rows and the ncol() function # to find the number of columns in the USArrests data frame. According to the results, # there are 50 rows and 4 columns in USArrests built-in data frame. nrow(USArrests) ## [1] 50 ncol(USArrests) ## [1] 4 b) # Utilize the vapply() function to determine the median of each column in the USArrests # data frame. vapply(USArrests, median, 1 ) 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

## Murder Assault UrbanPop Rape ## 7.25 159.00 66.00 20.10 c) When the population in urban areas exceeds 77%, the average per capita murder rate is greater by 0.25 compared to when the urban area population is less than 50%. # Utilize the mean() function to determine the average per capita murder rate when the # USArrests data frame urban area population is more than 77%. average.greater.than .77 <- mean(USArrests$Murder[USArrests$UrbanPop > 77 ]) average.greater.than .77 ## [1] 8.5 # Then, use the same strategy to determine the average per capita murder rate when the # USArrests data frame urban area population is less than 50%. average.less.than .50 <- mean(USArrests$Murder[USArrests$UrbanPop < 50 ]) average.less.than .50 ## [1] 8.25 d) # Utilize the sample() function to retrieve 12 elements without replacement from the # USArrests data frame containing 50 records and the data.frame() function to construct # a new data frame for the newly-created sample. sample.id <- sample( 1 : 50 , size = 12 , replace = FALSE) records.sample <- data.frame(USArrests[sample.id, ]) records.sample ## Murder Assault UrbanPop Rape ## Kansas 6.0 115 66 18.0 ## Arizona 8.1 294 80 31.0 ## South Carolina 14.4 279 48 22.5 ## New York 11.1 254 86 26.1 ## Oklahoma 6.6 151 68 20.0 ## Tennessee 13.2 188 59 26.9 ## Florida 15.4 335 80 31.9 ## Mississippi 16.1 259 44 17.1 ## Idaho 2.6 120 54 14.2 ## California 9.0 276 91 40.6 ## Oregon 4.9 159 67 29.3 ## Alaska 10.0 263 48 44.5 Chapter 2 Exercises on Page 50 and 51 Problem 3 a) # Use the subset() function to construct a subset of the chickwts data frame where the # weight of chicks is greater than 300. chickwts300p <- subset(chickwts, weight > 300 ) chickwts300p ## weight feed ## 11 309 linseed ## 26 327 soybean ## 27 329 soybean ## 31 316 soybean 4

## 37 423 sunflower ## 38 340 sunflower ## 39 392 sunflower ## 40 339 sunflower ## 41 341 sunflower ## 43 320 sunflower ## 45 334 sunflower ## 46 322 sunflower ## 48 318 sunflower ## 49 325 meatmeal ## 51 303 meatmeal ## 52 315 meatmeal ## 53 380 meatmeal ## 58 344 meatmeal ## 60 368 casein ## 61 390 casein ## 62 379 casein ## 64 404 casein ## 65 318 casein ## 66 352 casein ## 67 359 casein ## 71 332 casein b) # Use the subset() function to construct a subset of the chickwts data frame where the # chicks are fed with linseed. chickwtslinseed <- subset(chickwts, feed == "linseed" ) chickwtslinseed ## weight feed ## 11 309 linseed ## 12 229 linseed ## 13 181 linseed ## 14 141 linseed ## 15 260 linseed ## 16 203 linseed ## 17 148 linseed ## 18 169 linseed ## 19 213 linseed ## 20 257 linseed ## 21 244 linseed ## 22 271 linseed c) # Utilize the mean() function to determine the average weight of all chicks eating # linseed from the chickwtslinseed subset. mean(chickwtslinseed$weight) ## [1] 218.75 d) # Create a new subset from the chickwts data frame where the chicks are not fed with # linseed and use the mean() function to determine the average weight of all chicks not # eating linseed from the chickwtsnolinseed subset. chickwtsnolinseed <- subset(chickwts, feed != "linseed" ) 5

mean(chickwtsnolinseed$weight) ## [1] 269.9661 Problem 6 # Create a sequence using the seq() function called x from 0 to 6, which increments # by 0.01. x <- seq( 0 , 6 , by= 0.01 ) # Produce a function called f() which takes the x variable as its parameter. If x is # greater than 3, then y-values are determined by the function: (2*x) - (0.5*(xˆ2)). # However, if that is not the case and x is less than or equal to 3, then y-values are # determined by the function: (3*x) + 2. f = function (x) { if (x > 3 ) { ( 2 *x) - ( 0.5 *(xˆ 2 )) } else { ( 3 *x) + 2 } } # Create a function wrapper for function f() by utilizing the Vectorize() function. f = Vectorize(f) # Utilize the plot() function to produce a graph of the piecewise function, where the # y-axis ranges from -10 to 15, the scattered plot values are outlined with an orange # line and an appropriate title is included to describe the function. plot(x, f(x), ylim= c(- 10 , 15 ), main= "Piecewise Function on the Interval from 0 to 6" , col= ' orange ' ) 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

0 1 2 3 4 5 6 -10 -5 0 5 10 15 Piecewise Function on the Interval from 0 to 6 x f(x) Problem 7 a) # Utilize the sample() function to create a data frame called dieRolls which generates # 1000000 tosses from a 6-sided die with replacement. dieRolls <- sample( 1 : 6 , 1000000 , replace = TRUE) b) # Utilize the factor() function to create a factor of the dieRolls frame set, where # the level names are changed to non-numerical names of die sides. dieRollsFactor <- factor(dieRolls) levels(dieRollsFactor) <- c( "One" , "Two" , "Three" , "Four" , "Five" , "Six" ) c) # Convert the dieRollsFactor data frame to a character factor vector using the # as.character() function. dieRollsChar <- as.character(dieRollsFactor) d) # Generate tables utilizing the table() function to compare and display results from # all die roll data frames. When comparing the names, the levels from dieRolls are # numerical, while the levels from dieRollsFactor and dieRollsChar data frames are # non-numerical. While comparing outputs, results from dieRolls and dieRollsFactor # data frames are both similar, while dieRollsChar data frame values differ from # the other two. 7

table(dieRolls) ## dieRolls ## 1 2 3 4 5 6 ## 166848 166627 166230 166901 166930 166464 table(dieRollsFactor) ## dieRollsFactor ## One Two Three Four Five Six ## 166848 166627 166230 166901 166930 166464 table(dieRollsChar) ## dieRollsChar ## Five Four One Six Three Two ## 166930 166901 166848 166464 166230 166627 e) The dieRolls data frame takes the longest to produce a table, while the dieRollsFactor data frame requires the shortest amount of time to generate a table. # Use the built-in system.time() function to determine the total length of time to # generate tables from all die roll data frames. system.time(table(dieRolls)) ## user system elapsed ## 0.04 0.00 0.05 system.time(table(dieRollsFactor)) ## user system elapsed ## 0.02 0.00 0.02 system.time(table(dieRollsChar)) ## user system elapsed ## 0.05 0.00 0.05 f) The dieRolls.R file requires the least amount of memory with an approximate size of 4 MB, while the dieRollsChar.R file takes the most amount of memory with an approximate size of 8 MB. # Utilize the dump() function to produce an R file based on the data frame specified # in the function parameter. In addition, use the file.info() function to return the # sizes of the newly-created R files for comparison. dump( "dieRolls" , "dieRolls.R" ) file.info( "dieRolls.R" )$size ## [1] 4125016 dump( "dieRollsFactor" , "dieRollsFactor.R" ) file.info( "dieRollsFactor.R" )$size ## [1] 4125111 dump( "dieRollsChar" , "dieRollsChar.R" ) file.info( "dieRollsChar.R" )$size ## [1] 7905541 8