PSTAT10-HW1

pdf

School

University of California, Santa Barbara *

*We aren’t endorsed by this school

Course

10

Subject

Mathematics

Date

Apr 3, 2024

Type

pdf

Pages

4

Uploaded by DukeMaskWildcat18

Report
PSTAT 10: Homework 1 Due 07/05/2023 11:59 PM on Canvas Problem 1: The R Ecosystem (* optional) One of the main advantages of using R is that the language is fully open source , and we are able to download and view the source code of most R packages. Five basic R packages are beepr , fun , fortunes , cowsay , and praise . Choose one of these packages and download its source code. The list of all packages on CRAN is here https://mirror.las.iastate.edu/CRAN/. On a package’s page, you can find the package source to download, which will be in tar.gz format. This is called a tarball. Unzip the tarball, navigate to an R script, and copy and paste a function’s name and arguments (but not the body) to your homework solution. Comment on the name of the function and how many arguments it has. For example in cowsay, the script in cowsay/R/utils.R contains a function check_color: check_color <- function (clr) { It takes one argument, clr . Note that to include an incomplete piece of code in an R Markdown file, you must add eval = FALSE to your code chunk, like this: {r, eval=FALSE} . Problem 2: Vector Manipulation You have been hired as a data analyst for a sports team, and your task is to analyze the performance of players over the last season to determine next season’s player contracts. You have been provided with the following dataset of seven players’ average points, rebounds, and assists per game: Player Points Rebounds Assists Avery 12 3 4 Parker 3 2 6 Blake 6 14 2 Riley 8 8 2 Cameron 6 4 1 Aubrey 1 1 4 Finn 11 2 8 i. Create and display four vectors called player_names , points_scored , rebounds , and assists with the corresponding data values. Use the player_names vector to assign index names to the other three vectors. ii. Create and display a vector called player_totals that contains the total number of points, rebounds, and assists recorded for each player. Based on these totals, which player is most valuable? (i.e., which player has the highest combined total) iii. New analytics have shown that offensive stats (points and assists) are on average 1.2 times more important than defensive stats (rebounds) for a team’s winning chances. Create and display a new vector called updated_totals which re-weights the points_scored and assists values to generate a new total score for each player (rebounds remain unchanged). iv. Using the result in (iii), determine the three most valuable players. Report your findings. 1
v. Use the which.max() and which.min() functions to determine which player’s score increased by the most and which player’s score increased by the least between the original weighting in (ii) and the reweighted values in (iii). Problem 3: Matrix Manipulation You have been given a dataset that contains the Math, English, and Science test scores of three students. This data is displayed below. Math English Science Student1 85 92 78 Student2 88 90 95 Student3 79 83 87 i. Create and display a matrix called test_scores that contains the data in the above table. Use the dimnames argument to set the row and column index names (note: recall that dimnames takes a list argument). ii. Determine the maximum score any student achieved in each subject, and report the results in a named vector with indices Math , English , and Science . hint : do so using the apply() function. iii. Using the result from (ii), create and display a new matrix called scaled_scores that divides each student’s scores by the maximum score in each subject. hint : slicing the matrix into vectors may be helpful. iv. Which student had the highest average score across all three subjects? Which student had the highest scaled average score across all three subjects? Problem 4: Motivating Lists: Iris Data Recall the iris dataset used in worksheets (1) and (2) contains data on various characteristics of three different species of irises. In this dataset, the first 50 entries correspond to setosa species, the next 50 correspond to versicolor , and the final 50 correspond to virginica . i. Load the iris dataset and use run the code below to generate a 150 x 4 matrix that omits the species column in the original (non-matrix) data. Make sure you understand why iris is not a matrix as discussed in lecture 3 (use ?iris if unclear). iris_matrix <- iris[ 1 : 4 ] ii. Use the iris_matrix data to generate three separate matrices containing the 50 sets of Sepal.Width and Petal.Width data from the three species. Call these matrices setosa_width , versicolor_width , and virginica_width . iii. Create a new column called Total in each of the matrices in (ii) that contains the sum of the sepal and petal widths. iv. Create a list called iris_species which consists (in order) of the setosa, versicolor, and virginica width matrices. v. Using either the list in (iv) or the matrices in (iii), determine the mean total width of each of the three species. Save these widths as a numeric vector with appropriately named indices. Append this vector to the iris_species list using the append() function. note : read the append() documentation carefully. Problem 5: Functions without input parameters So far, most of the functions that we’ve worked with have required specific input parameters. Note, however, that we’ve seen a handful that do not – for example, the getwd() function. Another useful function requiring no inputs is Sys.time() . 2
Write a function current_time that takes no arguments and prints the character string "The current time is " followed by the current time but not the date. It may be helpful to look at documentation for the Sys.time() , format() , and paste0() functions. Problem 6: Geometric and harmonic means The arithmetic mean (commonly referred to as just “mean” or “average”) is a concept regularly used in statistics and data analysis. Two less common, but still useful types of means are the geometric and harmonic means. For a set of n data points x 1 , x 2 , · · · , x n , the geometric and harmonic means take the following forms: ¯ x geom = ( n i =1 x i ) (1 /n ) ¯ x harmonic = n n i =1 1 x i Where n i =1 x i is the product of the n x i ’s and n i =1 1 x i is the sum of the n values 1 x i . Write the function unusual_means which takes a nonzero numeric vector as an input and returns a vector containing the geometric and harmonic means of the input. If the input vector is not numeric, print Input must be numeric! . If the input vector contains a zero, print Input must be nonzero! ( hint : use the == comparison to check if the input vector contains zeros). It is important to make sure functions act as intended. We can run unit tests to check their behavior. Run your function on the input below to see if you obtain the desired results. unusual_means ( c ( 1 : 9 )) ## [1] 4.147166 3.181372 unusual_means ( c ( TRUE , TRUE , FALSE )) ## [1] "Input must be numeric!" unusual_means ( c ( "cat" , "dog" , "mouse" , "horse" )) ## [1] "Input must be numeric!" unusual_means ( c ( 1 , 5 , 4 , 3 , 8 , 0 , 9 )) ## [1] "Input must be nonzero!" unusual_means ( seq ( 1 , 100 )) ## [1] 37.99269 19.27756 Problem 7: Manhattan Distance Many students will have encountered the notion of Euclidean distance , or length of a line segment between two points in previous algebra and geometry courses. Many other measures of distance exist, including the Manhattan Distance (also called the Taxicab distance). This metric compares the distance between two length n vectors p and q as the sum of the difference in absolute values of each of their components. dist Manhattan ( p, q ) = n i =1 | p i - q i | Write a function called manhattan that takes two numeric vectors as inputs and outputs Manhattan distance between them. Make sure to include appropriate error messages for non-numeric input vectors and different length input vectors. Test the code using the following examples: 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
manhattan ( c ( 1 , 2 , 5 ), c ( 2 , 6 , 1 )) ## [1] 9 manhattan ( c ( "dog" , "cat" , "mouse" ), c ( TRUE , FALSE , FALSE )) ## [1] "Error: both vectors must be numeric" manhattan ( c ( 1 , 2 , 8 , 10 ), c ( 2 , 1 , 6 , 5 , 3 )) ## [1] "Error: Manhattan distance requires inputs of the same length" manhattan ( seq ( 1 , 500 ), rep ( 500 , 500 )) ## [1] 124750 Write two unit tests of your own and include these with your function. 4