PSTAT10-HW1
pdf
keyboard_arrow_up
School
University of California, Santa Barbara *
*We aren’t endorsed by this school
Course
10
Subject
Mathematics
Date
Apr 3, 2024
Type
Pages
4
Uploaded by DukeMaskWildcat18
PSTAT 10: Homework 1
Due 07/05/2023 11:59 PM on Canvas
Problem 1: The R Ecosystem (* optional)
One of the main advantages of using R is that the language is fully
open source
, and we are able to download
and view the source code of most R packages. Five basic R packages are
beepr
,
fun
,
fortunes
,
cowsay
, and
praise
.
Choose one of these packages and download its source code.
The list of all packages on CRAN is here
https://mirror.las.iastate.edu/CRAN/. On a package’s page, you can find the package source to download,
which will be in tar.gz format. This is called a tarball.
Unzip the tarball, navigate to an R script, and copy and paste a function’s name and arguments (but not the
body) to your homework solution. Comment on the name of the function and how many arguments it has.
For example in cowsay, the script in cowsay/R/utils.R contains a function check_color:
check_color <-
function
(clr) {
It takes one argument,
clr
. Note that to include an incomplete piece of code in an R Markdown file, you
must add
eval = FALSE
to your code chunk, like this:
{r, eval=FALSE}
.
Problem 2: Vector Manipulation
You have been hired as a data analyst for a sports team, and your task is to analyze the performance of
players over the last season to determine next season’s player contracts. You have been provided with the
following dataset of seven players’ average points, rebounds, and assists per game:
Player
Points
Rebounds
Assists
Avery
12
3
4
Parker
3
2
6
Blake
6
14
2
Riley
8
8
2
Cameron
6
4
1
Aubrey
1
1
4
Finn
11
2
8
i.
Create and display four vectors called
player_names
,
points_scored
,
rebounds
, and
assists
with
the corresponding data values. Use the
player_names
vector to assign index names to the other three
vectors.
ii.
Create and display a vector called
player_totals
that contains the total number of points, rebounds,
and assists recorded for each player. Based on these totals, which player is most valuable? (i.e., which
player has the highest combined total)
iii.
New analytics have shown that offensive stats (points and assists) are on average 1.2 times more
important than defensive stats (rebounds) for a team’s winning chances. Create and display a new
vector called
updated_totals
which re-weights the
points_scored
and
assists
values to generate a
new total score for each player (rebounds remain unchanged).
iv. Using the result in (iii), determine the three most valuable players. Report your findings.
1
v.
Use the
which.max()
and
which.min()
functions to determine which player’s score increased by the
most and which player’s score increased by the least between the original weighting in (ii) and the
reweighted values in (iii).
Problem 3: Matrix Manipulation
You have been given a dataset that contains the Math, English, and Science test scores of three students.
This data is displayed below.
Math
English
Science
Student1
85
92
78
Student2
88
90
95
Student3
79
83
87
i.
Create and display a matrix called
test_scores
that contains the data in the above table. Use the
dimnames
argument to set the row and column index names (note: recall that
dimnames
takes a
list
argument).
ii.
Determine the maximum score any student achieved in each subject, and report the results in a named
vector with indices
Math
,
English
, and
Science
.
hint
: do so using the
apply()
function.
iii.
Using the result from (ii), create and display a new matrix called
scaled_scores
that divides each
student’s scores by the maximum score in each subject.
hint
: slicing the matrix into vectors may be
helpful.
iv.
Which student had the highest average score across all three subjects? Which student had the highest
scaled average score across all three subjects?
Problem 4: Motivating Lists: Iris Data
Recall the
iris
dataset used in worksheets (1) and (2) contains data on various characteristics of three
different species of irises. In this dataset, the first 50 entries correspond to
setosa
species, the next 50
correspond to
versicolor
, and the final 50 correspond to
virginica
.
i.
Load the iris dataset and use run the code below to generate a 150 x 4 matrix that omits the
species
column in the original (non-matrix) data. Make sure you understand why
iris
is
not
a matrix as
discussed in lecture 3 (use
?iris
if unclear).
iris_matrix <- iris[
1
:
4
]
ii.
Use the
iris_matrix
data to generate three separate matrices containing the 50 sets of
Sepal.Width
and
Petal.Width
data from the three species. Call these matrices
setosa_width
,
versicolor_width
,
and
virginica_width
.
iii.
Create a new column called
Total
in each of the matrices in (ii) that contains the sum of the sepal and
petal widths.
iv.
Create a list called
iris_species
which consists (in order) of the setosa, versicolor, and virginica width
matrices.
v.
Using either the list in (iv) or the matrices in (iii), determine the mean total width of each of the three
species. Save these widths as a numeric vector with appropriately named indices. Append this vector to
the
iris_species
list using the
append()
function.
note
: read the
append()
documentation carefully.
Problem 5: Functions without input parameters
So far, most of the functions that we’ve worked with have required specific input parameters. Note, however,
that we’ve seen a handful that do not – for example, the
getwd()
function. Another useful function requiring
no inputs is
Sys.time()
.
2
Write a function
current_time
that takes no arguments and prints the character string
"The current time
is "
followed by the current time but
not
the date. It may be helpful to look at documentation for the
Sys.time()
,
format()
, and
paste0()
functions.
Problem 6: Geometric and harmonic means
The arithmetic mean (commonly referred to as just “mean” or “average”) is a concept regularly used in
statistics and data analysis.
Two less common, but still useful types of means are the
geometric
and
harmonic
means. For a set of
n
data points
x
1
, x
2
,
· · ·
, x
n
, the geometric and harmonic means take the
following forms:
¯
x
geom
= (
n
i
=1
x
i
)
(1
/n
)
¯
x
harmonic
=
n
∑
n
i
=1
1
x
i
Where
n
i
=1
x
i
is the product of the
n x
i
’s and
∑
n
i
=1
1
x
i
is the sum of the
n
values
1
x
i
.
Write the function
unusual_means
which takes a nonzero numeric vector as an input and returns a vector
containing the geometric and harmonic means of the input. If the input vector is not numeric, print
Input
must be numeric!
. If the input vector contains a zero, print
Input
must be nonzero!
(
hint
: use the
==
comparison to check if the input vector contains zeros).
It is important to make sure functions act as intended. We can run
unit tests
to check their behavior. Run
your function on the input below to see if you obtain the desired results.
unusual_means
(
c
(
1
:
9
))
## [1] 4.147166 3.181372
unusual_means
(
c
(
TRUE
,
TRUE
,
FALSE
))
## [1] "Input must be numeric!"
unusual_means
(
c
(
"cat"
,
"dog"
,
"mouse"
,
"horse"
))
## [1] "Input must be numeric!"
unusual_means
(
c
(
1
,
5
,
4
,
3
,
8
,
0
,
9
))
## [1] "Input must be nonzero!"
unusual_means
(
seq
(
1
,
100
))
## [1] 37.99269 19.27756
Problem 7: Manhattan Distance
Many students will have encountered the notion of
Euclidean distance
, or length of a line segment between
two points in previous algebra and geometry courses. Many other measures of distance exist, including the
Manhattan Distance
(also called the Taxicab distance). This metric compares the distance between two
length
n
vectors
p
and
q
as the sum of the difference in absolute values of each of their components.
dist
Manhattan
(
p, q
) =
n
i
=1
|
p
i
-
q
i
|
Write a function called
manhattan
that takes two numeric vectors as inputs and outputs Manhattan distance
between them. Make sure to include appropriate error messages for non-numeric input vectors and different
length input vectors.
Test the code using the following examples:
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
manhattan
(
c
(
1
,
2
,
5
),
c
(
2
,
6
,
1
))
## [1] 9
manhattan
(
c
(
"dog"
,
"cat"
,
"mouse"
),
c
(
TRUE
,
FALSE
,
FALSE
))
## [1] "Error: both vectors must be numeric"
manhattan
(
c
(
1
,
2
,
8
,
10
),
c
(
2
,
1
,
6
,
5
,
3
))
## [1] "Error: Manhattan distance requires inputs of the same length"
manhattan
(
seq
(
1
,
500
),
rep
(
500
,
500
))
## [1] 124750
Write two unit tests of your own and include these with your function.
4