Assignment-2_F2023
Rmd
keyboard_arrow_up
School
Toronto Metropolitan University *
*We aren’t endorsed by this school
Course
123
Subject
Medicine
Date
Dec 6, 2023
Type
Rmd
Pages
6
Uploaded by AmbassadorValorFrog11
---
title: 'CIND 123 - Data Analytics: Basic Methods'
author:
output:
html_document: default
word_document: default
pdf_document: default
---
<center> <h1> Assignment 2 (10%) </h1> </center>
<center> <h3> [Insert your full name] </h3> </center>
<center> <h3> [Insert course section & student number] </h3>
</center>
---
## Instructions
This is an R Markdown document. Markdown is a simple formatting
syntax for authoring HTML, PDF, and MS Word documents. Review
this website for more details on using R Markdown
<http://rmarkdown.rstudio.com>.
Use RStudio for this assignment. Complete the assignment by
inserting your R code wherever you see the string "#INSERT YOUR
ANSWER HERE".
When you click the **Knit** button, a document (PDF, Word, or
HTML format) will be generated that includes both the assignment
content as well as the output of any embedded R code chunks.
Submit **both**
the rmd and generated output files. Failing to
submit both files will be subject to mark deduction.
## Sample Question and Solution
Use `seq()` to create the vector $(100,97\ldots,4)$.
```{r}
seq(100, 3, -3)
```
## Question 1 (40 points)
The Titanic Passenger Survival Data Set provides information on
the fate of passengers on the fatal maiden voyage of the ocean
liner "Titanic." The dataset is available from the Department of
Biostatistics at the Vanderbilt University School of Medicine
(https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
) in several formats. Store the Titanic Data Set `titanic_train`
using the following commands.
```{r}
install.packages("titanic")
library(titanic)
titanicDataset <- read.csv(file =
"https://biostat.app.vumc.org/wiki/pub/Main/DataSets/titanic3.csv
", stringsAsFactors = F)
str(titanicDataset)
```
a) Extract and show the columns `cabin`, `age`, `embarked` and
`pclass` into a new data frame of the name 'titanicSubset'. (5
points)
```{r}
#INSERT YOUR ANSWER HERE
titanicSubset <- titanicDataset[,c("cabin", "age", "embarked",
"pclass")]
titanicSubset
```
b) Numerical data: Use the count() function from the `dplyr`
package to display the total number of passengers that survived
or not. (5 points)
HINT: To count the occurrences of survived or not in the
titanicDataset data frame using the `dplyr` package, you can use
the pipe operator (%>%) to chain operations.
```{r}
#INSERT YOUR ANSWER HERE
library(dplyr)
titanicDataset %>% count(survived)
```
c) Categorical data: Use count() and group_by() functions from
the `dplyr` package to calculate the number of passengers by
`embarked`. (5 points)
HINT: Use group_by() first then pipe the result to count() to
calculate the number of passengers.
```{r}
#INSERT YOUR ANSWER HERE
library(dplyr)
titanicDataset %>% group_by(embarked) %>%
count(embarked)
```
d) Find the passengers in data frame whose embarked information
is an empty character (""), and fill them by the most frequent
embarked value. (3 points)
```{r}
#INSERT YOUR ANSWER HERE
```
e) Use the aggregate() function to calculate the 'survivalCount'
of each `embarked` and calculate the survival rate of each
embarked. Then draw the conclusion on which embarked has the
higher survival rate. (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
f) Use boxplot to display the distribution of fare for each
pcalss and infer which passenger class is more expensive. (5
points)
```{r}
#INSERT YOUR ANSWER HERE
```
g) Calculate the average fare for three pclass and describe if
the calculation agrees with the box plot. (5 points)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
```{r}
#INSERT YOUR ANSWER HERE
```
h) Use the for loop and if control statements to list the menss
failure is 0.05. We know that if one engine fails, the whole
system stops.
a) What is the probability that the system operates without
failure? (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
b) Use the Binomial approximation to calculate the probability
that at least 3 engines are defective? (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
c) What is the probability that the second engine (B) is
defective given the first engine (A) is not defective, i.e., P(B
is defective|A is not defective), while we know that the first and
second engines are independent. (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
## Question 3 (25 points)
On average, John visits his parents 4 times a month
a) Find the probabilities that John visits his parents 1 to 6
times in a month? (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
b) Find the probability that John visits his parents 3 times or
more in a month? (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
c) Compare the similarity between Binomial and Poisson
distribution. (15 points @ 5 point each)
1) Create 100,000 samples for a Binomial random variable using
parameters described in Question 2
2) Create 100,000 samples for a Poisson random variable using
parameters described in Question 3
3) then illustrate on how well the Poisson probability
distribution approximates the Binomial probability distribution.
HINT: use multhist() from the 'plotrix' package
```{r}
#INSERT YOUR ANSWER HERE
```
## Question 4 (20 points)
Write a script in R to compute the following probabilities of a
normal random variable with mean 9 and variance 25
a) The probability that it lies between 8.2 and 17.3 (inclusive)
(5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
b) The probability that it is greater than 15.02 (5 points)
```{r}
#INSERT YOUR ANSWER HERE
```
c) The probability that it is less than or equal to 11.8 (5
points)
```{r}
#INSERT YOUR ANSWER HERE
```
d) The probability that it is less than 10 or greater than 13 (5
points)
```{r}
#INSERT YOUR ANSWER HERE
```
END of Assignment #2.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help