Assignment-6-Introduction-to-working-with-R-RStudio
docx
keyboard_arrow_up
School
University of Saskatchewan *
*We aren’t endorsed by this school
Course
311
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
4
Uploaded by MagistrateStar1002
Assignment #6: Introduction to Working with R/RStudio
Submission Instructions
Due:
Friday, April 6, 2018 at 11:59 PM.
Submit the following four
files through Canvas>Assignments>To-Do: (1)
The completed, working R script that produced the analysis in Steps 1 through 9
(2)
The output file – descriptivesOutput.txt (3)
Another output file – histogram.pdf
(4)
The completed answer sheet provided on the last page and also as a separate word file
If you do not follow the instructions, your assignment will be counted late.
o
Late Assignment policy: Same as before.
Evaluation
Your submission will be graded based on the correctness of the completed answer sheet, with other files
as supporting documents.
Before you start
For this assignment, you’ll run simple analyses by modifying the R script you used in the ICA #11 (
Descriptives.r
). You will also need a new data set – OnTimeAirport2017Dec.csv
, which contains actual data regarding on-time flight statistics for 83,915 flights, by airline and airport, for December 2017, collected from Bureau of Transportation Statistics.
1
IMPORTANT! When downloading the .csv file, please make sure that the name doesn’t change, and that it is in the same folder as the Descriptives.r file that you are modifying
.
The metadata for the – OnTimeAirport2017Dec.csv spreadsheet is below:
Variable Name
Variable Description
FlightDate
The date of the flight (mm/dd/yyyy)
UniqueCarrier
The unique carrier code
CarrierlName
The name of the carrier
FlightNum
Flight Number
Origin
The origin airport of the flight
OriginCity
The origin city of the flight
Dest
The destination airport of the flight
DestCity
The destination city of the flight
DepDelay
The delay in departing from the origin gate (in minutes)
TaxiOut
The minutes spent taxiing out to the runway at origin
TaxiIn
The minutes spent taxiing in from the runway at destination
ArrDelay
The delay in arrive to the destination gate (in minutes)
Cancelled
Whether the flight was cancelled (0 = no, 1 = yes)
AirTime
Flight Time (in minutes)
Distance
The total distance of the flight (in miles)
1
https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236
Modifying the Descriptives.r script
To complete the assignment, modify the Descriptives.r
script (used in ICA #11) to perform an analysis of departure delays by origin airport, following the instructions below, and complete the answer sheet on the last page
.
1)
Use OnTimeAirport2017Dec.csv as the input file.
HINT: In line 21 of the Descriptives.r script, it says:
INPUT_FILENAME <- "NBA14Salaries.csv"
Change that line to:
INPUT_FILENAME <- "OnTimeAirport2017Dec.csv"
2)
Present the number of flights, grouped by destination airport (using Dest
).
HINT: In line 61, change the line to read:
summary(dataSet$Dest)
This presents the number of observations/rows (flights) by destination airport. You will need the output from this command to answer the first question in the answersheet on the last page.
3)
Present summary statistics for arrival delay (using ArrDelay
).
HINT: In line 66, change the line by replacing Salary with ArrDelay
:
describe(dataSet$ArrDelay)
4)
Present summary statistics for arrival delay (using ArrDelay
), grouped by airline carriers (using UniqueCarrier
).
HINT: Check line 73 in the script:
describeBy(dataSet$Salary,dataSet$Position)
This presents summary statistics for salary by position (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change line 73 to present summary statistics for arrival delay (
ArrDelay
), grouped by airline carrier (
UniqueCarrier
).
If you get that, you will now be able to answer questions 2 through 4 on the answer sheet!
5)
Compare, using a t-test, the arrival delays for two airline carrier
s (using UniqueCarrier
)
, American Airlines (AA) and United Airlines (UA).
HINT: Now please change line 87 and line 93 on your own. Hopefully the first few steps will get you started!
Check line 87:
subset <- dataSet[ which(dataSet$Position=='PG' | dataSet$Position=='SF'), ] This create a subset with only two positions: PG and SF (for the NBA salary data). Now that we are using a different data set, you should be able to figure out how to change this line to create a subset with only two airline carriers: AA and UA.
Check line 93:
Page 2
t.test(subset$Salary~subset$Position)
This runs a t-test by using Salary as your dependent variable and Position as your grouping variable (for the NBA salary data). Now with the airport data, you should be able to change this line by using ArrDelay
as the dependent variable, and UniqueCarrier
as the grouping variable.
6)
Create a histogram, properly labeled, of the overall distribution of arrival delays (using ArrDelay
) for all flights.
HINT: You will need to change the hist()
function in both line 106 and line 112. You also need to change line 25 & line 27 for the label and title of the histogram. In addition, in line 24, change the number of breaks (NUM_BREAKS) to 50 so you will see more vertical bars in the histogram.
Once you’ve completed this part, add several new lines to the script that does the following 7), 8), and 9):
NOTE: Make sure you add these lines right before the sink()
function (line 96) so that the results are included in your text file output.
7)
Use describeBy()
to compare the flight distance (
Distance
) across airlines (using UniqueCarrier
).
8)
Use describeBy()
to compare the taxiing out time (
TaxiOut
) across origin airports (
Origin
).
9)
Answer this question using a t-test: Do planes spend more time taxiing out to the runway in Newark (EWR) or Philadelphia (PHL) as the origin airport? (using TaxiOut
as the taxiing out time, and Origin
as the origin airport); Once you’ve completed all the 9 steps, you can set the working directory and run the script. Based on your script output, answer the 11 questions listed on the answersheet on the next page.
Answer Sheet on the Next Page……
1.
Page 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Answer Sheet for Assignment: Introduction to Working with R/RStudio
Name __________________________________
Answer the questions below based on your script output
Question
Answer
1
How many total flights (including cancelled flights) have Philadelphia (PHL) as the destination airport during December 2017?
2
What was the average arrival delay (in minutes) across all flights during December 2017?
3
What was the average arrival delay (in minutes) for American Airlines (with UniqueCarrier code of AA) during December 2017?
4
What was the longest arrival delay for United Airlines (with UniqueCarrier code of UA) during December 2017?
5
On average, which airline (using UniqueCarrier) experienced greater arrival delays: American Airline (AA) or United Airlines (UA)?
6
For question #5, was this difference statistically significant? What is the p-value?
(answer both questions in the blank to the right)
7
Which airline(s) had longest average flight distance?
(you can list more than one if it’s a tie)
8
Which airline (s) had shortest average flight distance?
(you can list more than one if it’s a tie)
9
On average, which origin airport (using Origin) experienced greater taxi out times: Newark (EWR) or Philadelphia (PHL)?
1
0
For question #9, was this difference statistically significant? What is the p-value?
(answer both questions in the blank to the right)
1
1
Looking at the histogram. Is the distribution symmetric? Are most flights delayed less than 50 minutes or more than 50 minutes?
Page 4
Related Documents
Related Questions
need evaluation steps
arrow_forward
Please give excel formula
arrow_forward
Now monitor the process. An additional ten days of data have been collected, see table labeled “1st 10 Days of Monitoring Reservation Processing Time” in the Data File.
Develop Xbar and R charts for the 1st 10 days of monitoring. Plot the data for the 1st 10 days on the Xbar and R charts.
Is the process in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart.
Based on the X-bar and R Charts that you developed for the 1st 10 days of data, is the process in control?
Group of answer choices
No. The X-bar and R Charts are both out of control.
No. The X-bar Chart is in control, but the R Chart is out of control.
No. The R Chart is in control, but the X-bar Chart is out of control.
Yes. The X-bar and R Charts are both in control.
arrow_forward
The data file, data2.xls (Excel format), has been uploaded to this module. Click, download, and open this file. It contains:
Table 1. Violent victimization, by type of crime, 2016, and 2017
Appendix table 3. Standard errors for table 1: Violent victimization, by type of crime, 2016, and 2017
From the estimation of the number of Rape/sexual assault (298,410) in 2016 at 95% CI. what is the lower limit?
arrow_forward
You have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position.
check the attched pic for full question
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Male
Green Condition
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Gender Too Fast Fine
Female
35
40
Too Fast
10
65
60
Fine
40
Total
100
100
Female Golfers
200
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Note: This exercise is an example of Simpson's Paradox.
39
Fine
9
Total
75
125
Which group shows the highest percentage saying that the greens are too fast?
Females, at 40%
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For…
arrow_forward
A survey about social media reported that 82% of B2B marketers (marketers that focus primarily on
attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731
B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A. What is the probability that a randomly selected respondent is a B2C marketer?
B. What is the probability that a randomly selected respondent plans to increase use of social media
or is a B2C marketer?
C. Explain the difference in the results in (a) and (b)
arrow_forward
A survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A Click the icon to view the contingency table about social media use and marketers.
Contingency table
a. What is the probability that a randomly selected respondent plans to increase use of social media?
(Round to three decimal places as needed.)
Increase Use of
Social Media?
Business Focus
B2B
B2C
Total
b. What is the probability that a randomly selected respondent is a B2C marketer?
Yes
1,049
901
1,950
(Round to three decimal places as needed.)
No
284
768
1,052
Total
1,333
1,669
3,002
c. What is the probability that a randomly selected respondent plans to increase use of social media or is…
arrow_forward
Briefly describe the methods of collecting primary data
arrow_forward
A lecturer at WIN wanted to know if he can predict student’s quiz results by asking them to complete a simple survey. The result of the survey is found in the file: Assignment 2 sem22020 data set 1.Quiz ResultActual Mark (0-15) for quiz student attainedEQRQuiz score (0-15) expected to get before taking the quizStudy Hrs.Number of hours per week (on average) spent studying for StatisticsAgeAge (in years)BBTSatisfaction rating of Big Bang TheorySexM=1 F=0MBMB=1 for good math background, otherwise 0MCMC= 1 if math centre is used regularly, otherwise 0AuHSAuHS = 1 if student completed high school in Australia, otherwise 0LMLM=1 if student likes math, 0 otherwiseTask 1: Variable List(a) Using the variables listed in the table above, Describe each variable.(b) State for each variable whether it is qualitative or quantitative; if it is qualitative, state whether it is nominal or ordinal, and if it is quantitative, state whether it is discrete or continuous.Task 2: HistogramCreate a histogram…
arrow_forward
Determine the type of variation model that best fits the data in the attached image.
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Green Condition
Gender Too Fast
Male
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Female
Too Fast
10
Fine
Fine
40
Female Golfers
Total
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Fine
9
39 51
Total
Which group shows the highest percentage saying that the greens are too fast?
- Select your answer -
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For the low handicappers, the - Select your answer - have a higher percentage who…
arrow_forward
Alert for not submit AI generated answer. I need unique and correct answer. Don't try to copy from anywhere. Do not give answer in image formet and hand writing
arrow_forward
Give a detailed outline for this:
Also, do not give plagirised answer.
Suppose that you have two sets of data. The first set is a list of all the
injuries that were seen in a clinic in a month's time. The second set
contains data on the number of minutes that each patient spent in the
waiting room of a doctor's office. You can make assumptions about
other information or variables that are included in each data set.
For each data set, propose your idea of how best to represent the key
information.
To organize your data, would you choose to use a frequency table, a
cumulative frequency table, or a relative frequency table? Why?
What type of graph would you use to display the organized data from
each frequency distribution? What would be shown on each of the
axes for each graph?
Minimum of 1 scholarly source and one appropriate resource such as
the textbook, math video, and/or math website
arrow_forward
In IBM SPSS, what does clicking on this icon do?
arrow_forward
• Open RStudio.
• Open a new R Script.
• Copy and paste the below code in R. (to present your answer for this task, take a screenshot of
your RStudio session, and paste it into your assessment solutions document).
• Run the code, then copy and paste the outputs and the graphs into your assessment solutions
document.
• Copy and paste the code in your assignment and use comments to explain each line of the code.
Note: to get help in R, highlight the code and press F1.
Code to be copied and pasted into an R script in RStudio:
Treatment <- rep (c("Level_1", "Level_2","Level_3"), each=10)
Response <- c(11, 8, 10, 12, 8, 12, 9, 9, 4, 12,
12, 9, 12, 10, 18, 24, 17, 16, 20, 11,
14, 7, 18, 8, 12, 10, 9, 14, 12, 10)
Data Question_3<-data.frame(Treatment, Response)
View(Data_Question 3)
aggregate (Data Question_3$Response, list (Data Question_3$Treatment), mean)
aggregate (Data_Question_3$Response, list (Data Question_3$Treatment), sd)
aggregate (Response Treatment, Data Question 3, function…
arrow_forward
Describe the three ways identified in the text to find or develop a new research idea from existing research report(s).
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Related Questions
- need evaluation stepsarrow_forwardPlease give excel formulaarrow_forwardNow monitor the process. An additional ten days of data have been collected, see table labeled “1st 10 Days of Monitoring Reservation Processing Time” in the Data File. Develop Xbar and R charts for the 1st 10 days of monitoring. Plot the data for the 1st 10 days on the Xbar and R charts. Is the process in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart. Based on the X-bar and R Charts that you developed for the 1st 10 days of data, is the process in control? Group of answer choices No. The X-bar and R Charts are both out of control. No. The X-bar Chart is in control, but the R Chart is out of control. No. The R Chart is in control, but the X-bar Chart is out of control. Yes. The X-bar and R Charts are both in control.arrow_forward
- The data file, data2.xls (Excel format), has been uploaded to this module. Click, download, and open this file. It contains: Table 1. Violent victimization, by type of crime, 2016, and 2017 Appendix table 3. Standard errors for table 1: Violent victimization, by type of crime, 2016, and 2017 From the estimation of the number of Rape/sexual assault (298,410) in 2016 at 95% CI. what is the lower limit?arrow_forwardYou have been asked to complete a short skills assessment exam that will be given to screen applicants to a Jr. Operations Analyst position. check the attched pic for full questionarrow_forwardRecently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Male Green Condition Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Gender Too Fast Fine Female 35 40 Too Fast 10 65 60 Fine 40 Total 100 100 Female Golfers 200 Green Condition Handicap Under 15 15 or more Too Fast 1 Note: This exercise is an example of Simpson's Paradox. 39 Fine 9 Total 75 125 Which group shows the highest percentage saying that the greens are too fast? Females, at 40% b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For…arrow_forward
- A survey about social media reported that 82% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A. What is the probability that a randomly selected respondent is a B2C marketer? B. What is the probability that a randomly selected respondent plans to increase use of social media or is a B2C marketer? C. Explain the difference in the results in (a) and (b)arrow_forwardA survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A Click the icon to view the contingency table about social media use and marketers. Contingency table a. What is the probability that a randomly selected respondent plans to increase use of social media? (Round to three decimal places as needed.) Increase Use of Social Media? Business Focus B2B B2C Total b. What is the probability that a randomly selected respondent is a B2C marketer? Yes 1,049 901 1,950 (Round to three decimal places as needed.) No 284 768 1,052 Total 1,333 1,669 3,002 c. What is the probability that a randomly selected respondent plans to increase use of social media or is…arrow_forwardBriefly describe the methods of collecting primary dataarrow_forward
- A lecturer at WIN wanted to know if he can predict student’s quiz results by asking them to complete a simple survey. The result of the survey is found in the file: Assignment 2 sem22020 data set 1.Quiz ResultActual Mark (0-15) for quiz student attainedEQRQuiz score (0-15) expected to get before taking the quizStudy Hrs.Number of hours per week (on average) spent studying for StatisticsAgeAge (in years)BBTSatisfaction rating of Big Bang TheorySexM=1 F=0MBMB=1 for good math background, otherwise 0MCMC= 1 if math centre is used regularly, otherwise 0AuHSAuHS = 1 if student completed high school in Australia, otherwise 0LMLM=1 if student likes math, 0 otherwiseTask 1: Variable List(a) Using the variables listed in the table above, Describe each variable.(b) State for each variable whether it is qualitative or quantitative; if it is qualitative, state whether it is nominal or ordinal, and if it is quantitative, state whether it is discrete or continuous.Task 2: HistogramCreate a histogram…arrow_forwardDetermine the type of variation model that best fits the data in the attached image.arrow_forwardRecently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Green Condition Gender Too Fast Male Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Female Too Fast 10 Fine Fine 40 Female Golfers Total Green Condition Handicap Under 15 15 or more Too Fast 1 Fine 9 39 51 Total Which group shows the highest percentage saying that the greens are too fast? - Select your answer - b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For the low handicappers, the - Select your answer - have a higher percentage who…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw Hill

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill