HW 1 solutions
.pdf
keyboard_arrow_up
School
University of Texas, Rio Grande Valley *
*We aren’t endorsed by this school
Course
AUDITING
Subject
Statistics
Date
Nov 24, 2024
Type
Pages
14
Uploaded by LieutenantResolve12096
GV900
Homework 1
Fall 2023
Homework 1: Answer Key
Question 1
Load the ggplot2 library and then the midwest dataset.
Display the column
names for the midwest dataset.
[3 marks]
library(ggplot2)
data(
"midwest"
)
# OR data(midwest) is also fine
?midwest
names(midwest)
[1] “PID” “county” “state”
[4] “area” “poptotal” “popdensity”
[7] “popwhite” “popblack” “popamerindian”
[10] “popasian” “popother” “percwhite”
[13] “percblack” “percamerindan” “percasian”
[16] “percother” “popadults” “perchsd”
[19] “percollege” “percprof” “poppovertyknown”
[22] “percpovertyknown” “percbelowpoverty” “percchildbelowpovert” [25] “per-
cadultpoverty” “percelderlypoverty” “inmetro”
[28] “category”
GV900
Homework 1
Fall 2023
# or colnames(midwest) is also fine
Question 2
What type of variable is state? How can you tell? How many different states
are there in this dataset? What is the modal state in the dataset? [2 marks]
# create a frequency table for variable state
table(midwest$state)
##
##
IL
IN
MI
OH
WI
## 102
92
83
88
72
The variable ‘state’ is a nominal/categorical variable, as there is no inherent
order among the states and no numbers attached to them either. Within this
dataset, there are five distinct states. The modal state, meaning the state with
the highest frequency, is IL (Illinois).
Question 3
Look at the summary statistics for the variable popwhite. Just based on looking
at these numbers, describe the likely distribution of the variable. That is, in your
own words and in 2-3 sentences, describe the range, and the likely skewness of
the variable (if any). Make sure to explain why you think the variable is/isn’t
skewed a certain way based on just these summary statistics numbers [4 marks]
GV900
Homework 1
Fall 2023
summary(midwest$popwhite)
##
Min. 1st Qu.
Median
Mean 3rd Qu.
Max.
##
416
18630
34471
81840
72968 3204947
range(midwest$popwhite)
## [1]
416 3204947
# for the range, it's also fine to use max(midwest$popwhite) - min(midwest$pop
Based on the summary statistics, we can observe that the variable has a substan-
tial range, extending from a minimum number of 416 white people in a county to
a maximum value of 3,204,947. This wide range indicates a high degree of vari-
ability in the data. The median (34,471) is notably less than the mean (81,840),
suggesting a right-skewed distribution, as the mean is pulled towards the higher
values by extremely high data points.
Question 4
Now, make a histogram for the popwhite variable. In doing so, make sure that
you change the default binwidth, color, and axes labels, and that you give your
histogram a relevant title. Also make sure that you save your histogram as an
object rather than displaying it directly. If you were to describe the distribution
of this variable based on the histogram, would your description change in any
way from the previous answer? If yes, in what way? If no, why not? Explain
why you think this variable is skewed or not, i.e., think about what the variable
GV900
Homework 1
Fall 2023
measures and explain why it probably looks the way it does. (Note that this
last part is not about how you can tell whether the variable is skewed or not
but, rather, asks you think about what the variable measures and why it makes
sense, or doesn’t make sense, that the variable is skewed or not in the way that
it is.) [6 marks]
# create histogram for 'popwhite'
popwhite_hist
<-
ggplot(
data =
midwest, aes(
x =
popwhite)) +
geom_histogram(
binwidth =
100000
,
color =
"purple"
,
fill =
"skyblue"
) +
labs(
x =
"Number of white people"
,
y =
"Frequency"
,
title =
"Histogram of white people in every county"
)
popwhite_hist
GV900
Homework 1
Fall 2023
0
100
200
300
0e+00
1e+06
2e+06
3e+06
Number of white people
Frequency
Histogram of white people in every county
Observing the histogram, I would draw a similar conclusion to before that the
variable is very much right-skewed. As the figure shows, most counties have up
to 200,000 white people with some having more but still under approximately
1 million. There are a few large outliers, especially the maximum value, which
gives the distribution a very long tail. Thus the decription of the variable would
be similar to before. As for
why
the variable looks this way, that is likely because
it is a raw count of the number of white people in a given county without taking
into account the size of the county or the overall population. In other words,
geographically larger counties or densely populated counties (such as counties
containing big cities) will have a higher number of people living there and also
a high number of white people living there. However, there won’t be very many
such counties as not every county has a huge urban center in it so the dataset
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Andrew a sales representative of Data Network Company (DNC) must visit 6 towns within the coast province in the coming month. If there are nine major towns in the region to be visited, how many different groups of 6 are there that Andrew might visit?
arrow_forward
find mode of the data 2,3,4,5,6,7,7,6,5,5,5,4,2,1,0,10,77,88,
arrow_forward
tion 2 of 15
Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline
in dollars per gallon varied from state to state and are listed below.
$2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31
Click to download the data in your preferred format.
CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc
Calculate the range of the price of gas. Give your solution to the nearest cent.
range:
dollars per gallon
DELL
&
4.
7
8.
arrow_forward
Here are the numbers of times 9 people ate out last month.
5
,
6
,
4
,
5
,
6
,
5
,
7
,
7
,
7
Send data to calculator
Find the modes of this data set.
If there is more than one mode, write them separated by commas.If there is no mode, click on "No mode."
arrow_forward
How to construct Scatterplot?
arrow_forward
The difference between the largest and smallest values in an ordered array is called the interquartile range.
Select one:
True
False
arrow_forward
This pictograph shows the approximate number of passenger arrivals and departures at five selected
year, What is the approximate total number of arrivals and departures at City 2 and City 5 airports?
Previous question
The approximate total number of arrivals and departures at City 2 and City 5 airports is
million.
City 1
City 2
City 3
City 4
City 5
10 million passenger arrivals and departures
Next
MacBook Air
DII
DD
esc
F9
F10
F11
F4
F6
FB
&
3
4.
7
云
CO
arrow_forward
find the mode of 9,12,15,15,19,20
arrow_forward
findy vabe
arrow_forward
the numbers of touchdown passes that peyton manning threw in each of his 13 seasons with the indianapolis colts are below
26, 26, 33, 26,27,29, 49, 28, 31, 31, 27, 33,33
Find the five number summary?
What is the interquartile rang of data?
Find the outliners in the data?
arrow_forward
Here are the numbers of children in 12 elementary school classes.
20
,
20
,
17
,
17
,
19
,
18
,
20
,
19
,
19
,
18
,
18
,
16
Send data to calculator
Find the modes of this data set.
If there is more than one mode, write them separated by commas.If there is no mode, click on "No mode."
arrow_forward
How will I draw R shiny app using this dataset? Thanks.
There are two files:
server.R and ui.R
arrow_forward
Directions and Dataset : Please see the image for the fill in the blank squares that go with this question
Please import the resilience dataset into R-studio by the R command below:
resilience <- read.csv("https://raw.githubusercontent.com/njlytal/MATH315/main/resilience.csv")
This is survey data collected from 1,350 students across 25 medical schools in the United States as part of a study examining the life of students and residents in healthcare professions.
Variable names and the descriptions of the dataset are as follows:
1. dreem: Perception of the educational environment was assessed via the DREEM questionnaire; possible scores range from 0 to 200, with higher scores representing a more positive perception about the educational environment. Sample questions include "I feel I am being well prepared for my profession" and "The atmosphere motivates me as a learner".
2. train: The first two years of medical school are focused on basic science education (pre-clinical…
arrow_forward
List 6 ways of collecting data and state two ways in which each can be used
arrow_forward
Write out a multiplication table for S3
arrow_forward
Here are the numbers of times 9 people ate out last month.
4
,
3
,
5
,
4
,
5
,
6
,
6
,
4
,
5
Send data to calculator
Find the modes of this data set.
If there is more than one mode, write them separated by commas.If there is no mode, click on "No mode."
No mode
Check
Save For Later
Submit Assignment
Terms of Use
arrow_forward
find 5 number summary for given data
2,5,8,9,11,15,14,16,18,20,25,30,35,40,48,50
arrow_forward
Identify the pattern 6,8,14,22,36,58,
arrow_forward
What is 25-7 eas a mixed numbe
Cincc.edu/webapps/assessment/take/launch.jsp?course assessmen
Question Completion Status:
QUESTION 3
Write a mixed number represented by the shading.
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).
BIUS Paragraph
Arial
X 0
IT
QS
==
X
=
arrow_forward
Define interset.
arrow_forward
What is the center of {12.8, 5.7, 7.9, 1.3, 3.2, 2.8}?
arrow_forward
Guadalajara Tours' daily sightseeing trip has a capacity of 55 people. Each day, tour officials record the number of people making the trip. The data for a
selection of 22 summer days are as follows.
31, 36, 41, 42, 43, 43, 46, 46, 46, 47, 47, 49, 51, 51, 51, 51, 52, 52, 52, 54, 54, 54
Send data to calculator
Send data to Excel
Frequency
10-
10
6+
4
4.
2+
1.
0-
30
35
40
45
50
55
Number of people on trip
OMean
(a) For these data, which measures of
central tendency take more than one value?
Choose all that apply.
OMedian
Mode
ONone of these measures
OMean
(b) Suppose that the measurement 31 (the
smallest measurement in the data set) were
OMedian
replaced by 15. Which measures of central
OMode
tondoncu would bo offoctod by the chano02
Explanation
Check
O 2021 McGraw-Hill Education. All Rights Reserved. Terms of Use Privacy Access
APA
pe here to search
99+
arrow_forward
List 7 ways of representing data
arrow_forward
What are the favorite sports of the high school across Harlem Children Zone?
arrow_forward
We are interested in whether there is a relationship between the ranking of a state and the area of the state.
State
# letters inname
Year entered theUnion
Rank for entering theUnion
Area (squaremiles)
Alabama
7
1819
22
52,423
Colorado
8
1876
38
104,100
Hawaii
6
1959
50
10,932
Iowa
4
1846
29
56,276
Maryland
8
1788
7
12,407
Missouri
8
1821
24
69,709
New Jersey
9
1787
3
8,722
Ohio
4
1803
17
44,828
SouthCarolina
13
1788
8
32,008
Utah
4
1896
45
84,904
Wisconsin
9
1848
30
65,499
Part (a)
Part (b)
Part (c)
Part (d)
Calculate the least-squares line. Put the equation in the form of:
ŷ = a + bx.
(Round your answers to two decimal places.)
ŷ = + x
Part (e)
Find the correlation coefficient. (Round your answer to two decimal places.)What does it imply about the significance of the relationship?
It implies there is a linear relationship between the variables.
Part (f)
Find the estimated areas…
arrow_forward
You've heard of "Florida Man;" now meet "Florida Bear." This problem involves data from a
subspecies of black bear found in Florida, Ursus americanus floridanus. The data were collected by T. D.
Bartareau as part of a study published in the Journal of Fish and Wildlife Management (2017, vol 8, pp 234-
239). Before you begin consult the info sheet included at the end
(a) Do you predict an allometric or isometric scaling relationship between body weight and body length?
Explain.
(b) Based on your answer to part a, what would a plot of log body weight (vertical axis) versus log body length
(horizontal axis) look like?
(c) Using the data provided, create a plot of log body weight versus log body length. Make sure to label the
axes. Why might someone think your plot fails to provide clear support for your claim in part b?
(d) Using the tools described on the info sheet to isolate portions of the dataset, refine your use of data in part
c to strengthen support for your claim in b. Why does…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Related Questions
- Andrew a sales representative of Data Network Company (DNC) must visit 6 towns within the coast province in the coming month. If there are nine major towns in the region to be visited, how many different groups of 6 are there that Andrew might visit?arrow_forwardfind mode of the data 2,3,4,5,6,7,7,6,5,5,5,4,2,1,0,10,77,88,arrow_forwardtion 2 of 15 Last summer, the Smith family drove through seven different states and visited various popular landmarks. The prices of gasoline in dollars per gallon varied from state to state and are listed below. $2.34, $2.75, $2.48, $3.58, $2.87, $2.53, $3.31 Click to download the data in your preferred format. CrunchIt! CSV Excel JMP Mac Text Minitab PC Text R SPSS TI Calc Calculate the range of the price of gas. Give your solution to the nearest cent. range: dollars per gallon DELL & 4. 7 8.arrow_forward
- Here are the numbers of times 9 people ate out last month. 5 , 6 , 4 , 5 , 6 , 5 , 7 , 7 , 7 Send data to calculator Find the modes of this data set. If there is more than one mode, write them separated by commas.If there is no mode, click on "No mode."arrow_forwardHow to construct Scatterplot?arrow_forwardThe difference between the largest and smallest values in an ordered array is called the interquartile range. Select one: True Falsearrow_forward
- This pictograph shows the approximate number of passenger arrivals and departures at five selected year, What is the approximate total number of arrivals and departures at City 2 and City 5 airports? Previous question The approximate total number of arrivals and departures at City 2 and City 5 airports is million. City 1 City 2 City 3 City 4 City 5 10 million passenger arrivals and departures Next MacBook Air DII DD esc F9 F10 F11 F4 F6 FB & 3 4. 7 云 COarrow_forwardfind the mode of 9,12,15,15,19,20arrow_forwardfindy vabearrow_forward
- the numbers of touchdown passes that peyton manning threw in each of his 13 seasons with the indianapolis colts are below 26, 26, 33, 26,27,29, 49, 28, 31, 31, 27, 33,33 Find the five number summary? What is the interquartile rang of data? Find the outliners in the data?arrow_forwardHere are the numbers of children in 12 elementary school classes. 20 , 20 , 17 , 17 , 19 , 18 , 20 , 19 , 19 , 18 , 18 , 16 Send data to calculator Find the modes of this data set. If there is more than one mode, write them separated by commas.If there is no mode, click on "No mode."arrow_forwardHow will I draw R shiny app using this dataset? Thanks. There are two files: server.R and ui.Rarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage LearningMathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,