Week-4-Assignment
.pdf
keyboard_arrow_up
School
Bunker Hill Community College *
*We aren’t endorsed by this school
Course
IT471
Subject
Statistics
Date
Jun 24, 2024
Type
Pages
36
Uploaded by Sushilghimiray639
Week 4
Shraddha Bijukchhe
2024-05-30
1. Loading Libraries: We will start by loading the socviz and tidyverse libraries, which provide useful
functions and datasets for data visualization and manipulation.
# Load the socviz library for social science data visualization
library
(socviz)
# Load the tidyverse library for data science, which includes ggplot2 for plotting
library
(tidyverse)
## Warning: package ’tidyverse’ was built under R version 4.3.3
## Warning: package ’ggplot2’ was built under R version 4.3.3
## Warning: package ’tidyr’ was built under R version 4.3.2
## Warning: package ’readr’ was built under R version 4.3.2
## Warning: package ’purrr’ was built under R version 4.3.2
## Warning: package ’dplyr’ was built under R version 4.3.2
## Warning: package ’stringr’ was built under R version 4.3.2
## Warning: package ’lubridate’ was built under R version 4.3.2
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v dplyr
1.1.4
v readr
2.1.5
## v forcats
1.0.0
v stringr
1.5.1
## v ggplot2
3.5.0
v tibble
3.2.1
## v lubridate 1.9.3
v tidyr
1.3.1
## v purrr
1.0.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()
masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#socviz: This library provides datasets and functions specifically useful for social science data visualization.
#tidyverse:
This is a collection of R packages designed for data science.
It includes ggplot2 for data
visualization, dplyr for data manipulation, tidyr for data tidying, and others.
Summarizing mpg DataFrame:
We will display metadata for the mpg dataframe using the summary()
function to understand its structure and the summary statistics of its variables.
1
# Display summary statistics for the mpg dataframe
summary
(mpg)
##
manufacturer
model
displ
year
##
Length:234
Length:234
Min.
:1.600
Min.
:1999
##
Class :character
Class :character
1st Qu.:2.400
1st Qu.:1999
##
Mode
:character
Mode
:character
Median :3.300
Median :2004
##
Mean
:3.472
Mean
:2004
##
3rd Qu.:4.600
3rd Qu.:2008
##
Max.
:7.000
Max.
:2008
##
cyl
trans
drv
cty
##
Min.
:4.000
Length:234
Length:234
Min.
: 9.00
##
1st Qu.:4.000
Class :character
Class :character
1st Qu.:14.00
##
Median :6.000
Mode
:character
Mode
:character
Median :17.00
##
Mean
:5.889
Mean
:16.86
##
3rd Qu.:8.000
3rd Qu.:19.00
##
Max.
:8.000
Max.
:35.00
##
hwy
fl
class
##
Min.
:12.00
Length:234
Length:234
##
1st Qu.:18.00
Class :character
Class :character
##
Median :24.00
Mode
:character
Mode
:character
##
Mean
:23.44
##
3rd Qu.:27.00
##
Max.
:44.00
#summary(mpg): This function provides summary statistics for each variable in the dataframe, such as
mean, median, minimum, and maximum values, as well as the distribution of categorical variables.
The mpg dataframe provides comprehensive data on various car models, including details such as manufac-
turer, model, engine displacement (displ), year, number of cylinders (cyl), transmission type (trans), drive
type (drv), city miles per gallon (cty), highway miles per gallon (hwy), fuel type (fl), and vehicle class.
Summary statistics reveal that the dataset consists of 234 entries with engine displacements ranging from
1.6 to 7.0 liters, a median year of 2004, and vehicles predominantly having 4 to 8 cylinders. Fuel efficiency
varies, with city miles per gallon ranging from 9 to 35 (median 17) and highway miles per gallon from 12
to 44 (median 24). These statistics provide an overview of the range, central tendencies, and distribution of
the dataset’s variables, essential for understanding the dataset’s characteristics.
2. Summarizing gapminder DataFrame: Similarly, we will display metadata for the gapminder dataframe
using the summary() function to understand its structure and the summary statistics of its variables.
#Load the library
library
(gapminder)
## Warning: package ’gapminder’ was built under R version 4.3.3
# Display summary statistics for the gapminder dataframe
summary
(gapminder)
##
country
continent
year
lifeExp
##
Afghanistan:
12
Africa
:624
Min.
:1952
Min.
:23.60
##
Albania
:
12
Americas:300
1st Qu.:1966
1st Qu.:48.20
##
Algeria
:
12
Asia
:396
Median :1980
Median :60.71
2
##
Angola
:
12
Europe
:360
Mean
:1980
Mean
:59.47
##
Argentina
:
12
Oceania : 24
3rd Qu.:1993
3rd Qu.:70.85
##
Australia
:
12
Max.
:2007
Max.
:82.60
##
(Other)
:1632
##
pop
gdpPercap
##
Min.
:6.001e+04
Min.
:
241.2
##
1st Qu.:2.794e+06
1st Qu.:
1202.1
##
Median :7.024e+06
Median :
3531.8
##
Mean
:2.960e+07
Mean
:
7215.3
##
3rd Qu.:1.959e+07
3rd Qu.:
9325.5
##
Max.
:1.319e+09
Max.
:113523.1
##
#summary(gapminder): This function provides summary statistics for each variable in the dataframe, giving
insights into the range and distribution of the data.
The summary of the gapminder dataframe provides a concise statistical overview of key variables related to
global demographics and economics. It reveals that the dataset encompasses observations from 142 countries
across different continents, with data spanning from 1952 to 2007. Life expectancy varies widely, ranging
from 23.6 to 82.6 years, with a median of 60.71 years. Population sizes exhibit substantial diversity, with
a minimum of 60,010 and a maximum of 1.319 billion, indicating significant disparities among countries.
Similarly, GDP per capita values range from 241.2 to 113,523.1, with a median of 3,531.8, reflecting eco-
nomic variations among nations. This summary provides valuable insights into the distribution and central
tendencies of key demographic and economic indicators, facilitating further analysis and interpretation of
global trends over time.
3. Creating a ggplot Object: We are going to assign a ggplot object to the variable ‘p’. This ggplot object
will serve as the foundation for creating visualizations based on the gapminder dataset. Specifically, we
are mapping GDP per capita (gdpPercap) to the x-axis and life expectancy (lifeExp) to the y-axis. By
doing so, we aim to explore the relationship between a country’s economic prosperity, as indicated by
GDP per capita, and the life expectancy of its population. This assignment sets the stage for further
customization and layering of graphical elements to create insightful visualizations that illustrate trends
and patterns in global demographics and economics.
# Create a ggplot object
'
p
'
with gapminder dataset
p
<-
ggplot
(
data =
gapminder,
mapping =
aes
(
x =
gdpPercap,
y =
lifeExp,
color =
continent,
size =
pop))
# Add points to the plot with transparency set to 0.7
geom_point
(
alpha =
0.7
)
+
# Set x-axis scale to log10 and format labels as dollar values
scale_x_log10
(
labels =
scales
::
dollar_format
())
+
# Manually specify colors for different continents
scale_color_manual
(
values =
c
(
"#F8766D"
,
"#00BA38"
,
"#619CFF"
,
"#FFC100"
,
"#A3A3A3"
,
"#E76BF3"
))
+
# Set size range for points and specify breaks for size legend
scale_size
(
range =
c
(
2
,
12
),
breaks =
c
(
1e+06
,
5e+07
,
1e+09
))
+
# Add axis and legend labels
labs
(
x =
"GDP per capita"
,
y =
"Life expectancy"
,
size =
"Population"
,
color =
"Continent"
)
+
# Add plot title
ggtitle
(
"Life expectancy and GDP per capita by continent"
)
+
3
# Customize plot theme to minimal
theme_minimal
()
+
# Customize various text elements in the plot
theme
(
plot.title =
element_text
(
hjust =
0.5
,
size =
16
,
face =
"bold"
),
axis.text =
element_text
(
size =
12
),
axis.title =
element_text
(
size =
14
),
legend.title =
element_text
(
size =
14
),
legend.text =
element_text
(
size =
12
))
# Print the plot
print
(p)
40
60
80
$1,000
$10,000
$100,000
GDP per capita
Life expectancy
Population
1e+06
5e+07
1e+09
Continent
Africa
Americas
Asia
Europe
Oceania
Life expectancy and GDP per capita by continent
#ggplot Object Creation: A ggplot object named ‘p’ is created using the ggplot() function. Data from the
gapminder dataset is mapped to aesthetics (x, y, color, size) using the aes() function.
#Geometric Elements: Points are added to the plot using geom_point(), with transparency set to 0.7 to
make overlapping points more visible.
#Scale Transformations: The x-axis scale is set to log10 using scale_x_log10() to better visualize data with
a wide range. Labels on the x-axis are formatted as dollar values using scales::dollar_format().
#Color and Size Scales: Colors for different continents are manually specified using scale_color_manual(),
while the size of points is adjusted using scale_size() with a specified range and breaks.
#Axis and Legend Labels: Labels for x-axis, y-axis, point size (population), and color (continent) are added
using the labs() function.
4
#Themes and Text Customization: The plot theme is set to minimal using theme_minimal(), and various
text elements (plot title, axis labels, legend titles, legend text) are customized using the theme() function.
#Printing the Plot: The plot object ‘p’ is printed using the print() function to display the visualization.
The output of the code is a scatter plot visualizing the relationship between GDP per capita and life
expectancy across different continents. Each point represents a country, with the size of the points corre-
sponding to the population size of the country and the color representing the continent. The plot illustrates
a general trend of higher GDP per capita being associated with longer life expectancy, with notable varia-
tions among continents, such as higher GDP per capita and life expectancy in Europe and North America
compared to other regions like Africa and Asia.
4. Checking the Structure of p: This step involves using the str() function to examine the internal structure
of the ggplot object ‘p’. By inspecting its components, such as data, aesthetics, geometries, scales, and
themes, we gain insight into how the plot is constructed. Understanding the structure of ‘p’ allows for
better customization and manipulation of the plot, facilitating effective data visualization and analysis.
# Display the internal structure of the ggplot object p
str
(p)
## List of 11
##
$ data
: tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame)
##
..$ country
: Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##
..$ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##
..$ year
: int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##
..$ lifeExp
: num [1:1704] 28.8 30.3 32 34 36.1 ...
##
..$ pop
: int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 1
##
..$ gdpPercap: num [1:1704] 779 821 853 836 740 ...
##
$ layers
:List of 1
##
..$ :Classes ’LayerInstance’, ’Layer’, ’ggproto’, ’gg’ <ggproto object: Class LayerInstance, Layer,
##
aes_params: list
##
compute_aesthetics: function
##
compute_geom_1: function
##
compute_geom_2: function
##
compute_position: function
##
compute_statistic: function
##
computed_geom_params: list
##
computed_mapping: uneval
##
computed_stat_params: list
##
constructor: call
##
data: waiver
##
draw_geom: function
##
finish_statistics: function
##
geom: <ggproto object: Class GeomPoint, Geom, gg>
##
aesthetics: function
##
default_aes: uneval
##
draw_group: function
##
draw_key: function
##
draw_layer: function
##
draw_panel: function
##
extra_params: na.rm
##
handle_na: function
##
non_missing_aes: size shape colour
##
optional_aes:
5
##
parameters: function
##
rename_size: FALSE
##
required_aes: x y
##
setup_data: function
##
setup_params: function
##
use_defaults: function
##
super:
<ggproto object: Class Geom, gg>
##
geom_params: list
##
inherit.aes: TRUE
##
layer_data: function
##
map_statistic: function
##
mapping: NULL
##
position: <ggproto object: Class PositionIdentity, Position, gg>
##
compute_layer: function
##
compute_panel: function
##
required_aes:
##
setup_data: function
##
setup_params: function
##
super:
<ggproto object: Class Position, gg>
##
print: function
##
setup_layer: function
##
show.legend: NA
##
stat: <ggproto object: Class StatIdentity, Stat, gg>
##
aesthetics: function
##
compute_group: function
##
compute_layer: function
##
compute_panel: function
##
default_aes: uneval
##
dropped_aes:
##
extra_params: na.rm
##
finish_layer: function
##
non_missing_aes:
##
optional_aes:
##
parameters: function
##
required_aes:
##
retransform: TRUE
##
setup_data: function
##
setup_params: function
##
super:
<ggproto object: Class Stat, gg>
##
stat_params: list
##
super:
<ggproto object: Class Layer, gg>
##
$ scales
:Classes ’ScalesList’, ’ggproto’, ’gg’ <ggproto object: Class ScalesList, gg>
##
add: function
##
add_defaults: function
##
add_missing: function
##
backtransform_df: function
##
clone: function
##
find: function
##
get_scales: function
##
has_scale: function
##
input: function
##
map_df: function
##
n: function
##
non_position_scales: function
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Spend at least 20 minutes looking at a few of the different unique data visualization ideas foundat this blog: http://flowingdata.com/. Discuss one of the posts in a few sentences, copying inany appropriate (and appropriately resized) graphics.
arrow_forward
Answer the following questions
arrow_forward
Answer the question below in the picture
arrow_forward
# 4
arrow_forward
Thank you for any feedback on this one.
arrow_forward
Derive the linear interpolant through the two data points (1.0, 2.0) and (1.1, 2.5).Then, derive the quadratic interpolant through these two points and the point (1.2, 1.5). Show a graphdepicting this situation.
arrow_forward
provide three examples of the rectangular coordinate system, which is the basis for most consumer graphs
arrow_forward
Texas experienced a severe drought and a long heat wave in 2011. Access the Climate Assessment report.
Here is the Link ! https://nca2014.globalchange.gov/report/sectors/energy-water-and-land#graphic-16636
Figure 10.3 of the report shows the average summer temperatures and total rainfalls in Texas from 1919 to 2012 (you may want to download the graph to see it better). The dots indicate average temperatures and total rainfalls for specific years. The red dots show the range of the data.
Part A: Ignoring 2011, the year with the least total rainfall June-August was and the rainfall for that year was about .
Part B: Ignoring 2011, the year with the greatest total rainfall June-August was and the rainfall for that year was about .
Part C: Ignoring 2011, the year with the lowest average temperature June-August was and the average temperature for that year was about .
Part D: Ignoring 2011, the year with the highest average temperature June-August was and the average temperature…
arrow_forward
Plz help asap 40 need crit value as well
arrow_forward
The whole data set will be in the two pictures
arrow_forward
heliumfootballs.txt
StatCrunch
Applets
Edit -
Data -
Stat
Graph Help-
Row
Distances of air Distances of he
var3
var4
var5
1
19
11
20
12
20
14
22
22
22
23
22
24
25
26
8.
25
26
6.
25
26
10
25
27
11
26
28
12
26
28
13
27
28
14
27
29
15
27
29
16
28
29
17
28
29
18
28
30
19
28
30
20
28
30
21
29
30
22
29
31
23
29
31
24
31
32
25
31
32
26
31
33
27
32
34
28
33
35
29
34
39
30
31
32
33
234 5679 o
arrow_forward
The r code for side by side boxplot of vitamind v newage and vitamin d v country.
Scatterplot code for relationship between vitamin d level and age.
arrow_forward
can a cause and effect relationship be determined?
arrow_forward
Please use inequalities to define the data collection of the following solid shape.
arrow_forward
Describe the graphical representation. Write down the types of graphs and diagram with the help of examples for each type.
arrow_forward
Graph
ScarX
れ
arrow_forward
UCS
Dashboard
* Big Ideas Math
https://www.bigideasmath.com/MRL/public/app/#/student/assessment;isPlayerWindow=true;assignmentld=7f..
BIG IDEAS MATH
Sydney Hewer
Alg 2 6.4/6/5 Assignment
CALCULATORS
8.
i
Write a rule for g that represents a translation 4 units right and 1 unit down, followed by a vertical shrink by a factor of
of the
3
graph of f(x) = e.
g(x)
Check
? Help
( PREV
5 6 7
3 4
NEXT
17 of 23 answered
...
...
P Type here to search
2:32 PM
2П/2021
DELL
00
00
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Related Questions
- Spend at least 20 minutes looking at a few of the different unique data visualization ideas foundat this blog: http://flowingdata.com/. Discuss one of the posts in a few sentences, copying inany appropriate (and appropriately resized) graphics.arrow_forwardAnswer the following questionsarrow_forwardAnswer the question below in the picturearrow_forward
- provide three examples of the rectangular coordinate system, which is the basis for most consumer graphsarrow_forwardTexas experienced a severe drought and a long heat wave in 2011. Access the Climate Assessment report. Here is the Link ! https://nca2014.globalchange.gov/report/sectors/energy-water-and-land#graphic-16636 Figure 10.3 of the report shows the average summer temperatures and total rainfalls in Texas from 1919 to 2012 (you may want to download the graph to see it better). The dots indicate average temperatures and total rainfalls for specific years. The red dots show the range of the data. Part A: Ignoring 2011, the year with the least total rainfall June-August was and the rainfall for that year was about . Part B: Ignoring 2011, the year with the greatest total rainfall June-August was and the rainfall for that year was about . Part C: Ignoring 2011, the year with the lowest average temperature June-August was and the average temperature for that year was about . Part D: Ignoring 2011, the year with the highest average temperature June-August was and the average temperature…arrow_forwardPlz help asap 40 need crit value as wellarrow_forward
- The whole data set will be in the two picturesarrow_forwardheliumfootballs.txt StatCrunch Applets Edit - Data - Stat Graph Help- Row Distances of air Distances of he var3 var4 var5 1 19 11 20 12 20 14 22 22 22 23 22 24 25 26 8. 25 26 6. 25 26 10 25 27 11 26 28 12 26 28 13 27 28 14 27 29 15 27 29 16 28 29 17 28 29 18 28 30 19 28 30 20 28 30 21 29 30 22 29 31 23 29 31 24 31 32 25 31 32 26 31 33 27 32 34 28 33 35 29 34 39 30 31 32 33 234 5679 oarrow_forwardThe r code for side by side boxplot of vitamind v newage and vitamin d v country. Scatterplot code for relationship between vitamin d level and age.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillElementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning