Week-4-Assignment

.pdf

School

Bunker Hill Community College *

*We aren’t endorsed by this school

Course

IT471

Subject

Statistics

Date

Jun 24, 2024

Type

pdf

Pages

36

Uploaded by Sushilghimiray639

Week 4 Shraddha Bijukchhe 2024-05-30 1. Loading Libraries: We will start by loading the socviz and tidyverse libraries, which provide useful functions and datasets for data visualization and manipulation. # Load the socviz library for social science data visualization library (socviz) # Load the tidyverse library for data science, which includes ggplot2 for plotting library (tidyverse) ## Warning: package ’tidyverse’ was built under R version 4.3.3 ## Warning: package ’ggplot2’ was built under R version 4.3.3 ## Warning: package ’tidyr’ was built under R version 4.3.2 ## Warning: package ’readr’ was built under R version 4.3.2 ## Warning: package ’purrr’ was built under R version 4.3.2 ## Warning: package ’dplyr’ was built under R version 4.3.2 ## Warning: package ’stringr’ was built under R version 4.3.2 ## Warning: package ’lubridate’ was built under R version 4.3.2 ## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 -- ## v dplyr 1.1.4 v readr 2.1.5 ## v forcats 1.0.0 v stringr 1.5.1 ## v ggplot2 3.5.0 v tibble 3.2.1 ## v lubridate 1.9.3 v tidyr 1.3.1 ## v purrr 1.0.2 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors #socviz: This library provides datasets and functions specifically useful for social science data visualization. #tidyverse: This is a collection of R packages designed for data science. It includes ggplot2 for data visualization, dplyr for data manipulation, tidyr for data tidying, and others. Summarizing mpg DataFrame: We will display metadata for the mpg dataframe using the summary() function to understand its structure and the summary statistics of its variables. 1
# Display summary statistics for the mpg dataframe summary (mpg) ## manufacturer model displ year ## Length:234 Length:234 Min. :1.600 Min. :1999 ## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999 ## Mode :character Mode :character Median :3.300 Median :2004 ## Mean :3.472 Mean :2004 ## 3rd Qu.:4.600 3rd Qu.:2008 ## Max. :7.000 Max. :2008 ## cyl trans drv cty ## Min. :4.000 Length:234 Length:234 Min. : 9.00 ## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00 ## Median :6.000 Mode :character Mode :character Median :17.00 ## Mean :5.889 Mean :16.86 ## 3rd Qu.:8.000 3rd Qu.:19.00 ## Max. :8.000 Max. :35.00 ## hwy fl class ## Min. :12.00 Length:234 Length:234 ## 1st Qu.:18.00 Class :character Class :character ## Median :24.00 Mode :character Mode :character ## Mean :23.44 ## 3rd Qu.:27.00 ## Max. :44.00 #summary(mpg): This function provides summary statistics for each variable in the dataframe, such as mean, median, minimum, and maximum values, as well as the distribution of categorical variables. The mpg dataframe provides comprehensive data on various car models, including details such as manufac- turer, model, engine displacement (displ), year, number of cylinders (cyl), transmission type (trans), drive type (drv), city miles per gallon (cty), highway miles per gallon (hwy), fuel type (fl), and vehicle class. Summary statistics reveal that the dataset consists of 234 entries with engine displacements ranging from 1.6 to 7.0 liters, a median year of 2004, and vehicles predominantly having 4 to 8 cylinders. Fuel efficiency varies, with city miles per gallon ranging from 9 to 35 (median 17) and highway miles per gallon from 12 to 44 (median 24). These statistics provide an overview of the range, central tendencies, and distribution of the dataset’s variables, essential for understanding the dataset’s characteristics. 2. Summarizing gapminder DataFrame: Similarly, we will display metadata for the gapminder dataframe using the summary() function to understand its structure and the summary statistics of its variables. #Load the library library (gapminder) ## Warning: package ’gapminder’ was built under R version 4.3.3 # Display summary statistics for the gapminder dataframe summary (gapminder) ## country continent year lifeExp ## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60 ## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20 ## Algeria : 12 Asia :396 Median :1980 Median :60.71 2
## Angola : 12 Europe :360 Mean :1980 Mean :59.47 ## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85 ## Australia : 12 Max. :2007 Max. :82.60 ## (Other) :1632 ## pop gdpPercap ## Min. :6.001e+04 Min. : 241.2 ## 1st Qu.:2.794e+06 1st Qu.: 1202.1 ## Median :7.024e+06 Median : 3531.8 ## Mean :2.960e+07 Mean : 7215.3 ## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5 ## Max. :1.319e+09 Max. :113523.1 ## #summary(gapminder): This function provides summary statistics for each variable in the dataframe, giving insights into the range and distribution of the data. The summary of the gapminder dataframe provides a concise statistical overview of key variables related to global demographics and economics. It reveals that the dataset encompasses observations from 142 countries across different continents, with data spanning from 1952 to 2007. Life expectancy varies widely, ranging from 23.6 to 82.6 years, with a median of 60.71 years. Population sizes exhibit substantial diversity, with a minimum of 60,010 and a maximum of 1.319 billion, indicating significant disparities among countries. Similarly, GDP per capita values range from 241.2 to 113,523.1, with a median of 3,531.8, reflecting eco- nomic variations among nations. This summary provides valuable insights into the distribution and central tendencies of key demographic and economic indicators, facilitating further analysis and interpretation of global trends over time. 3. Creating a ggplot Object: We are going to assign a ggplot object to the variable ‘p’. This ggplot object will serve as the foundation for creating visualizations based on the gapminder dataset. Specifically, we are mapping GDP per capita (gdpPercap) to the x-axis and life expectancy (lifeExp) to the y-axis. By doing so, we aim to explore the relationship between a country’s economic prosperity, as indicated by GDP per capita, and the life expectancy of its population. This assignment sets the stage for further customization and layering of graphical elements to create insightful visualizations that illustrate trends and patterns in global demographics and economics. # Create a ggplot object ' p ' with gapminder dataset p <- ggplot ( data = gapminder, mapping = aes ( x = gdpPercap, y = lifeExp, color = continent, size = pop)) # Add points to the plot with transparency set to 0.7 geom_point ( alpha = 0.7 ) + # Set x-axis scale to log10 and format labels as dollar values scale_x_log10 ( labels = scales :: dollar_format ()) + # Manually specify colors for different continents scale_color_manual ( values = c ( "#F8766D" , "#00BA38" , "#619CFF" , "#FFC100" , "#A3A3A3" , "#E76BF3" )) + # Set size range for points and specify breaks for size legend scale_size ( range = c ( 2 , 12 ), breaks = c ( 1e+06 , 5e+07 , 1e+09 )) + # Add axis and legend labels labs ( x = "GDP per capita" , y = "Life expectancy" , size = "Population" , color = "Continent" ) + # Add plot title ggtitle ( "Life expectancy and GDP per capita by continent" ) + 3
# Customize plot theme to minimal theme_minimal () + # Customize various text elements in the plot theme ( plot.title = element_text ( hjust = 0.5 , size = 16 , face = "bold" ), axis.text = element_text ( size = 12 ), axis.title = element_text ( size = 14 ), legend.title = element_text ( size = 14 ), legend.text = element_text ( size = 12 )) # Print the plot print (p) 40 60 80 $1,000 $10,000 $100,000 GDP per capita Life expectancy Population 1e+06 5e+07 1e+09 Continent Africa Americas Asia Europe Oceania Life expectancy and GDP per capita by continent #ggplot Object Creation: A ggplot object named ‘p’ is created using the ggplot() function. Data from the gapminder dataset is mapped to aesthetics (x, y, color, size) using the aes() function. #Geometric Elements: Points are added to the plot using geom_point(), with transparency set to 0.7 to make overlapping points more visible. #Scale Transformations: The x-axis scale is set to log10 using scale_x_log10() to better visualize data with a wide range. Labels on the x-axis are formatted as dollar values using scales::dollar_format(). #Color and Size Scales: Colors for different continents are manually specified using scale_color_manual(), while the size of points is adjusted using scale_size() with a specified range and breaks. #Axis and Legend Labels: Labels for x-axis, y-axis, point size (population), and color (continent) are added using the labs() function. 4
#Themes and Text Customization: The plot theme is set to minimal using theme_minimal(), and various text elements (plot title, axis labels, legend titles, legend text) are customized using the theme() function. #Printing the Plot: The plot object ‘p’ is printed using the print() function to display the visualization. The output of the code is a scatter plot visualizing the relationship between GDP per capita and life expectancy across different continents. Each point represents a country, with the size of the points corre- sponding to the population size of the country and the color representing the continent. The plot illustrates a general trend of higher GDP per capita being associated with longer life expectancy, with notable varia- tions among continents, such as higher GDP per capita and life expectancy in Europe and North America compared to other regions like Africa and Asia. 4. Checking the Structure of p: This step involves using the str() function to examine the internal structure of the ggplot object ‘p’. By inspecting its components, such as data, aesthetics, geometries, scales, and themes, we gain insight into how the plot is constructed. Understanding the structure of ‘p’ allows for better customization and manipulation of the plot, facilitating effective data visualization and analysis. # Display the internal structure of the ggplot object p str (p) ## List of 11 ## $ data : tibble [1,704 x 6] (S3: tbl_df/tbl/data.frame) ## ..$ country : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ... ## ..$ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ... ## ..$ year : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ... ## ..$ lifeExp : num [1:1704] 28.8 30.3 32 34 36.1 ... ## ..$ pop : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 1 ## ..$ gdpPercap: num [1:1704] 779 821 853 836 740 ... ## $ layers :List of 1 ## ..$ :Classes ’LayerInstance’, ’Layer’, ’ggproto’, ’gg’ <ggproto object: Class LayerInstance, Layer, ## aes_params: list ## compute_aesthetics: function ## compute_geom_1: function ## compute_geom_2: function ## compute_position: function ## compute_statistic: function ## computed_geom_params: list ## computed_mapping: uneval ## computed_stat_params: list ## constructor: call ## data: waiver ## draw_geom: function ## finish_statistics: function ## geom: <ggproto object: Class GeomPoint, Geom, gg> ## aesthetics: function ## default_aes: uneval ## draw_group: function ## draw_key: function ## draw_layer: function ## draw_panel: function ## extra_params: na.rm ## handle_na: function ## non_missing_aes: size shape colour ## optional_aes: 5
## parameters: function ## rename_size: FALSE ## required_aes: x y ## setup_data: function ## setup_params: function ## use_defaults: function ## super: <ggproto object: Class Geom, gg> ## geom_params: list ## inherit.aes: TRUE ## layer_data: function ## map_statistic: function ## mapping: NULL ## position: <ggproto object: Class PositionIdentity, Position, gg> ## compute_layer: function ## compute_panel: function ## required_aes: ## setup_data: function ## setup_params: function ## super: <ggproto object: Class Position, gg> ## print: function ## setup_layer: function ## show.legend: NA ## stat: <ggproto object: Class StatIdentity, Stat, gg> ## aesthetics: function ## compute_group: function ## compute_layer: function ## compute_panel: function ## default_aes: uneval ## dropped_aes: ## extra_params: na.rm ## finish_layer: function ## non_missing_aes: ## optional_aes: ## parameters: function ## required_aes: ## retransform: TRUE ## setup_data: function ## setup_params: function ## super: <ggproto object: Class Stat, gg> ## stat_params: list ## super: <ggproto object: Class Layer, gg> ## $ scales :Classes ’ScalesList’, ’ggproto’, ’gg’ <ggproto object: Class ScalesList, gg> ## add: function ## add_defaults: function ## add_missing: function ## backtransform_df: function ## clone: function ## find: function ## get_scales: function ## has_scale: function ## input: function ## map_df: function ## n: function ## non_position_scales: function 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help