Week-4-Assignment

pdf

School

Bunker Hill Community College *

*We aren’t endorsed by this school

Course

IT471

Subject

Statistics

Date

Jun 24, 2024

Type

pdf

Pages

Uploaded by Sushilghimiray639

Week 4 Shraddha Bijukchhe 2024-05-30 1. Loading Libraries: We will start by loading the socviz and tidyverse libraries, which provide useful functions and datasets for data visualization and manipulation. # Load the socviz library for social science data visualization library (socviz) # Load the tidyverse library for data science, which includes ggplot2 for plotting library (tidyverse) ## Warning: package ’tidyverse’ was built under R version 4.3.3 ## Warning: package ’ggplot2’ was built under R version 4.3.3 ## Warning: package ’tidyr’ was built under R version 4.3.2 ## Warning: package ’readr’ was built under R version 4.3.2 ## Warning: package ’purrr’ was built under R version 4.3.2 ## Warning: package ’dplyr’ was built under R version 4.3.2 ## Warning: package ’stringr’ was built under R version 4.3.2 ## Warning: package ’lubridate’ was built under R version 4.3.2 ## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 -- ## v dplyr 1.1.4 v readr 2.1.5 ## v forcats 1.0.0 v stringr 1.5.1 ## v ggplot2 3.5.0 v tibble 3.2.1 ## v lubridate 1.9.3 v tidyr 1.3.1 ## v purrr 1.0.2 ## -- Conflicts ------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() ## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors #socviz: This library provides datasets and functions specifically useful for social science data visualization. #tidyverse: This is a collection of R packages designed for data science. It includes ggplot2 for data visualization, dplyr for data manipulation, tidyr for data tidying, and others. Summarizing mpg DataFrame: We will display metadata for the mpg dataframe using the summary() function to understand its structure and the summary statistics of its variables. 1

# Display summary statistics for the mpg dataframe summary (mpg) ## manufacturer model displ year ## Length:234 Length:234 Min. :1.600 Min. :1999 ## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999 ## Mode :character Mode :character Median :3.300 Median :2004 ## Mean :3.472 Mean :2004 ## 3rd Qu.:4.600 3rd Qu.:2008 ## Max. :7.000 Max. :2008 ## cyl trans drv cty ## Min. :4.000 Length:234 Length:234 Min. : 9.00 ## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00 ## Median :6.000 Mode :character Mode :character Median :17.00 ## Mean :5.889 Mean :16.86 ## 3rd Qu.:8.000 3rd Qu.:19.00 ## Max. :8.000 Max. :35.00 ## hwy fl class ## Min. :12.00 Length:234 Length:234 ## 1st Qu.:18.00 Class :character Class :character ## Median :24.00 Mode :character Mode :character ## Mean :23.44 ## 3rd Qu.:27.00 ## Max. :44.00 #summary(mpg): This function provides summary statistics for each variable in the dataframe, such as mean, median, minimum, and maximum values, as well as the distribution of categorical variables. The mpg dataframe provides comprehensive data on various car models, including details such as manufac- turer, model, engine displacement (displ), year, number of cylinders (cyl), transmission type (trans), drive type (drv), city miles per gallon (cty), highway miles per gallon (hwy), fuel type (fl), and vehicle class. Summary statistics reveal that the dataset consists of 234 entries with engine displacements ranging from 1.6 to 7.0 liters, a median year of 2004, and vehicles predominantly having 4 to 8 cylinders. Fuel efficiency varies, with city miles per gallon ranging from 9 to 35 (median 17) and highway miles per gallon from 12 to 44 (median 24). These statistics provide an overview of the range, central tendencies, and distribution of the dataset’s variables, essential for understanding the dataset’s characteristics. 2. Summarizing gapminder DataFrame: Similarly, we will display metadata for the gapminder dataframe using the summary() function to understand its structure and the summary statistics of its variables. #Load the library library (gapminder) ## Warning: package ’gapminder’ was built under R version 4.3.3 # Display summary statistics for the gapminder dataframe summary (gapminder) ## country continent year lifeExp ## Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60 ## Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20 ## Algeria : 12 Asia :396 Median :1980 Median :60.71 2

## Angola : 12 Europe :360 Mean :1980 Mean :59.47 ## Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85 ## Australia : 12 Max. :2007 Max. :82.60 ## (Other) :1632 ## pop gdpPercap ## Min. :6.001e+04 Min. : 241.2 ## 1st Qu.:2.794e+06 1st Qu.: 1202.1 ## Median :7.024e+06 Median : 3531.8 ## Mean :2.960e+07 Mean : 7215.3 ## 3rd Qu.:1.959e+07 3rd Qu.: 9325.5 ## Max. :1.319e+09 Max. :113523.1 ## #summary(gapminder): This function provides summary statistics for each variable in the dataframe, giving insights into the range and distribution of the data. The summary of the gapminder dataframe provides a concise statistical overview of key variables related to global demographics and economics. It reveals that the dataset encompasses observations from 142 countries across different continents, with data spanning from 1952 to 2007. Life expectancy varies widely, ranging from 23.6 to 82.6 years, with a median of 60.71 years. Population sizes exhibit substantial diversity, with a minimum of 60,010 and a maximum of 1.319 billion, indicating significant disparities among countries. Similarly, GDP per capita values range from 241.2 to 113,523.1, with a median of 3,531.8, reflecting eco- nomic variations among nations. This summary provides valuable insights into the distribution and central tendencies of key demographic and economic indicators, facilitating further analysis and interpretation of global trends over time. 3. Creating a ggplot Object: We are going to assign a ggplot object to the variable ‘p’. This ggplot object will serve as the foundation for creating visualizations based on the gapminder dataset. Specifically, we are mapping GDP per capita (gdpPercap) to the x-axis and life expectancy (lifeExp) to the y-axis. By doing so, we aim to explore the relationship between a country’s economic prosperity, as indicated by GDP per capita, and the life expectancy of its population. This assignment sets the stage for further customization and layering of graphical elements to create insightful visualizations that illustrate trends and patterns in global demographics and economics. # Create a ggplot object ' p ' with gapminder dataset p <- ggplot ( data = gapminder, mapping = aes ( x = gdpPercap, y = lifeExp, color = continent, size = pop)) # Add points to the plot with transparency set to 0.7 geom_point ( alpha = 0.7 ) + # Set x-axis scale to log10 and format labels as dollar values scale_x_log10 ( labels = scales :: dollar_format ()) + # Manually specify colors for different continents scale_color_manual ( values = c ( "#F8766D" , "#00BA38" , "#619CFF" , "#FFC100" , "#A3A3A3" , "#E76BF3" )) + # Set size range for points and specify breaks for size legend scale_size ( range = c ( 2 , 12 ), breaks = c ( 1e+06 , 5e+07 , 1e+09 )) + # Add axis and legend labels labs ( x = "GDP per capita" , y = "Life expectancy" , size = "Population" , color = "Continent" ) + # Add plot title ggtitle ( "Life expectancy and GDP per capita by continent" ) + 3

Your preview ends here