Lab 2 Two Variables at the Same Time

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

10

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

4

Uploaded by DukeMetalSkunk43

Report
1 Lab 2 Worksheet and Assignment Working on Two Variables at the Same Time Individual Assignment: Lab2 Report is Due Thursday February 01 st , 2024 @ Noon Objectives: In this lab we are: Learn how to report summary statistics two variables at the same time. Learn how to visualize and graph two variables at the same time. Cases: A. Categorical Categorical Variables. B. Quantitative- Categorical Variables. C. Quantitative- Quantitative Variables. Collaboration Policy In Lab you are encouraged to work in pairs or small groups to discuss the concepts on the assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work you submit must be entirely your own. If you have a question in lab, feel free to reach out to other groups or talk to your TA if you get stuck. Life Expectancy Data: Kaggle Data Source. In this section we are going to consider LifeExp.csv data file posted on BruinLearn Week 3. The data has 1649 observations and 25 variables. The following table summarizes the variables in the data: [1] "Country" [2] "Year" [3] "Status" [4] "Life expectancy" [5] "Adult Mortality" [6] "infant deaths" [7] "Alcohol" [8] "percentage expenditure" [9] "Hepatitis B" [10] "Measles" [11] "BMI"
2 [12] "under-five deaths" [13] "Polio" [14] "Total expenditure" [15] "Diphtheria" [16] "HIV/AIDS" [17] "GDP" [18] "Population" [19] "thinness 1-19 years" [20] "thinness 5-9 years" [21] "Income composition of resources" [22] "Schooling" [23] "BMI.C" [24] "Alcohol.C" [25] "SmokerPCT" Use the library: library(readr) and the function read_csv to upload the data on your R- studio session call it Life. run: dim(Life) to confirm the size of the data. Part A: Summarizing Two Categorical Variables: Displaying Frequencies and Proportions: Syntax the function: table(dataname$ name of the variable1, dataname$ name of the variable2 ) use: table(Life$ name of the variable1, Life$ name of the variable2 ) Example: table(Life$Alcohol.C,Life$Status) Display proportions of a table for two categorical variables use: prop.table( table(dataname$ name of the variable1, dataname$ name of the variable2 ) ) Example: prop.table(table(Life$Alcohol.C,Life$Status),1) # Row Proportions prop.table(table(Life$Alcohol.C,Life$Status),2) # Column Proportions prop.table(table(Life$Alcohol.C,Life$Status)) # Cell Proportions Visualization of Two Categorical Variables: Stacked Bar Chart Using ggplot2 ggplot(Life,aes(Status,group=Alcohol.C,color=Alcohol.C,fill=Alcohol.C))+geom_bar() ggplot(Life,aes(Status,group=Alcohol.C,color=Alcohol.C,fill=Alcohol.C))+geom_bar(position ="fill")
3 Question 1: A) Report the frequency tables for of the categorical variables Status and BMI.C separately in your data. B) Report the proportions table for the categorical variables Status and BMI.C separately in your data. C) Report the frequency tables for of the categorical variables Status and BMI.C simultaneously in your data. D) Report the proportions table for the categorical variables Status and BMI.C simultaneously in your data. E) Create a stacked bar chart of the two categorical variables Status and BMI.C (Status on the X-axis). Comment of your graph. Are the two variables associated? Part B: Summarizing Categorical and Quantitative Variables Simultaneously: Displaying Statistical Summaries. Syntax use the following functions: summary(Life$`Life expectancy`) table(Life$Status) Reporting Statistical Summaries Per Group aggregate(Life$`Life expectancy`~Status,Life, mean) aggregate(Life$`Life expectancy`~Status,Life, var) aggregate(Life$`Life expectancy`~Status,Life, sd) Visualization of a Categorical Variable and a Quantitative Variable Simultaneously: Histograms and Side-By-Side Boxplots Using ggplot2 Example: ggplot(Life,aes(`Life expectancy`, color=Status,fill=Status))+geom_histogram(alpha=0.3) ggplot(Life,aes(`Life expectancy`, color=Status,fill=Status))+geom_boxplot(alpha=0.3) Question 2: Consider the following two variables: Adult Mortality and Status A) Report the summary statistics of Adult Mortality per group using the variable Status in your data. B) Compare the variance i of Adult Mortality per group using the variable Status in your data. C) Create histograms of the variable Adult mortality per group using the variable Status. Comment on your plot. D) Create side-by-side boxplots of the variable Adult mortality per group using the variable Status. Comment on your plot. Are these variables associated with each other?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Part C: Visualization of Two Quantitative Variables: Displaying Graphs: Scatter Plots Syntax use the following functions: ggplot(data_name, aes(x= the independent variable, y = the dependent variable)) +geom_point( ) Correlation Coefficient r: cor(variabl1, variable2) Examples: Plotting Life Expectancy Vs. `Income composition of resources` ggplot(Life,aes(x=`Life expectancy`,y=`Income composition of resources`))+geom_point() cor(Life$`Life expectancy`,Life$`Income composition of resources`) Question 3: A) Consider the two variables: x = Life Expectancy`, and y=Schooling B) Report summary statistics for each variable. C) Create a scatter plot for the two variables. Comment on your plot. Do you notice any pattern? D) Calculate the correlation coefficient between the two variables.