Lab 1 Summaries and Visualizations

pdf

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

10

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

4

Uploaded by DukeMetalSkunk43

Report
1 Lab 1 Worksheet and Assignment Working on Summaries and Visualization Individual Assignment: Lab1 Report is Due Wednesday January 24 th , 2024 @ 11:59 AM Objectives: In this lab we are: Learn how to report summary statistics for Qualitative/Categorical and Quantitative/Numerical variables. Learn how to visualize and graph Qualitative/Categorical and Quantitative/Numerical variables. Collaboration Policy In Lab you are encouraged to work in pairs or small groups to discuss the concepts on the assignments. However, DO NOT copy each other’s work as this constitutes cheating. The work you submit must be entirely your own. If you have a question in lab, feel free to reach out to other groups or talk to your TA if you get stuck. CDC Data: Centers for Disease Control and Prevention: In this section we are going to consider CDC data posted on BruinLearn Week 2. The data has 20,000 observations and 11 variables. The following table summarizes the variables in the data: Variables Name Description Type "state" States numbered from 1 56 Categorical "genhlth" General Health Categorical "physhlth" Physical Health: a score from 0 - 30 Quantitative "exerany" Exercise: 0 means NO , 1 means YES Categorical "hlthplan" Health Plan: 0 means NO , 1 means YES Categorical "smoke100" Smoking more than 100 cigarettes: 0 means NO , 1 means YES Categorical
2 "height" Height in Inches Quantitative "weight" Weight in lbs. Quantitative "wtdesire" Weight Desired Quantitative "age" Age in years Quantitative "gender" Gender M means Males and F means Females Categorical Use the library: library(readr) and the function read_csv to upload the data on your R-studio session. cdc <- read_csv("Statistics Department Files/Shared Documents 2011/STAT 10/New Labs/cdc.csv") run: dim(cdc) to confirm the size of the data. names(cdc) to display the names of variables in the data. Part A: Summarizing Categorical Variables: Displaying Frequencies Syntax the function table(dataname$ name of the variable ) use: table(cdc$ Name_of_varaible ) Example: table(cdc$state) Display proportions of a table for a categorical variable use: prop.table( table(dataname$ name of the variable )) Example: prop.table(table(cdc$state)) Question 1: A) Report the frequency tables for each of the categorical variables in your data. B) Report the proportions table for each of the categorical variables in your data. C) Does the data report a higher percentage of people who do exercise? D) Is this data balanced based on gender? balanced means 50-50 in terms of M Vs F.
3 Part B: Summarizing Quantitative Variables: Displaying Statistical Summaries: Syntax use the following functions: 1. min(dataname$ name of the variable ) for the minimum 2. max(dataname$ name of the variable ) for the maximum 3. mean(dataname$ name of the variable ) for the mean = average 4. median(dataname$ name of the variable ) for the median 5. sd(dataname$ name of the variable ) for the standard deviation 6. var(dataname$ name of the variable ) for the variance 7. summary(dataname$ name of the variable ) for the summary statistics Example: summary(cdc$height) Question 2: A) Report the summary statistics for each of the quantitative variables in your data. B) Compare the mean vs median in each of the quantitative variables in your data. C) Calculate the Range of each of the quantitative variable in your data. Range = Max Min D) Compare the summary statistics of Weight VS. Desired Weight. Any comment? E) What can you tell about the summary statistics of the variable Age? Part C: Visualization of Categorical and Quantitative Variables: Displaying Graphs: Syntax use the following functions: 1. barplot(table(cdc$ name of the variable )) # Bar Chart of a Categorical Variables 2. pie(table(cdc$ name of the variable )) # Pie Chart of a Categorical Variables 3. hist(cdc$ name of the variable ) # Histogram of a Quantitative Variables 4. boxplot(cdc$ name of the variable ) # Boxplot of a Quantitative Variables 5. dotPlot(cdc$ name of the variable ) # Dot plot of a Quantitative Variables (needs to install the library(mosaic) Examples: barplot(table(cdc$gender)) pie(table(cdc$gender)) hist(cdc$weight) boxplot(cdc$weight, horizontal = T) library(mosaic) dotPlot(cdc$weight) Question 3: A) Report a pie chart for the variable smoke100. Comment on your plot. B) Report a bar chart for the variable general health. Comment on your plot. C) Create a histogram for the variable age. Comment on your plot. D) Create a boxplot for the variable height. Comment on your plot. E) Create a Dot plot for the variable wtdesire. Comment on your plot.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 Extra Credit: Download the library ggplot2: Use the function ggplot to create plots of variables Syntax: Library(ggplot2) ggplot(data, aes( name of the variable))+geom_bar() ggplot(data, aes( name of the variable))+geom_histogram() ggplot(data, aes( name of the variable))+geom_boxplot()