Week 3 Assignment_Drills with R

docx

School

Cumberland University *

*We aren’t endorsed by this school

Course

441

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

7

Uploaded by DrElementNewt28

Report
1 Week 3 Assignment Manisha Reddy Yerla Masters in data science, University of Cumberland’s 2023 Fall - Statistics for Data Science (MSDS-531-B02) - Second Bi-term Dr. Mina Richards November 10, 2023
2 Drills with R on point estimates and confidence intervals Question 1: A study investigates the distribution of annual income for heads of households living in public housing in Chicago. For a random sample of size 30, the annual incomes (in thousands of dollars) are in the Chicago data file. a. Based on a descriptive graphic, describe the shape of the sample data distribution. Find and interpret point estimates of the population mean and standard deviation. The below R code conducts an analysis of a dataset representing annual incomes for heads of households living in public housing in Chicago. First load the dataset from specified URL. A histogram is created showing the distribution of household income. Then Calculate the mean and median of income. Based on histogram generated we can say that the distribution is symmetrical and bell-shaped with normal distribution. Also mean and median are almost equal which indicates normal distribution. Then we calculated the point estimates of the population mean and standard deviation. The point estimate of the population mean household income is approximately $20,333. This value represents the average income across the sampled households. The point estimate of the population standard deviation of household income is approximately $3,681.11. This value indicates the spread or variability in household incomes around the mean.
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 b. Construct a 95% confidence interval for μ, using R software. The below R code conducts an analysis of a dataset representing annual incomes for heads of households living in public housing in Chicago. It starts by loading the dataset from a specified URL. Then we calculate the sample mean, sample standard deviation, and standard error. Then, we determine the degrees of freedom for the t-distribution and find the t-critical value. Finally, we calculate the margin of error and construct the 95% confidence interval for the population mean (μ). The confidence interval will give us a range within which we can be 95% confident that the true population mean falls. Question 2: The Anorexia data file contains results for the cognitive behavioral and family therapies and the control group. Using data for the 17 girls who received the family therapy: a. Conduct a descriptive statistical analysis using graphs and numerical summaries. The below R code conducts an analysis of a dataset representing results for the cognitive behavioural and family therapies and the control group. First load the dataset from specified
5 URL. Then we create a subset of the data for the girls who received family therapy, this creates a new data frame called family_therapy_data that contains only the data for the girls who received family therapy. Now, descriptive statistical analysis is performed where Summary statistics, Mean and standard deviation for the "before" and "after" columns are calculated and printed. This provides an overview of the data distribution, including measures like minimum, 1st quartile, median (2nd quartile), mean, 3rd quartile, maximum and common measures of central tendency and variability. Then Box plots is generated for "before" and "after" columns to visualize the distribution of the "before" and "after" values for the girls who received family therapy.
6 b. Construct a 95% confidence interval for the difference between the population mean weight changes for the family therapy and the control. The below R code is designed to calculate a 95% confidence interval for the difference between the population mean weight changes for two groups: the family therapy group (therapy = 'f') and the control group (therapy = 'c) for the given dataset. First load the dataset from specified URL. The data is subset into two groups: family therapy and control groups. The subset function is used to filter the data based on the 'therapy' column. The 'family_data' and 'control_data' data frames are created for the respective groups. The code calculates the sample mean and standard deviation of weight changes (final weight - initial weight) for the family and control therapy group. mean_family and mean_control stores the sample mean, and sd_family and sd_control stores the sample standard deviation. To compare the means of the two groups, a two-sample t-test is performed using the t.test function. The weight changes for both the family therapy and control groups are passed as arguments to the function. var.equal = FALSE indicates that the code assumes unequal variances between the two groups, which is often a reasonable assumption. The result of the t-test is stored in the t_test variable. After performing the t-test, the code obtains the 95% confidence interval for the difference in means using t_test$conf.int. This contains the lower and upper bounds of the
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
7 confidence interval. The code then prints the 95% confidence interval for the difference between the population mean weight changes for the family therapy and control groups using the cat function.