Highland_Week5 Discussion_Statistical Inference

docx

School

Boston College *

*We aren’t endorsed by this school

Course

021

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by AdmiralArtJackal44

Highland_Discussion Week 5_Statistical Inference DISCUSSION - Part 1. The (student) t distribution converges to normal distribution as the degrees of freedom increase (beyond 120). Please plot a normal distribution, and a few t distributions on the same chart with 2, 5, 15, 30, 120 degrees of freedom. > library(ggplot2) Warning message: package ‘ggplot2’ was built under R version 4.3.2 > library(gridExtra) Warning message: package ‘gridExtra’ was built under R version 4.3.2 > # Define a sequence of x values for plotting > x <- seq(-4, 4, length.out = 100) > # Calculate densities for the normal distribution > normal_density <- dnorm(x) > # Calculate densities for t-distributions with various degrees of freedom > t_density_2 <- dt(x, df = 2) > t_density_5 <- dt(x, df = 5) > t_density_15 <- dt(x, df = 15) > t_density_30 <- dt(x, df = 30) > t_density_120 <- dt(x, df = 120) > # Combine the data into a data frame for plotting > plot_data <- data.frame(x, Normal = normal_density, + `t, df=2` = t_density_2, + `t, df=5` = t_density_5, + `t, df=15` = t_density_15, + `t, df=30` = t_density_30, + `t, df=120` = t_density_120) > # Melt the data frame for use with ggplot > library(reshape2) Warning message: package ‘reshape2’ was built under R version 4.3.2 > plot_data_melted <- melt(plot_data, id.vars = 'x') > # Plot > ggplot(plot_data_melted, aes(x = x, y = value, color = variable)) + + geom_line() + + labs(title = "Normal and t Distributions", + x = "Value", + y = "Density") + + scale_color_brewer(palette = "Dark2") + + theme_minimal() > setwd("C:/Users/SharonHighland/OneDrive - Barak Asset Management LLC/Data Analytics _Spring 2024") Page 1 of 4

Highland_Discussion Week 5_Statistical Inference This plot visually demonstrates how the t-distribution approaches the normal distribution as the degrees of freedom increase, with the t-distribution with 120 degrees of freedom closely resembling the normal distribution. Let’s work with normal data below (1000 observations with mean of 108 and sd of 7.2). set.seed(123) # Set seed for reproducibility mu <- 108 sigma <- 7.2 data_values <- rnorm(n = 1000, mean = mu, sd = sigma ) Plot two charts - the normally distributed data (above) and the Z score distribution of the same data. Do they have the same distributional shape? Why or why not? ####################################################### ################################ > # Set seed for reproducibility > set.seed(123) > # Parameters > mu <- 108 > sigma <- 7.2 > # Generate normally distributed data > data_values <- rnorm(n = 1000, mean = mu, sd = sigma) > # Calculate Z scores for the data > z_scores <- (data_values - mu) / sigma > # Load necessary library for plotting > library(ggplot2) > # Plot for the normally distributed data > p1 <- ggplot() + + geom_histogram(aes(x = data_values), bins = 30, fill = "blue", alpha = 0.7) + + labs(title = "Normally Distributed Data", x = "Values", y = "Frequency") + Page 2 of 4

Highland_Discussion Week 5_Statistical Inference + theme_minimal() > # Plot for the Z scores > p2 <- ggplot() + + geom_histogram(aes(x = z_scores), bins = 30, fill = "red", alpha = 0.7) + + labs(title = "Z Score Distribution", x = "Z Scores", y = "Frequency") + + theme_minimal() > # Display the plots side by side > library(gridExtra) > grid.arrange(p1, p2, ncol = 2) Both the original data and the Z score distribution of the data should have the same distributional shape (normal), but they will be centered at different means (108 for the original data and 0 for the Z scores) and have different scales (standard deviation of 7.2 for the original data and 1 for the Z scores). This demonstrates an important property of normal distributions: transforming them into Z scores does not change their shape, only their scale and location. Part 3. In your own words, please explain what is p-value? The p-value is a tool for making decisions about hypotheses. It helps researchers to determine whether their results are within the realm of normal variations expected under the null hypothesis or if they are statistically significant enough to warrant further consideration of the alternative hypothesis. For example, imagine you're a detective investigating a case, and your null hypothesis is that the suspect is innocent. Finding evidence that is highly unusual or unexpected if the suspect were truly innocent (like the suspect's fingerprints at the crime scene) would lead you to doubt the innocence. In Page 3 of 4

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Highland_Discussion Week 5_Statistical Inference statistical terms, this unexpected evidence corresponds to a lower p-value. The lower the p-value, the more surprising the evidence is under the assumption of innocence (null hypothesis), and thus, the stronger your reason to reject the null hypothesis in favor of an alternative hypothesis (the suspect is not innocent). Page 4 of 4