Module 5 Assignment
docx
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6015
Subject
Mathematics
Date
Apr 3, 2024
Type
docx
Pages
12
Uploaded by ayeshut
Module 5 Assignment – Non Parametric Methods and Sampling
Intermediate Analytics
Ayeshabi W Tigdikar
Master of Science in Project Management, Northeastern University
Professor Richard He
October 24
th
, 2023
Table of Contents
I
NTRODUCTION
.........................................................................................................................................
3
A
NALYSIS
...................................................................................................................................................
3
C
ONCLUSION
.............................................................................................................................................
9
R
EFERENCES
.............................................................................................................................................
9
A
PPENDIX
..................................................................................................................................................
9
I
NTRODUCTION
In this series of statistical tests and simulations, we have looked at different approaches to data analysis and understanding, claim investigation, and decision-making. The tests used both parametric and non-
parametric methods and covered a variety of scenarios, including correlation analysis and hypothesis testing. These analyses are useful resources for researchers, analysts, and decision-makers who want to glean meaningful information from data and make decisions that will have an impact on the actual world.
To help us make decisions, we started by establishing the context for each test, outlining the claims, hypotheses, and significance levels (). After that, we carried out the tests, calculating test statistics and p-
values, contrasting them with the critical values, and finally making deft choices. Each test was carried out with a sound technique and rigorous statistical analysis. The outcomes of these tests reveal important details regarding how variables are related, whether there are differences or correlations, and the importance of assertions. These assessments enable us to acquire deeper insights into the data and make evidence-based judgments, whether we are examining baseball league performance, comparing math proficiency across areas, or calculating the number of purchases needed to win the lottery.
A
NALYSIS
1.
Game Attendance a.
State the hypothesis and identify the claim. The null hypothesis (H0) is that the median attendance is 3000.
The alternative hypothesis (H1) is that the median attendance is not 3000.
H0: μ = 3000
H1: μ ≠ 3000 (two-tailed test)
Claim:
Median attendance is 3000.
b.
Find Critical Value According to the table the critical value when n=20 and alpha = 0.05 will be 5.
The Signs have been calculated and it is seen that there are 10 positive and 10 negative sign with no zero values.
c.
Compute test value The Test value will be the lesser value of the + and – when n<25. This means that the Test
Value is 10.
d.
Make the decision. Since the test value exceeds the critical value, the null hypothesis cannot be rejected.
e.
Summarize
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Our p-value for the binomial test was 1, which is larger than alpha (0.05), which indicates
that we were unable to reject the null hypothesis. The null hypothesis would not be rejected since the p-value was higher than 0.05, and there is insufficient data to draw the conclusion that the median attendance is different from 3000.
2.
Lottery Ticket Sales
a.
State the hypothesis and identify the claim. The null hypothesis (H0) is that the median sale is greater than or equal to 200.
The alternative hypothesis (H1) is that the median sale is less than 200.
H0: μ >= 200
H1: μ < 200 (one-tailed test)
Claim:
Median sales are below 200.
b.
Find Critical Value Critical value for a left-tailed test is -1.65, as stated in Allan G. Bluman's Statistics book.
c.
Compute test value The test value will compute as follows as the number of trials is greater than 25. t = [(x+0.5) – 0.5n] / (sqrt(n)/2)
=((15+0.5)-0.5(40)) / (sqrt(40)/2)
= -1.42
d.
Make the decision. The P value is 0.9597 which is greater than alpha.
Hence, we fail to reject the null hypothesis.
The test value is also much more than the critical value.
e.
Summarize In conclusion, we can say that after the above analysis, we fail to reject the null hypothesis and there isn’t much evidence to prove that the median is below 200 tickets.
3.
Lengths of Prison Sentences
The Wilcoxon Rank Sum Test, a nonparametric test that makes use of ranks to ascertain if two independent samples were drawn from a population with identical distributions, will be used to complete this task.
a.
State the hypothesis and identify the claim.
Null Hypothesis: No difference in the sentence received by each gender
Alternative Hypothesis: There will be a difference in the sentence received by each gender.
Claim: There is no difference in the sentence received by each gender.
b.
Find the Critical Value
The critical value is determined using the cumulative standard normal distribution table. Use the values of +1.96 and -1.96 because the test is two-tailed and = 0.05.
c.
Compute the test value Order the data in an ascending manner.
2,3,4,5,6,7,8,9,11,12,12,13,14,15,16,17,19,21,22,23,24,26,26,27,30,32 F,F,F,F,M,F,M,F,F,F,M,M,M,M,F,F,M,F,M,F,M,M,F,M,F,M Rank is 1,2,3,4,5,6,7,8,9,10.5,10.5,12,13,14,15,16,17,18,19,20,21,22.5,22.5,24,25,26
R (small sample size) = 191 uR=12*(12+14+1)/2 = 162 sigmaR=sqrt((12*14*27)/12)=19.44 z=(191-162)/19.44=1.4917
z lies between the critical values -1.96 and +1.96, therefore we fail to reject the null hypothesis.
d.
Make a decision A two sided test is created and it is found that the p value is 0.1425 is the p-value. The standard level of significance is 0.05, although this p-value is higher. This implies that we
fail to reject the null hypothesis.
e.
Summarize
In conclusion, there is insufficient evidence to prove that sentences for each gender are handed down differently.
4.
Winning Baseball Games a.
State the hypothesis and identify the claim.
Null Hypothesis (H0): There is no difference in the number of wins between the NL and AL Eastern Divisions.
Alternative Hypothesis (H1): There is a difference in the number of wins between the NL
and AL Eastern Divisions.
Claim:
From 1970 to 1993, the National League (NL) and American League (AL) Eastern Divisions had different numbers of victories.
b.
Find the critical value.
The critical value is determined using the cumulative standard normal distribution table. Use the values of +1.96 and -1.96 because the test is two-tailed and = 0.05.
c.
Compute the test value. Order the data in ascending order and compute the z value as per the previous task
As the z value is equal to -0.431, which is between -1.96 and +1.96, we cannot rule out the null hypothesis.
d.
Make a decision.
After performing the wilcox test, it is noted that the p value is 0.7357 which is greater than the alpha value.
e.
Summarize On comparing the test value and the critical value and the wilcox test, it is noted that we fail to reject the null hypothesis. The idea that there is a difference in the number of wins is unsupported by sufficient data.
5.
Section 13-4
a.
ws = 13, n = 15, α = 0.01, two-tailed
As per the table K, the critical value is 16.
CV > 13 , this implies that the null hypothesis should be rejected.
b.
ws = 32, n = 28, α = 0.025, one-tailed
As per the table K, the critical value is 117, which is greater than 32. Hence, the null hypothesis must be rejected.
c.
ws = 65, n = 20, α = 0.05, one-tailed
The critical value is 60 as per the table K. CV< 65, this implies that the null hypothesis should not be rejected.
d.
ws = 22, n = 14, α = 0.10, two-tailed
The critical value is 26.
CV> 22, hence we reject the null hypothesis.
6.
Mathematics literacy scores
a.
State the hypothesis and identify the claim.
Null Hypothesis (H0): There is no significant difference in means among the three regions (Western Hemisphere, Europe, Eastern Asia).
Alternative Hypothesis (H1): There is a significant difference in means among the three regions.
Claim: There is a difference in means of mathematics literacy scores among different regions (Western Hemisphere, Europe, and Eastern Asia).
b.
Find the critical value.
I used the chi-square table with d.f. = k 1, where k is the number of groups, to determine the critical value. The critical value in the case of = 0.05 and d.f. = 3-1 = 2 is 5.991.
c.
Compute the test value. Sum of ranks Western Hemisphere = 1+3+4+5+11=24 (R1) Eastern Asia = 2+10+12+13+15=51 (R2)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Europe = 6+7+8+9+14=44. (R3) N=15
Kruskal Wallis test statistic:
H=12n(n+1)∑ki=1R2ini−3(n+1)
After inputting all the values,
H = 4.16
This is below our critical value; hence, in this instance, we decided not to reject the null hypothesis.
d.
Make a decision.
The p value is 0.1245 which is greater than the alpha value, hence we fail to reject the null hypothesis.
e.
Summarize The conclusion is that there is not enough data to refute the assertion that there is no difference in group means. The differences are therefore not noteworthy at = 0.05.
7.
Subway and Commuter Rail Passengers
You can compute the Spearman rank correlation coefficient and run a hypothesis test to see if there is a correlation between commuter rail service and the quantity of daily passenger journeys for subways in six randomly chosen cities. A non-parametric measure of association used to determine the direction and degree of a relationship between two variables is the Spearman rank correlation coefficient.
a.
Find the Spearman rank correlation coefficient.
The spearman correlation coefficient is (ρ) is 0.6.
b.
State the hypothesis and identify the claim.
Null hypothesis (H0): There is no link between commuter rail service and the number of daily subway rides in the six cities.
Alternative Hypothesis (H1): There is a strong association between commuter rail service
and the number of daily subway passenger journeys in the six cities.
c.
Find the critical value. The critical value, which is equal to 0.886 for n=6 and =0.05, was calculated from the Rank Correlation Coefficient Table.
d.
Make a decision.
The p value is 0.2417 which is greater than the significance level, 0.05. Hence, we fail to reject the null hypothesis.
e.
Summarize
According to the findings, there is insufficient data to draw any firm conclusions on the relationship between commuter rail service and the number of daily subway journeys in the six cities. To put it another way, there is not enough support for your claim that there is a correlation between these two variables.
8.
Prizes in Caramel corn boxes
According to the findings, it can be seen that the average number of boxes a person needs to buy is 17.
9.
Lottery Winner An individual must typically purchase 7 tickets to win the reward.
C
ONCLUSION
For the numerous scenarios described, the statistical tests and simulations performed in this series have offered insightful analysis and conclusions. These studies enable us to make defensible inferences and evidence-based judgments, which is crucial in a variety of industries, from sports management to education and transportation planning. We were able to learn whether there is enough evidence to support or refute different statements thanks to the hypothesis tests, which also revealed whether there are differences, correlations, or links between the variables. These tests' findings inform our understanding and direct our actions in the real world. In general, using statistical tests and simulations gives us the ability to efficiently use data, make data-driven decisions, and come to trustworthy findings. These technologies are essential for utilizing the power of data for strategic and informed decision-making in the
fields of research, business, and policymaking.
R
EFERENCES
Sign Test | R Tutorial. (n.d.). www.r-Tutor.com. http://www.r-tutor.com/elementary-statistics/non-
parametric- methods/sign-test Wilcoxon Signed-Rank Test | R Tutorial. (n.d.). www.r-Tutor.com. http://www.r-tutor.com/elementary-
statistics/non-parametric- methods/wilcoxon-signed-rank-test
Binom.test: Exact binomial test
. RDocumentation. (n.d.). https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/binom.test Critical values: Find a critical value in any tail
. Statistics How To. (2023, February 22). https://www.statisticshowto.com/probability-and-statistics/find-critical-values/#find%20critical
%20value%20two%20tailed Kruskal Wallis
. Kruskal-Wallis Test. (n.d.). https://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/kruskwal.htm Nonparametric tests
. Tests with Matched Samples. (n.d.). https://sphweb.bumc.bu.edu/otlt/mph-
modules/bs/bs704_nonparametric/BS704_Nonparametric5.html#:~:text=The%20appropriate
%20critical%20value%20for,level%20of%20significance%20%CE%B1%3D0.05. Spearman ranked Correlation Table - Shippensburg University. (n.d.). http://webspace.ship.edu/pgmarr/Geo441/Tables/Spearman%20Ranked%20Correlation
%20Table.pdf A
PPENDIX
gamedata <- c(6210, 3150, 2700, 3012, 4875, 3540, 6127, 2581, 2642, 2573, 2792, 2800, 2500, 3700, 6030, 5437, 2758, 3490, 2851, 2720)
median <- 3000
alpha <- 0.05
diff <- gamedata - median #paid attendance above 3000
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
p <- length(diff[diff>0])
p
#paid attendance below 3000
n <- length(diff[diff<0])
n
res <- binom.test(x = c(p,n), alternative = "two.sided")
#check to reject ifelse(res$p.value > alpha, "Fail to Reject Null Hypothesis", "Reject Null Hypothesis")
#####################################################################################
n <- 15
p <- 40-n
res <- binom.test(x = c(p,n), alternative = "less")
res
ifelse(res$p.value > alpha, "Fail to Reject Null Hypothesis", "Reject Null Hypothesis")
############################################################################
male<- c(8,12,6,14,22,27,32,24,26,19,15,13)
female<- c(7,5,2,3,21,26,30,9,4,17,23,12,11,16)
res <- wilcox.test(x=male, y=female, alternative = "two.sided", exact = FALSE)
res
############################################################################
nl_wins <- c(89, 9, 8, 101, 90, 91, 9, 96, 108, 100, 9, 6, 8, 2, 5)
al_wins <- c(108, 8, 9, 97, 100, 102, 9, 104, 95, 89, 8, 101, 6, 1, 5, 8)
res <- wilcox.test(x=nl_wins, y=al_wins, alternative = "two.sided", exact = FALSE)
res
# Sort the data in ascending order
N <- sort(nl_wins)
A <- sort(al_wins)
##########################################################
alpha = 0.05
WestHemi <- data.frame(means = c(527, 406, 474, 381, 411), group = rep("Western Hemisphere", 5))
Europe <- data.frame(means = c(520, 510, 513, 548, 496), group = rep("Europe", 5))
EastAsia <- data.frame(means = c(523, 547, 547, 391, 549), group = rep("Eastern Asia", 5))
data <- rbind(WestHemi, Europe, EastAsia)
res <- kruskal.test(means~group, data = data)
res
ifelse(res$p.value>alpha, "fail to reject null hypothesis", "reject null hypothesis")
data[order(data$means),]
###########################################################
city <- c(1,2,3,4,5,6)
subway <- c(845,494,425,313,108,41)
rail <- c(39,291,142,103,33,38)
passengers <- data.frame(City=city, Subway=subway, Rail=rail)
passengers
res <- cor.test(passengers$Subway, passengers$Rail, method = "spearman")
res
############################################################
set.seed(70442)
num_sim <- 100 #number of simulations
results <- numeric(num_sim)
for (i in 1:num_sim) {
prizes_collection <- c() box_pur <- 0 while (length(prizes_collection) < 7) {
new_prize <- sample(1:7, 1)
# Check if the prize is one that hasn't been collected yet
if (!(new_prize %in% prizes_collection)) {
prizes_collection <- c(prizes_collection, new_prize)
}
box_pur <- box_pur + 1
}
results[i] <- box_pur
}
average_box <- mean(results)
average_box
###########################################################
set.seed(70442)
num_sim <- 100
results <- numeric(num_sim)
p_c <- 0.50
p_a <- 0.30
p_t <- 0.20
for (i in 1:num_sim) {
letters_collect <- c() tickets_pur <- 0 while (!all(c("c", "a", "t") %in% letters_collect)) {
r <- runif(1) if (r <= p_c) {
letters_collect <- c(letters_collect, "c")
} else if (r <= p_c + p_a) {
letters_collect <- c(letters_collect, "a")
} else {
letters_collect <- c(letters_collect, "t")
}
tickets_pur <- tickets_pur + 1
}
results[i] <- tickets_pur
}
average_ticket <- mean(results)
average_ticket
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help