1. It is assumed that achievement test scores should be correlated with student's classroom performance. One would expect that students who consistently perform well in the classroom (tests, quizes, etc.) would also perform well on a standardized achievement test (0-100 with 100 indicating high achievement). A teacher decides to examine this hypothesis. At the end of the academic year, she computes a correlation between the students achievement test scores (she purposefully did not look at this data until after she submitted students grades) and the overall g.p.a. for each student computed over the entire year. The data for her class are provided below. Achievement G.P.A. 98 3.6 96 2.7 94 3.1 88 4.0 91 3.2 77 3.0 86 3.8 71 2.6 59 3.0 63 84 79 75 72 86 85 71 93 90 62 2.2 1.7 3.1 2.6 2.9 2.4 3.4 2.8 3.7 3.2 1.6 Compute Correlation Coefficient. Is this correlation statistically significant at significance level 0.1? Implement in Python.
1. It is assumed that achievement test scores should be correlated with student's classroom performance. One would expect that students who consistently perform well in the classroom (tests, quizes, etc.) would also perform well on a standardized achievement test (0-100 with 100 indicating high achievement). A teacher decides to examine this hypothesis. At the end of the academic year, she computes a correlation between the students achievement test scores (she purposefully did not look at this data until after she submitted students grades) and the overall g.p.a. for each student computed over the entire year. The data for her class are provided below. Achievement G.P.A. 98 3.6 96 2.7 94 3.1 88 4.0 91 3.2 77 3.0 86 3.8 71 2.6 59 3.0 63 84 79 75 72 86 85 71 93 90 62 2.2 1.7 3.1 2.6 2.9 2.4 3.4 2.8 3.7 3.2 1.6 Compute Correlation Coefficient. Is this correlation statistically significant at significance level 0.1? Implement in Python.
Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
Related questions
Question
100%
please help with python program I am providng program that some part u can reuse for question nubmer 1 is attched in image
# call the appropriate libraries
import numpy as np
from numpy import mean
from numpy import std
from numpy.random import randn
from numpy.random import seed
from matplotlib import pyplot
from numpy import cov
from scipy.stats import pearsonr
from scipy.stats import spearmanr
import scipy.stats as stats
import pandas as pd
# generate data
# seed random number generator
seed(1)
# prepare data
data1 = 20 * randn(1000) + 100
data2 = data1 + (10 * randn(1000) + 50)
# print mean and std
print('data1: mean=%.3f stdv=%.3f' % (mean(data1), std(data1)))
print('data2: mean=%.3f stdv=%.3f' % (mean(data2), std(data2)))
# plot the sample data
pyplot.scatter(data1, data2)
pyplot.show()
# calculate covariance
covariance = cov(data1, data2)
print('The covariance matrix')
print(covariance)
# calculate Pearson's correlation
corr, _ = pearsonr(data1, data2)
print('Pearsons correlation: %.3f' % corr)
##calculate spearman correlation
corr, _ = spearmanr(data1, data2)
print('Spearmans correlation: %.3f' % corr)
######################chi square test
####Input as contigency table
####consider different pets bought by male and female
#######dog cat bird total
##men 207 282 241 730
##women 234 242 232 708
##total 441 524 473 1438
###The aim of the test is to conclude whether the two variables( gender and choice of pet ) are related to each other.
from scipy.stats import chi2_contingency
print("CHI SQUARE TEST with HYPOTHESIS TESTING")
# defining the table
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)
######hyposthesis testing
# interpret p-value
alpha = 0.05
print("p value is " + str(p))
if p <= alpha:
print('reject H0 - have correlation with 95% confidence level')
else:
print('accept H0 - Independent no correlation with 95% confidence level')
np.random.seed(6)
####generate possion distribution with lowest x value 18 and mean given by mu of the distribution and size of sample
population_ages1 = stats.poisson.rvs(loc=18, mu=35, size=150000)
population_ages2 = stats.poisson.rvs(loc=18, mu=10, size=100000)
population_ages = np.concatenate((population_ages1, population_ages2))
minnesota_ages1 = stats.poisson.rvs(loc=18, mu=30, size=30)
minnesota_ages2 = stats.poisson.rvs(loc=18, mu=10, size=20)
minnesota_ages = np.concatenate((minnesota_ages1, minnesota_ages2))
print( population_ages.mean(), ' is the population mean' )
print( minnesota_ages.mean(), ' is the sample mean' )
### we know that both samples comes from different distribution
#####Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis
# that the sample comes from the same distribution as the population.
s, p = stats.ttest_1samp(a = minnesota_ages, # Sample data
popmean = population_ages.mean())
print(s, 'is the test statistics')
### interpret the t-statistics
if s >= 0:
print('Sample mean is larger than the population mean')
else:
print('Sample mean is smaller than the population mean')
###if p value is less than 0.05 we reject the null hypothesis that both samples are same
if p < 0.05:
print('This observation is statistically significant with 95% confidence.')
else:
print('This observation is not statistically significant with 95% confidence.')
###performing min max normalization
data = {'weight':[300, 250, 800],
'price':[3, 2, 5]}
df = pd.DataFrame(data)
print(df)
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(df)
print('Normalized data')
print(normalized_data)
###standardization is process of converting data to z score value and spread is across median 0
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
standardized_data = scaler.fit_transform(df)
print('standardized data value')
print(standardized_data)
####normality check using Q-Q plot
np.random.seed(0)
data = np.random.normal(0,1, 1000)
import statsmodels.api as sm
import matplotlib.pyplot as plt
#create Q-Q plot with 45-degree line added to plot
fig = sm.qqplot(data, line='45')
plt.show()
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution!
Trending now
This is a popular solution!
Step by step
Solved in 4 steps with 1 images
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education