Skip to main content

Documents Statistics

Lab KX - Jupyter Notebook.pdf

Lab KX - Jupyter Notebook

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

88

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

17

Uploaded by DeanBookKookabura6

Lab Lab KX: Chi Squared Tests and AB Tests Setup In [99]: # Import some useful functions from numpy import * from numpy.random import * from datascience import * # Customize look of graphics import matplotlib.pyplot as plt plt.style.use( 'fivethirtyeight' ) % matplotlib inline # Force display of all values from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = "all" # Handle some obnoxious warning messages import warnings warnings.filterwarnings( "ignore" )

T-shirt sales Business Decision You have four di ff erent t-shirt designs and need a demand forecast so you know how many t-shirts to print for your next production run. Having already sold to Unit 1, you wonder whether Unit 2 will ultimately express similar preferences. Is this a Chi Squared "Goodness of Fit" test? or a Chi Squared "Test of Independence"? Assuming that students can purchase multiple designs and multiple quantities, we can construct the total distribution of sales to Unit 1 as: Style 1: 20%, Style 2: 35%, Style 3: 30%, Style 4: 15% Data Construct a table from your Unit 1 sample. In [100]: Out[100]: Style Demand Forecast 1 0.2 2 0.35 3 0.3 4 0.15 unit1 = Table().with_columns( "Style" , make_array( 1 , 2 , 3 , 4 ), "Demand Forecast" , make_array( 0.2 , 0.35 , 0 unit1

Suppose that current sales to Unit 2 look like the following: Style 1: 102, Style 2: 121, Style 3: 120, Style 4: 57 Show the sample information for Unit 2 as a table. In [101]: Analysis Knowing how many t-shirts were actually sold in Unit 2, add a column to your data for Unit 2 that contains the "expected sales" for Unit 2 if the current sales in Unit 2 had been distributed in the same proportions (the same percentages) as those proportions in Unit 1. Out[101]: Style Actual Sales 1 102 2 121 3 120 4 57 unit2 = Table().with_columns( "Style" , make_array( 1 , 2 , 3 , 4 ), "Actual Sales" , make_array( 102 , 121 , 120 , unit2

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

In [102]: Compute the Chi-squared Statistic In [103]: Out[102]: Style Actual Sales Expected Sales 1 102 80 2 121 140 3 120 120 4 57 60 Out[103]: Style Actual Sales Expected Sales di ff di ff ^2 relative 1 102 80 22 484 6.05 2 121 140 -19 361 2.57857 3 120 120 0 0 0 4 57 60 -3 9 0.15 Out[103]: 8.778571428571428 unit2 = unit2.with_column( "Expected Sales" , unit1.column( "Demand Forecast" ) * sum (unit2.column( "Actual unit2 # compute chi-squared # this is a Goodness of Fit test unit2 = unit2.with_column( "diff" , unit2.column( "Actual Sales" ) - unit2.column( "Expected Sales" )) unit2 = unit2.with_columns( 'diff^2' , unit2.column( "diff" ) ** 2 ) unit2 = unit2.with_columns( 'relative' , unit2.column( 'diff^2' ) / unit2.column( 'Expected Sales' )) unit2 chi_s = sum (unit2.column( 'relative' )) chi_s

Generate the chi-squared distribution for the apprporiate degrees of freedom. In [104]: Calculate and show the critical value at significance level ( ) = 0.05 based on the chisquared distribution. Out[104]: 3 df = unit2.num_rows - 1 df dist_array = chisquare(df, 1000000 ) dist = Table().with_column( 'chisquared' , dist_array) dist.hist(bins = 50 , range = make_array( 0 , 25 ))

In [105]: Compute the P-value (Calculate and show the probability the chi-squared statistic is greater than or equal to the computed value of the statistic). Out[105]: 7.823004259600657 alpha = 0.05 cv = percentile(( 1 - alpha) * 100 , dist.column( 'chisquared' )) cv

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

In [106]: Out[106]: 0.032371 Out[106]: <matplotlib.lines.Line2D at 0x7f0034c25790> Out[106]: <matplotlib.lines.Line2D at 0x7f0034c25880> Out[106]: <function matplotlib.pyplot.legend(*args, **kwargs)> p_value = dist.where( 'chisquared' , are.above_or_equal_to(chi_s)).num_rows / dist.num_rows p_value dist.hist(bins = 50 , range = make_array( 0 , 35 ), left_end = cv, right_end = 35 ) plt.axvline(cv,color = "red" ) plt.axvline(chi_s,color = 'green' ) plt.legend plt.show()

In [107]: Conclusion What do you conclude about using Unit 1 to estimate demand for Unit 2? Quiz What type of test are we conducting Goodness of Fit Test of Independence How many degrees of freedom are there: ___ How many total sales are there in Unit 2 so far: _____ Assuming that Unit 2 sales were to follow the same proportions as those of Unit 1, how many sales-to-date of Style 2 would you have expected?____ What was your computed p_value?_____ What was your critical value based upon a significance level of 5% using the lookup table from in-class? _____ What was the value of your sample chi squared statistic? _____ What do you conclude about using Unit 1 to estimate demand for Unit 2? _____. you can reject the null hypothesis and conclude that Unit 1 is a good guideline for Unit 2, because the p-value is large. Unit 2 sales are not consistent with Unit 1 sales because the statistic is more extreme than the cv Unit 2 sales are not consistent with Unit 1 sales because the cv is greater than the significance level. Unit 1 sales are a good guideline for Unit 2 because the p-value is less than the significance level. Out[107]: False Out[107]: False p_value > alpha chi_s < cv

Financial Advice Business Decision A financial advisor wants to determine the relationship between the type of fund and client satisfaction across all its clients. A fund can be made up of either stocks or bonds. Client satisfaction can be high, medium, or low. Data Here are the numbers of clients reporting satisfaction level, according to what type of fund the client owns: stocks: 15 high, 12 medium, 3 low bonds: 24 high, 4 medium, 2 low Show the count of each fund type-client satisfaction pair.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

In [108]: Analysis Calculate and show the count of each fund type. In [109]: Calculate and show the count of each client satisfaction level. Out[108]: Fund Type Satisfaction Level Count stocks high 15 stocks medium 12 stocks low 3 bonds high 24 bonds medium 4 bonds low 2 Out[109]: Fund Type Count sum bonds 30 stocks 30 data = Table().with_columns( "Fund Type" , make_array( "stocks" , "stocks" , "stocks" , "bonds" , "bonds" , "bon data fund_type_freq = data.select( "Fund Type" , "Count" ).group( "Fund Type" , sum ) fund_type_freq

In [110]: Add a column to your table for the expected frequencies for each pairwise combination Out[110]: Satisfaction Level Count sum high 39 low 5 medium 16 satisfaction_freq = data.select( "Satisfaction Level" , "Count" ).group( "Satisfaction Level" , sum ) satisfaction_freq

In [111]: Calculate and show the sample chisquared. Out[111]: 60 Out[111]: Satisfaction Level Fund Type Count Fund Type Count Satisfaction Level Count Expected high bonds 24 30 39 19.5 high stocks 15 30 39 19.5 low bonds 2 30 5 2.5 low stocks 3 30 5 2.5 medium bonds 4 30 16 8 medium stocks 12 30 16 8 # expected frequencies for each pairwise combination total_obs = sum (fund_type_freq.column( "Count sum" )) total_obs pairwise_freq = data.join( 'Fund Type' , fund_type_freq) pairwise_freq = pairwise_freq.relabeled( "Count sum" , "Fund Type Count" ) pairwise_freq = pairwise_freq.join( "Satisfaction Level" , satisfaction_freq) pairwise_freq = pairwise_freq.relabeled( "Count sum" , "Satisfaction Level Count" ) pairwise_freq = pairwise_freq.with_columns( 'Expected' , \ pairwise_freq.column( 'Fund Type Count' ) * pairwise_freq.column( 'Satisfaction Level Count' ) / total_obs) pairwise_freq

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

In [112]: Get 1,000,000 values from the chi squared distribution for the appropirate degrees of freedom. Show the degrees of freedom and a few of the values and a histogram of all the values (50 bins, range 0 to 25). In [113]: Out[112]: Satisfaction Level Fund Type Count Fund Type Count Satisfaction Level Count Expected di ff di ff ^2 rel di ff high bonds 24 30 39 19.5 4.5 20.25 1.03846 high stocks 15 30 39 19.5 -4.5 20.25 1.03846 low bonds 2 30 5 2.5 -0.5 0.25 0.1 low stocks 3 30 5 2.5 0.5 0.25 0.1 medium bonds 4 30 16 8 -4 16 2 medium stocks 12 30 16 8 4 16 2 Out[112]: 6.276923076923078 # Compute Chi-Squared statistic # add column for difference between observed and expected pairwise_freq = pairwise_freq.with_column( 'diff' , \ pairwise_freq.column( 'Count' )\ - pairwise_freq.column( 'Expected' )) # square the difference pairwise_freq = pairwise_freq.with_column( 'diff^2' , pairwise_freq.column( 'diff' ) ** 2 ) # find relative difference by dividing squared differences by 'expected' pairwise_freq = pairwise_freq.with_column( 'rel diff' , \ pairwise_freq.column( 'diff^2' )\ / pairwise_freq.column( 'Expected' )) pairwise_freq chi_s = sum (pairwise_freq.column( 'rel diff' )) chi_s

In [113]: Out[113]: 2 Out[113]: chisquared 0.0544898 0.386128 0.390457 2.5022 0.615491 0.388123 0.492173 1.57597 0.272731 1.53969 ... (999990 rows omitted) df = 2 * 1 df dist_array = chisquare(df, 1000000 ) dist = Table().with_column( 'chisquared' , dist_array) dist dist.hist(bins = 50 , range = make_array( 0 , 25 ))

Calculate and show the probability of the sample chisquared (or above) if hypothesis is correct (this is the p-value). Also show the sample chisquared and histogram of chisquared distribution with the area corresponding to the probability highlighted. In [114]: Calculate and show the critical value at significance level 0.05 based on the chisquared distribution. Also show the significance level and histogram of chisquared distribution with the area corresponding to the significance level highlighted. Out[114]: 0.04351 p_value = dist.where( 'chisquared' , are.above_or_equal_to(chi_s)).num_rows / dist.num_rows p_value

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

In [115]: Calculate and show whether you should conclude that the hypothesis is correct, at significance level 0.05. Out[115]: 6.0029736001240765 Out[115]: 0.05 Out[115]: 6.0029736001240765 Out[115]: <matplotlib.lines.Line2D at 0x7f0034ba97f0> Out[115]: <matplotlib.lines.Line2D at 0x7f0034bda220> Out[115]: <function matplotlib.pyplot.legend(*args, **kwargs)> alpha = 0.05 cv = percentile(( 1 - alpha) * 100 , dist.column( 'chisquared' )) cv alpha cv dist.hist(bins = 50 , range = make_array( 0 , 35 ), left_end = cv, right_end = 35 ) plt.axvline(cv,color = "red" ) plt.axvline(chi_s,color = 'green' ) plt.legend plt.show()

In [116]: Quiz The financial advisory firm has reports from ___ of its clients. ___ of its clients own funds that comprise bonds. ___ of its clients are highly satisfied. If type of fund were independent of satisfaction level, then we would expect ____ of clients to be highly satisfied bond fund owners. In other words, then we would expect with this probability that a client has high satisfaction and owns a bond fund. The sample chi squared is ____. The p-value is ____. The critical value is ____. Based on this analysis and assuming 5% significance level, the financial advisor should conclude that a client's satisfaction level ____ depends on whether it owns stocks or bonds, because the sample chi squared is less than the critical value does not depend on whether it owns stocks or bonds, because the sample chi squared is less than the critical value depends on whether it owns stocks or bonds, because the sample chi squared is greater than the critical value does not depend on whether it owns stocks or bonds, because the sample chi squared is greater than the critical value Document revised 10 April 2023 Copyright (c) Huntsinger and Lee Out[116]: False Out[116]: False p_value > alpha chi_s < cv

Related Documents

Statistics Practice Quiz.docx

hw1_questions.docx

Lab_JX - Jupyter Notebook.pdf

Chapter 10 - End-of-chapter question 21.pdf

SAFMEDS W7 ABA632.docx

SAFMEDS W4 ABA632 .docx

SAFMEDS W5 ABA632.docx

Final Paper.docx

SAFMEDS W6 ABA632.docx

W1 SAFMEDS ABA632 .docx

Related Questions

Please share an excel screen on how to input the data for #2 only. Thank you

A data set contains the observations 8,5,4,6,9. find ( ∑x )^2

Define interset.

Please share an excel screen on how to input and calculate the data for #1 only. Thank you

A professor use chi-square test to examine the relationship between smoking and colon cancer. He followed a total of 480 people and the result was listed below. How many degrees of freedom are there in this table? No Colon Total Frequency Colon Cancer Cancer Smokers 120 80 200 Non-smokerS 25 255 280 Total 145 335 480 For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).

> Search itc.edu.kh v Activity Midterm Statistics(2) (2020-2021GICI31STA_GIC_Statistics_OL Say_Mardi_7-9am) Close Teams Hi DIM LIFY, when you submit this form, the owner will be able to see your name and email address. Assignments 1 Question 5 Calendar (20 Points) Files Let X1, X2, X3,..., Xn be a random sample from a Geometric distribution Geo(0), where 0 is unknown. Find the maximum likelihood estimator (MLE) of O based on this random sample. Recall that the pmf of X ~ Geo(0) is f(x; 0) = (1 – 6)*-10, (a) Ômle = X (b) Ômle = 1/X x = 0, 1, .... %3D (c) Ômle = E=, In X, (d) Ômle = 2X %3D (a) (b) (c) Apps (d) 1:50 PM A Spotify T. General (2020-2021... Details | bartleby - .. A D 4) G E ENG 12/16/2020 O 田

M ui/v2/assessment-player/index.html?launchld=3cb6995a-a464-4ce8-9952-7c527abd86ce#/question/2 -/1 E Question 3 of 14 View Policies Current Attempt in Progress A company has cost and revenue functions, in dollars, given by C(q) = 6000 + 8g and R(g) = 12g. (a) Find the cost and revenue if the company produces 500 units. Does the company make a profit? What about 5000 units? Enter the exact answers without comma separation of digits. The cost of producing 500 units is $ i The revenue if the company produces 500 units is $ i Thus, the company v a profit. The cost of producing 5000 units is $ The revenue if the company produces 5000 units is $i Thus, the company v a profit. eTextbook and Media (b) Find the break-even point. Enter the exact answer. The break-even point is i units. eTextbook and Media Which of tbe fellowina illust break even point aranbically? ssion_..docx 2 Discussion_-..docx - Discussion_...docx MacBook Pro

Sahar Rasoul-Math 7 End of Yea X Gspy ninjas book-Google docs.google.com/spreadsheets/d/1j5MotWzsc0V1V3Qyl4rbP_OFOUotaNXCIIFax> Copy of Copy of Col... 8.8 Sahar Rasoul - Math 7 End of Year Digital Task Cards Student Version ☆ File Edit View Insert Format Data Tools Extensions Help Last edit was 5 minu $ % .0 .00 123 Century Go... ▼ 18 Y BIS fx| =IF(B4="Question 1", Sheet2! H21, if(B4="Question 2", Sheet2! H22, IF(B4=" n 100% 36:816 A B C 6 16 A flashlight can light a circular area of up to 6 feet in diameter. What is the maximum area that can be lit? Round to the nearest tenth. 30x 0004 15 A Sheet1 https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.amazon.com%2FSpy-Ninjas-Ultimate-Guidebook-Scholastic%2Fdp 7 8 9 10 11 12 13 14 3 5. 7. a 5 $9 A

Most vertebrate adult body sizes remain constant or increase throughout their lives. Marine iguanas are different in that they can actually decrease in size, in periods of low food availability. Examine the data set here, and test the hypothesis that on average across time, iguana length changed (increased) by 1 mm. Paste your R code in the answer box, and tell me your conclusion. Data is here: https://github.com/lisamanne/biostats_data/raw/main/Iguana_lengths.csv DATA- Tributary,Species_upstream,Species_downstream I��,14,19 Juta�,11,18 Japur�,8,8 Coari,5,7 Purus,10,16 Manacapuru,5,6 Negro,23,24 Madeira,29,30 Trombetas,19,16 Tapaj�s,16,20 Xingu,25,21 Tocantins,10,12

Please help me with my home

can you please share the excel file here or can u make Gantt chart for me

An insurance company hires an actuary to determine whether the number of hours of safety drivingclasses can be used to predict the number of driving accidents for each driver. Identify theexplanatory variable, if any.

Can anyone please send me the normal standard table

Please show step by step. Subpart Viii and Viiii please

SEE MORE QUESTIONS

Recommended textbooks for you

Text book image

MATLAB: An Introduction with Applications

Statistics

ISBN:9781119256830

Author:Amos Gilat

Publisher:John Wiley & Sons Inc

Text book image

Probability and Statistics for Engineering and th...

Statistics

ISBN:9781305251809

Author:Jay L. Devore

Publisher:Cengage Learning

Text book image

Statistics for The Behavioral Sciences (MindTap C...

Statistics

ISBN:9781305504912

Author:Frederick J Gravetter, Larry B. Wallnau

Publisher:Cengage Learning

Text book image

Elementary Statistics: Picturing the World (7th E...

Statistics

ISBN:9780134683416

Author:Ron Larson, Betsy Farber

Publisher:PEARSON

Text book image

The Basic Practice of Statistics

Statistics

ISBN:9781319042578

Author:David S. Moore, William I. Notz, Michael A. Fligner

Publisher:W. H. Freeman

Text book image

Introduction to the Practice of Statistics

Statistics

ISBN:9781319013387

Author:David S. Moore, George P. McCabe, Bruce A. Craig

Publisher:W. H. Freeman

SEE MORE TEXTBOOKS

Related Questions

SEE MORE QUESTIONS

Recommended textbooks for you

MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc
Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning
Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning
Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON
The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman
Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman

Text book image

MATLAB: An Introduction with Applications

Statistics

ISBN:9781119256830

Author:Amos Gilat

Publisher:John Wiley & Sons Inc

Text book image

Probability and Statistics for Engineering and th...

Statistics

ISBN:9781305251809

Author:Jay L. Devore

Publisher:Cengage Learning

Text book image

Statistics for The Behavioral Sciences (MindTap C...

Statistics

ISBN:9781305504912

Author:Frederick J Gravetter, Larry B. Wallnau

Publisher:Cengage Learning

Text book image

Elementary Statistics: Picturing the World (7th E...

Statistics

ISBN:9780134683416

Author:Ron Larson, Betsy Farber

Publisher:PEARSON

Text book image

The Basic Practice of Statistics

Statistics

ISBN:9781319042578

Author:David S. Moore, William I. Notz, Michael A. Fligner

Publisher:W. H. Freeman

Text book image

Introduction to the Practice of Statistics

Statistics

ISBN:9781319013387

Author:David S. Moore, George P. McCabe, Bruce A. Craig

Publisher:W. H. Freeman

SEE MORE TEXTBOOKS