HW 6-AB Testing

pdf

School

University of North Georgia, Dahlonega *

*We aren’t endorsed by this school

Course

1001

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

7

Uploaded by ProfessorIron11938

Report
HW 6-AB Testing November 17, 2023 1 Homework 6 Make sure you run the two code blocks below. The first initializes key Python tools. The second imports the dataset which includes personality variables collected from UNG students. [2]: from datascience import * import numpy as np % matplotlib inline import matplotlib.pyplot as plots plots . style . use( 'fivethirtyeight' ) from scipy import stats [3]: pers = Table . read_table( 'personality.csv' ) pers . show( 5 ) <IPython.core.display.HTML object> For details about all the variables in the dataset, you may download the following PDF: Personal- ity_Variables 1.1 A/B Testing At UNG, are males more narcisstic than females? If we have the correct data, we can answer this question with an A/B Test. We need a numeric variable and a grouping variable. The A/B refers to two different groups. In this dataset, biological sex has two options: male or female. Narcissism is the numeric variable. Our textbook covers A/B testing in Chapter 18. In statistics, the same test is called an independent samples t-test. If you are unfamiliar with the personality trait of narcissism, Psychology Today has an overview in the first three paragraphs. 1.1.1 Question 1. Select answer choice 1, 2 or 3. 1. There will be no difference in the levels of narcissism between males and females. 2. Males will exhibit higher levels of narcissism. 1
3. Females will exhibit higher levels of narcissism. In the cell below, indicate your answer choice of 1, 2 or 3 follwed by an explanation of your reasoning. #2 I think males will exhibit higher levels of narcissim because that’s what I infer. 1.2 Tools Here is a toolkit to help you. We need three programs: • ab_diff • ab_shuff • ab_hist The first finds differences between the means for groups A and B . The second shuffles the grouping variable column. The third shows a picture of the results. 1.2.1 ab_diff [4]: def ab_diff (tab): tab . group( 0 ,np . average) a_mean = tab . group( 0 ,np . average) . column( 1 ) . item( 0 ) b_mean = tab . group( 0 ,np . average) . column( 1 ) . item( 1 ) return a_mean - b_mean Notes - The input for our ab_diff function is a table. - The function finds the difference in A/B group means. - The function expects the first column to be the grouping variable. - The function expects the second column to be the numeric variable. - The output of the function is a number indicating the mean difference between groups A and B. To demonstrate how it works, let’s create a 2-column table called narc where narcissism is the numeric variable and biological sex is the grouping variable. [5]: narc = pers . select( 'Sex' , 'Narc' ) narc . show( 5 ) <IPython.core.display.HTML object> Now, let’s apply ab_diff to our new table. [6]: ab_diff(narc) [6]: -1.7449631449631444 Notice that ab_diff proceeds alphabetically and that this takes “female mean” minus “male mean”. 1.3 ab_shuffle Let’s create the ab_shuffle function. Simply execute the code block below. 2
[7]: def ab_shuffle (tab): shuffle_group = tab . sample(with_replacement = False ) . column( 0 ) shuffled_tab = tab . with_column( "Shuffled Grouping" ,shuffle_group) . , select( 2 , 1 ) return shuffled_tab Notes - The input for our ab-shuffle function is a table. - The function shuffles the first column. - The function then returns a table with two columns. - The returned table’s first column is the grouping variable. - The returned table’s second column should be the numeric variable. To illustrate, let’s create a table called ‘narc’ using the ‘personality’ dataset. Execute the code block below with no changes. Now, we can apply our ab_shuffle function to our new table narc . [8]: ab_shuffle(narc) . show( 5 ) <IPython.core.display.HTML object> Note that the Grouping variable column is shuffled. Try executing the code block several times, and notice how the shuffling of the first column labels is different each time. The data in the second column is unchanged. We can use both of our new functions at once to find the difference in means in our shuffled table. Execute the code block below to see that in action. [9]: ab_diff(ab_shuffle(narc)) [9]: -0.03341523341523356 Run the code block several times and note how the output changes. 1.4 ab_hist Let’s create our last function, one that will display a visual representation of our results from the A/B test. Execute the code block below. [10]: def ab_hist (myArray, observed_value): tab = Table() . with_column( 'A/B Differencs' ,myArray) tab . hist( 0 ) _ = plots . plot([observed_value, observed_value], [ 0 , 0.1 ], color = 'red' , , lw =2 ) Notes - The function input requires two things, an array and an observed value. - We run a “for loop” to generate the array. - The observed value is the ab_diff from the original data in narc . 1.5 The for loop First, let’s create a variable observed_diff which will be the difference in groups A and B in the original narc dataset. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
1.5.1 Question 2 In the code block below, define observed_diff . Hint, use ab_diff to make life easy. [11]: observed_diff = ab_diff(narc) observed_diff [11]: -1.7449631449631444 [11]: # showing what we expect students to code # this code block will be deleted in final Homework file observed_diff = ab_diff(narc) observed_diff [11]: -1.7449631449631444 Now, we need to create an array of output values. We want to shuffle the grouping variable over and over, and the array will record the results of the mean difference each time. The code block below is our for loop. You will need to set the reps variable based on your computer and your internet connection. [20]: narc_diffs = make_array() # Set reps at 1,000 or less especially if running this in the cloud. reps = 2500 for i in range (reps): new_diff = ab_diff(ab_shuffle(narc)) narc_diffs = np . append(narc_diffs, new_diff) To use the above code on another data set, change the variable name narc_diffs and change the table from narc to the table you wish to read data from. [25]: ab_hist(narc_diffs,observed_diff) 4
View the output above. The red vertical line shows the observed value. The histogram shows the distribution of the shuffled differences. 1.5.2 Question 3 What is the probability that the red line is a value in the distribution? Estimate the likelihood for a value like this one to appear at random in the distribution shown in the histogram. Write your answer and a brief justification below. My estimate of the probability is .5 We can calculate the exact probability. Execute the two code blocks below. [26]: sum (narc_diffs <= observed_diff) [26]: 0 [23]: p_value = sum (narc_diffs <= observed_diff) / reps p_value [23]: 0.0 We check to see how often the values in the narc_diffs distribution are at least as low as the observed_diff . The sum function counts how many values meet the criteria. The first code block shows this evaluation. The second code block calculates the p-value. Notice that the probability is p 0 . 5
1.5.3 Question 4 Where do you think the red line would fall in the histogram if the p-value were p = 0 . 25 ? What if p = 0 . 8 or p = 0 . 9 ? Explain your reasoning in the block below. If the p-value were p = 0 . 25 , I believe it would go up with the difference. 1.5.4 Question 5 Given the p-value, what do you think of your guess in Question 1? Is there a difference in the levels of narcissism between males and females? Why do you think so? Are males or females (or neither) more narcissistic? After seeing the results of A/B testing, I believe that there is a differnece and males are in fact more narcissistic. 1.6 Practice Problems Pick at least one of the following practice problems. In the code blocks below, run the indicated independent samples t-test. Calculate an exact p-value, and say what you think the output means in the real world. 1. Using the exact same procedure, test for a difference between male and female perfectionsim scores. The following code will help extract the table you need to work with. perf = personality.select('Sex','Perf') perf.show(5) 2. Use an A/B test to determine if older students drink more caffeine than younger students. The G21 variable has Yes/No responses to the question, “Are you 21 years old or older?” The following code will help get the needed table. caff = personality.select('Sex','Caffeine') caff.show(5) 3. Use an A/B test to determine if older students are less naive about relationships. The variable TxRel is based on the “Toxic Relationship Beliefs” scale. The G21 variable has Yes/No responses to the question, “Are you 21 years old or older?” Higher TxRel scores indicate the person is more naive aobut relationships. Determine if younger students are more naive about relationships. The following code will help get the needed table. rel = personality.select('Sex','TxRel') rel.show(5) 4. Use an A/B test to determine if the thrill-seeking behaviors of males and females is different. The numeric variable Thrill indicates a person’s interesting thrill-seeking behaviors with higher scores indicating more interest. The following code will help get the needed table. thrill = personality.select('Sex','Thrill') thrill.show(5) [27]: perf = personality . select( 'Sex' , 'Perf' ) perf . show( 5 ) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
, --------------------------------------------------------------------------- NameError Traceback (most recent call , last) <ipython-input-27-da41168dab84> in <module> ----> 1 perf = personality.select('Sex','Perf') 2 perf.show(5) NameError: name 'personality' is not defined [28]: thrill = personality . select( 'Sex' , 'Thrill' ) thrill . show( 5 ) , --------------------------------------------------------------------------- NameError Traceback (most recent call , last) <ipython-input-28-386c3c38ef0b> in <module> ----> 1 thrill = personality.select('Sex','Thrill') 2 thrill.show(5) NameError: name 'personality' is not defined [ ]: [ ]: [ ]: 7