HW6_S2022

docx

School

Montana State University *

*We aren’t endorsed by this school

Course

217

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

10

Uploaded by q92r692

Report
Stat 217 - Homework 6 Parker Balestri Due: Friday, March 4th, 2022 by 11pm in Gradescope Background The Oxford English Dictionary defines plagiarism as “the practice of taking someone else’s work or ideas and passing them off as one’s own.” Plagiarism is an unfortunate, but very complex, problem on college campuses, both nationwide and around the world. There are many motivations for this form of cheating, many of which have been extensively examined by researchers, including sex/gender, work ethic, institutional factors (how plagiarism is handled by universities and their faculty), and contextual factors (peer approval/disapproval or perceived severity of punishments). Some researchers also posit that there are cultural components as well, such as how external factors like honor codes are valued. Researchers in Europe wanted to further explore motivations for students to commit plagiarism. In particular, they wanted to examine how easy access to and increased use of technology may influence the decision to plagiarize. They sampled 485 college student volunteers in Central Europe during the 2017/2018 academic year and asked them to complete a survey consisting of several general demographic questions (age, area of study, whether they had a job, whether they had a scholarship, whether their classes were primarily blended or in person) and 56 “motivation to plagiarize” questions. For each motivation question, the student was provided a statement and asked whether they strongly disagreed , disagreed , were unsure , agreed , or strongly agreed . These responses were presented on a Likert scale, where 1 = “strongly disagree” and 5 = “strongly agree.” After removal of incomplete records, there were 417 respondents. For this analysis, we will examine a potential relationship between student opinions regarding the statement “It is easy for me to copy/paste due to contemporary technology” and the statement “I am afraid to fail.” Variables of interest in this dataset include the following: Area : The area of science a student’s degree program fell under ( Technological , Social , Natural ) Work : Whether a student had a job at the time of the study ( Yes or No ) Method : Whether a student attended via blended learning ( Blended learning ) or fully in-person ( Classic learning ) Scholarship : Whether the student was receiving a scholarship at the time of the study ( Yes or No )
Pressure : How much a student agreed with the statement “I am afraid to fail,” measured on a Likert scale with five possible responses ranging from “Strongly disagree” to “Strongly agree” ICT : How much a student agreed with the statement “It is easy for me to copy/paste due to contemporary technology,” measured on a Likert scale with five possible responses ranging from “Strongly disagree” to “Strongly agree.” (Note: ICT stands for “Information Communication Technology”) Data courtesy of: Jereb E, Perc M, Lämmlein B, Jerebic J, Urh M, Podbregar I, et al. (2018) Factors influencing plagiarism in higher education: A comparison of German and Slovene students. PLoS ONE 13(8): e0202252. Instructions Include all output and generated plots to receive full credit for each question. Be sure to bold your answers to questions where relevant, and write in complete sentences. Answers to multiple choice questions may be indicated by typing the selected letter below the choices and bolding that. You should spell check your RMD file by going to “edit-> check spelling”. Also, be sure to proofread your knitted word document to ensure all images and formatting are correctly implemented. Once finished, save as a pdf and submit to the appropriate assignment folder in Gradescope. If you work in a group, please make sure ALL group member names are listed in line 3 and in the Gradescope submission!! ## Code to read in data library (readr) library (tidyverse) cheat <- read_csv ( "plagiarism.csv" ) cheat_sub <- cheat %>% mutate ( Area = factor (Area, labels = c ( "Technological" , "Social" , "Natural" )), Work = factor (Work, labels = c ( "Yes" , "No" )), Method = factor (Method, labels = c ( "Classic learning" , "Blended learning" )), Scholarship = factor (Scholarship, labels = c ( "Yes" , "No" )), Pressure = factor (VAR_5_6, labels = c ( "Strongly disagree" , "Disagree" , "Unsure" , "Agree" , "Strongly agree" )), ICT = factor (VAR_1_1, labels = c ( "Strongly disagree" , "Disagree" , "Unsure" ,
"Agree" , "Strongly agree" ))) %>% select ( c (Area, Work, Method, Scholarship, Pressure, ICT)) %>% drop_na () Variables and Analysis 1. (2pts) Thinking about the types of variables of interest (categorical or quantitative), why are Chi-Square methods appropriate to analyze these data? Both the ICT and pressure variables are categorical. Chi-sqare methods are appropriate for these categorical responses. 2. (2 pts) Why is the Chi-Square Test of Independence the most appropriate test for this research question? Select all that apply. A. There is one large sample from a single population. B. Both the explanatory and response variables are categorical. C. We have several samples from multiple populations. D. The motivation prompts were randomly assigned. Exploratory Data Analysis 3. (4 pts) A tableplot allows us to visually explore trends/patterns across multiple variables at once. Run the code below to create a tableplot for these data. Referencing the plot created, select all of the following statements that are true: library (remotes) remotes :: install_github ( "mtennekes/tabplot" ) #Comment out before knitting library (tabplot) cols_b <- c ( "#053061" , "#2166AC" , "#4393C3" , "#92C5DE" , "#D1E5F0" ) set.seed ( 101120213 ) tableplot (cheat_sub, select = c (Area, Work, Method, Scholarship, Pressure, ICT), pals = list (cols_b), sample = F, nBins = dim (cheat_sub)[ 1 ], colorNA_num = "orange" , numMode = "MB-ML" )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A. Natural science students were more likely to have jobs at the time of the study than social science students. B. Social science students were more likely to be in blended learning classes and have jobs at the time of the study than either natural or technological sciences students. C. Natural and technological sciences students were more likely to agree with the statement “It is easy for me to copy/paste due to contemporary techonology” than social science students. D. Natural science students were more likely to disagree with the statement “Afraid to fail” than technological science students. 4. (2pts) The code below will create a plot for these data. Based on this plot, is there evidence of a relationship between a student’s opinion of the ease of copy/pasting and how much they rated themselves as “afraid to fail”? Reference features of this plot to justify your answer. Those that agree with the pressure statement are more likely to agree witht the ICT statement. library (mosaic) plot ( tally ( ~ Pressure + ICT, data = cheat_sub), las = 1 , main = "Plagiarism Motivations" )
Hypothesis Testing 5. (2 pts) Which is the correct alternative hypothesis for the research question? Select one. A. The population distributions of opinion regarding the ease of copy/pasting vary across levels of opinion about fear of failing for the college students studied. B. There is a relationship between opinion regarding the ease of copy/pasting and opinion about fear of failing for the population of European college students in the 2017/2018 academic year. C. The population distributions of levels of opinions about fear of failing is the same regardless of opinion about fear of failing. D. There is no association between levels of opinions about fear of failing and opinion regarding the ease of copy/pasting for the population of European college students in the 2017/2018 academic year.
6. (2 pt) The code below will create a contingency table with totals to summarize the variables of interest for the research question. Use this to calculate the expected number of students who strongly agree that they are afraid to fail but who are unsure about the ease of copy/pasting another person’s work, under the assumption of the null hypothesis. Do not use the table of expected counts to find this value; show all work based on values in the contingency table. library (mosaic) tally ( ~ Pressure + ICT, data = cheat_sub, margins= T) ## ICT ## Pressure Strongly disagree Disagree Unsure Agree Strongly agree ## Strongly disagree 1 2 4 29 21 ## Disagree 5 6 5 44 24 ## Unsure 2 6 15 39 17 ## Agree 2 12 23 64 52 ## Strongly agree 2 1 10 16 15 ## Total 12 27 57 192 129 ## ICT ## Pressure Total ## Strongly disagree 57 ## Disagree 84 ## Unsure 79 ## Agree 153 ## Strongly agree 44 ## Total 417 (44x57)/(417)=6.0144 7. (2 pts) Calculate the standardized residual for students who strongly agree that they are afraid to fail but who are unsure about the ease of copy/pasting another person’s work. Show all work ( do not use the table of standardized residuals to find this value). (10-6.01438)/(√6.01438)=1.625 8. (2 pts) Calculate the contribution to the X 2 statistic for students who strongly agree that they are afraid to fail but who are unsure about the ease of copy/pasting another person’s work. Show all work. X 2 =1.625 2 =2.641
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
9. (1 pts) True/False: A non-parametric hypothesis test would be appropriate as all of the expected counts are greater than 0, we have two categorical variables, and there is no evidence that observations (students) are dependent on one another. True Regardless of your previous answer, let’s perform the permutation test for these data. The code below will simulate and plot the permutation distribution. set.seed ( 1009210871 ) Tobs <- chisq.test ( tally ( ~ Pressure + ICT, data = cheat_sub)) $ statistic B <- 1000 Tstar <- matrix ( NA , nrow= B) set.seed ( 2020 ) for (b in 1 : B){ Tstar[b] <- chisq.test ( tally ( ~ shuffle (ICT) + Pressure, data= cheat_sub, margins= F)) $ statistic } tibble (Tstar) %>% ggplot ( aes ( x = Tstar)) + geom_histogram ( fill = "grey" , col = "black" , bins = 21 , aes ( y = ..ncount..)) + geom_density ( fill = "blue" , alpha = 0.1 , aes ( y = ..scaled..)) + theme_light () + labs ( y = "Density" , title = "Permutation Distribution" , x = expression (X ^ 2 )) + stat_bin ( aes ( y = ..ncount.., label = ..count..), bins = 21 , geom = "text" , vjust = - 0.5 ) + geom_vline ( xintercept = c (Tobs), col = "red" , lwd = 1 ) + theme ( plot.title = element_text ( hjust = 0.5 , face = "bold" ))
10. (2pts) Based on the graph of the permutation distribution above , which of the following best estimates the p-value for this study? A. 0.920 B. 0.08 C. 0.095 D. 1.4068 Conclusion 11. (2 pts) Which conclusion regarding the null hypothesis is the most appropriate for these data, in the context of this study? Select one. A. There is evidence of a relationship between opinion regarding the ease of copy/pasting and opinion about fear of failing for the population of European college students. B. There is little evidence against that the population distributions of opinion regarding the ease of copy/pasting and opinion about fear of failing for the population of European college students in the 2017/2018 academic year. C. There is very strong evidence that the observed distribution of opinion regarding the ease of copy/pasting and opinion about fear of failing for the population of European college students in this study. D. There is moderate evidence against there being no relationship between opinion regarding the ease of copy/pasting and opinion about fear of failing
for the population of European college students in the 2017/2018 academic year. E. There is some evidence that the population distributions of opinion about fear of failing and opinion regarding the ease of copy/pasting for all college students. 12. (1 pt) True or false: Since we randomly sampled college students, we can generalize back to all college students in the 2017/2018 academic year. True. General Scenario Questions For each of the following scenarios, determine whether a Chi-Square test of Independence or Homogeneity should be performed. Explain/justify your choice for each. 13. (2 pts) In an experiment conducted in Finland, researchers investigated whether the regular use of chewing gum containing xylitol could reduce the risk of a middle ear infection for children in daycare centers (Uhari et al., 1998). The investigators randomly divided 533 children in daycare centers into three groups: one group regularly chewed gum that contained xylitol, another group regularly took xylitol lozenges, and the third group regularly chewed gum that did not contain xylitol. The experiment lasted for 3 months and for each child the researchers recorded whether the child had an ear infection during that period. (Uttz, 2013) A chi-square test of homogeneity should be performed because the data is collected separately between three groups (three types of gum). 14. (2 pts) 1500 participants in a study about alcohol abuse by college students at an unnamed university were asked how often they experienced hangover symptoms after a night of drinking within the last year (a list was provided). Responses were categorized as 0 times, 1-2 times, 3-11 times, 12-51 times, and $$52 times. The researchers want to know if the responses differ between students who live on campus and those who live off campus. A chi-square test of independence should be performed because observational units are collected randomly from one population and there are two categorical variables for each unit. 15. (2 pts) Frequent computer use may be a cause of carpal tunnel syndrome, a painful condition that affects people’s hands and wrists. Researchers hypothesized that people who took typing classes in school would be less likely to suffer from carpal tunnel syndrome than those who did not take typing classes. To examine this hypothesis, the researchers asked a random sample of 3000 people who frequently used computers in their jobs whether they had ever taken typing classes and whether they had ever suffered from carpal tunnel syndrome. (Uttz, 2013)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A chi-square test of independence should be performed because observational units are collected randomly from one population and there are two categorical variables for each unit. 16. (2 pts) In a 1989 study, researchers wanted to test the efficacy of treatments for cocaine addiction. They assigned 72 cocaine abusers to one of three treatments for six weeks. 24 people received lithium carbonate (considered the “usual treatment), 24 received an antidepressant drug, and the remaining 24 received a placebo. After the six week trial period, participants were classified as having relapsed or not. A chi-square test of homogeneity should be performed because the data is colllected by random sampling from each group separately. 17. (5 pts) Please list anyone that you worked on any part of this activity with if their name is not already part of this group’s submission. For tutors, give their name and affiliation (MSC, SmartyCats, etc.). For fellow students, give their name and section number. If you did not utilize any outside resources, simply enter “NA.” N/A