Lab Report Statistical Analysis on Genetic Data - Rayana Rankoussi (1)
docx
keyboard_arrow_up
School
Northern Virginia Community College *
*We aren’t endorsed by this school
Course
311
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
6
Uploaded by BarristerTree5917
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
Table 4-1
served as the basis for the subsequent data analysis and calculations in the following
tables. It contained essential information derived directly from the provided Ears of Corn. Unlike other tables, there was no predetermined or expected data on this table; it primarily functioned as a repository for raw data collected during the experiment.
Table 4-2, much like its predecessor, played a fundamental role in shaping the subsequent tables' data and calculations. It operated as a compilation of information gathered directly from
the provided Ears of Corn. Similar to Table 4-1, this table did not include any expected data, focusing solely on presenting the collected data in a structured format for further analysis in the subsequent tables.
4-3a
For this table the expected value was 64, I was able to reach this conclusion by dividing the total corn count by how many specific values there were: 256 total corn / 4 values. The observed values were off by a bit, for instance, Red and Smooth had an observed total of 70 In table 4-3a, the total number of counted corn was 256. After calculations were done, it was seen that table 4-3a categorized itself into a 1:1:1:1 ratio; with the expected value becoming 64:64:64:64 of an equal value of red smooth, red wrinkled, yellow smooth and yellow wrinkled corn. Although the data of expected and observed were not completely equal in number, the observed values were very close or less than the expected data. This statement is true because when it comes to “failing to reject” a table, usually the observed data is either less than the expected data or either equal to/very close. Hence,we chose to fail to reject.
4-3b
Similarly in 4-3b, the total count of the small sample of corn was 30. To determine the expected values, I divided the total count by the number of categories, resulting in an expected 1:1:1:1 ratio for red smooth, red wrinkled, yellow smooth, and yellow wrinkled corn, with each category
ideally having 7.5 instances. However, the observed values deviated slightly. Despite minor discrepancies, the observed values closely approximated or were less than the expected values. Typically, when "failing to reject" a table, the observed data is either less than or very close to the expected data. This lack of significant difference suggests no effects to reject, acknowledging
the possibility of errors in data collection such as miscounting or miscalculating. Comparing the differences in 4-3 a & b, it is seen that the bigger the sample the more close to accurate the data was. The overall decision to fail to reject both tables is significant enough to say the data was both very close to our expectations. For example the Yellow and Smooth
data was off by a calculated 3.07. The total chi-square for the large sample was 2.37 vs the small
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
sample was 3.96. This evidence further reasons how the larger sample is more accurate to the expected data.
4-4a
The results conducted in this table were based off of Ear #2 (
large sample)
. The total value for the corn counted was 242, and to get the expected we used the ratio 9:3:3:1 and put them over the total of 16 and then multiplied by the total according to each value; for instance, I multiplied 128 observed Red and Smooth by 9/16 since the ratio stated 9 Red and Smooth.
The data I compiled left me with very similar results and each value sort of matched what was expected. The null hypothesis was failed to reject because the data was not far off. The chi-square for this table was a 2.13. 4-4b
For this table we had a different situation which actually led to rejecting the data. The priori chi-
square test conducted left us with an expected value of 60.5 and the observed value was far off in
this case. Although the sample was large the ratio of 1:1:1:1 was far off from the ratio that we observed. The null hypothesis was the REjected due to a chi-square total of 114.2 which is significantly less than 0.01 on the table of chi-square values. There could have been the possibility of errors in data collection such as miscounting and/or miscalculating that could have brought this rejection.
The difference in probabilities could be looked at in a number of ways but when it comes to viewing the differences in probabilities obtained for correct versus false hypotheses, Table 4-4a had included a correct-expected ratio whereas Table 4-4b included false hypotheses for a false
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
ratio. We expected to "Fail to reject" table 4-4b and "Reject" table 4-4a, by the end it was the complete opposite case. 4-5
This table was a contingency chi-square analysis of data, the null hypothesis I accumulated was: “The number of bead colors and bag number are independent of each other. The expected data was found by taking the column total and multiplying it by the row total and dividing that number by the Group Total. There is a different expected value for each observed value. The expected values were far off from our observations so we proceeded to Reject. There could have been a number of reasons as to why we would reject but the most common mistake I tink was made in the miscalculations as the table was very big and had high chances for error. As well, there could have been a simple mistake of miscounting the data we received.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
4-6a For this contingency table the null hypothesis was: “The texture and color are independent from each other of Corn Ear #1”. This null hypothesis was failed to reject due to the expected value we collected; the collection of data was the same as stated in table 4-5 and this time the data was more accurate to what we had observed. The chi square value was 1.62 with a chi-square table value of around 0.20-0.10 further explaining how the table failed to reject.
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
4-6b
The last table was a contingency chi-square analysis of Corn Ear #2, the expected values were found the same as the previous two tables: the column total and multiplying it by the row total and dividing that number by the Group Total. These values were different for each observed value and left us with almost the same values as observed. The null hypothesis was: “The texture
and the color are independent from each other of Corn Ear #2. The chi-square value was 0.48 which is significantly low, giving us a probability around 50-30%. All of these results posed as a reason to Fail to Reject the null hypothesis.
Rayana Rankoussi
Lab Report : Statistical Analysis on Genetic Data
02/15/2024
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
A study was undertaken in a hospital to compare the performance of a new surveillance method to the standard method in the ability to detect patients who develop an infection after a caesarean section. Individual record data from this study is provided in the file called surveillence.dta.
The dataset contains the following variables:
ID: Patient identification number new:
Assessment of cases by the new surveillance method (1=Positive, 2=Negative) standard:
Assessment of cases by the standard method (1=Positive, 2=Negative)
ID
new
standard
1
negative
negative
2
negative
negative
3
negative
negative
4
negative
negative
5
negative
negative
6
negative
negative
7
negative
negative
8
negative
negative
9
negative
negative
10
negative
negative
11
negative
negative
12
negative
negative
13
negative
negative
14
negative
negative
15
negative
negative
16
negative…
arrow_forward
Globally, numerous people died in road traffic crashes every year. For the betterment of the public at
large, ways to alleviate the frequency and severity of traffic crashes have been the primary concerns of
many governments. In fact, many traffic crashes can be avoided with the implementation of effective
policies and regulations.
Assume that a dataset is collected for traffic crash analysis. The dataset contains data of severe traffic
crashes between 2015 and 2019 in Australia. The description of the dataset is listed in Table 1.
Suggest two (2) additional variables that can be included in traffic crash analysis. Explain the rationale of
their inclusion.
Table 1. Description of the dataset
Field
Description
Experience
The driving experience of the driver (Under 7 years/7-
14 years/Over 14 years)
Whether the crash involves fatigue driving (True/False)
Whether the crash involves drunk or drug driving
(True/False)
The type of road where the crash occurs (E.g.,
Highway, Freeway, etc)
The…
arrow_forward
Demarion Santana is interested in studying the growth rates of capsicum annuum (pepper) plants under bridges in California. He randomly selects 50 bridges in California and records the frequency, height, and number of peppers on any plants growing under these bridges. His results were published in the Almanac of Questionable Statistics, Vol 1 (2016). What is the population in this study?
Question 1 options:
1)
Readers of the Almanac of Questionable Statistics
2)
All capsicum annuum plants growing anywhere in California
3)
Capsicum annuum plants growing under the 50 bridges randomly selected to be included in the study
4)
All capsicum annuum plants growing under bridges in California
arrow_forward
A professor suspects that recent testing will lead to greater retention of course material in a statistics course.to examine this, students in one session of PSYC stats are given a short quiz every week. Students in the second section of the PSYC stats take only one mid term and final. At the end of the quarter, both sections receive the final exam, and the scores are summarized below. Does the data indicate more frequent testing will produce and effect on the final exam score? Use a two-tail test at .05 level of significance. Show your work for each step
Frequent exams
N= 22
Sample means = 85
Ss=350
Frequent exams
N= 20
Sample means = 79
Ss=1050
Calculate Cohens d or argue why its not appropriate in this scenario?
Name the research design used?
Write the results in APA format
arrow_forward
One variety of corn is genetically modified to produce chemicals for defense against predators. The standard corn contains the same standard genotype, but without genetic modification for the chemicals. The two types of corn are then planted in randomly placed adjacent plots across a single field. The amount of predators that are present when the corn has matured are then recorded. The data that was recorded of the predators present among each type are listed.
Has genetic alterations decreased the amount of predators for the altered genotype? Normality tests for the populations of both contain a p value less than 0.05. The given alpha is 0.05
Modified Corn (20, 18, 26, 23, 17, 23, 22)
Standard Corn (89, 64, 102, 96, 56, 77, 84, 59)
use an F test to determine if the variance between the 2 lists are different. Based on this result, state the correct 2 sample parametric test to be used. State the null and alternative. Show calculations while rounding to two decimal points.What is the…
arrow_forward
Professional Athlete Salaries. From the Statistical Abstract of the United States and the article “Average Salaries in the NBA, NFL, MLB and NHL” by J. Dorish, published on the Yahoo! Contributor Network, we obtained the following data on average professional athletes’ salaries for the years 2005 and 2011.
arrow_forward
The World Health Organization declared the novel coronavirus (COVID-19) outbreak a global pandemic on March 11, 2020. How did population movement change before and after the
COVID-19 pandemic was declared? The United States' Bureau of Transportation has collected data that can help answer this question. In this activity we will focus on traveling
patterns in the state of New York during 2019 and the first three seasons of 2020.
The dataset consists of variables for the state of New York including the month the data were collected, a season year label, and the number of trips less than 1 mile from one's home
as measured by a mobile device's movement. A trip is defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. Movements
with multiple stays of longer than 10 minutes before returning home are counted as multiple trips. The months January, February, and March are categorized as winter; the months
April, May and June are categorized…
arrow_forward
Is Data analysis methods: description of the data analysis methods is clear and appropriate?
Thematic analysis was utilized in which themes from the data of the interviews, questionnaire, and phone call were identified by grouping similar answers into larger categories. Major themes emerged in the process of grouping. These themes were then organized into minor themes. Open coding, or initial coding was utilized for the data where fragments of data such as keywords, incidents, etc. were studied closely for analytic import. "Initial coding continues the interaction that you shared with your participants while collecting data but brings you into an interactive analytic space (Silipigni Connaway, L., &, Radford, M. L., 2021). The software that seemed best for the study that was used was MicroCase, a statistical analysis and data management system, as it developed with social science researchers in mind to utilize, and the area of workplace ageism as derived from social interaction…
arrow_forward
A study is performed to assess the prevalence of use of anti-hypertensive medication (MEDS) in the non-institutionalized U.S. population. The following data are reported for MEDS status of the U.S. population by age-sex group:
Table 1 Estimated MEDS Status of the U.S. Population by age and sex group, NHANES study 1999-2018
Males
Females
Age Group
MEDS = yes
Na
Age Group
MEDS = yes
Na
18-29
160,617
23,069,867
18-29
168,940
22,132,949
30-39
899,152
18,690,754
30-39
759,532
18,937,878
40-49
2,264,304
18,316,125
40-49
2,257,203
18,816,304
50-59
3,215,688
15,022,905
50-59
4,059,363
15,997,653
60-69
2,936,183
8,400,156
60-69
4,260,975
10,993,805
70-79
1,677,664
4,012,930
70-79
2,953,646
6,012,728
aN = total sample size in specific age-sex groups.
a. What is the prevalence of use of MEDS in 40-49 year-old females?
b. Suppose a married couple consists…
arrow_forward
Solve for test statistic
arrow_forward
The 5-year incidence of cardiovascular disease (CVD) in relation to smoking status was determined in a population sample of 1000 men, 18 years and older. At baseline, 30% of the men were classified as being current smokers and 70% as being non-smokers. At the end of the follow-up, 60 CVD events had occurred among the smokers and 70 events among the non-smokers.
Construct a 2 x 2 table based on the data provided
arrow_forward
Fingerprint analysis and blood grouping are features that do not change through the lifetime of an individual. Fingerprint features appear early in the development of a fetus, and blood types are determined by genetics. Therefore, each is considered an effective tool for identification of individuals. These characteristics are also of interest in the discipline of biological anthropology—a scientific discipline concerned with the biological and behavioral aspects of human beings.
The relationship between these characteristics was the subject of a study conducted by biological anthropologists with a simple random sample of male students from a certain region with a large student population. Fingerprint patterns are generally classified as loops, whorls, and arches. The four principal blood types are designated as A, B, AB, and O. The table shows the distribution of fingerprint patterns and blood types for the sample. Expected counts are listed in parentheses. The anthropologists…
arrow_forward
You have been asked to collect data on the effectiveness of the EHR (electronic health record) transition within the 324-bed hospital you work at. Outline your data collection options and determine which one would be most effective for this endeavor. Please explain your selection.
arrow_forward
Your team members were interested in (a) how students were connecting with their professors and lecture content, (b) how in control students felt over their learning, and (c) how much they valued what they were learning during online lectures given the switch from in-person to virtual learning at the start of the pandemic. They collected data by briefly interrupting students at three-time points throughout an online lecture (similar to Dataset 1) to answer three different questions. They were asked “did you generate a personal connection with the information you were learning?” on a scale of 1 (not at all) to 9 (extremely), “I have the impression that my learning the content from this lecture so far has been under my control” on a scale from 1 (strongly disagree) to 5 (strongly agree), and “I believe that the content I learned from this lecture so far has been valuable for me to learn” on a scale from 1 (strongly disagree) to 5 (strongly agree). Each answer that participants provided…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Related Questions
- A study was undertaken in a hospital to compare the performance of a new surveillance method to the standard method in the ability to detect patients who develop an infection after a caesarean section. Individual record data from this study is provided in the file called surveillence.dta. The dataset contains the following variables: ID: Patient identification number new: Assessment of cases by the new surveillance method (1=Positive, 2=Negative) standard: Assessment of cases by the standard method (1=Positive, 2=Negative) ID new standard 1 negative negative 2 negative negative 3 negative negative 4 negative negative 5 negative negative 6 negative negative 7 negative negative 8 negative negative 9 negative negative 10 negative negative 11 negative negative 12 negative negative 13 negative negative 14 negative negative 15 negative negative 16 negative…arrow_forwardGlobally, numerous people died in road traffic crashes every year. For the betterment of the public at large, ways to alleviate the frequency and severity of traffic crashes have been the primary concerns of many governments. In fact, many traffic crashes can be avoided with the implementation of effective policies and regulations. Assume that a dataset is collected for traffic crash analysis. The dataset contains data of severe traffic crashes between 2015 and 2019 in Australia. The description of the dataset is listed in Table 1. Suggest two (2) additional variables that can be included in traffic crash analysis. Explain the rationale of their inclusion. Table 1. Description of the dataset Field Description Experience The driving experience of the driver (Under 7 years/7- 14 years/Over 14 years) Whether the crash involves fatigue driving (True/False) Whether the crash involves drunk or drug driving (True/False) The type of road where the crash occurs (E.g., Highway, Freeway, etc) The…arrow_forwardDemarion Santana is interested in studying the growth rates of capsicum annuum (pepper) plants under bridges in California. He randomly selects 50 bridges in California and records the frequency, height, and number of peppers on any plants growing under these bridges. His results were published in the Almanac of Questionable Statistics, Vol 1 (2016). What is the population in this study? Question 1 options: 1) Readers of the Almanac of Questionable Statistics 2) All capsicum annuum plants growing anywhere in California 3) Capsicum annuum plants growing under the 50 bridges randomly selected to be included in the study 4) All capsicum annuum plants growing under bridges in Californiaarrow_forward
- A professor suspects that recent testing will lead to greater retention of course material in a statistics course.to examine this, students in one session of PSYC stats are given a short quiz every week. Students in the second section of the PSYC stats take only one mid term and final. At the end of the quarter, both sections receive the final exam, and the scores are summarized below. Does the data indicate more frequent testing will produce and effect on the final exam score? Use a two-tail test at .05 level of significance. Show your work for each step Frequent exams N= 22 Sample means = 85 Ss=350 Frequent exams N= 20 Sample means = 79 Ss=1050 Calculate Cohens d or argue why its not appropriate in this scenario? Name the research design used? Write the results in APA formatarrow_forwardOne variety of corn is genetically modified to produce chemicals for defense against predators. The standard corn contains the same standard genotype, but without genetic modification for the chemicals. The two types of corn are then planted in randomly placed adjacent plots across a single field. The amount of predators that are present when the corn has matured are then recorded. The data that was recorded of the predators present among each type are listed. Has genetic alterations decreased the amount of predators for the altered genotype? Normality tests for the populations of both contain a p value less than 0.05. The given alpha is 0.05 Modified Corn (20, 18, 26, 23, 17, 23, 22) Standard Corn (89, 64, 102, 96, 56, 77, 84, 59) use an F test to determine if the variance between the 2 lists are different. Based on this result, state the correct 2 sample parametric test to be used. State the null and alternative. Show calculations while rounding to two decimal points.What is the…arrow_forwardProfessional Athlete Salaries. From the Statistical Abstract of the United States and the article “Average Salaries in the NBA, NFL, MLB and NHL” by J. Dorish, published on the Yahoo! Contributor Network, we obtained the following data on average professional athletes’ salaries for the years 2005 and 2011.arrow_forward
- The World Health Organization declared the novel coronavirus (COVID-19) outbreak a global pandemic on March 11, 2020. How did population movement change before and after the COVID-19 pandemic was declared? The United States' Bureau of Transportation has collected data that can help answer this question. In this activity we will focus on traveling patterns in the state of New York during 2019 and the first three seasons of 2020. The dataset consists of variables for the state of New York including the month the data were collected, a season year label, and the number of trips less than 1 mile from one's home as measured by a mobile device's movement. A trip is defined as movements that include a stay of longer than 10 minutes at an anonymized location away from home. Movements with multiple stays of longer than 10 minutes before returning home are counted as multiple trips. The months January, February, and March are categorized as winter; the months April, May and June are categorized…arrow_forwardIs Data analysis methods: description of the data analysis methods is clear and appropriate? Thematic analysis was utilized in which themes from the data of the interviews, questionnaire, and phone call were identified by grouping similar answers into larger categories. Major themes emerged in the process of grouping. These themes were then organized into minor themes. Open coding, or initial coding was utilized for the data where fragments of data such as keywords, incidents, etc. were studied closely for analytic import. "Initial coding continues the interaction that you shared with your participants while collecting data but brings you into an interactive analytic space (Silipigni Connaway, L., &, Radford, M. L., 2021). The software that seemed best for the study that was used was MicroCase, a statistical analysis and data management system, as it developed with social science researchers in mind to utilize, and the area of workplace ageism as derived from social interaction…arrow_forwardA study is performed to assess the prevalence of use of anti-hypertensive medication (MEDS) in the non-institutionalized U.S. population. The following data are reported for MEDS status of the U.S. population by age-sex group: Table 1 Estimated MEDS Status of the U.S. Population by age and sex group, NHANES study 1999-2018 Males Females Age Group MEDS = yes Na Age Group MEDS = yes Na 18-29 160,617 23,069,867 18-29 168,940 22,132,949 30-39 899,152 18,690,754 30-39 759,532 18,937,878 40-49 2,264,304 18,316,125 40-49 2,257,203 18,816,304 50-59 3,215,688 15,022,905 50-59 4,059,363 15,997,653 60-69 2,936,183 8,400,156 60-69 4,260,975 10,993,805 70-79 1,677,664 4,012,930 70-79 2,953,646 6,012,728 aN = total sample size in specific age-sex groups. a. What is the prevalence of use of MEDS in 40-49 year-old females? b. Suppose a married couple consists…arrow_forward
- Solve for test statisticarrow_forwardThe 5-year incidence of cardiovascular disease (CVD) in relation to smoking status was determined in a population sample of 1000 men, 18 years and older. At baseline, 30% of the men were classified as being current smokers and 70% as being non-smokers. At the end of the follow-up, 60 CVD events had occurred among the smokers and 70 events among the non-smokers. Construct a 2 x 2 table based on the data providedarrow_forwardFingerprint analysis and blood grouping are features that do not change through the lifetime of an individual. Fingerprint features appear early in the development of a fetus, and blood types are determined by genetics. Therefore, each is considered an effective tool for identification of individuals. These characteristics are also of interest in the discipline of biological anthropology—a scientific discipline concerned with the biological and behavioral aspects of human beings. The relationship between these characteristics was the subject of a study conducted by biological anthropologists with a simple random sample of male students from a certain region with a large student population. Fingerprint patterns are generally classified as loops, whorls, and arches. The four principal blood types are designated as A, B, AB, and O. The table shows the distribution of fingerprint patterns and blood types for the sample. Expected counts are listed in parentheses. The anthropologists…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Linear Algebra: A Modern IntroductionAlgebraISBN:9781285463247Author:David PoolePublisher:Cengage LearningBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning

Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt