Tutorial Assign 1 - Desc Stats1

docx

School

Mt. Kenya University *

*We aren’t endorsed by this school

Course

910

Subject

Health Science

Date

Nov 24, 2024

Type

docx

Pages

9

Uploaded by SuperStrawOyster23

Report
HTH SCI 2S03 Tutorial Assignment #1 1. The food frequency questionnaire (FFQ) is a tool often used in epidemiology studies to assess food consumption. A person is asked to write down the number of servings per day typically eaten in the past year of over 100 individual food items. A food- consumption table is then used to compute nutrient intakes for key nutrients (e.g., protein, fat, calories) based on aggregating response for individual foods. The FFQ is inexpensive to administer, but is considered less accurate than the diet record (DR) – the ‘gold standard’ for diet epidemiological studies. For the DR, the participant writes down the amount of each specific food eaten over the past week in a food diary and a nutritionist uses a special computer program to compute nutrient intakes from the food diaries. To validate the FFQ, 173 nurses participating in the Nurses’ Health Study completed 4 weeks of diet recording about equally spaced over a 12-month period and an FFQ at the end of diet recording. The excel data file called VALID.xlsx contains the data for this study and is located in the Data Sets in the Contents of A2L. It contains the following measures: saturated fat (grams) from the DR (sfat_dr) and FFQ (sfat_ffq), total fat (grams) from the DR (tfat_dr) and FFQ (tfat_ffq), alcohol consumption from (ounces) the DR (alco_DR) and FFQ (alco_ffq) and total calories from the DR (cal_dr) and FFQ (cal_ffq). Use this data set to answer the questions below. a. Create an appropriate graphical display to show the distribution of saturated fat data (sfat_ffq) for the FFQ. One appropriate graphical display to show the distribution of saturated fat data (sfat_ffq) for the FFQ is a histogram. This will allow us to see the frequency distribution of different saturated fat values in the data. 1
HTH SCI 2S03 Tutorial Assignment #1 b. Compute the most appropriate measure of central tendency and accompanying measure of dispersion for the saturated fat data (sfat_ffq) that you graphed in question a. Explain why you chose these measures and show your calculations. To determine the most appropriate measure of central tendency and accompanying measure of dispersion for the saturated fat data (sfat_ffq), we need to consider the characteristics of the data and the goal of the analysis. Here I choose to use mean and Standard deviation respectively. By using descriptive statistics, I found the values as shown on the table below. From descriptive Statistics Mean 21.91561 The measure of central tendency Standard Deviation 9.275395 accompanying measure of dispersion Calculations Mean = X = X N = sum of sfat_ffq / 173 = 3791.4/ 173 = 21.91561 Standard Deviation = σ = ( x i μ ) 2 N x i = ¿ each value from the population μ = ¿ mean of population = 21.91561 N = ¿ size of population = 173 Replacing the values in the equation we get the Standard Deviation 9.275395. The excel was used to calculate this. Why I chose these measures Since the data is continuous (measured in grams), a commonly used measure of central tendency is the mean. Since the data is measured on a continuous scale, the most appropriate measure of dispersion is the standard deviation. 2
HTH SCI 2S03 Tutorial Assignment #1 c. Compute an alternate measure of central tendency for sfat_ffq other than the one you chose for question b. Is this value higher than, lower than, or equal to the measure you calculated for question b? Is the direction of the difference for the two measures as expected, given the graphical display that you created in question a? Here we can choose mode as the alternate measure of central tendency for sfat_ffq From descriptive statistics we get; From descriptive Statistics Mode 18.8 The mode is the value that occurs most frequently in a set of data. In this case, the mode is 18.80, which occurs 4 times. The mode value is lower than the mean value that was previously calculated. This is expected because the data is positively skewed, meaning that there are more values on the right side of the distribution. So, the most frequently occurring value (mode) is lower than the average value (mean). This difference is reflected in the histogram, where the mode is represented by the highest bar and the mean value is slightly to the right of that bar. d. Create an appropriate graphical display to relate alcohol consumption measured using the dietary record (alco_dr) with alcohol consumption measured using the FFQ (alco_ffq). A scatter plot can be an appropriate graphical display. This will allow us to see the relationship between the two variables and identify any patterns or trends. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
HTH SCI 2S03 Tutorial Assignment #1 0 10 20 30 40 50 60 0 10 20 30 40 50 60 70 alco_ffq by alco_dr Scatter plot alco_dr alco_ffq e. Based on your assessment of the graphical display created for question d, do you think that the FFQ is a reasonably accurate approximation of the DR for alcohol consumption? Why or why not? Yes. Based on the scatter plot created to compare alcohol consumption measured using the dietary record (alco_dr) with alcohol consumption measured using the FFQ (alco_ffq), it can be observed that there is a moderate positive correlation between the two measures. The points on the scatter plot cluster relatively close to a straight line, indicating a reasonably accurate approximation of alcohol consumption between the FFQ and the DR. Therefore, based on the graphical display, the FFQ can be considered a reasonably accurate approximation of the DR for alcohol consumption, but further analysis and validation would be needed to make a definitive conclusion. 2. Using the VALID.xlsx excel data set described in question 1, answer the following questions relating to the total fat consumption (tfat_dr, tfat_ffq). a. There are 173 values in the data set for tfat_dr. You would like to summarize these data more succinctly, and decide to use a frequency table to do this. Create a frequency table that consists of between five and ten intervals to summarize the tfat_dr data. Show both the frequency and relative frequency for the intervals you have chosen. Solution i. First sort from smallest to largest in excel 4
HTH SCI 2S03 Tutorial Assignment #1 ii. Secondly find the range = =MAX(D:D)-MIN(D:D) = 83.93 iii. Determine class width = divide range by 10 = 8.393 round to 10 iv. Determine where to start that is 30 according to our data v. Find frequencies vi. Divide frequencies with the population Interval Frequency Relative Frequency 30-40 4 2% 40-50 18 10% 50-60 31 18% 60-70 44 25% 70-80 39 23% 80-90 17 10% 90-100 13 8% 100-110 5 3% 110-120 2 1% Total 173 100% b. What % of the tfat_dr values in the data set are within 1 standard deviation of the mean for tfat_dr? Descriptive statistics Mean 68.61538 Standard Error 1.23857 Median 68.28 Mode 49.65 Standard Deviation 16.29084 Sample Variance 265.3915 Kurtosis 0.300981 Skewness 0.544292 Range 83.93 Minimum 35.9 Maximum 119.83 Sum 11870.46 Count 173 i. The data set has a mean of 68.62, with a standard deviation of 16.29. To get values that diverge one standard deviation from the mean, we shall perform addition and subtraction of standard deviation from the mean. 5
HTH SCI 2S03 Tutorial Assignment #1 ii. We shall add the standard deviation to the mean 68.62 + 16.29 that gives 84.91. This is the 151st observation. iii. Subtracting the standard deviation from the mean gives us 68.62 - 16.29 = 52.33. This is the 29th observation. iv. To find out how many values are between these two observations, we subtract the position of the 29th observation from the position of the 151st observation: 151 - 29 = 122. v. Since there are 173 total observations in the data set, the percentage of values between the 29th and 151st observation is 122/173 = 0.7052, or approximately 70.52%. c. Create an appropriate graphical display to relate total fat measured using the dietary record (tfat_dr) with total fat measured using the FFQ (tfat_ffq). Does it look like the FFQ is a reasonable approximation of the DR for total fat? 30 40 50 60 70 80 90 100 110 120 130 0 20 40 60 80 100 120 140 160 Scatter plot for tfat_ffq by tfat_dr tfat_dr tfat_ffq Based on the graphical display, there is a strong correlation between the total fat measured using the dietary record (DR) and the total fat measured using the FFQ. The points on the graph are concentrated around the middle, indicating that the FFQ is a reasonable approximation of the DR for total fat. 3. A study was conducted to examine the association between lead exposure and developmental features in children. The excel file containing the data on 99 children is 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
HTH SCI 2S03 Tutorial Assignment #1 called LEAD_MAXFWT.xlsx and is located in the Data Sets folder under Contents in A2L. Children were divided into two lead exposure groups: group 1 (GROUP=1, n=64) were children who lived less than 25 km away from a lead smelter, and group 2 (GROUP=2, n=35) were children who lived more than 25 km away from a lead smelter. One important outcome measure that was captured was Weschler IQ score (IQF). Use these data to answer the questions below. a. Create one box and whisker plot comparing the distribution of the IQF scores for the two groups of children. Does there appear to be a significant difference between IQ scores for the two groups? First we need to cluster the data into Group 1 and Group 2 based on their corresponding data in column 1 (IQF). We will use the formula `=IF(B2=1, A2, 0)` and `=IF(B2=2, A2, 0)`. Yes, it has a significant different. Based on the box and whisker plot, it is evident that there is a significant difference in IQ scores between the two groups of children. The maximum and minimum values for each group are notably different, indicating a disparity in IQ scores. b. As discussed in Session 1, an outlier can be considered any observation that exceeds the value of Q3 + 1.5xIQR and Q1 – 1.5xIQR. Using this definition of an outlier, how many outliers exist in the IQR data (ignore groups)? Show your calculations. To determine the outliers in the IQR data, we need to calculate the upper and lower bounds using the formula: Upper bound = Q3 + 1.5 * IQR Lower bound = Q1 - 1.5 * IQR Find Q1-first quartile, Q3- third quartile, and IQR - interquartile range. Q1 = 84.5 Q3 = 99.5 7
HTH SCI 2S03 Tutorial Assignment #1 IQR = Q3 - Q1 = 99.5 – 84.5 = 15 Next, we can calculate the upper and lower bounds: Upper bound = 99.5 + 1.5 * 15 = 99.5 + 22.5 = 122 Lower bound = 84.5 - 1.5 * 22.5 = 84.5 - 22.5 = 62 An observation greater than 122 or less than 62 is an outlier. From the provided data, we can see the bellow outliers: 125,128,141 (greater than the upper bound of 122) 46,50,56 (less than the lower bound of 62) Thus, there are 6 outliers in the IQR data. 4. Determine the data types or measurement scales for the following: a. Determine whether the following variables are discrete or continuous: i. Length of stay for hospitalizations (in days): Continuous ii. Any emergency department visit in last 6 months (Yes/No): Discrete iii. HbA1c level (in %): Continuous iv. Height (in cm): Continuous v. Concentration of mercury in the blood (ug/L): Continuous vi. Pulse rate (beats per minute): Continuous vii. Temperature (degrees Celsius): Continuous viii. Pain scale (0-9): Discrete ix. Grade level (A+, A, A-,…C-): Discrete x. Diabetes ‘cut-off’ (non-diabetes-fasting glucose < 110 mg/dL, diabetic – fasting glucose ≥ 110 mg/dL): Discrete xi. Number of missing teeth: Discrete xii. Arm length (cm): Continuous b. Determine whether the following variables are nominal, ordinal, interval or ratio: i. Ethnicity (e.g., German, Italian, Iranian): Nominal ii. Temperature (degrees Kelvin): Interval iii. Ph level (degree of acidity in a solution): Interval iv. Satisfaction rating (strongly like, like,…strongly dislike): Ordinal v. Income (in dollars): Ratio vi. Cause of death (injury, chronic illness, infection etc): Nominal vii. BMI category (underweight, normal, overweight, obese): Ordinal viii. Fasting glucose level (mg/dL): Interval ix. Socio-economic status (low, middle, upper): Ordinal x. Blood group (e.g., A, B, AB): Nominal xi. Disease symptoms (e.g., fever, pain, incontinence): Nominal xii. Disease severity (mild, moderate, severe): Ordinal 8
HTH SCI 2S03 Tutorial Assignment #1 References: The questions in this tutorial assignment were adapted from practice questions and companion data sets from Chapter 2 of the textbook by Rosner, B. (2011). Fundamentals of Biostatistics, 7 th Edition. Brooks/Cole Cengage Learning, Boston, MA. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help