HW2_SOHAIB_KHURAM_09122023

docx

School

Virginia Commonwealth University *

*We aren’t endorsed by this school

Course

500

Subject

Mechanical Engineering

Date

Dec 6, 2023

Type

docx

Pages

7

Uploaded by SohaibK7

Report
BIOS 500 Homework 2 Name: Sohaib Khuram_ Although you are encouraged to work together in groups, collaboration does not mean copying a classmate’s response to a question. What you turn in must represent your understanding of the concepts learned. 1. Choose the correct answer. (a) Which requires the smallest sample for rare diseases: case-control, cohort or RCT? (b) Which is better suited for rare exposures: case-control or cohort? 2. Read the following paper published in the Annals of Internal Medicine : https://www.acpjournals.org/doi/pdf/10.7326/0003-4819-134-10-200105150-00009 (a) PICOT is an acronym for population, intervention, comparison, outcome and time. P for population . Why is it inadequate to simply say that the population in this study consists of actors and actresses? Because the target population is actors and actresses in movies that were nominated for Academy Awards and the comparison group are other actors in the same film who did not get nominated . To be in a high profile movie includes a different population than what is considered the larger population of actors as a whole. The health behaviors of someone who acts in home movies or small independent projects are likely to differ vastly from academy award winning stars. (b) I for intervention . Was there an intervention in the study? If ‘yes’, identify the intervention. If ‘no’, identify the exposure. There is no intervention that is controlled for but rather the presence of an exposure, which is being an actor in a movie that won an academy award . (c) C for comparison . Describe the groups that were compared. Other actors in an academy award winning movie that did not receive any awards/recognition for their role (d) O for outcome . What was the outcome? All cause mortality and life expectancy (e) T for time . What was baseline time? How long were the subjects tracked? This was a retrospective cohort analysis and median duration of follow -up was 66 years which suggests that subjects were tracked from birth for an average of 66 years. 1
(f) The researchers compared the life expectancy of the two groups and found that life expectancy was 3.9 years longer for Academy Award nominees than for other, less recognized performers (79.7 vs 75.8 years; p = 0 . 003 ). Even after adjusting for birth year, sex, ethnicity, birth country, possible name change, age at release of first film, and total films in career, the survival advantage of Academy Award nominees remained unchanged. It is very important to pay attention to the role of time in every longitudinal study, as that understanding of time may affect the data analysis and thus the study conclusions. Briefly explain why the results of this published research are questionable. (This is similar to the heart transplant example discussed in the slides.) One possible consideration is survivorship bias in that the behaviors of an individual that led to them being an academy award nominee are potentially the deciding factor as opposed to the academy award nomination itself. Other considerations might be changes in technology and health practices since this study tracks people for over 60 years in their health outcomes which might account for the differences in life expectancy. 3. Shape of distribution . Do Problem 2.98b (p. 76). (a) The data is right skewed as its shape and distribution 4. (a) To show the distribution of a categorical variable (e.g., race), we draw bar graphs or pie charts (b) To show the distribution of a quantitative variable (e.g., body temperature), we draw box plot, dot plot, stem-and-leaf plot or histograms . 5. Quantitative variables may be ratio or interval. Shoe size is an interval variable. Why is shoe size not a ratio variable? (a) While differences between shoe size is an established interval, there is no “true zero” in shoe sizes indicating the absence of shoe size 6. Relative-frequency histograms are better than frequency histograms when com- paring two data sets. Why? The normalization introduced by relative-frequency histogram turns the data representation to a proportion of each class relative to total counts, therefor it’s easier to get visual comparisons while also avoiding the pitfalls of comparing raw counts of data with very different sizes. 2
7. What is the main point of Example 2.24 in the textbook (p. 74)? (a) To show the power of sampling. Given we can’t always know population distributions, we see that by taking random samples from a known population distribution results in the same trend so that in other forms of research, we can rely on sampling parts of the population as opposed to the whole to get an understanding of trends in data more easily. 8. Does the graph below suggest old people drive safer? Why or why not? How do you fix the graph? It’s inappropriate to suggest that old people drive safer from this graph since it seems to be showing raw counts while not comparing each age group as a proportion of drivers for each category. There are likely fewer older drivers so those smaller numbers might represent a larger portion of population getting into crashes. Turn this into a relative frequency bar chart based on proportion of drivers for each age group getting into crashes. 9. Airlines are often compared for safety. It would be silly to simply rank these airlines based on the number of plane crashes each had in the past 10 years or so. We need to divide the number of plane crashes by some other variable. Will this variable be the number of flights flown, the number of aircraft the airline has, the number of passengers flown, or something else? Explain briefly. Is the ratio you obtain a proportion or a rate? (a) I would use something like total number of flights flown since this is the primary exposure in plane crashes. More flights flown equals more chances for a crash. Also an airline with 1,000,000 flights with 10 crashes has a lower rate of crashes (1 in a 100,000) compared to an airline with 10,000 flights with 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
crashes (1 in 2000). The ratio obtained from this metric would be considered a rate, specifically incidence rate because it describes frequency of an event as opposed to the proportion of a population having a certain experience. 10. Fill in each blank with ‘mean’ or ‘median’. (a) When calculating the Mean _ , the value of every observation is taken into account. (b) Many of the tests that statisticians conduct are based on the Mean . (c) The _ Mean _ is sensitive to outliers, particularly in small samples. (d) When you suspect that one or two values in the data are not accurate and you have no way of fixing the problem, it is better to use the median as a measure of center. (e) Lab instruments are reliable when they generate values within specified ranges. Outside the range, the values are said to be inaccurate and they are set by the lab technician to a default minimum or maximum value. In this situation, it is better to use the median as a measure of center. 11. Write a LIBNAME statement to read the BODYTEMP data set into SAS. (a) LIBNAME BIOS500 ‘S:\course\BIOS500\Binongo\2023_Lab\Datasets\’; PROC CONTENTS data=BIOS500.bodytemp; Run;
(b) Draw side-by-side box plots showing the distribution of TEMPERATURE for each sex category. Attach the box plot. (c) Draw side-by-side box plots showing the distribution of HEARTRATE for each sex category. Attach the box plot. (d) Complete the table below by filling in the mean standard deviation in the ap- propriate column. ±
Summary Statistic Variable Entire Sample ( n =130 ) Females ( n 65) Males ( n =65 ) Heart rate (beats per minute) 73.76+-7.06 74.15 +- 8.11 73.37 +- 5.88 Body t em peratur e ( F) 98.25+-.73 98.4+-.74 98.1+-.70 Variables are summarized using mean ± standard deviation. (e) Is there reason to believe that te M pe R at UR e is not normally distributed? Is there reason to believe that hea R t R ate is not normally distributed? Use the box plot and the skewness and kurtosis statistics to justify your answers. Given the following data, we see that heart rate is slightly left skewed (negative value) and has a slightly flatter peak compared to a normal curve, but is a normal distribution Temperature is also slightly left skewed and has a sharper peak due to the positive kurtosis value but is also a normal distribution since these skewness and kurtosis values are very close to zero. (f) Calculate the te M pe R at UR e and hea R t R ate coefficients of variation. Can you com- pare the two statistics? Explain briefly. CV(heartrate) = 9.57 and CV(Temp) = 0.75; based on this heart rate has greater variability from the mean compared to temperature. So the difference between standard deviations for heart rate are larger than they are for temp. (g) Do you think I would object if you were to replace the mean standard deviation with median (quartile 1 , quartile 3 )? Explain briefly. While median with Q1 and Q3 would certainly give a picture of the majority of the distribution, it certainly does not account for as much information as the mean +- standard deviation metric. Q1 mediant Q3 represents the 25 percentile and 75 th percentile with the most common value while mean+-standard deviation gives ranges for % proportion of distribution and the ranges of these distributions. 12. Did you use an artificial intelligence (AI) software to complete this assignment? If you did, describe how you used the tool and write the prompts you used to generate results. I used AI lines to quickly look up SAS code and values to alter it. Prompts include: How to create a box plot in SAS, how to get ±
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
kurtosis and skewness values in proc means for SAS, and what are the general rules of thumb for kurtosis and skewness in finding out if a distribution is normal or not.