Final Exam for DATA1001: Foundations of Data Science

Final Exam A Semester 1 2023 The University of Sydney School of Mathematics and Statistics DATA1001/1901 Foundations of Data Science June 2023 Lecturers: Di Warren Time Allowed: Reading time — 10 minutes; Writing time — 1.5 hours Exam Conditions: This is a closed-book examination — no material permitted. Writing is not permitted at all during reading time. Family Name: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SID: . . . . . . . . . . . . . . . . . . . . . . . . . . . Other Names: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seat Number: . . . . . . . . . . . . . . . . . Please check that your examination paper is complete (23 pages) and indicate by signing below. I have checked the examination paper and affirm it is complete. Signature: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date: . . . . . . . . . . . . . . . . . . . . . . . . . This examination has two sections: Multiple Choice and Extended Answer. The Multiple Choice Section is worth 50% of the total examination. There are 20 questions. The questions are of equal value. All questions may be attempted. Answers to the Multiple Choice questions must be entered on the Multiple Choice Answer Sheet before the end of the examination. The Extended Answer Section is worth 50% of the total examination. There are 3 questions. The questions are of equal value. All questions may be attempted. Working must be shown. Concept Sheet & Calculators: There is a concept sheet after the last question in this booklet. Calculators may NOT be used. THE QUESTION PAPER MUST NOT BE REMOVED FROM THE EXAMINATION ROOM. Marker’s use only Page 1 of 23

Final Exam A Semester 1 2023 Page 2 of 23 Multiple Choice Section In each question, choose at most one option. Your answers must be entered on the Multiple Choice Answer Sheet. 1. What is a complexity that is commonly associated with data linkage of human subjects? (a) Ensuring the privacy of participants (b) Data wrangling (c) Getting ethics approval (d) All of the other answers 2. Which of the following scenarios would most likely be conducted as a randomised con- trolled trial? (a) An Australian clinical trial for a new drug (b) Interviews for all new workers at Woolworths (c) Feedback on a new teaching method (d) A study of Sydney’s air pollution over 5 years 3. What graphical summary could represent 1 qualitative variable and 1 quantitative vari- able? (a) Q-Q plot (b) Scatter plot (c) Clustered bar chart (d) Comparative boxplot 4. A company decreases all their food prices by 2%. By how much will the mean and standard deviation of food prices change, respectively? (a) 2% and 4% (b) 2% and 2% (c) 0% and 2% (d) 2% and 0%

Final Exam A Semester 1 2023 Page 3 of 23 5. Given univariate, quantitative data, which of the following is impossible? (a) Mean= - 1 (b) Median = - 1 (c) Standard deviation = - 1 (d) Lower threshold = - 1 6. Which R command works out this area under the curve for X ∼ N (1 , 2 2 )? (a) pnorm(2,1,2)-pnorm(0,1,2) (b) pnorm(2,1,2)-pnorm(-2,1,2) (c) pnorm(2,1,4)-pnorm(0,1,4) (d) pnorm(2)-pnorm(0) 7. Measurement error is defined as follows: Individual measurement = exact value + chance error + bias. How could we estimate the chance error? (a) Remove any outliers and calculate the RMS. (b) Find the systematic error (related to the bias). (c) Replicate the measurements under the same conditions, and calculate the standard deviation. (d) Find the exact value and bias, and subtract them from the individual measurements.

Your preview ends here