Shah_Random Sampling f21

docx

School

Rowan University *

*We aren’t endorsed by this school

Course

02280

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

12

Uploaded by Ishashah221022

Report
Biometry, Random Sampling 1 Biometry Random Sampling Text: Sections 1.3, 2.7 Objectives o How to take a random sample from a population o Discussion and questions on why random samples are necessary o Create and interpret scatterplots, time plots o Linear transformations on data sets: when, why, and how to do them Terminology simple random sample ( SRS ) of size n = a sample of n items in which (a) every member of the population has the same chance of being included; and (b) the members of the sample are chosen independently of each other. sampling bias = a bias resulting from a faulty sampling method sampling frame =a list of individuals from which the sample was actually selected. population distribution of a variable is the distribution of its values for all members of the population. sampling variability (error) = arises because the observed value of a statistic depends on the particular sample selected, and typically varies from sample to sample. We cannot eliminate this variability but we can learn how it works. sampling distribution (of a statistic) = the theoretical probability distribution of ALL of the possible values of the sample statistic. Note that changing the sample size (n) has an important effect on this. parameter = a number describing the population, or a population characteristic statistic = a number describing the sample, or a sample characteristic Parameter Statistic Mean μ ¯ y Standard deviation σ s Proportion p ^ p
Biometry, Random Sampling 2 Parameters vs. statistics: in the science of statistics we need to use sample statistics (like the sample mean number of squares in a clutch, ¯ y ) to estimate the population parameter (like the population mean number of squares in a clutch, μ ), because often a census is impossible or too expensive . PART 1: Random Sampling and Experimental Design Random rectangles For this exercise, we are trying to estimate the mean number of squares in a rectangle. For example, each of the rectangles below has 4 squares, so the mean number of squares in each rectangle is 4. Use this table to record your estimates of the mean number of squares: Number squares in rectangle Mean # of squares ( ¯ y ) First estimate (guess) Second estimate (judgment) 8 6 3 5 18 8 Third estimate (random # table) 1 4 4 14 16 7.8 Fourth estimate (calculator or JMP) 12 1 1 5 2 4.2 1. First estimate (Guess): When the instructor says “Go”, you will look at “random rectangle” document for 5 seconds and from this come up with an estimate (guess) of the average number of squares. When the instructor says “Stop” write your estimate in the table below in the row labeled “First estimate”. 2. Second estimate (Judgment) Look at the “random rectangle” document and select five (5) different rectangles that, in your judgment, are representative of the population of rectangles. Write down the # of squares for those five rectangles in the table above, and calculate the mean number of squares ( ¯ y ).
Biometry, Random Sampling 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Biometry, Random Sampling 4 3. Third estimate (random sample using the random number table above). We will use a random number table to pick a simple random sample ( SRS , see the “New Terminology at the beginning of the lab) of 5 different random numbers between 01 and 100. Skip over any repeats , so that you get 5 different numbers. Your instructor will demonstrate the use of the random numbers table. Write down 5 random numbers from the random numbers table on the lines below: ______ ______ ______ ______ ______ You will now use these random numbers to identify rectangles in the “random rectangle” document and count the squares in those rectangles. Write down the number of squares in each of the 5 clutches in the table above. Calculate the mean number of squares in the 5 clutches and write it into the table. DO NOT RECORD YOUR RANDOM NUMBERS FROM THE RANDOM NUMBER TABLE IN THE TABLE OF YOUR ESTIMATES OF THE MEAN!! 4. Fourth estim ate (random sample using JMP). Generate 5 random integers between 1 and 100 using JMP (instructions below). Skip over any repeats , so that you get 5 different numbers. Write down 5 random numbers from JMP on the lines below: ______ ______ ______ ______ ______ You will now use these random numbers to identify rectangles in the “random rectangle” document and count the squares in those rectangles. Write down the number of square in each of the 5 clutches in the table above. Calculate the mean number of squares in the 5 clutches and write it into the table. DO NOT RECORD YOUR RANDOM NUMBERS FROM JMP IN THE TABLE OF YOUR ESTIMATES OF THE MEAN!! 5. When you are done, record your four estimates on the spreadsheet at the front of the class.
Biometry, Random Sampling 5 How to generate random numbers in JMP 1. Create a new column in JMP. 2. Right-click on the top of the column. Select “Column Info…”, and give it a name. 3. Pull down the menu next to “Initial Data Values” and pick “Random”. 4. The number of rows you pick will be the number of random numbers generated. 5. Leave “Random Integer” selected, then pick the range of numbers you want generated. For this purpose, we want numbers between 1 and 99. Note: the random number generator may pick the same number twice, even in a small sample (e.g. 5 numbers). If you’re sampling without replacement, you may want to have it generate more than you need, and just use the first 5 that are not the same number. If you are sampling with replacement, then repeats are ok and you’ll use them. 6. Hit “OK”, and JMP will populate the column with your random numbers. Always start with the one on top and work down; don’t pick numbers, use them in order. Only skip a number if it’s a repeat of the one above (again, if you want numbers that are sampled without replacement). Questions: please answer these using the class data a) Compare the class’ first and second estimates (guess and judgment). Are these two estimates close to each other? Please answer YES or NO and explain briefly why they might be similar or different . b) Compare the class’ first and third estimates (guess and simple random sample using the random numbers table). Are these two estimates close to each other? Answer YES or NO and explain briefly why they might be similar or different . c) Compare the class’ second and third estimates (judgment and simple random sample using the random numbers table). Are these two estimates close to each other? Answer YES or NO and explain briefly why they might be similar or different . d) Compare the class’ third and fourth estimates (simple random samples, one using the random numbers table, the other using JMP to generate random numbers). Are these two estimates close to each other? Answer YES or NO and explain briefly why they might be similar or different . Make separate histograms for each of the four class estimates to answer the following questions:
Biometry, Random Sampling 6 e) Compare the histograms for each of the four estimates by looking at their means (point of balance). Do you see differences among them? What might explain any differences you see, if present? Explain briefly. f) Is there a sampling bias in the first two estimates, when everyone guessed or use their judgment? Explain briefly. (Reminder: the definition of sampling bias is at the beginning of the lab, under “New Terminology”).
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Biometry, Random Sampling 7 PART 2: SCATTERPLOTS, TIME PLOTS, LINEAR TRANSFORMATIONS Tips on graphing The independent variable goes on the x-axis; the dependent variable goes on the y-axis Double-check your graph once it’s done. Does it look like you expected it should, from the data? This final check will help you catch a lot of errors. Don’t just make a graph and consider it done: always ask if it looks right!!! Terminology Scatterplot: a plot of two related, quantitative variables on the Cartesian coordinate system (the x - y plane). If you think one of the variables explains the other, the former, or explanatory (independent) variable goes on the x - axis, and the latter, or dependent variable, goes on the y-axis. Data are from: Samuels et al., 2012, Problem 12.3.8 Time plot or time series plot: scatterplot where one variable is time. Plot the values of the other variable on the vertical axis ( y -axis) against time on the horizontal axis. Tips on interpretation of time plots: look for the overall pattern ( trend : a general upward or downward ‘movement’ over time) & ‘seasonal’ variation (a pattern that repeats every 12 months, 4 quarters, or fixed time period), and striking deviations from the overall pattern (peculiar bumps, valleys, etc., over a short period of time).
Biometry, Random Sampling 8 Data are from NOAA, at: ftp://ftp.cmdl.noaa.gov/ccg/co2/trends/co2_mm_mlo.txt , accessed 9.18.11 Lab Exercises Scatterplots 1 . Make a scatterplot in JMP using the Survey Data for height vs. footprint length. Before making the graph, think about which variable goes on each axis . Questions: g) Of the two variables (height, footprint length), which is the explanatory variable, if either, that should go on the x-axis? Explain your choice. h) Interpret the pattern in the scatterplot you just made: what does the graph tell you about the relationship between height and footprint length? i) Now focus on the variability in the relationship between height and footprint length. Is this a strong or weak relationship (i.e. little or quite a bit of scatter in the pattern)? Why is there any variability (why don’t the data fall along a straight line)?
Biometry, Random Sampling 9 Points to help with the interpretation of scatterplots Look for the overall pattern and striking deviations from that pattern (from Moore & McCabe’s Introduction to the Practice of Statistics , Freeman): overall pattern is made up of: direction, form, and strength direction – is there a positive association, a negative association, or none. form – is it linear or curved? strength – is the pattern strong, moderate, or weak? Look for how tight the overall pattern is. (Hint: If you cannot see a pattern, then what is the strength?) striking deviations from the overall pattern: usually we are talking about outliers, which are points outside the overall pattern. Time plots, or time series plots 2 . Construct a time plot (or time series plot) of the data provided below (these data are entered into a JMP spreadsheet posted on Canvas), which represent the number of bacterial colony-forming units (CFUs) per mL over time. Connect the points on this graph to make it easier to view the trend over time. You can do this by selecting the red arrow by “Bivariate Fit…” and then choosing “Flexible” and “Fit each Value”. Time (h) CFUs / mL Time (h) CFUs / mL 0 12000000 5.0 750000000 0.5 11300000 5.5 2000000000 1.0 10300000 6.0 1700000000 1.5 24800000 6.5 2000000000 2.0 60000000 7.5 3480000000 2.5 84000000 8.0 4000000000 3.0 166000000 12 4100000000 3.5 330000000 16 3700000000 4.0 2900000000 20 800000000 4.5 620000000 24.5 125000000 Questions : j) Interpret the time plot you just created. What do the data tell you about how bacterial colony-forming units (CFUs) change over time? Is this what you expected the graph to look like? k) Imagine we could do another experiment related to the one that generated these bacterial growth data. What new experiment could we do that would be likely to
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Biometry, Random Sampling 10 change the shape of the graph? Explain one variable you could change, how it would change the graph, and why you think it would change the graph in the way you predict (i.e. what’s your rationale?). 3 . Construct a time plot (or time series plot) of the data provided below (they are entered into a JMP spreadsheet posted on Canvas), which represent the power load used over time, where each time period represents a 3-month period . Connect the points on this graph to make it easier to see a pattern in the data over time. Time period Power load (MW) Time period Power load (MW) 1 68.8 26 116.8 2 65 27 144.2 3 88.4 28 123.3 4 69 29 142.3 5 83.6 30 124 6 69.7 31 146.1 7 90.2 32 135.5 8 72.5 33 147.1 9 106.8 34 119.3 10 89.2 35 138.2 11 110.7 36 127.6 12 91.7 37 143.4 13 108.6 38 134 14 98.9 39 159.6 15 120.1 40 135.1 16 102.1 41 149.5 17 113.1 42 123.3 18 94.2 43 154.4 19 120.5 44 139.4 20 107.4 45 151.6 21 116.2 46 133.7 22 104.4 47 154.5 23 131.7 48 135.1 24 117.9 25 130.6 Question: l) Interpret the time plot you just created. How does the power load change over time? Why do you think the time plot looks the way it does (i.e. why does it change over time with the pattern it does)? Remember that each time point represents a 3-month interval.
Biometry, Lab 3 – scatterplots, time plots, linear transformations, random sampling and design 11 Linear transformations Many transformations are linear and their effects on ¯ y and s are predictable. Let Y’ be the ‘new’ value and Y be the ‘old’ value. Linear transformations change an old variable into a new one by the equation: Y ' = mY + b Under a Linear Transformation: The effect on ¯ y is ‘natural’. That is, ¯ y changes like y : ¯ y ' = m ¯ y + b . It works for the median, Q 1 and Q 3 , too. The effect on s is just multiplicative. That is, s ' = ms . It works for the IQR, too. Because we are talking about measures of dispersion, or variability, the addition of the constant b gets ‘wiped out.’ 4 . Using the Survey Data, do the following calculations on the footprint lengths data, which was measured in cm: o Write out the equation in the form above for the linear transformation of footprint length in centimeters to footprint length in inches. ( Hint : think about what is “m” or “b” in the equation above, for your transformation) o Do this calculation for 2 footprint lengths. Show your work and show the final result of footprint lengths in inches. o Now, have JMP do this calculation for you for all the footprint length data in the Survey Data file. To do this, create a new column by scrolling in JMP to the right until there is an empty column. Double click in the top of the column and type in a column name (e.g. “Footprint length (in.))”. Hit Enter. Then right click on the new column title and select “Formula”. In the window that appears, re-create the equation you wrote above for the linear transformation from footprint length in cm. to inches. Click “Apply” or “Ok”. The new column should populate with the values of footprint length in inches. o To calculate the mean and standard deviation of footprint length in inches, create a histogram of the data in this new column. The descriptive statistics you need will be displayed. Question: m) When would doing a linear transformation be useful? Give an example other than the calculation you just did on footprint lengths, and then explain why this other transformation would be useful.
Biometry, Lab 3 – scatterplots, time plots, linear transformations, random sampling and design 12 Submission Details Submit this complete assignment on Canvas before the due date, which is the beginning of next week’s lab. Your assignment should include all of the graphs/charts you were asked to make, with figure legends, and correctly formatted as figures. You will also need to turn in your answers to questions a-m. Do not turn in any tables or lists of raw data. This assignment should be completed with your lab partner. If you are at a table with only three students, you may complete the assignment as a group of three. Only one assignment should be turned in per group. Participation statement removed and updated for canvas F21 NAR Participation statement s21 by nar and tjo Reformatted and revised by nar F19 V6 based on cer: tjo revised 2.4.13; dcw & chd statistics revised 12.30.13; NR proofed and assignment change 9/19/2014.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help