Entity Aacdemy Lesson 8 Other Common Distributions (AutoRecovered)

docx

School

Liberty University *

*We aren’t endorsed by this school

Course

BASIC

Subject

Statistics

Date

Jan 9, 2024

Type

docx

Pages

11

Uploaded by JusticeFogPrairieDog37

Report
Sindy Saintclair Monday, November 28 2021 Lesson 8 – Other Common Distributions Learning Objectives and Questions Notes and Answers Poisson distributio n When to Use Poisson? - To model things when looking at a given number of discrete events in a fixed interval. - For instance, the number of students that come into an office hour—where the given discrete number would be the number of students given (where there can only be a whole number of students) and a given time which would represent the fixed interval. - You can also look for the probability of events occurring in a fixed space, like how many shooting stars do you see in the night sky? - The Poisson distribution is simple because there is only one parameter. The mean of the distribution is the same as the variance (standard deviation squared). Assumptions of the Poisson Distribution 1. Events must be independent of each other; mutually exclusive 2. Number of events must be countable and a whole number; if not a whole number, calculate with the exponential distribution instead 3. The long-term rate of events occurring is constant 4. Two events can’t happen in the same instant Lambda is the mean for the Poisson Distribution.
Examples Using the Poisson Distribution - the number of bugs of a particular type in a square foot path of lawn - the number of pieces of mail received at your home in a typical day - the number of customers arriving between 10AM and 10:30AM to a store - the number of defects found in 30 feet of pipe How to understand this graph: For a Poisson distribution with a lambda of 2, if you want to determine the probability of a 0, you will look vertically right above the 0 and see where it hits the curve. In this case, the curve is directly above the 0 at about 0.14, meaning that the probability of getting a zero when something is distributed as a Poisson with a lambda of 2 is equal to 0.14. How about a 1? By observing the graph directly vertically up from the 1 on the horizontal sxis, and then extrapolating that point to the vertical axis, you can see the probability of a 1 is about 0.26. If you find the probability using the eyeball method and the graph, you can construct the following probabilities: 0 => P ~ 0.14 If you add all these probabilities up, they should 1 => p ~ 0.26 be pretty close to 1, because that forms a comp- 2 => p ~ 0.28 -lete set of possible outcomes. Based on your 3 => p ~0.16 eyeball analysis, the total of the probabilities is
4 => p ~ 0.09 0.99, which is not bad for a quick and dirty ana- 5 => p ~ 0.04 -lysis. 6 => p ~ 0.02 7 or higher => p ~ 0.00 Bathtub curve used in the manufacturing industry; used to explain the rate of failures and non-working items in manufacturing The bathtub curve is the blue curve in the picture above. You can see two dashed lines – one in the red and one in yellow – that added together make up the bathtub curve. Infant Mortality Rate Failure The red dashed line represents the “infant mortality” failure rate of an electronic device. Depending on the complexity of the electronic device, all that is needed is a couple of quality assurance (QA) inspectors to remove the failures before they go on sale. On the other hand, electronics are hard to evaluate by visual means. A manufacturing fail and a manufacturing success can look exactly the same. No manufacturer wants to send a bunch of manufacturing fails to a retailer for sale. It is in everybody’s best interest to separate out the manufacturing fails before sending inventory to the retailer. To prevent this as much as possible, the manufacturer will “burn-in”, or place hundreds of testers on a test floor, each holding as many as 250K die to be burned in. What they are looking for is to use the semiconductors enough in simulated workload and cull out all of the infant fails. If you look on the red dashed curve, that would be the point at which the “steepness” of the failure rate flattens out.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Wear Out Failure On the other side of the distribution, pretty much everything wears out eventually. The ‘wear out failures’ as indicated by the dashed yellow line are relatively flat to begin with. Most stuff doesn’t wear out immediately after manufacture. But as time goes on, the wear out failure rate will begin to increase, and the curve gets steeper. Even in industries where manufactured goods are not terribly complex, a good manufacturer wants to know when the wear curve starts to steepen because they don’t want their warranties or guarantees to extend into the steep part of the wear out curve—it costs them too much. Both of these curves together creates the bathtub curve. Observed Failure Rate
The job of the data scientist in this scenario could be complex modelling with great detail for a highly sophisticated piece of electronic equipment with lots of electronic parts, or it could be a pretty simple “connect the dots” type of curve if the complexity of the manufactured goods is low. There is also a concept called “planned obsolescence>” This means that a manufacturer doesn’t want your product to last forever. They want it to wear out, or become outdated, so that you will have to replace it. The manufacturer will sometimes shorten the width of a bathtub curve by using plastic parts instead of metal parts at a critical point in a piece of equipment. Exponenti al distributio n Exponential Distribution Example - time spent shoe shopping - similar to the Poisson distribution - can be a continuous number An Example A local fast-food chain has a drive thru window that is open 24 hours day. During the hours of 10:30 am and 11:00 pm, there is more likely than not a car sitting in the drive thru lane wanting some food. But in the dead of night, there usually aren’t that many cars that come by. Suppose you are studying the hours of 2:00 am to 4:00 am. Historical data tells you that between those hours, you will typically have 17.8 customers in the drive thru window. With this information, you could create a Poisson distribution with mean (lambda) = 17.8.
At this point, it is probably a good idea to keep an eye on units. The mean of 17.8 represents 18.7 cars per 2 hour period, or 17.8 cars per period. Shape of the Exponential Distribution Please note that the larger the lambda is, the steeper the curve is on the left side. The three curves shown here are for lambda of 0.5, 1, and 1.5. Imagine what the curve would look like for lambda = 17.8, as in the above scenario. It would be so steep on the left side, that it would look almost vertical, and then approach the horizontal axis very quickly. What you did with the last calculation is similar to what you
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
were doing earlier with the web applets for finding the probabilities based on “area under the curve” and between various values of z or t. In this case, you use the above equation to calculate the area under the curve, between the values of x=0 and whatever chosen value for x you have. Chi- square distributio n Types of Chi-Square Tests - goodness of fit – compares one piece of categorical data to the population - independent – compares categorical data to each other Chi- Square Distribution – shapes change drastically according to degrees of freedom and the value of the chi-square statistic. Data that is suitable for chi-square is categorical data with more than one level. Not really suitable for modeling but mostly used for hypothesis testing. Suppose you are polling for a general election. You have three candidates: Abernathy, Boykins and Calendario. As you are polling, you ask two questions. First, who is your preferred candidate, and second, what age group are you in. After you have collected a bunch of data, you construct a table that looks like this: 18-29 30-49 50 and older Abernathy 47 25 18 90 Boykins 22 68 19 109 Calendari o 15 54 4 73 84/J 147 41 272 When examining this table, you will realize that Abernathy is much more popular among the younger voters than Boykins, whereas Boykins appeals to the 30–49-year-old group. However, if the differences were a bit more subtle, how would you decide if there was a candidate preference by voter age? In order to tackle a problem like this, you need to assume that there is no difference between age groups regarding candidate preference. With this assumption, you then calculate the ‘expected’ counts in each cell. For example, 90 out of 272 voters overall like Abernathy, or
about 33%. If each age group liked Abernathy at 33% also, then for the 18–29-year old’s, you would expect there to be about 0.33 x 84 = 27.8 voters for Abernathy. Here is what the table of ‘expected’ values would be: 18-29 30-49 50 and older Abernathy 27.8 48.6 13.6 90 Boykins 33.7 58.9 16.4 109 Calendari o 22.5 39.5 11.0 73 84 147 41 272 If you notice, all of the column totals and row totals are still the same, it is just that the expected counts in each cell have been re-balanced. Without talking about a lot of theory, it turns out that if you do a bunch of math on the ‘observed’ cell counts and ‘expected’ cell counts in each cell, the total will be distributed as a Chi- Square variable. In fact, it is called the Chi-Square statistic. The formula for the calculation will not be presented here. There are tons of web applications that will calculate it for you, so there is no need to learn how to do it by hand. In this case, the Chi-Square statistic value is 44.4. Much like has been done with the z-score and the t-score, you can use a we based calculator to determine the probability that a Chi- Square statistic would be as large or larger than 44.4, and it turns out the probability is about 0.000001. In other words, it should essentially never happen.
Donner Pass Example – in Utah is one of the most interesting and more tragic pieces of US History. A group of pioneers were traveling when they got trapped in an early blizzard in a mountain pass. The whole party was stuck for quite some time, and many did not survive. This is a contingency table, which shows the frequencies of one categorical variable by the frequencies of another categorical variable. 0 = not surviving and 1 = survived Observed vs Expected Data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The idea of a Chi-square is to test against expected counts – if all the cells had an equal likelihood of happening. Calculating Independent Chi-Squares What does independence mean? - variables do not influence each other - observed: real data - expected: hypothesized data Assumptions of the Independent Chi-Square Two categorical variables with 2 or more levels Data in the contingency table (usually) Simple random sampling from a population Expected cell count => 5 Create Hypotheses Null - variables are independent - gender and survival rate are not related Alternative - variables are not independent - gender and survival rate are connected
Question to Answer – Does survival rate depend on the gender of the pioneer in Donner Pass? 1. enter Donner pass data into chi-square blue box 2. the X2 calculator will provide the formula and the answers in the box below 3. in the top right box labeled chi-square test tells the df (# of rows -1 x # of columns -1) and the p value, and the test statistic which is chi-square. When the p value is less than 0.05, it is significant which means you reject the hypothesis and state that yes, the survival rate depends on the gender of pioneer in Donner Pass. The Test for Independence To conduct a test for independence of categorical variables, the x2 (Chi-Square) Test of Independence will be used. The fancy ‘X’ looking character (x) is the Greek letter Chi. It is pronounced ‘ki’.