IE6400_Day15

html

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Feb 20, 2024

Type

html

Pages

Uploaded by ColonelStraw13148

IE6400 Foundations for Data Analytics Engineering ¶ Fall 2023 ¶ Module 2: Probability Distribution ¶ A probability distribution is a fundamental concept in statistics and probability theory that describes how the probabilities of different outcomes or events are distributed within a random experiment or random variable. It provides a mathematical framework for understanding uncertainty and randomness in various fields such as science, engineering, economics, and more. Types of Probability Distributions ¶ • There are two main types of probability distributions: • Discrete Probability Distribution : This type of distribution deals with random variables that can only take on a finite or countable number of distinct values. Examples include the binomial distribution, Poisson distribution, and geometric distribution. • Continuous Probability Distribution : Continuous distributions apply to random variables that can take on an infinite number of values within a certain range. Examples include the normal distribution, exponential distribution, and uniform distribution. Common Probability Distributions ¶ • Normal Distribution : Also known as the Gaussian distribution, it is widely used to model continuous data and is characterized by its bell-shaped curve. • Binomial Distribution : Used for modeling the number of successes in a fixed number of independent Bernoulli trials. • Poisson Distribution : Used to model the number of events occurring in a fixed interval of time or space when events are rare and random. • Exponential Distribution : Models the time between events in a Poisson process. • Uniform Distribution : Assigns equal probability to all values within a specified range. Applications ¶ • Probability distributions are used in a wide range of fields, including statistics, finance, engineering, science, and machine learning, to model and analyze uncertainty and randomness in data. Understanding probability distributions is crucial for making informed decisions, conducting statistical analysis, and solving various real-world problems that involve randomness and uncertainty. Different types of probability distributions are chosen based on the characteristics of the data and the specific problem at hand. Excercise 1 Probability Distribution of the Sum of Two Fair Six-Sided Dice Rolls ¶ In this example, we will calculate and visualize the probability distribution for the sum of two fair six-sided dice rolls. The possible outcomes range from 2 (the minimum sum) to 12 (the maximum sum). Step 1: Define the Sample Space ¶ The sample space consists of all possible outcomes when rolling two fair six-sided dice. Each die can land on any number from 1 to 6. So, there are 6 possible outcomes for each die, and the total number of outcomes is 6 * 6 = 36. Step 2: Calculate the Probability for Each Outcome ¶ To calculate the probability distribution, we need to determine the probability of each possible sum from 2 to 12.

• There is only one way to get a sum of 2 (rolling two ones), so the probability is 1/36. • There are two ways to get a sum of 3 (rolling a 1 and a 2 or a 2 and a 1), so the probability is 2/36 = 1/18. • Continue this process for all possible sums up to 12. Let's use Python to calculate these probabilities. In [1]: import numpy as np # Define the sample space sample_space = np.arange(2, 13) # Initialize a dictionary to store probabilities probabilities = {} # Calculate probabilities for each sum for sum_value in sample_space: count = np.sum(sample_space == sum_value) probability = count / 36.0 probabilities[sum_value] = probability probabilities Out[1]: {2: 0.027777777777777776, 3: 0.027777777777777776, 4: 0.027777777777777776, 5: 0.027777777777777776, 6: 0.027777777777777776, 7: 0.027777777777777776, 8: 0.027777777777777776, 9: 0.027777777777777776, 10: 0.027777777777777776, 11: 0.027777777777777776, 12: 0.027777777777777776} The calculated probabilities will give us the probability distribution for the sum of two dice rolls. Step 3: Visualize the Probability Distribution ¶ Now that we have calculated the probabilities for each possible sum, let's visualize the probability distribution using a bar chart. In [2]: import matplotlib.pyplot as plt # Extract sums and corresponding probabilities sums = list(probabilities.keys()) probs = list(probabilities.values()) # Create a bar chart plt.bar(sums, probs, tick_label=sums, color='green') plt.xlabel('Sum of Two Dice Rolls') plt.ylabel('Probability') plt.title('Probability Distribution of the Sum of Two Dice Rolls') plt.show()

This bar chart will show the probability of each sum, ranging from 2 to 12. Interpretation ¶ The probability distribution and the bar chart show the following: • The most likely sum is 7, as there are more ways to obtain a sum of 7 than any other sum. • The probabilities decrease as we move away from 7, forming a symmetric distribution. • The least likely sums are 2 and 12, each with a probability of 0.0277, as there is only one way to achieve them. This analysis provides insights into the likelihood of different outcomes when rolling two dice, which is useful in various games and probabilistic scenarios. Discrete Probability Distributions ¶ Binomial Distribution ¶ Excercise 2 Generating and Analyzing a Binomial Distribution with SciPy ¶ Objective: In this exercise, you will use the SciPy library in Python to generate and analyze a binomial distribution. The binomial distribution is commonly used to model the number of successes in a fixed number of independent Bernoulli trials. Instructions: ¶ 1. Import the necessary libraries In [3]: import numpy as np from scipy.stats import binom import matplotlib.pyplot as plt 1. Define the parameters of the binomial distribution: • n (number of trials): Choose a value such as 10, representing the number of trials or experiments. • p (probability of success): Choose a value between 0 and 1, representing the probability of success in each trial. In [4]: n = 10 # Number of trials p = 0.3 # Probability of success

Your preview ends here