assignment12fda

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

10

Uploaded by vishalbunty01

Report
assignment12fda November 3, 2023 Question 1: Monty Hall Problem Simulation and Analysis** Task1: Data Loading and Preprocessing: [2]: import pandas as pd import numpy as np np . random . seed( 999 ) def simulate_monty_hall (num_trials): doors = [ 'A' , 'B' , 'C' ] results = [] for i in range (num_trials): car_location = np . random . choice(doors) initial_choice = np . random . choice(doors) remaining_doors = [door for door in doors if door != initial_choice and door != car_location] monty_reveal = np . random . choice(remaining_doors) # Simulating equal probability of sticking or switching final_decision = np . random . choice([ 'Stick' , 'Switch' ]) if final_decision == 'Stick' : win = 1 if initial_choice == car_location else 0 else : switch_to = [door for door in doors if door != initial_choice and door != monty_reveal][ 0 ] win = 1 if switch_to == car_location else 0 results . append([i +1 , initial_choice, monty_reveal, car_location, final_decision, win]) return pd . DataFrame(results, columns = [ 'trial' , 'initial_choice' , 'monty_reveal' , 'actual_car_location' , 'final_decision' , 'win' ]) df = simulate_monty_hall( 1000 ) df . to_csv( 'monty_hall_trials.csv' , index = False ) 1
[44]: import matplotlib.pyplot as plt df = pd . read_csv( 'monty_hall_trials.csv' ) missing_data = df . isnull() . sum() inconsistent_data = (df[ 'actual_car_location' ] == df[ 'monty_reveal' ]) . sum() summary = df . describe() print (missing_data) print ( " \n Inconsistent" , inconsistent_data) print ( " \n Summary:" ) print (summary) trial 0 initial_choice 0 monty_reveal 0 actual_car_location 0 final_decision 0 win 0 dtype: int64 Inconsistent 0 Summary: trial win count 1000.000000 1000.000000 mean 500.500000 0.515000 std 288.819436 0.500025 min 1.000000 0.000000 25% 250.750000 0.000000 50% 500.500000 1.000000 75% 750.250000 1.000000 max 1000.000000 1.000000 Task2: Simulation Analysis [46]: tot = len (df) stick_w = df[df[ 'final_decision' ] == 'Stick' ][ 'win' ] . sum() switch_w = df[df[ 'final_decision' ] == 'Switch' ][ 'win' ] . sum() probability_stick = stick_w / tot probability_switch = switch_w / tot print ( "Probability of Winning with Stick:" , probability_stick) 2
print ( "Probability of Winning with Switch:" , probability_switch) Probability of Winning with Stick: 0.16 Probability of Winning with Switch: 0.355 Task3: Visualization [48]: str = [ 'Stick' , 'Switch' ] win_p = [probability_stick, probability_switch] plt . bar( str , win_p, color = [ 'red' , 'skyblue' ]) plt . xlabel( 'Strategy' ) plt . ylabel( 'Winning Probability' ) plt . show() Task4:Interpretation Probability of winning by sticking : 1/3 Probability of winning by switching doors: 2/3 Question 2: Poisson Process Analysis of Website Hits Task1: Data Loading and Preprocessing 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[8]: import pandas as pd import numpy as np np . random . seed( 12345 ) hits_per_hour = np . random . poisson(lam =6 , size =24 ) time_intervals = [ f" { i } - { i +1 } " for i in range ( 24 )] df = pd . DataFrame({ 'time_interval' : time_intervals, 'hits' : hits_per_hour }) df . to_csv( 'website_hits.csv' , index = False ) [50]: import pandas as pd df_hits = pd . read_csv( 'website_hits.csv' ) missing_data = df_hits . isnull() . sum() summary = df_hits . describe() print (missing_data) print ( "Summary:" ) print (summary) time_interval 0 hits 0 dtype: int64 Summary: hits count 24.000000 mean 6.500000 std 2.484736 min 2.000000 25% 5.000000 50% 6.000000 75% 8.000000 max 12.000000 Task2: Poisson Distribution Fitting [52]: mean_hr = df_hits[ 'hits' ] . mean() exp_hits = [np . random . poisson(lam = mean_hr) for _ in range ( 24 )] 4
print ( "Mean HR:" , mean_hr) print ( "Exp Hits:" , exp_hits) Mean HR: 6.5 Exp Hits: [8, 6, 9, 7, 3, 6, 6, 5, 6, 11, 5, 3, 5, 8, 2, 7, 5, 13, 6, 5, 5, 7, 3, 7] Task 3: Visualization [53]: ti = df_hits[ 'time_interval' ] obs_hits = df_hits[ 'hits' ] fig, ax = plt . subplots(figsize = ( 12 , 6 )) plt . bar(ti, obs_hits, label = 'Observed Hits' , color = 'red' , alpha =0.7 ) plt . bar(ti, exp_hits, label = 'Expected Hits' , color = 'skyblue' , alpha =0.5 ) plt . xlabel( 'Time Interval' ) plt . ylabel( 'Hits' ) plt . title( 'Obs vs Exp Hits' ) plt . legend() plt . xticks(rotation =45 ) plt . show() Task4: 4. Hypothesis Testing: [54]: from scipy.stats import chi2_contingency 5
contingency_table = pd . crosstab(df_hits[ 'time_interval' ], columns = [df_hits[ 'hits' ], df_hits[ 'time_interval' ]]) chi2, p_value, _, expected = chi2_contingency(contingency_table) alpha = 0.05 if p_value <= alpha: result = { "decision" : "Reject the null hypothesis" , "chi2" : chi2, "p_value" : p_value, "interpretation" : "The observed hits significantly differ from a Poisson distribution with the calculated mean rate." } else : result = { "decision" : "Do not reject the null hypothesis" , "chi2" : chi2, "p_value" : p_value, "interpretation" : "The observed hits do not significantly differ from a Poisson distribution with the calculated mean rate." } print (result) {'decision': 'Do not reject the null hypothesis', 'chi2': 552.0000000000002, 'p_value': 0.23652183054456624, 'interpretation': 'The observed hits do not significantly differ from a Poisson distribution with the calculated mean rate.'} Question 3: Bayesian Analysis of Product Review Sentiments Task 1: Data Loading and Preprocessing: [27]: import pandas as pd import numpy as np np . random . seed( 56789 ) # Generating review sentiments sentiments = [ "Positive" , "Neutral" , "Negative" ] probabilities = [ 0.55 , 0.25 , 0.2 ] reviews_count = 1000 generated_sentiments = np . random . choice(sentiments, size = reviews_count, p = probabilities) 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
review_texts = [ "Loved it! Amazing product." , "It's okay. Does the job." , "Not what I expected. Disappointed." , "Works like a charm!" , "Mediocre experience." , "Wouldn't recommend to anyone." ] df = pd . DataFrame({ 'review_id' : range ( 1 , reviews_count + 1 ), 'text' : np . random . choice(review_texts, reviews_count), 'sentiment' : generated_sentiments }) df . to_csv( 'product_reviews.csv' , index = False ) [56]: import pandas as pd df_reviews = pd . read_csv( 'product_reviews.csv' ) missing_data = df_reviews . isnull() . sum() print (missing_data) sentiment_counts = df_reviews[ 'sentiment' ] . value_counts(normalize = True ) print (missing_data) print ( "Sentiment Distribution" ) print (sentiment_counts) review_id 0 text 0 sentiment 0 dtype: int64 review_id 0 text 0 sentiment 0 dtype: int64 Sentiment Distribution Positive 0.525 Neutral 0.272 Negative 0.203 Name: sentiment, dtype: float64 Task 2: Bayesian Updating: 7
[58]: pr_pos = 0.5 pr_neu = 0.3 pr_neg = 0.2 tot_len = len (df_reviews) like_pos = (df_reviews[ 'sentiment' ] == 'Positive' ) . mean() like_neu = (df_reviews[ 'sentiment' ] == 'Neutral' ) . mean() like_neg = (df_reviews[ 'sentiment' ] == 'Negative' ) . mean() obs_pos_prob = len (df_reviews[df_reviews[ 'sentiment' ] == "Positive" ]) / tot_len obs_neu_prob = len (df_reviews[df_reviews[ 'sentiment' ] == "Neutral" ]) / tot_len obs_neg_prob = len (df_reviews[df_reviews[ 'sentiment' ] == "Negative" ]) / tot_len unnor_post_pos = pr_pos * like_pos unnor_post_neu = pr_neu * like_neu unnor_post_neg = pr_neg * like_neg nor_fac = unnor_post_pos + unnor_post_neu + unnor_post_neg post_pos = unnor_post_pos / nor_fac post_neu = unnor_post_neu / nor_fac post_neg = unnor_post_neg / nor_fac print (obs_pos_prob) print (obs_neu_prob) print (obs_neg_prob) print ( " \n Posterior Probabilities:" ) print ( "Positive:" , post_pos) print ( "Neutral:" , post_neu) print ( "Negative:" , post_neg) 0.525 0.272 0.203 Posterior Probabilities: Positive: 0.6823498830257343 Neutral: 0.2121133350662854 Negative: 0.10553678190798024 Task 3: Visualization [59]: prior_beliefs = [pr_pos, pr_neu, pr_neg] observed_frequencies = [like_pos, like_neu, like_neg] posterior_probabilities = [post_pos, post_neu, post_neg] 8
sentiments = [ 'Positive' , 'Neutral' , 'Negative' ] x = np . arange( len (sentiments)) width =0.2 plt . bar(x - width, prior_beliefs, width =0.2 , label = 'Prior Beliefs' ) plt . bar(x, observed_frequencies, width =0.2 , label = 'Observed Frequencies' ) plt . bar(x + width, posterior_probabilities, width =0.2 , label = 'Posterior Probabilities' ) plt . xlabel( 'Sentiment Category' ) plt . ylabel( 'Probability' ) plt . title( 'Sentiment Analysis' ) plt . xticks(x,sentiments) plt . legend() plt . tight_layout() plt . show() Task 4: Interpretation 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
The shift from prior to posterior beliefs indicates how the observed reviews have influenced the company’s perception of customer sentiment. Based on the updated beliefs, the perceived quality of the product may be different from the initial assumptions. If the posterior probability of Positive sentiment increased significantly, it suggests that the product is well-received by customers. 10