assignment12fda
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6400
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
10
Uploaded by vishalbunty01
assignment12fda
November 3, 2023
Question 1: Monty Hall Problem Simulation and Analysis**
Task1: Data Loading and Preprocessing:
[2]:
import
pandas
as
pd
import
numpy
as
np
np
.
random
.
seed(
999
)
def
simulate_monty_hall
(num_trials):
doors
=
[
'A'
,
'B'
,
'C'
]
results
=
[]
for
i
in
range
(num_trials):
car_location
=
np
.
random
.
choice(doors)
initial_choice
=
np
.
random
.
choice(doors)
remaining_doors
=
[door
for
door
in
doors
if
door
!=
initial_choice
and
␣
↪
door
!=
car_location]
monty_reveal
=
np
.
random
.
choice(remaining_doors)
# Simulating equal probability of sticking or switching
final_decision
=
np
.
random
.
choice([
'Stick'
,
'Switch'
])
if
final_decision
==
'Stick'
:
win
= 1
if
initial_choice
==
car_location
else
0
else
:
switch_to
=
[door
for
door
in
doors
if
door
!=
initial_choice
and
␣
↪
door
!=
monty_reveal][
0
]
win
= 1
if
switch_to
==
car_location
else
0
results
.
append([i
+1
, initial_choice, monty_reveal, car_location,
␣
↪
final_decision, win])
return
pd
.
DataFrame(results, columns
=
[
'trial'
,
'initial_choice'
,
␣
↪
'monty_reveal'
,
'actual_car_location'
,
'final_decision'
,
'win'
])
df
=
simulate_monty_hall(
1000
)
df
.
to_csv(
'monty_hall_trials.csv'
, index
=
False
)
1
[44]:
import
matplotlib.pyplot
as
plt
df
=
pd
.
read_csv(
'monty_hall_trials.csv'
)
missing_data
=
df
.
isnull()
.
sum()
inconsistent_data
=
(df[
'actual_car_location'
]
==
df[
'monty_reveal'
])
.
sum()
summary
=
df
.
describe()
print
(missing_data)
print
(
"
\n
Inconsistent"
, inconsistent_data)
print
(
"
\n
Summary:"
)
print
(summary)
trial
0
initial_choice
0
monty_reveal
0
actual_car_location
0
final_decision
0
win
0
dtype: int64
Inconsistent 0
Summary:
trial
win
count
1000.000000
1000.000000
mean
500.500000
0.515000
std
288.819436
0.500025
min
1.000000
0.000000
25%
250.750000
0.000000
50%
500.500000
1.000000
75%
750.250000
1.000000
max
1000.000000
1.000000
Task2: Simulation Analysis
[46]:
tot
=
len
(df)
stick_w
=
df[df[
'final_decision'
]
==
'Stick'
][
'win'
]
.
sum()
switch_w
=
df[df[
'final_decision'
]
==
'Switch'
][
'win'
]
.
sum()
probability_stick
=
stick_w
/
tot
probability_switch
=
switch_w
/
tot
print
(
"Probability of Winning with Stick:"
, probability_stick)
2
print
(
"Probability of Winning
with Switch:"
, probability_switch)
Probability of Winning with Stick: 0.16
Probability of Winning
with Switch: 0.355
Task3: Visualization
[48]:
str
=
[
'Stick'
,
'Switch'
]
win_p
=
[probability_stick, probability_switch]
plt
.
bar(
str
, win_p, color
=
[
'red'
,
'skyblue'
])
plt
.
xlabel(
'Strategy'
)
plt
.
ylabel(
'Winning Probability'
)
plt
.
show()
Task4:Interpretation
Probability of winning by sticking : 1/3
Probability of winning by switching doors: 2/3
Question 2: Poisson Process Analysis of Website Hits
Task1: Data Loading and Preprocessing
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[8]:
import
pandas
as
pd
import
numpy
as
np
np
.
random
.
seed(
12345
)
hits_per_hour
=
np
.
random
.
poisson(lam
=6
, size
=24
)
time_intervals
=
[
f"
{
i
}
-
{
i
+1
}
"
for
i
in
range
(
24
)]
df
=
pd
.
DataFrame({
'time_interval'
: time_intervals,
'hits'
: hits_per_hour
})
df
.
to_csv(
'website_hits.csv'
, index
=
False
)
[50]:
import
pandas
as
pd
df_hits
=
pd
.
read_csv(
'website_hits.csv'
)
missing_data
=
df_hits
.
isnull()
.
sum()
summary
=
df_hits
.
describe()
print
(missing_data)
print
(
"Summary:"
)
print
(summary)
time_interval
0
hits
0
dtype: int64
Summary:
hits
count
24.000000
mean
6.500000
std
2.484736
min
2.000000
25%
5.000000
50%
6.000000
75%
8.000000
max
12.000000
Task2: Poisson Distribution Fitting
[52]:
mean_hr
=
df_hits[
'hits'
]
.
mean()
exp_hits
=
[np
.
random
.
poisson(lam
=
mean_hr)
for
_
in
range
(
24
)]
4
print
(
"Mean HR:"
, mean_hr)
print
(
"Exp Hits:"
, exp_hits)
Mean HR: 6.5
Exp Hits: [8, 6, 9, 7, 3, 6, 6, 5, 6, 11, 5, 3, 5, 8, 2, 7, 5, 13, 6, 5, 5, 7,
3, 7]
Task 3: Visualization
[53]:
ti
=
df_hits[
'time_interval'
]
obs_hits
=
df_hits[
'hits'
]
fig, ax
=
plt
.
subplots(figsize
=
(
12
,
6
))
plt
.
bar(ti, obs_hits, label
=
'Observed Hits'
, color
=
'red'
, alpha
=0.7
)
plt
.
bar(ti, exp_hits, label
=
'Expected Hits'
, color
=
'skyblue'
, alpha
=0.5
)
plt
.
xlabel(
'Time Interval'
)
plt
.
ylabel(
'Hits'
)
plt
.
title(
'Obs vs Exp Hits'
)
plt
.
legend()
plt
.
xticks(rotation
=45
)
plt
.
show()
Task4: 4. Hypothesis Testing:
[54]:
from
scipy.stats
import
chi2_contingency
5
contingency_table
=
pd
.
crosstab(df_hits[
'time_interval'
],
␣
↪
columns
=
[df_hits[
'hits'
], df_hits[
'time_interval'
]])
chi2, p_value, _, expected
=
chi2_contingency(contingency_table)
alpha
= 0.05
if
p_value
<=
alpha:
result
=
{
"decision"
:
"Reject the null hypothesis"
,
"chi2"
: chi2,
"p_value"
: p_value,
"interpretation"
:
"The observed hits significantly differ from a
␣
↪
Poisson distribution with the calculated mean rate."
}
else
:
result
=
{
"decision"
:
"Do not reject the null hypothesis"
,
"chi2"
: chi2,
"p_value"
: p_value,
"interpretation"
:
"The observed hits do not significantly differ from a
␣
↪
Poisson distribution with the calculated mean rate."
}
print
(result)
{'decision': 'Do not reject the null hypothesis', 'chi2': 552.0000000000002,
'p_value': 0.23652183054456624, 'interpretation': 'The observed hits do not
significantly differ from a Poisson distribution with the calculated mean
rate.'}
Question 3: Bayesian Analysis of Product Review Sentiments
Task 1: Data Loading and Preprocessing:
[27]:
import
pandas
as
pd
import
numpy
as
np
np
.
random
.
seed(
56789
)
# Generating review sentiments
sentiments
=
[
"Positive"
,
"Neutral"
,
"Negative"
]
probabilities
=
[
0.55
,
0.25
,
0.2
]
reviews_count
= 1000
generated_sentiments
=
np
.
random
.
choice(sentiments, size
=
reviews_count,
␣
↪
p
=
probabilities)
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
review_texts
=
[
"Loved it! Amazing product."
,
"It's okay. Does the job."
,
"Not what I expected. Disappointed."
,
"Works like a charm!"
,
"Mediocre experience."
,
"Wouldn't recommend to anyone."
]
df
=
pd
.
DataFrame({
'review_id'
:
range
(
1
, reviews_count
+ 1
),
'text'
: np
.
random
.
choice(review_texts, reviews_count),
'sentiment'
: generated_sentiments
})
df
.
to_csv(
'product_reviews.csv'
, index
=
False
)
[56]:
import
pandas
as
pd
df_reviews
=
pd
.
read_csv(
'product_reviews.csv'
)
missing_data
=
df_reviews
.
isnull()
.
sum()
print
(missing_data)
sentiment_counts
=
df_reviews[
'sentiment'
]
.
value_counts(normalize
=
True
)
print
(missing_data)
print
(
"Sentiment Distribution"
)
print
(sentiment_counts)
review_id
0
text
0
sentiment
0
dtype: int64
review_id
0
text
0
sentiment
0
dtype: int64
Sentiment Distribution
Positive
0.525
Neutral
0.272
Negative
0.203
Name: sentiment, dtype: float64
Task 2: Bayesian Updating:
7
[58]:
pr_pos
= 0.5
pr_neu
= 0.3
pr_neg
= 0.2
tot_len
=
len
(df_reviews)
like_pos
=
(df_reviews[
'sentiment'
]
==
'Positive'
)
.
mean()
like_neu
=
(df_reviews[
'sentiment'
]
==
'Neutral'
)
.
mean()
like_neg
=
(df_reviews[
'sentiment'
]
==
'Negative'
)
.
mean()
obs_pos_prob
=
len
(df_reviews[df_reviews[
'sentiment'
]
==
"Positive"
])
/
tot_len
obs_neu_prob
=
len
(df_reviews[df_reviews[
'sentiment'
]
==
"Neutral"
])
/
tot_len
obs_neg_prob
=
len
(df_reviews[df_reviews[
'sentiment'
]
==
"Negative"
])
/
tot_len
unnor_post_pos
=
pr_pos
*
like_pos
unnor_post_neu
=
pr_neu
*
like_neu
unnor_post_neg
=
pr_neg
*
like_neg
nor_fac
=
unnor_post_pos
+
unnor_post_neu
+
unnor_post_neg
post_pos
=
unnor_post_pos
/
nor_fac
post_neu
=
unnor_post_neu
/
nor_fac
post_neg
=
unnor_post_neg
/
nor_fac
print
(obs_pos_prob)
print
(obs_neu_prob)
print
(obs_neg_prob)
print
(
"
\n
Posterior Probabilities:"
)
print
(
"Positive:"
, post_pos)
print
(
"Neutral:"
, post_neu)
print
(
"Negative:"
, post_neg)
0.525
0.272
0.203
Posterior Probabilities:
Positive: 0.6823498830257343
Neutral: 0.2121133350662854
Negative: 0.10553678190798024
Task 3: Visualization
[59]:
prior_beliefs
=
[pr_pos, pr_neu, pr_neg]
observed_frequencies
=
[like_pos, like_neu, like_neg]
posterior_probabilities
=
[post_pos, post_neu, post_neg]
8
sentiments
=
[
'Positive'
,
'Neutral'
,
'Negative'
]
x
=
np
.
arange(
len
(sentiments))
width
=0.2
plt
.
bar(x
-
width, prior_beliefs, width
=0.2
, label
=
'Prior Beliefs'
)
plt
.
bar(x, observed_frequencies, width
=0.2
, label
=
'Observed Frequencies'
)
plt
.
bar(x
+
width, posterior_probabilities, width
=0.2
, label
=
'Posterior
␣
↪
Probabilities'
)
plt
.
xlabel(
'Sentiment Category'
)
plt
.
ylabel(
'Probability'
)
plt
.
title(
'Sentiment Analysis'
)
plt
.
xticks(x,sentiments)
plt
.
legend()
plt
.
tight_layout()
plt
.
show()
Task 4: Interpretation
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
The shift from prior to posterior beliefs indicates how the observed reviews have influenced the
company’s perception of customer sentiment. Based on the updated beliefs, the perceived quality
of the product may be different from the initial assumptions.
If the posterior probability of Positive sentiment increased significantly, it suggests that the product
is well-received by customers.
10