assignment7fda
pdf
keyboard_arrow_up
School
Northeastern University *
*We aren’t endorsed by this school
Course
6400
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
12
Uploaded by vishalbunty01
assignment7fda
October 15, 2023
Question 1: Load the Dataset
[2]:
import
numpy
as
np
import
pandas
as
pd
import
matplotlib.pyplot
as
plt
df
=
pd
.
read_csv(
'all_month.csv'
)
df
.
head()
[2]:
time
latitude
longitude
depth
mag magType
nst
\
0
2023-10-14T00:40:16.200Z
19.340666 -155.118332
-0.64
1.91
ml
39.0
1
2023-10-14T00:29:02.360Z
37.570000 -119.555336
14.76
2.18
md
12.0
2
2023-10-14T00:13:31.008Z
64.205700 -150.031900
15.60
1.50
ml
NaN
3
2023-10-13T23:56:32.410Z
17.989333
-66.946667
13.40
2.24
md
8.0
4
2023-10-13T23:53:12.670Z
17.966333
-66.943000
12.78
3.08
md
28.0
gap
dmin
rms
…
updated
\
0
132.0
NaN
0.25
…
2023-10-14T00:45:48.260Z
1
131.0
0.21210
0.04
…
2023-10-14T00:35:17.439Z
2
NaN
NaN
0.77
…
2023-10-14T00:15:19.771Z
3
157.0
0.06571
0.13
…
2023-10-14T00:16:37.480Z
4
183.0
0.03322
0.17
…
2023-10-14T00:52:32.381Z
place
type horizontalError depthError
\
0
13 km S of Fern Forest, Hawaii
earthquake
0.46
0.18
1
19 km S of Yosemite Valley, CA
earthquake
0.39
0.97
2
41 km W of Clear, Alaska
earthquake
NaN
0.40
3
3 km W of Fuig, Puerto Rico
earthquake
1.01
0.57
4
3 km SW of Fuig, Puerto Rico
earthquake
0.53
0.25
magError
magNst
status
locationSource magSource
0
0.420000
6.0
automatic
hv
hv
1
0.200000
4.0
automatic
nc
nc
2
NaN
NaN
automatic
ak
ak
3
0.108381
8.0
reviewed
pr
pr
4
0.126826
11.0
reviewed
pr
pr
1
[5 rows x 22 columns]
Question 2: Summary and Info
[3]:
(df
.
info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9995 entries, 0 to 9994
Data columns (total 22 columns):
#
Column
Non-Null Count
Dtype
---
------
--------------
-----
0
time
9995 non-null
object
1
latitude
9995 non-null
float64
2
longitude
9995 non-null
float64
3
depth
9995 non-null
float64
4
mag
9994 non-null
float64
5
magType
9994 non-null
object
6
nst
7359 non-null
float64
7
gap
7359 non-null
float64
8
dmin
6036 non-null
float64
9
rms
9995 non-null
float64
10
net
9995 non-null
object
11
id
9995 non-null
object
12
updated
9995 non-null
object
13
place
9995 non-null
object
14
type
9995 non-null
object
15
horizontalError
6674 non-null
float64
16
depthError
9995 non-null
float64
17
magError
7327 non-null
float64
18
magNst
7345 non-null
float64
19
status
9995 non-null
object
20
locationSource
9995 non-null
object
21
magSource
9995 non-null
object
dtypes: float64(12), object(10)
memory usage: 1.7+ MB
1)The time, magType, id, updated, place, type, status, locationSource, magSource column is of
type ‘object’.
2)The latitude, longitude, depth, mag, nst, gap, dmin, rms, horizontalError, depthError, magError,
magNst columns are of type ‘float64’
Question 3: Handling Missing Values
[4]:
print
(df
.
isnull()
.
sum())
time
0
latitude
0
longitude
0
2
depth
0
mag
1
magType
1
nst
2636
gap
2636
dmin
3959
rms
0
net
0
id
0
updated
0
place
0
type
0
horizontalError
3321
depthError
0
magError
2668
magNst
2650
status
0
locationSource
0
magSource
0
dtype: int64
[5]:
df
=
df
.
dropna(subset
=
[
'mag'
,
'magType'
])
df
.
isnull()
.
sum()
[5]:
time
0
latitude
0
longitude
0
depth
0
mag
0
magType
0
nst
2636
gap
2636
dmin
3959
rms
0
net
0
id
0
updated
0
place
0
type
0
horizontalError
3321
depthError
0
magError
2667
magNst
2649
status
0
locationSource
0
magSource
0
dtype: int64
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[6]:
columns_to_fill
=
[
'nst'
,
'gap'
,
'dmin'
,
'horizontalError'
,
'magError'
,
␣
↪
'magNst'
]
for
column
in
columns_to_fill:
df[column]
.
fillna(df[column]
.
mean(), inplace
=
True
)
df
.
isnull()
.
sum()
[6]:
time
0
latitude
0
longitude
0
depth
0
mag
0
magType
0
nst
0
gap
0
dmin
0
rms
0
net
0
id
0
updated
0
place
0
type
0
horizontalError
0
depthError
0
magError
0
magNst
0
status
0
locationSource
0
magSource
0
dtype: int64
[7]:
df
[7]:
time
latitude
longitude
depth
mag magType
\
0
2023-10-14T00:40:16.200Z
19.340666 -155.118332
-0.64
1.91
ml
1
2023-10-14T00:29:02.360Z
37.570000 -119.555336
14.76
2.18
md
2
2023-10-14T00:13:31.008Z
64.205700 -150.031900
15.60
1.50
ml
3
2023-10-13T23:56:32.410Z
17.989333
-66.946667
13.40
2.24
md
4
2023-10-13T23:53:12.670Z
17.966333
-66.943000
12.78
3.08
md
…
…
…
…
…
…
…
9990
2023-09-14T01:33:37.321Z
62.077400 -149.468100
33.80
2.00
ml
9991
2023-09-14T01:32:23.440Z
35.841833
-97.860500
16.33
1.23
ml
9992
2023-09-14T01:31:57.210Z
38.824667 -122.791333
3.87
0.48
md
9993
2023-09-14T01:17:05.007Z
63.166300 -150.561500
111.00
1.10
ml
9994
2023-09-14T01:16:33.369Z
62.398500 -152.304000
118.10
1.10
ml
nst
gap
dmin
rms
…
updated
\
4
0
39.000000
132.000000
0.67781
0.25
…
2023-10-14T00:45:48.260Z
1
12.000000
131.000000
0.21210
0.04
…
2023-10-14T00:35:17.439Z
2
24.048926
116.580645
0.67781
0.77
…
2023-10-14T00:15:19.771Z
3
8.000000
157.000000
0.06571
0.13
…
2023-10-14T00:16:37.480Z
4
28.000000
183.000000
0.03322
0.17
…
2023-10-14T00:52:32.381Z
…
…
…
…
…
…
…
9990
24.048926
116.580645
0.67781
0.60
…
2023-09-27T22:30:14.717Z
9991
58.000000
40.000000
0.15297
0.17
…
2023-09-15T13:00:36.577Z
9992
20.000000
64.000000
0.01374
0.09
…
2023-09-14T19:24:43.874Z
9993
24.048926
116.580645
0.67781
0.46
…
2023-09-27T22:30:19.808Z
9994
24.048926
116.580645
0.67781
0.27
…
2023-09-28T11:40:30.503Z
place
type horizontalError
\
0
13 km S of Fern Forest, Hawaii
earthquake
0.460000
1
19 km S of Yosemite Valley, CA
earthquake
0.390000
2
41 km W of Clear, Alaska
earthquake
1.859097
3
3 km W of Fuig, Puerto Rico
earthquake
1.010000
4
3 km SW of Fuig, Puerto Rico
earthquake
0.530000
…
…
…
…
9990
22 km ESE of Susitna North, Alaska
earthquake
1.859097
9991
6 km ESE of Kingfisher, Oklahoma
earthquake
1.859097
9992
6 km W of Cobb, CA
earthquake
0.440000
9993
71 km SE of Denali National Park, Alaska
earthquake
1.859097
9994
65 km NW of Skwentna, Alaska
earthquake
1.859097
depthError
magError
magNst
status
locationSource magSource
0
0.18
0.420000
6.000000
automatic
hv
hv
1
0.97
0.200000
4.000000
automatic
nc
nc
2
0.40
0.226387
17.680054
automatic
ak
ak
3
0.57
0.108381
8.000000
reviewed
pr
pr
4
0.25
0.126826
11.000000
reviewed
pr
pr
…
…
…
…
…
…
…
9990
0.60
0.226387
17.680054
reviewed
ak
ak
9991
0.70
0.200000
23.000000
reviewed
ok
ok
9992
0.64
0.128000
22.000000
reviewed
nc
nc
9993
0.70
0.226387
17.680054
reviewed
ak
ak
9994
1.10
0.226387
17.680054
reviewed
ak
ak
[9994 rows x 22 columns]
Question 4: Time Analysis
[8]:
df[
'time'
]
=
pd
.
to_datetime(df[
'time'
])
df[
'year'
]
=
df[
'time'
]
.
dt
.
year
df[
'month'
]
=
df[
'time'
]
.
dt
.
month
df[
'day'
]
=
df[
'time'
]
.
dt
.
day
5
dist_yearly
=
df[
'year'
]
.
value_counts()
.
sort_index()
dist_monthly
=
df[
'month'
]
.
value_counts()
.
sort_index()
print
(
"Dist. of Earthquakes over Years:"
)
print
(dist_yearly)
print
(
"
\n
Dist. of Earthquakes over Months:"
)
print
(dist_monthly)
plt
.
figure(figsize
=
(
10
,
5
))
plt
.
bar(dist_yearly
.
index, dist_yearly
.
values, color
=
'green'
)
plt
.
xlabel(
'Years'
)
plt
.
ylabel(
'No of Earthquakes'
)
plt
.
title(
'Dist. of Earthquakes over Years'
)
plt
.
xticks(dist_yearly
.
index
.
astype(
int
))
plt
.
show()
plt
.
figure(figsize
=
(
10
,
5
))
plt
.
bar(dist_monthly
.
index, dist_monthly
.
values, color
=
'blue'
)
plt
.
xlabel(
'Months'
)
plt
.
ylabel(
'No of Earthquakes'
)
plt
.
title(
'Dist. of Earthquakes over Months'
)
plt
.
xticks(dist_monthly
.
index
.
astype(
int
))
plt
.
show()
Dist. of Earthquakes over Years:
2023
9994
Name: year, dtype: int64
Dist. of Earthquakes over Months:
9
6140
10
3854
Name: month, dtype: int64
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 5: Magnitude Analysis
[9]:
dist_magnitude
=
df[
'mag'
]
plt
.
figure(figsize
=
(
10
,
5
))
plt
.
hist(dist_magnitude, bins
=20
, edgecolor
=
'k'
, color
=
'green'
)
plt
.
xlabel(
'Magnitude'
)
7
plt
.
ylabel(
'Number of earthquakes'
)
plt
.
title(
'Dist. of Earthquake Magnitudes'
)
plt
.
show()
Question 6: Depth Analysis
[10]:
dist_depth
=
df[
'depth'
]
plt
.
figure(figsize
=
(
10
,
5
))
plt
.
hist(dist_depth, bins
=20
, edgecolor
=
'k'
, color
=
'red'
)
plt
.
xlabel(
'Depth'
)
plt
.
ylabel(
'Number of earthquakes'
)
plt
.
title(
'Dist. of Earthquake Depths'
)
plt
.
show()
8
Question 7: Location Analysis
[11]:
freq_loc
=
df
.
groupby([
'latitude'
,
'longitude'
])
.
size()
.
↪
reset_index(name
=
'frequency'
)
Top10
=
freq_loc
.
sort_values(by
=
'frequency'
, ascending
=
False
)
.
head(
10
)
plt
.
figure(figsize
=
(
10
,
6
))
plt
.
scatter(Top10[
'longitude'
], Top10[
'latitude'
], s
=
Top10[
'frequency'
]
*10
,
␣
↪
c
=
'blue'
, alpha
=0.5
)
plt
.
xlim(Top10[
'longitude'
]
.
min()
- 5
, Top10[
'longitude'
]
.
max()
+ 5
)
plt
.
ylim(Top10[
'latitude'
]
.
min()
- 5
, Top10[
'latitude'
]
.
max()
+ 5
)
for
i, row
in
Top10
.
iterrows():
plt
.
text(row[
'longitude'
], row[
'latitude'
],
f'Location
{
i
+1
}
'
, fontsize
=14
,
␣
↪
color
=
'red'
)
plt
.
xlabel(
'Longitude'
)
plt
.
ylabel(
'Latitude'
)
plt
.
title(
'10 Highest Frequency Locations'
)
plt
.
show()
#Note:There are nine points at the same location in the plot.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Question 8: Correlation Analysis
[12]:
mag_earthquake
=
df[
'mag'
]
depth_earthquake
=
df[
'depth'
]
plt
.
figure(figsize
=
(
8
,
6
))
plt
.
scatter(mag_earthquake, depth_earthquake, alpha
=0.5
)
plt
.
title(
'Scatter Plot'
)
plt
.
xlabel(
'Magnitude'
)
plt
.
ylabel(
'Depth'
)
cor_coef
=
np
.
corrcoef(mag_earthquake, depth_earthquake)[
0
,
1
]
print
(
f'Correlation Coefficient:
{
cor_coef
:
.2f
}
'
)
plt
.
show()
Correlation Coefficient: 0.35
10
Question 9: Advanced Visualization
[35]:
import
plotly.graph_objects
as
go
lat
=
df[
'latitude'
]
.
values
long
=
df[
'longitude'
]
.
values
mag
=
df[
'mag'
]
.
values
fig
=
go
.
Figure(data
=
go
.
Densitymapbox(
lat
=
lat, lon
=
long, z
=
mag,
radius
=10
,
colorscale
=
'Viridis'
,
colorbar
=
dict
(title
=
'Magnitude'
)
))
fig
.
update_layout(
mapbox_style
=
"stamen-terrain"
,
mapbox_center_lon
=
long
.
mean(),
mapbox_center_lat
=
lat
.
mean(),
mapbox_zoom
=3
11
)
fig
.
show()
Question 10: Insights and Observations
1) In the dataset, there were missing values in columns like ‘nst’, ‘gap’, ‘dmin’, ‘horizontalError’,
‘magError’, and ‘magNst’.
2)Various parts of the world experience earthquakes, with some regions having higher levels of
seismic activity.
3)Most earthquakes are of lower magnitude, there are also occurrences of higher magnitude ones.
4)A moderate correlation coefficient of 0.35 is observed between the depth and magnitude of the
earthquake.
Key Takeaways
1)The moderate positive correlation between magnitude and depth shows that deeper earthquakes
tend to have higher magnitudes.
2)Earthquake occurrences are not evenly distributed across time and location
3)The dataset contains earthquakes of varying magnitudes, ranging from minor tremors to seismic
events
4)The data is valuable for scientific research, disaster preparedness, and policy-making related to
earthquake mitigation
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help