assignment7fda

pdf

School

Northeastern University *

*We aren’t endorsed by this school

Course

6400

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

12

Uploaded by vishalbunty01

Report
assignment7fda October 15, 2023 Question 1: Load the Dataset [2]: import numpy as np import pandas as pd import matplotlib.pyplot as plt df = pd . read_csv( 'all_month.csv' ) df . head() [2]: time latitude longitude depth mag magType nst \ 0 2023-10-14T00:40:16.200Z 19.340666 -155.118332 -0.64 1.91 ml 39.0 1 2023-10-14T00:29:02.360Z 37.570000 -119.555336 14.76 2.18 md 12.0 2 2023-10-14T00:13:31.008Z 64.205700 -150.031900 15.60 1.50 ml NaN 3 2023-10-13T23:56:32.410Z 17.989333 -66.946667 13.40 2.24 md 8.0 4 2023-10-13T23:53:12.670Z 17.966333 -66.943000 12.78 3.08 md 28.0 gap dmin rms updated \ 0 132.0 NaN 0.25 2023-10-14T00:45:48.260Z 1 131.0 0.21210 0.04 2023-10-14T00:35:17.439Z 2 NaN NaN 0.77 2023-10-14T00:15:19.771Z 3 157.0 0.06571 0.13 2023-10-14T00:16:37.480Z 4 183.0 0.03322 0.17 2023-10-14T00:52:32.381Z place type horizontalError depthError \ 0 13 km S of Fern Forest, Hawaii earthquake 0.46 0.18 1 19 km S of Yosemite Valley, CA earthquake 0.39 0.97 2 41 km W of Clear, Alaska earthquake NaN 0.40 3 3 km W of Fuig, Puerto Rico earthquake 1.01 0.57 4 3 km SW of Fuig, Puerto Rico earthquake 0.53 0.25 magError magNst status locationSource magSource 0 0.420000 6.0 automatic hv hv 1 0.200000 4.0 automatic nc nc 2 NaN NaN automatic ak ak 3 0.108381 8.0 reviewed pr pr 4 0.126826 11.0 reviewed pr pr 1
[5 rows x 22 columns] Question 2: Summary and Info [3]: (df . info()) <class 'pandas.core.frame.DataFrame'> RangeIndex: 9995 entries, 0 to 9994 Data columns (total 22 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 time 9995 non-null object 1 latitude 9995 non-null float64 2 longitude 9995 non-null float64 3 depth 9995 non-null float64 4 mag 9994 non-null float64 5 magType 9994 non-null object 6 nst 7359 non-null float64 7 gap 7359 non-null float64 8 dmin 6036 non-null float64 9 rms 9995 non-null float64 10 net 9995 non-null object 11 id 9995 non-null object 12 updated 9995 non-null object 13 place 9995 non-null object 14 type 9995 non-null object 15 horizontalError 6674 non-null float64 16 depthError 9995 non-null float64 17 magError 7327 non-null float64 18 magNst 7345 non-null float64 19 status 9995 non-null object 20 locationSource 9995 non-null object 21 magSource 9995 non-null object dtypes: float64(12), object(10) memory usage: 1.7+ MB 1)The time, magType, id, updated, place, type, status, locationSource, magSource column is of type ‘object’. 2)The latitude, longitude, depth, mag, nst, gap, dmin, rms, horizontalError, depthError, magError, magNst columns are of type ‘float64’ Question 3: Handling Missing Values [4]: print (df . isnull() . sum()) time 0 latitude 0 longitude 0 2
depth 0 mag 1 magType 1 nst 2636 gap 2636 dmin 3959 rms 0 net 0 id 0 updated 0 place 0 type 0 horizontalError 3321 depthError 0 magError 2668 magNst 2650 status 0 locationSource 0 magSource 0 dtype: int64 [5]: df = df . dropna(subset = [ 'mag' , 'magType' ]) df . isnull() . sum() [5]: time 0 latitude 0 longitude 0 depth 0 mag 0 magType 0 nst 2636 gap 2636 dmin 3959 rms 0 net 0 id 0 updated 0 place 0 type 0 horizontalError 3321 depthError 0 magError 2667 magNst 2649 status 0 locationSource 0 magSource 0 dtype: int64 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[6]: columns_to_fill = [ 'nst' , 'gap' , 'dmin' , 'horizontalError' , 'magError' , 'magNst' ] for column in columns_to_fill: df[column] . fillna(df[column] . mean(), inplace = True ) df . isnull() . sum() [6]: time 0 latitude 0 longitude 0 depth 0 mag 0 magType 0 nst 0 gap 0 dmin 0 rms 0 net 0 id 0 updated 0 place 0 type 0 horizontalError 0 depthError 0 magError 0 magNst 0 status 0 locationSource 0 magSource 0 dtype: int64 [7]: df [7]: time latitude longitude depth mag magType \ 0 2023-10-14T00:40:16.200Z 19.340666 -155.118332 -0.64 1.91 ml 1 2023-10-14T00:29:02.360Z 37.570000 -119.555336 14.76 2.18 md 2 2023-10-14T00:13:31.008Z 64.205700 -150.031900 15.60 1.50 ml 3 2023-10-13T23:56:32.410Z 17.989333 -66.946667 13.40 2.24 md 4 2023-10-13T23:53:12.670Z 17.966333 -66.943000 12.78 3.08 md 9990 2023-09-14T01:33:37.321Z 62.077400 -149.468100 33.80 2.00 ml 9991 2023-09-14T01:32:23.440Z 35.841833 -97.860500 16.33 1.23 ml 9992 2023-09-14T01:31:57.210Z 38.824667 -122.791333 3.87 0.48 md 9993 2023-09-14T01:17:05.007Z 63.166300 -150.561500 111.00 1.10 ml 9994 2023-09-14T01:16:33.369Z 62.398500 -152.304000 118.10 1.10 ml nst gap dmin rms updated \ 4
0 39.000000 132.000000 0.67781 0.25 2023-10-14T00:45:48.260Z 1 12.000000 131.000000 0.21210 0.04 2023-10-14T00:35:17.439Z 2 24.048926 116.580645 0.67781 0.77 2023-10-14T00:15:19.771Z 3 8.000000 157.000000 0.06571 0.13 2023-10-14T00:16:37.480Z 4 28.000000 183.000000 0.03322 0.17 2023-10-14T00:52:32.381Z 9990 24.048926 116.580645 0.67781 0.60 2023-09-27T22:30:14.717Z 9991 58.000000 40.000000 0.15297 0.17 2023-09-15T13:00:36.577Z 9992 20.000000 64.000000 0.01374 0.09 2023-09-14T19:24:43.874Z 9993 24.048926 116.580645 0.67781 0.46 2023-09-27T22:30:19.808Z 9994 24.048926 116.580645 0.67781 0.27 2023-09-28T11:40:30.503Z place type horizontalError \ 0 13 km S of Fern Forest, Hawaii earthquake 0.460000 1 19 km S of Yosemite Valley, CA earthquake 0.390000 2 41 km W of Clear, Alaska earthquake 1.859097 3 3 km W of Fuig, Puerto Rico earthquake 1.010000 4 3 km SW of Fuig, Puerto Rico earthquake 0.530000 9990 22 km ESE of Susitna North, Alaska earthquake 1.859097 9991 6 km ESE of Kingfisher, Oklahoma earthquake 1.859097 9992 6 km W of Cobb, CA earthquake 0.440000 9993 71 km SE of Denali National Park, Alaska earthquake 1.859097 9994 65 km NW of Skwentna, Alaska earthquake 1.859097 depthError magError magNst status locationSource magSource 0 0.18 0.420000 6.000000 automatic hv hv 1 0.97 0.200000 4.000000 automatic nc nc 2 0.40 0.226387 17.680054 automatic ak ak 3 0.57 0.108381 8.000000 reviewed pr pr 4 0.25 0.126826 11.000000 reviewed pr pr 9990 0.60 0.226387 17.680054 reviewed ak ak 9991 0.70 0.200000 23.000000 reviewed ok ok 9992 0.64 0.128000 22.000000 reviewed nc nc 9993 0.70 0.226387 17.680054 reviewed ak ak 9994 1.10 0.226387 17.680054 reviewed ak ak [9994 rows x 22 columns] Question 4: Time Analysis [8]: df[ 'time' ] = pd . to_datetime(df[ 'time' ]) df[ 'year' ] = df[ 'time' ] . dt . year df[ 'month' ] = df[ 'time' ] . dt . month df[ 'day' ] = df[ 'time' ] . dt . day 5
dist_yearly = df[ 'year' ] . value_counts() . sort_index() dist_monthly = df[ 'month' ] . value_counts() . sort_index() print ( "Dist. of Earthquakes over Years:" ) print (dist_yearly) print ( " \n Dist. of Earthquakes over Months:" ) print (dist_monthly) plt . figure(figsize = ( 10 , 5 )) plt . bar(dist_yearly . index, dist_yearly . values, color = 'green' ) plt . xlabel( 'Years' ) plt . ylabel( 'No of Earthquakes' ) plt . title( 'Dist. of Earthquakes over Years' ) plt . xticks(dist_yearly . index . astype( int )) plt . show() plt . figure(figsize = ( 10 , 5 )) plt . bar(dist_monthly . index, dist_monthly . values, color = 'blue' ) plt . xlabel( 'Months' ) plt . ylabel( 'No of Earthquakes' ) plt . title( 'Dist. of Earthquakes over Months' ) plt . xticks(dist_monthly . index . astype( int )) plt . show() Dist. of Earthquakes over Years: 2023 9994 Name: year, dtype: int64 Dist. of Earthquakes over Months: 9 6140 10 3854 Name: month, dtype: int64 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 5: Magnitude Analysis [9]: dist_magnitude = df[ 'mag' ] plt . figure(figsize = ( 10 , 5 )) plt . hist(dist_magnitude, bins =20 , edgecolor = 'k' , color = 'green' ) plt . xlabel( 'Magnitude' ) 7
plt . ylabel( 'Number of earthquakes' ) plt . title( 'Dist. of Earthquake Magnitudes' ) plt . show() Question 6: Depth Analysis [10]: dist_depth = df[ 'depth' ] plt . figure(figsize = ( 10 , 5 )) plt . hist(dist_depth, bins =20 , edgecolor = 'k' , color = 'red' ) plt . xlabel( 'Depth' ) plt . ylabel( 'Number of earthquakes' ) plt . title( 'Dist. of Earthquake Depths' ) plt . show() 8
Question 7: Location Analysis [11]: freq_loc = df . groupby([ 'latitude' , 'longitude' ]) . size() . reset_index(name = 'frequency' ) Top10 = freq_loc . sort_values(by = 'frequency' , ascending = False ) . head( 10 ) plt . figure(figsize = ( 10 , 6 )) plt . scatter(Top10[ 'longitude' ], Top10[ 'latitude' ], s = Top10[ 'frequency' ] *10 , c = 'blue' , alpha =0.5 ) plt . xlim(Top10[ 'longitude' ] . min() - 5 , Top10[ 'longitude' ] . max() + 5 ) plt . ylim(Top10[ 'latitude' ] . min() - 5 , Top10[ 'latitude' ] . max() + 5 ) for i, row in Top10 . iterrows(): plt . text(row[ 'longitude' ], row[ 'latitude' ], f'Location { i +1 } ' , fontsize =14 , color = 'red' ) plt . xlabel( 'Longitude' ) plt . ylabel( 'Latitude' ) plt . title( '10 Highest Frequency Locations' ) plt . show() #Note:There are nine points at the same location in the plot. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 8: Correlation Analysis [12]: mag_earthquake = df[ 'mag' ] depth_earthquake = df[ 'depth' ] plt . figure(figsize = ( 8 , 6 )) plt . scatter(mag_earthquake, depth_earthquake, alpha =0.5 ) plt . title( 'Scatter Plot' ) plt . xlabel( 'Magnitude' ) plt . ylabel( 'Depth' ) cor_coef = np . corrcoef(mag_earthquake, depth_earthquake)[ 0 , 1 ] print ( f'Correlation Coefficient: { cor_coef : .2f } ' ) plt . show() Correlation Coefficient: 0.35 10
Question 9: Advanced Visualization [35]: import plotly.graph_objects as go lat = df[ 'latitude' ] . values long = df[ 'longitude' ] . values mag = df[ 'mag' ] . values fig = go . Figure(data = go . Densitymapbox( lat = lat, lon = long, z = mag, radius =10 , colorscale = 'Viridis' , colorbar = dict (title = 'Magnitude' ) )) fig . update_layout( mapbox_style = "stamen-terrain" , mapbox_center_lon = long . mean(), mapbox_center_lat = lat . mean(), mapbox_zoom =3 11
) fig . show() Question 10: Insights and Observations 1) In the dataset, there were missing values in columns like ‘nst’, ‘gap’, ‘dmin’, ‘horizontalError’, ‘magError’, and ‘magNst’. 2)Various parts of the world experience earthquakes, with some regions having higher levels of seismic activity. 3)Most earthquakes are of lower magnitude, there are also occurrences of higher magnitude ones. 4)A moderate correlation coefficient of 0.35 is observed between the depth and magnitude of the earthquake. Key Takeaways 1)The moderate positive correlation between magnitude and depth shows that deeper earthquakes tend to have higher magnitudes. 2)Earthquake occurrences are not evenly distributed across time and location 3)The dataset contains earthquakes of varying magnitudes, ranging from minor tremors to seismic events 4)The data is valuable for scientific research, disaster preparedness, and policy-making related to earthquake mitigation 12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help