EXAM 2

pdf

School

University of North Texas *

*We aren’t endorsed by this school

Course

5502

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

9

Uploaded by venkatasai1999

Report
In [1]: In [2]: In [3]: Out[2]: Unnamed: 0 Name PClass Age Sex Survived 0 1 Allen, Miss Elisabeth Walton 1st 29.00 female 1 1 2 Allison, Miss Helen Loraine 1st 2.00 female 0 2 3 Allison, Mr Hudson Joshua Creighton 1st 30.00 male 0 3 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female 0 4 5 Allison, Master Hudson Trevor 1st 0.92 male 1 ... ... ... ... ... ... ... 1308 1309 Zakarian, Mr Artun 3rd 27.00 male 0 1309 1310 Zakarian, Mr Maprieder 3rd 26.00 male 0 1310 1311 Zenni, Mr Philip 3rd 22.00 male 0 1311 1312 Lievens, Mr Rene 3rd 24.00 male 0 1312 1313 Zimmerman, Leo 3rd 29.00 male 0 1313 rows × 6 columns Out[3]: Unnamed: 0 0 Name 0 PClass 0 Age 557 Sex 0 Survived 0 dtype: int64 import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns df = pd.read_csv( "titanic_individuals.csv" ) df df.isnull().sum()
In [4]: In [5]: In [6]: In [7]: In [8]: In [9]: Out[5]: Unnamed: 0 0 Name 0 PClass 0 Age 0 Sex 0 Survived 0 dtype: int64 Out[6]: (756, 6) C:\Users\16036\AppData\Local\Temp\ipykernel_21576\1913066519.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) df["SexCode"]=df["Sex"].replace({"female": 1, "male": 0}) Out[9]: PClass Age Survived Sex SexCode 0 1st 29.00 1 female 1 1 1st 2.00 0 female 1 2 1st 30.00 0 male 0 3 1st 25.00 0 female 1 4 1st 0.92 1 male 0 df = df.dropna() df.isnull().sum() df.shape df[ "SexCode" ] = df[ "Sex" ].replace({ "female" : 1 , "male" : 0 }) titanic_df = df[[ 'PClass' , 'Age' , 'Survived' , 'Sex' , 'SexCode' ]].copy() titanic_df.head()
In [20]: In [17]: Out[20]: PClass Sex Survived 0 1st female 0.950495 1 1st male 0.344000 2 2nd female 0.882353 3 2nd male 0.165354 4 3rd female 0.450980 5 3rd male 0.148148 Out[17]: Sex female male PClass 1st 0.950495 0.344000 2nd 0.882353 0.165354 3rd 0.450980 0.148148 # Group by PClass and Sex, then calculate the mean of the Survived column grouped_data = titanic_df.groupby([ 'PClass' , 'Sex' ])[ 'Survived' ].mean().reset_index() grouped_data # Pivot the data for plotting pivot_grouped_data = grouped_data.pivot(index = 'PClass' , columns = 'Sex' , values = 'Survived' ) pivot_grouped_data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [18]: # Plotting fig, ax = plt.subplots(figsize = ( 10 , 6 )) pivot_grouped_data.plot(kind = 'bar' , ax = ax) plt.title( 'Survival Percentage by Class and Gender' ) plt.ylabel( 'Survival Percentage' ) plt.xlabel( 'Class' ) plt.xticks(rotation = 0 ) plt.legend(title = 'Gender' ) plt.show()
In [33]: In [34]: In [36]: Class Age Sex 0 1 29 1 1 2 29 0 2 3 36 1 3 1 47 0 4 2 38 1 5 3 19 1 6 1 71 0 7 2 45 0 8 3 30 1 Out[36]: Survived 0 1 1 0 2 1 3 1 4 0 5 1 6 0 7 0 8 1 # Provided training data and target as mentioned in the document training_data = np.array([[ 1 , 29 , 1 ], [ 2 , 29 , 0 ], [ 3 , 36 , 1 ], [ 1 , 47 , 0 ], [ 2 , 38 , 1 ], [ 3 , 19 , 1 ], [ 1 , 71 , 0 ], [ 2 , 45 , 0 ], [ 3 , 30 , 1 ]]) df = pd.DataFrame(data = training_data, columns = [ "Class" , "Age" , "Sex" ]) print (df) training_target = np.array([ 1 , 0 , 1 , 1 , 0 , 1 , 0 , 0 , 1 ]) df_survived = pd.DataFrame(data = training_target, columns = [ "Survived" ]) df_survived
In [38]: In [39]: In [40]: In [41]: C:\Users\16036\anaconda3\Lib\site-packages\sklearn\neighbors\_classification.py:228: DataConversionWarning: A colu mn-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example usi ng ravel(). return self._fit(X, y) Out[38]: KNeighborsClassifier KNeighborsClassifier(n_neighbors=1) Class Age Sex 0 2 50 1 Out[41]: array([1]) from sklearn import metrics from sklearn.neighbors import KNeighborsClassifier k = 1 model = KNeighborsClassifier(n_neighbors = k) model.fit(df,df_survived) # Apply the classifier to a simple test case: A 50-year-old female in second class test_data = [[ 2 , 50 , 1 ]] df_test = pd.DataFrame(data = test_data, columns = [ "Class" , "Age" , "Sex" ]) print (df_test) prediction = model.predict(df_test) prediction
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [51]: 1. 4-year-old boy in first class: He was predicted to survive. 2. 90-year-old man in first class: He was predicted to not surviuve. 3. 43-year-old woman in second class: She was predicted to not survive. 4. 40-year-old man in second class: He was predicted to not survive 5. 17-year-old woman in third class: She was predicted to survive. 6. 29-year-old man in third class: He was predicted to not survive. C:\Users\16036\anaconda3\Lib\site-packages\sklearn\base.py:464: UserWarning: X does not have valid feature names, but KNeighborsClassifier was fitted with feature names warnings.warn( Out[51]: array([1, 0, 0, 0, 1, 0]) # Create the test_data array with the specified test cases test_data = np.array([ [ 1 , 4 , 0 ], # A 4 year old boy in first class [ 1 , 90 , 0 ], # A 90 year old man in first class [ 2 , 43 , 1 ], # A 43 year old woman in second class [ 2 , 40 , 0 ], # A 40 year old man in second class [ 3 , 17 , 1 ], # A 17 year old woman in third class [ 3 , 29 , 0 ] ]) # A 29 year old man in third class # Predict the outcomes using the trained k-NN classifier predictions = model.predict(test_data) predictions # 1 means survived # 0 means not survived
In [45]: In [52]: Out[45]: (array([[1, 29.0, 1], [1, 2.0, 1], [1, 30.0, 0], [1, 25.0, 1], [1, 0.92, 0]], dtype=object), [1, 0, 0, 0, 1]) Out[52]: array([1, 0, 0, 0, 0, 0]) # Extracting the 'PClass', 'Age', and 'SexCode' columns as our training data training_data_from_df = titanic_df[[ 'PClass' , 'Age' , 'SexCode' ]].values # Map PClass values to numeric: 1st -> 1, 2nd -> 2, 3rd -> 3 training_data_from_df[:, 0 ] = [ 1 if x == '1st' else 2 if x == '2nd' else 3 for x in training_data_from_df[:, 0 ]] # Extracting the 'Survived' column as our training target training_target_from_df = titanic_df[ 'Survived' ].tolist() training_data_from_df[: 5 ], training_target_from_df[: 5 ] # Initialize and train the k-NN classifier with k=2 using the data from the dataframe knn_from_df = KNeighborsClassifier(n_neighbors = 2 ) knn_from_df.fit(training_data_from_df, training_target_from_df) # Apply the classifier to the test_data from Step 4 predictions_from_df = knn_from_df.predict(test_data) predictions_from_df A 4-year-old boy in first class: Predicted to survive. A 90-year-old man in first class: Predicted to not survive. A 43-year-old woman in second class: Predicted to not survive. A 40-year-old man in second class: Predicted to not survive. A 17-year-old woman in third class: Predicted to not survive. A 29-year-old man in third class: Predicted to not survive.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help