Lab-05 - Jupyter Notebook

pdf

School

University of North Texas *

*We aren’t endorsed by this school

Course

5502

Subject

Computer Science

Date

Feb 20, 2024

Type

pdf

Pages

28

Uploaded by venkatasai1999

Report
Part 1: Data Visualizations 1. Grouped Bar Plots: Make both a side-by-side bar plot and a stacked bar plot that displays the number of child visitors and the number of adult visitors at a waterpark in the months of April, May, June and July. Be sure to include titles, legends and appropriate labels sufficiently sized for readability. April Children: 780 Adults: 315 May Children: 1050 Adults: 400 June Children: 3056 Adults: 1000 July Children: 5025 Adults: 1500 In [179]: import numpy as np import matplotlib.pyplot as plt
In [180]: children = ( 780 , 1050 , 3056 , 5025 ) ind = np.arange( 4 ) width = 0.35 fig, ax = plt.subplots() rects1 = ax.bar(ind, children, width) adult = ( 315 , 400 , 1000 , 1500 ) rects2 = ax.bar(ind + width, adult, width) ax.set_ylabel( 'Number of Visitors' ,fontsize = 13 ) ax.set_title( 'Number of Vistors in 4months(April - July)' ,fontsize = 13 ) ax.set_xticks(ind + width / 2 ) ax.set_xticklabels(( 'April' , 'May' , 'June' , 'July' ),fontsize = 10 ) ax.legend((rects1[ 0 ], rects2[ 0 ]), ( 'children' , 'adult' )) def autolabel (rects): for rect in rects: height = rect.get_height() ax.text(rect.get_x() + rect.get_width() / 2. , height, '%d' % int (height),ha = 'center' , va = 'bottom' ) autolabel(rects1) autolabel(rects2) plt.show()
In [181]: children = [ 780 , 1050 , 3056 , 5025 ] adults = [ 315 , 400 , 1000 , 1500 ] months = [ 'April' , 'May' , 'June' , 'July' ] width = 0.5 ind = np.arange( len (months)) tick_pos = [i + (width / 50 ) for i in ind] p1 = plt.bar(ind, adults, width, align = 'center' ) p2 = plt.bar(ind, children, width, bottom = adults, align = 'center' ) plt.ylabel( 'Number of Visitors' , fontsize = 12 ) plt.xlabel( 'Months' , fontsize = 12 , labelpad = 15 ) plt.title( 'Number of Visitors in 4 months (April - July)' , fontsize = 13 ) plt.xticks(tick_pos, months, fontsize = 10 ) plt.legend((p1[ 0 ], p2[ 0 ]), ( 'Adults' , 'Children' ), loc = "best" ) for r1, r2 in zip (p1, p2): h1 = r1.get_height() h2 = r2.get_height() plt.text(r1.get_x() + r1.get_width() / 2. , h1 / 2. , "%d" % h1, ha = "center" , va = "bottom" , color = "white" , fontsiz plt.text(r2.get_x() + r2.get_width() / 2. , h1 + h2 / 2. , "%d" % h2, ha = "center" , va = "bottom" , color = "white" , fo plt.show()
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
2. Histogram: Make a histogram of the following scores from the Fall 2017 Data Structures course at Loyola University Chicago. Feel free to experiment on the best number of histogram bins for visualization. 114.8, 98.8, 97.3, 96, 94.1, 93.1, 93.1, 91.6, 91.5, 91.3, 90.3, 89.2, 87.5, 87.4, 85.2, 81.7, 81.6, 81.5, 80, 79.3, 78.2, 77.6, 77.1, 76.7, 75.1, 73.9, 72, 71, 64.6, 63.3, 47.2, 38.7 In [182]: In [129]: 38.7 Fall_2017 = np.array([ 114.8 , 98.8 , 97.3 , 96 , 94.1 , 93.1 , 93.1 , 91.6 , 91.5 , 91.3 , 90.3 , 89.2 , 87.5 , 87.4 , 85.2 , 81.7 , 81.6 , 81.5 , 80 , 79.3 , 78.2 , 77.6 , 77.1 , 76.7 , 75.1 , 73.9 , 72 , 71 , 64.6 , 63.3 , 47.2 , 38.7 ] print ( min (Fall_2017))
In [130]: In [131]: 114.8 Out[131]: Text(0.5, 1.0, 'Fall 2017- Data Structures Course Scrores ') print ( max (Fall_2017)) t = [ 0 , 20 , 30 , 40 , 50 , 60 , 70 , 80 , 90 , 100 , 110 , 120 ] plt.hist(Fall_2017,bins = t,ec = "black" ,color = 'green' ) plt.xticks(t) plt.xlabel( "scores" ) plt.ylabel( "count" ) plt.grid(axis = 'y' , linestyle = '--' , alpha = 0.6 ) plt.title( "Fall 2017- Data Structures Course Scrores " )
3. Line Plot: Create a line plot of sin(x) and cos(x + π/2) for -2π < x < 2π where x increases at intervals of π/4. Make the sin(x) graph red and make the cos(x+π/2) graph green. Put both lines onto the same plot.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [132]: Using the same info as above, make a subplot with 2 different graphs- one graph for sin(x) and Out[132]: <matplotlib.legend.Legend at 0x184f1b56410> x = np.arange( - 2 * np.pi, 2 * np.pi,np.pi / 4 ) y = np.sin(x) z = np.cos(x + np.pi / 2 ) plt.plot(x,y,color = 'red' ,label = "sin(x)" ) plt.plot(x,z,color = 'green' ,label = "cos(x+π/2)" ) plt.xlabel( '-2π to 2π' ) plt.ylabel( 'sin(x) and cos(x+ π/2)' ) plt.title( 'sin(x) and cos(x+ π/2) line plot' ) plt.grid( True ) plt.legend()
In [133]: 4. Scatter Plot: Using the following data about winter temperatures affecting the number of days for lake ice at Lake Superior, construct a scatter plot to display the data. Include a line of best fit. plt.figure(figsize = ( 10 , 4 )) plt.subplot( 1 , 2 , 1 ) plt.plot(x, y, color = 'red' , label = 'sin(x)' ) plt.xlabel( 'x' ) plt.ylabel( 'Value' ) plt.title( 'sin(x)' ) plt.subplot( 1 , 2 , 2 ) plt.plot(x, z, color = 'green' , label = 'cos(x + π/2)' ) plt.xlabel( 'x' ) plt.ylabel( 'Value' ) plt.title( 'cos(x + π/2)' ) plt.tight_layout() plt.show()
Mean Temperature (in Fahrenheit): 22.94, 23.02, 25.68, 19.96, 24.80, 23.98, 22.10, 20.30, 24.20, 22.74, 24.16, 24.94, 22.40, 22.14, 20.84, 25.66, 21.73, 24.49, 24.13, 22.17, 21.73, 20.41, 24.41, 23.95, 20.95, 26.71, 22.81, 23.11, 23.33, 28.83, 23.11, 21.47, 23.97, 24.75, 23.61, 23.08, 21.24,
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [134]: import numpy as np import matplotlib.pyplot as plt from scipy import stats mean_temperature = [ 22.94 , 23.02 , 25.68 , 19.96 , 24.80 , 23.98 , 22.10 , 20.30 , 24.20 , 22.74 , 24.16 , 24.94 , 22.40 , 22.14 , 20.84 , 25.66 , 21.73 , 24.49 , 24.13 , 22.17 , 21.73 , 20.41 , 24.41 , 23.95 , 20.95 , 26.71 , 22.81 , 23.11 , 23.33 , 28.83 , 23.11 , 21.47 , 23.97 , 24.75 , 23.61 , 23.08 , 21.24 , 26.63 , 23.88 ] days_of_ice = [ 87 , 137 , 106 , 97 , 105 , 118 , 118 , 136 , 91 , 107 , 96 , 114 , 125 , 115 , 118 , 82 , 115 , 97 , 104 , 146 , 126 , 141 , 111 , 123 , 118 , 83 , 48 , 118 , 116 , 81 , 116 , 123 , 112 , 99 , 102 , 118 , 63 , 62 , 132 ] slope, intercept, r_value, p_value, std_err = stats.linregress(mean_temperature, days_of_ice) line = slope * np.array(mean_temperature) + intercept plt.figure(figsize = ( 10 , 6 )) plt.scatter(mean_temperature, days_of_ice, label = 'Data Points' , color = 'blue' ) plt.plot(mean_temperature, line, label = 'Line of Best Fit' , color = 'red' ) plt.xlabel( 'Mean Temperature (Fahrenheit)' ) plt.ylabel( 'Days of Ice' ) plt.title( 'Relationship between Mean Temperature and Days of Ice' ) plt.legend() plt.grid( True ) plt.show()
Part 2: Basic Data Structure 1: Lists 1. Make a list with the spelled-out number strings ‘one’, ‘two’, ‘three’, ‘four’, and ‘five’ in that order and call it myList.
In [135]: 2. Remove ‘three’ from the list using positional indexing. In [136]: 3. Check if ‘four’ is in the list. In [137]: 4. Append ‘six’ to the end of the list, then print the length of the list. In [138]: 5. Print the contents of the list, but also next to each item print the length of the string (e.g. one is 3, four is 4) using a for loop. In [139]: Out[135]: ['one', 'two', 'three', 'four', 'five'] ['one', 'two', 'four', 'five'] True ['one', 'two', 'four', 'five', 'six'] 5 one is 3 two is 3 four is 4 five is 4 six is 3 myList = [ "one" , "two" , "three" , "four" , "five" ] myList del myList[ 2 ] print (myList) print ( "four" in myList) myList.append( "six" ) print (myList) print ( len (myList)) for i in myList: print (i, 'is' , len (i))
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. Create a list only of the lengths of the strings and show your result. You can use the loop before to fill the list. In [140]: 2: Dictionaries 1. Make a dictionary with the keys be English words as below, and the values be the translation. You can use this language example (German) or choose your own. Note: you need to make sure all of these words are represented as strings, in quotes. apple - Apfel apples - Äpfel I - Ich and - und like - mag strawberries - Erdbeeren In [141]: 2. Use the dictionary to look up the translation for ‘apple’ and ‘like’. In [142]: 3. Make a variable var with the string “I like apples and strawberries”. In [143]: Out[140]: [3, 3, 4, 4, 3] Out[141]: {'apple': 'Apfel', 'apples': 'Äpfel', 'I': 'Ich', 'and': 'und', 'like': 'mag', 'strawberries': 'Erdbeeren'} Apfel mag Out[143]: 'I like apples and strawberries' length_string = [ len (word) for word in myList] length_string dictionary_words = { "apple" : "Apfel" , "apples" : "Äpfel" , "I" : "Ich" , "and" : "und" , "like" : "mag" , "strawberries" : "Erdbeeren" } dictionary_words print (dictionary_words[ "apple" ]) print (dictionary_words[ "like" ]) var = "I like apples and strawberries" var
4. Now create a list from var with each word a separate item (this is a string split operation). In [144]: 5. Iterate through the list you’ve created and replace any word in your dictionary with the translation. In [145]: 6. Now take your new list and turn it into a string with spaces between the words. In [146]: 3: Arrays 1. Create an array of zeros of size 8 x 8 and print the data type of the array. Out[144]: ['I', 'like', 'apples', 'and', 'strawberries'] Out[145]: ['Ich', 'mag', 'Äpfel', 'und', 'Erdbeeren'] Out[146]: 'Ich mag Äpfel und Erdbeeren' split_string = var.split() split_string replace_list = [] for item in split_string: if item in dictionary_words: replace_list.append(dictionary_words[item]) else : replace_list.append(item) replace_list replace_string = ' ' .join(replace_list) replace_string
In [147]: In [148]: 2. Fill the array with the numbers 1 to 64 first by row, then by column. You may want to use a for loop inside a for loop to do this. In [149]: 3. Transpose the array. [[0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0.]] <class 'numpy.ndarray'> Out[149]: array([[ 1, 2, 3, 4, 5, 6, 7, 8], [ 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48], [49, 50, 51, 52, 53, 54, 55, 56], [57, 58, 59, 60, 61, 62, 63, 64]]) array = np.zeros(( 8 , 8 )) print (array) print ( type (array)) s = len (array) k = len (array) fill_array = [[(j + 1 ) + s * i for j in range (s)] for i in range (k)] fill_array = np.array(fill_array) fill_array
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [150]: 4. Print only the top 4 rows and columns. In [151]: 5. Make a 1D array out of your 2D array with the numbers 1 to 64 in order (note the column vs row issue, you may need transposes.) In [152]: In [153]: 6. Now take that 1D array you made from before and reshape it back to the original 2D array. Out[150]: array([[ 1, 9, 17, 25, 33, 41, 49, 57], [ 2, 10, 18, 26, 34, 42, 50, 58], [ 3, 11, 19, 27, 35, 43, 51, 59], [ 4, 12, 20, 28, 36, 44, 52, 60], [ 5, 13, 21, 29, 37, 45, 53, 61], [ 6, 14, 22, 30, 38, 46, 54, 62], [ 7, 15, 23, 31, 39, 47, 55, 63], [ 8, 16, 24, 32, 40, 48, 56, 64]]) Out[151]: array([[ 1, 9, 17, 25], [ 2, 10, 18, 26], [ 3, 11, 19, 27], [ 4, 12, 20, 28]]) Out[152]: array([ 1, 9, 17, 25, 33, 41, 49, 57, 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59, 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61, 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63, 8, 16, 24, 32, 40, 48, 56, 64]) Out[153]: 1 array = np.transpose(fill_array) array array[ 0 : 4 , 0 : 4 ] array_1 = array.flatten() array_1 array_1.ndim
In [154]: In [155]: 4: Application - Word Counts Word counts are often used in text processing to automatically classify documents by topic. They are also used to automatically measure the “sentiment” by counting, for example, the number of positive or negative words used in the comment or essay. Write code to count the number of unique words in a very large string using the following steps. A) First convert the string to a list with each word a separate item in the list. Hint: use a string split function for your language, and make sure it separates by “ ”. B) Then use a dictionary to associate each word with a count. Note, the dictionary won’t be able to increment a key unless you add it first, so you may have to check to see if it exists before setting the original count of a word to 1. C) Print each word and its count afterwards, and test with an interesting block of text that will have multiple words counted multiple times. (Note, the words don’t have to be in any particular order.) For example: “how much wood would a woodchuck chuck if a woodchuck could chuck wood” Expected output: how - 1 much -1 wood - 2 would - 1 a - 2 woodchuck - 2 chuck - 2 if - 1 could - 1 In [156]: Out[154]: array([[ 1, 9, 17, 25, 33, 41, 49, 57], [ 2, 10, 18, 26, 34, 42, 50, 58], [ 3, 11, 19, 27, 35, 43, 51, 59], [ 4, 12, 20, 28, 36, 44, 52, 60], [ 5, 13, 21, 29, 37, 45, 53, 61], [ 6, 14, 22, 30, 38, 46, 54, 62], [ 7, 15, 23, 31, 39, 47, 55, 63], [ 8, 16, 24, 32, 40, 48, 56, 64]]) Out[155]: 2 Out[156]: 'how much wood would a woodchuck chuck if a woodchuck could chuck wood' array_2 = array_1.reshape( 8 , 8 ) array_2 array_2.ndim Input_String = "how much wood would a woodchuck chuck if a woodchuck could chuck wood" Input_String
In [157]: In [158]: Out[157]: ['how', 'much', 'wood', 'would', 'a', 'woodchuck', 'chuck', 'if', 'a', 'woodchuck', 'could', 'chuck', 'wood'] how - 1 much - 1 wood - 2 would - 1 a - 2 woodchuck - 2 chuck - 2 if - 1 could - 1 List_Of_words = Input_String.split() List_Of_words word_counts = {} for word in List_Of_words: if word in word_counts: word_counts[word] += 1 else : word_counts[word] = 1 for word, count in word_counts.items(): print ( f" {word} - {count} " )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Part 3: Data Frames In this part, we will study a classic data set - the survivors in the sinking of the Titanic. As there were limited lifeboats, decisions were made prioritizing who would and would not survive. We will observe how different factors such as age, sex, and class affected a person’s chance of survival using data frames. Steps: 1. Input the following data into a data frame called titanic, and display the entire data frame: Sex, Class, Survived, Died Children, First, 6, 0 Children, Second, 24, 0 Children, Third, 27, 52 Men, First, 57, 118 Men, Second, 14, 154 Men, Third, 75, 387 Men, Crew, 192, 693 Women, First, 140, 4 Women, Second, 80, 13 Women, Third, 76, 89 Women, Crew, 20, 3
In [159]: 2. Now only show the data of the people in first class. In [160]: 3. Delete the crew members from the data. Out[159]: Sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 6 Men Crew 192 693 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 10 Women Crew 20 3 Out[160]: Sex Class Survived Died 0 Children First 6 0 3 Men First 57 118 7 Women First 140 4 import pandas as pd titanic_data = { 'Sex' : [ 'Children' , 'Children' , 'Children' , 'Men' , 'Men' , 'Men' , 'Men' , 'Women' , 'Women' , 'Women' , 'Women' ], 'Class' : [ 'First' , 'Second' , 'Third' , 'First' , 'Second' , 'Third' , 'Crew' , 'First' , 'Second' , 'Third' , 'Crew' ], 'Survived' : [ 6 , 24 , 27 , 57 , 14 , 75 , 192 , 140 , 80 , 76 , 20 ], 'Died' : [ 0 , 0 , 52 , 118 , 154 , 387 , 693 , 4 , 13 , 89 , 3 ] } data_frame = pd.DataFrame(titanic_data) data_frame data_frame.loc[data_frame[ "Class" ] == "First" ]
In [161]: 4. Create a new column that is the total number of people for that group (those who survived + died). Out[161]: Sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 data_frame = data_frame[data_frame[ 'Class' ] != 'Crew' ] data_frame
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [162]: 5. Create a new column with the percentage of people who survived. C:\Users\16036\AppData\Local\Temp\ipykernel_15412\83928789.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) data_frame["Total_Number"] = data_frame["Survived"]+data_frame["Died"] Out[162]: Sex Class Survived Died Total_Number 0 Children First 6 0 6 1 Children Second 24 0 24 2 Children Third 27 52 79 3 Men First 57 118 175 4 Men Second 14 154 168 5 Men Third 75 387 462 7 Women First 140 4 144 8 Women Second 80 13 93 9 Women Third 76 89 165 data_frame[ "Total_Number" ] = data_frame[ "Survived" ] + data_frame[ "Died" ] data_frame
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [163]: 6. Delete the column indicating the total number of people in that group. C:\Users\16036\AppData\Local\Temp\ipykernel_15412\601391844.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) data_frame['Survival Percentage'] = (data_frame['Survived'] / data_frame['Total_Number']) * 100 Out[163]: Sex Class Survived Died Total_Number Survival Percentage 0 Children First 6 0 6 100.000000 1 Children Second 24 0 24 100.000000 2 Children Third 27 52 79 34.177215 3 Men First 57 118 175 32.571429 4 Men Second 14 154 168 8.333333 5 Men Third 75 387 462 16.233766 7 Women First 140 4 144 97.222222 8 Women Second 80 13 93 86.021505 9 Women Third 76 89 165 46.060606 data_frame[ 'Survival Percentage' ] = (data_frame[ 'Survived' ] / data_frame[ 'Total_Number' ]) * 100 data_frame
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [164]: 7. Only show the rows where more than 80% of the people survived. In [165]: 8. Then only show the rows where less than 40% of the people survived. C:\Users\16036\AppData\Local\Temp\ipykernel_15412\75101291.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) data_frame.drop(labels ="Total_Number",axis=1,inplace=True) Out[164]: Sex Class Survived Died Survival Percentage 0 Children First 6 0 100.000000 1 Children Second 24 0 100.000000 2 Children Third 27 52 34.177215 3 Men First 57 118 32.571429 4 Men Second 14 154 8.333333 5 Men Third 75 387 16.233766 7 Women First 140 4 97.222222 8 Women Second 80 13 86.021505 9 Women Third 76 89 46.060606 Out[165]: Sex Class Survived Died Survival Percentage 0 Children First 6 0 100.000000 1 Children Second 24 0 100.000000 7 Women First 140 4 97.222222 8 Women Second 80 13 86.021505 data_frame.drop(labels = "Total_Number" ,axis = 1 ,inplace = True ) data_frame data_frame[data_frame[ 'Survival Percentage' ] > 80 ]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [166]: 9. Calculate the total number of people that survived and died for each class, then report the percentages. (Hint: Use a grouped calculation.) In [167]: Out[166]: Sex Class Survived Died Survival Percentage 2 Children Third 27 52 34.177215 3 Men First 57 118 32.571429 4 Men Second 14 154 8.333333 5 Men Third 75 387 16.233766 C:\Users\16036\AppData\Local\Temp\ipykernel_15412\2717960705.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) data_frame.drop(columns="Survival Percentage",axis=1,inplace=True) Out[167]: Sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 data_frame[data_frame[ "Survival Percentage" ] < 40 ] data_frame.drop(columns = "Survival Percentage" ,axis = 1 ,inplace = True ) data_frame
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [168]: In [169]: C:\Users\16036\AppData\Local\Temp\ipykernel_15412\4153513455.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#return ing-a-view-versus-a-copy (https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-v ersus-a-copy) data_frame["total"]=data_frame["Survived"]+data_frame["Died"] Out[168]: Sex Class Survived Died total 0 Children First 6 0 6 1 Children Second 24 0 24 2 Children Third 27 52 79 3 Men First 57 118 175 4 Men Second 14 154 168 5 Men Third 75 387 462 7 Women First 140 4 144 8 Women Second 80 13 93 9 Women Third 76 89 165 Class First 62.0 Second 41.0 Third 25.0 dtype: float64 Class First 38.0 Second 59.0 Third 75.0 dtype: float64 data_frame[ "total" ] = data_frame[ "Survived" ] + data_frame[ "Died" ] data_frame percentage_survived = round ((data_frame.groupby( "Class" ).Survived.sum() / data_frame.groupby( "Class" ).total.sum()) * 100 ) percentage_died = round ((data_frame.groupby( "Class" ).Died.sum() / data_frame.groupby( "Class" ).total.sum()) * 100 ) print (percentage_survived) print (percentage_died)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
In [170]: 10. Save your table in CSV format (as e.g. titanic_data.csv) with the first line as headers for the columns. In [171]: 11. Duplicate the CSV file on your computer since you will be editing the copied version (e.g. titanic_data2.csv). Open the new CSV file in a text editor. Note the way the data is organized. Now, in the text editor, add new lines including the data for the crew that was removed earlier. (Help: the percentage of male crew and female crew that survived was 21.69% and 86.96%.) In [172]: 12. Now read that updated CSV file into a new data frame called titanic2, and display the data. In [173]: Out[170]: %Survived %Died Class First 62.0 38.0 Second 41.0 59.0 Third 25.0 75.0 Out[173]: Class %Survived %Died 0 First 62.0 38.0 1 Second 41.0 59.0 2 Third 25.0 75.0 finalResult = pd.concat([percentage_survived,percentage_died],axis = 1 ,keys = [ "%Survived" , "%Died" ]) finalResult finalResult.to_csv( "titanic_data.csv" ) finalResult.to_csv( "titanic_data2.csv" ) titanic2 = pd.read_csv( "titanic_data2.csv" ) titanic2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help