RahulVarma_LAB5

pdf

School

University of North Texas *

*We aren’t endorsed by this school

Course

5709

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

18

Uploaded by venkatasai1999

Report
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 1/18 Part 1: Data Visualizations 1. Grouped Bar Plots: Make both a side-by-side bar plot and a stacked bar plot that displays the number of child visitors and the number of adult visitors at a waterpark in the months of April, May, June and July. Be sure to include titles, legends and appropriate labels sufficiently sized for readability. April Children: 780 Adults: 315 May Children: 1050 Adults: 400 June Children: 3056 Adults: 1000 July Children: 5025 Adults: 1500 In [93]: import numpy as np import matplotlib.pyplot as plt
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 2/18 In [164]: 2. Histogram: Make a histogram of the following scores from the Fall 2017 Data Structures course at Loyola University Chicago. Feel free to experiment on the best number of histogram bins for visualization. 114.8, 98.8, 97.3, 96, 94.1, 93.1, 93.1, 91.6, 91.5, 91.3, 90.3, 89.2, 87.5, 87.4, 85.2, 81.7, 81.6, 81.5, 80, 79.3, 78.2, 77.6, 77.1, 76.7, 75.1, 73.9, 72, 71, 64.6, 63.3, 47.2, 38.7 Out[164]: <matplotlib.legend.Legend at 0x1f2e1d20e80> months = [ "April" , "May" , "June" , "July" ] children = [ 780 , 1050 , 3056 , 5025 ] adults = [ 315 , 400 , 1000 , 1500 ] #code for graph width = 0.4 bars = np.arange( len (months)) plt.bar(bars - 0.2 , children, width, label = "children" ) plt.bar(bars + 0.2 , adults, width, label = "adults" ) plt.xticks(bars,months) plt.xlabel( "Months" ) plt.ylabel( " No of visitors" ) plt.title( "Number of child visitors vs Number of adult visitors" ) plt.legend()
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 3/18 In [96]: In [97]: In [98]: In [165]: 38.7 114.8 Out[165]: Text(0.5, 1.0, 'Fall 2017- Data Structures Course Scrores ') FallScores_2017 = np.array([ 114.8 , 98.8 , 97.3 , 96 , 94.1 , 93.1 , 93.1 , 91.6 , 91 ]) print ( min (FallScores_2017)) print ( max (FallScores_2017)) t = [ 0 , 10 , 20 , 30 , 40 , 50 , 60 , 70 , 80 , 90 , 100 , 110 , 120 ] plt.hist(FallScores_2017, bins = t , ec = "black" , color = '#4e7eb5' ) plt.xticks(t) plt.xlabel( "scores" ) plt.ylabel( "count" ) plt.title( "Fall 2017- Data Structures Course Scrores " )
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 4/18 3. Line Plot: Create a line plot of sin(x) and cos(x + π/2) for -2π < x < 2π where x increases at intervals of π/4. 1)Make the sin(x) graph red and make the cos(x+π/2) graph green a)Put both lines onto the same plot In [100]: 2)Using the same info as above, make a subplot with 2 different graphs a)one graph for sin(x) and Out[100]: <matplotlib.legend.Legend at 0x1f2dbfb96a0> x = np.arange( - 2 * np.pi, 2 * np.pi,np.pi / 4 ) y = np.sin(x) z = np.cos(x + np.pi / 2 ) plt.plot(x,y,color = 'blue' ,label = "sin(x)" ) plt.plot(x,z,color = 'green' , label = "cos(x+π/2)" ) plt.xlabel( '-2π to 2π' ) plt.ylabel( 'sin(x) and cos(x+ π/2)' ) plt.title( 'sin(x) and cos(x+ π/2) line plot' ) plt.legend()
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 5/18 In [101]: b)one graph for cos(x+π/2) Out[101]: <matplotlib.legend.Legend at 0x1f2dc0ab0d0> x = np.arange( - 2 * np.pi, 2 * np.pi,np.pi / 4 ) y = np.sin(x) plt.plot(x,y,color = 'green' , label = "sin(x)" ) plt.xlabel( ' -2π to 2π' ) plt.ylabel( 'sin(x)' ) plt.title( 'sin(x) line plot, ranging from -2π to 2π' ) plt.legend()
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 6/18 In [166]: Using the following data about winter temperatures affecting the number of days for lake ice at Lake Superior, construct a scatter plot to display the data. Include a line of best fit. Mean Temperature (in Fahrenheit): 22.94, 23.02, 25.68, 19.96, 24.80, 23.98, 22.10, 20.30, 24.20, 22.74, 24.16, 24.94, 22.40, 22.14, 20.84, 25.66, 21.73, 24.49, 24.13, 22.17, 21.73, 20.41, 24.41, 23.95, 20.95, 26.71, 22.81, 23.11, 23.33, 28.83, 23.11, 21.47, 23.97, 24.75, 23.61, 23.08, 21.24, 26.63, 23.88 Days of Ice: 87, 137, 106, 97, 105, 118, 118, 136, 91, 107, 96, 114, 125, 115, 118, 82, 115, 97, 104, 146, 126, 141, 111, 123, 118, 83, 48, 118, 116, 81, 116, 123, 112, 99, 102, 118, 63, 62, 132 In [103]: Out[166]: <matplotlib.legend.Legend at 0x1f2e1e28220> x = np.arange( - 2 * np.pi, 2 * np.pi,np.pi / 4 ) y = np.cos(x + np.pi / 2 ) plt.plot(x,z,color = 'red' , label = "cos(x+π/2)" ) plt.xlabel( '-2π to 2π' ) plt.ylabel( 'cos(x+ π/2)' ) plt.title( 'line plot, ranging from -2π to 2π' ) plt.legend() Mean_Temperature = np.array([ 22.94 , 23.02 , 25.68 , 19.96 , 24.80 , 23.98 , 22.10 , Days_of_Ice = np.array([ 87 , 137 , 106 , 97 , 105 , 118 , 118 , 136 , 91 , 107 , 96 , 114
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 7/18 In [145]: Part 2: Basic Data Structure 1: Lists 1) Make a list with the spelled-out number strings ‘one’, ‘two’, ‘three’, ‘four’, and ‘five’ in that order and call it myList. In [2]: 2) Remove ‘three’ from the list using positional indexing. Out[145]: [Text(0.5, 0, 'Mean temperature'), Text(0, 0.5, 'Days of ice'), Text(0.5, 1.0, 'Mean Temperature vs Days of Ice')] import seaborn as sns plot = sns.regplot(Mean_Temperature,Days_of_Ice) plot.set(xlabel = "Mean temperature" , ylabel = "Days of ice" , title = "Mean Te mylist = [ "one" , "two" , "three" , "four" , "five" ]
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 8/18 In [3]: 3) Check if ‘four’ is in the list. In [4]: 4) Append ‘six’ to the end of the list, then print the length of the list. In [5]: In [6]: 5) Print the contents of the list, but also next to each item print the length of the string (e.g. one is 3, four is 4) using a for loop. In [109]: 6) Create a list only of the lengths of the strings and show your result. You can use the loop before to fill the list. In [167]: Out[3]: ['one', 'two', 'four', 'five'] True Out[5]: ['one', 'two', 'four', 'five', 'six'] Out[6]: 5 one is 3 two is 3 four is 4 five is 4 six is 3 Out[167]: [3, 3, 4, 4, 3] mylist del mylist[ 2 ] mylist print ( "four" in mylist) mylist.append( "six" ) mylist len (mylist) for i in mylist: print (i , "is" , len (i)) length_string = [ len (word) for word in mylist] length_string
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 9/18 2: Dictionaries 1) Make a dictionary with the keys be English words as below, and the values be the translation. You can use this language example (German) or choose your own. Note: you need to make sure all of these words are represented as strings, in quotes. apple - Apfel apples - Äpfel I - Ich and - und like - mag strawberries - Erdbeeren In [168]: 2) Use the dictionary to look up the translation for ‘apple’ and ‘like’. In [169]: 3) Make a variable var with the string “I like apples and strawberries”. In [173]: 4) Now create a list from var with each word a separate item (this is a string split operation). In [174]: In [175]: 5) Iterate through the list you’ve created and replace any word in your dictionary with the translation. Out[168]: {'apple': 'Apfel', 'apples': 'Äpfel', 'I': 'Ich', 'and': 'und', 'like': 'mag', 'strawberries': 'Erdbeeren'} Apfel mag Out[173]: 'I like apples and strawberries' Out[175]: ['I', 'like', 'apples', 'and', 'strawberries'] dictionary_words = { "apple" : "Apfel" , "apples" : "Äpfel" , "I" : "Ich" , "and" : "und" dictionary_words print (dictionary_words[ "apple" ]) print (dictionary_words[ "like" ]) var = "I like apples and strawberries" var split_string = var.split() split_string
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 10/18 In [177]: 6) Now take your new list and turn it into a string with spaces between the words. In [178]: 3: Arrays 1) Create an array of zeros of size 8 x 8 and print the data type of the array. In [179]: In [180]: 2) Fill the array with the numbers 1 to 64 first by row, then by column. You may want to use a for loop inside a for loop to do this. Out[177]: ['Ich', 'mag', 'Äpfel', 'und', 'Erdbeeren'] Out[178]: 'Ich mag Äpfel und Erdbeeren' Out[179]: array([[0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0.]]) Out[180]: numpy.ndarray replace_list = [] for item in split_string: if item in dictionary_words: replace_list .append(dictionary_words[item]) else : replace_list.append(item) replace_list replace_string = ' ' .join(replace_list) replace_string array = np.zeros(( 8 , 8 )) array type (array)
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 11/18 In [182]: 3) Transpose the array. In [183]: 4) Print only the top 4 rows and columns. In [184]: 5) Make a 1D array out of your 2D array with the numbers 1 to 64 in order (note the column vs row issue, you may need transposes.) In [187]: Out[182]: array([[ 1, 2, 3, 4, 5, 6, 7, 8], [ 9, 10, 11, 12, 13, 14, 15, 16], [17, 18, 19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30, 31, 32], [33, 34, 35, 36, 37, 38, 39, 40], [41, 42, 43, 44, 45, 46, 47, 48], [49, 50, 51, 52, 53, 54, 55, 56], [57, 58, 59, 60, 61, 62, 63, 64]]) Out[183]: array([[ 1, 9, 17, 25, 33, 41, 49, 57], [ 2, 10, 18, 26, 34, 42, 50, 58], [ 3, 11, 19, 27, 35, 43, 51, 59], [ 4, 12, 20, 28, 36, 44, 52, 60], [ 5, 13, 21, 29, 37, 45, 53, 61], [ 6, 14, 22, 30, 38, 46, 54, 62], [ 7, 15, 23, 31, 39, 47, 55, 63], [ 8, 16, 24, 32, 40, 48, 56, 64]]) Out[184]: array([[ 1, 9, 17, 25], [ 2, 10, 18, 26], [ 3, 11, 19, 27], [ 4, 12, 20, 28]]) Out[187]: array([ 1, 9, 17, 25, 33, 41, 49, 57, 2, 10, 18, 26, 34, 42, 50, 58, 3, 11, 19, 27, 35, 43, 51, 59, 4, 12, 20, 28, 36, 44, 52, 60, 5, 13, 21, 29, 37, 45, 53, 61, 6, 14, 22, 30, 38, 46, 54, 62, 7, 15, 23, 31, 39, 47, 55, 63, 8, 16, 24, 32, 40, 48, 56, 64]) s = len (array) k = len (array) fill_array = [[(j + 1 ) + s * i for j in range (s)] for i in range (k)] fill_array = np.array(fill_array) fill_array array = np.transpose(fill_array) array array[ 0 : 4 , 0 : 4 ] array_1 = array.flatten() array_1
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 12/18 In [189]: 6) Now take that 1D array you made from before and reshape it back to the original 2D array. In [193]: In [194]: Part 3: Data Frames In this part, we will study a classic data set - the survivors in the sinking of the Titanic. As there were limited lifeboats, decisions were made prioritizing who would and would not survive. We will observe how different factors such as age, sex, and class affected a person’s chance of survival using data frames. Steps: 1. Input the following data into a data frame called titanic, and display the entire data frame: Sex, Class, Survived, Died Children, First, 6, 0 Children, Second, 24, 0 Children, Third, 27, 52 Men, First, 57, 118 Men, Second, 14, 154 Men, Third, 75, 387 Men, Crew, 192, 693 Women, First, 140, 4 Women, Second, 80, 13 Women, Third, 76, 89 Women, Crew, 20, 3 data. In [127]: In [196]: Out[189]: 1 Out[193]: array([[ 1, 9, 17, 25, 33, 41, 49, 57], [ 2, 10, 18, 26, 34, 42, 50, 58], [ 3, 11, 19, 27, 35, 43, 51, 59], [ 4, 12, 20, 28, 36, 44, 52, 60], [ 5, 13, 21, 29, 37, 45, 53, 61], [ 6, 14, 22, 30, 38, 46, 54, 62], [ 7, 15, 23, 31, 39, 47, 55, 63], [ 8, 16, 24, 32, 40, 48, 56, 64]]) Out[194]: 2 array_1.ndim array_2 = array_1.reshape( 8 , 8 ) array_2 array_2.ndim import pandas as pd titanic_data = { "sex" :[ "Children" , "Children" , "Children" , "Men" , "Men" , "Men "Class" :[ "First" , "Second" , "Third" , "First" , "Second" , "Third" , "Crew "Survived" :[ 6 , 24 , 27 , 57 , 14 , 75 , 192 , 140 , 80 , 76 , 20 ], "Died" :[ 0 , 0 , 52 , 118 , 154 , 387 , 693 , 4 , 13 , 89 , 3 ]}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 13/18 In [197]: 2. Now only show the data of the people in first class. In [198]: 3. Delete the crew members from the data. Out[197]: sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 6 Men Crew 192 693 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 10 women Crew 20 3 Out[198]: sex Class Survived Died 0 Children First 6 0 3 Men First 57 118 7 Women First 140 4 data_frame = pd.DataFrame(data = titanic_data, columns = [ "sex" , "Class" , "Survi data_frame data_frame.loc[data_frame[ "Class" ] == "First" ]
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 14/18 In [199]: 4. Create a new column that is the total number of people for that group (those who survived + died). In [200]: 5. Create a new column with the percentage of people who survived. Out[199]: sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 Out[200]: sex Class Survived Died Total_Number 0 Children First 6 0 6 1 Children Second 24 0 24 2 Children Third 27 52 79 3 Men First 57 118 175 4 Men Second 14 154 168 5 Men Third 75 387 462 7 Women First 140 4 144 8 Women Second 80 13 93 9 Women Third 76 89 165 data_frame.drop(labels = [ 6 , 10 ], inplace = True ) data_frame data_frame[ "Total_Number" ] = data_frame[ "Survived" ] + data_frame[ "Died" ] data_frame
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 15/18 In [201]: 6. Delete the column indicating the total number of people in that group In [202]: 7. Only show the rows where more than 80% of the people survived. Out[201]: sex Class Survived Died Total_Number percentage_survived 0 Children First 6 0 6 100.0 1 Children Second 24 0 24 100.0 2 Children Third 27 52 79 34.0 3 Men First 57 118 175 33.0 4 Men Second 14 154 168 8.0 5 Men Third 75 387 462 16.0 7 Women First 140 4 144 97.0 8 Women Second 80 13 93 86.0 9 Women Third 76 89 165 46.0 Out[202]: sex Class Survived Died percentage_survived 0 Children First 6 0 100.0 1 Children Second 24 0 100.0 2 Children Third 27 52 34.0 3 Men First 57 118 33.0 4 Men Second 14 154 8.0 5 Men Third 75 387 16.0 7 Women First 140 4 97.0 8 Women Second 80 13 86.0 9 Women Third 76 89 46.0 data_frame[ "percentage_survived" ] = round ((data_frame[ "Survived" ] / data_frame[ data_frame data_frame.drop(labels = "Total_Number" , axis = 1 , inplace = True ) data_frame
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 16/18 In [203]: 8. Then only show the rows where less than 40% of the people survived. In [204]: 9. Calculate the total number of people that survived and died for each class, then report the percentages. (Hint: Use a grouped calculation.) In [205]: Out[203]: sex Class Survived Died percentage_survived 0 Children First 6 0 100.0 1 Children Second 24 0 100.0 7 Women First 140 4 97.0 8 Women Second 80 13 86.0 Out[204]: sex Class Survived Died percentage_survived 2 Children Third 27 52 34.0 3 Men First 57 118 33.0 4 Men Second 14 154 8.0 5 Men Third 75 387 16.0 Out[205]: sex Class Survived Died 0 Children First 6 0 1 Children Second 24 0 2 Children Third 27 52 3 Men First 57 118 4 Men Second 14 154 5 Men Third 75 387 7 Women First 140 4 8 Women Second 80 13 9 Women Third 76 89 data_frame[data_frame[ "percentage_survived" ] > 80 ] data_frame[data_frame[ "percentage_survived" ] < 40 ] data_frame.drop(columns = "percentage_survived" , axis = 1 , inplace = True ) data_frame
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 17/18 In [206]: In [207]: In [208]: 10. Save your table in CSV format (as e.g. titanic_data.csv) with the first line as headers for the columns. Out[206]: sex Class Survived Died total 0 Children First 6 0 6 1 Children Second 24 0 24 2 Children Third 27 52 79 3 Men First 57 118 175 4 Men Second 14 154 168 5 Men Third 75 387 462 7 Women First 140 4 144 8 Women Second 80 13 93 9 Women Third 76 89 165 Class First 62.0 Second 41.0 Third 25.0 dtype: float64 Class First 38.0 Second 59.0 Third 75.0 dtype: float64 Out[208]: % survived % died Class First 62.0 38.0 Second 41.0 59.0 Third 25.0 75.0 data_frame[ "total" ] = df[ "Survived" ] + df[ "Died" ] data_frame percentage_survived = round ((data_frame.groupby( "Class" ).Survived.sum() / data_f percentage_died = round ((data_frame.groupby( "Class" ).Died.sum() / data_frame.gro print (percentage_survived) print (percentage_died) results = pd.concat([percentage_survived, percentage_died], axis = 1 , keys = [ "% results
3/9/23, 11:32 AM RahulVarma_LAB5 (2) - Jupyter Notebook localhost:8889/notebooks/Downloads/RahulVarma_LAB5 (2).ipynb# 18/18 In [209]: 11. Duplicate the CSV file on your computer since you will be editing the copied version (e.g. titanic_data2.csv). Open the new CSV file in a text editor. Note the way the data is organized. Now, in the text editor, add new lines including the data for the crew that was removed earlier. (Help: the percentage of male crew and female crew that survived was 21.69% and 86.96%.) In [210]: 12. Now read that updated CSV file into a new data frame called titanic2, and display the data. In [212]: Out[212]: Class % survived % died 0 First 62.0 38.0 1 Second 41.0 59.0 2 Third 25.0 75.0 results.to_csv( "titanic_data.csv" ) results.to_csv( "titanic_data2.csv" ) titanic2 = pd.read_csv( "titanic_data2.csv" ) titanic2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help