project3

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C8

Subject

Computer Science

Date

Jul 2, 2024

Type

pdf

Pages

12

Uploaded by CoachSheepPerson165

Question 1.2.2 Choose two different words in the dataset with a magnitude (absolute value) of correlation higher than 0.2 and plot a scatter plot with a line of best fit for them. Please do not pick “outer” and “space” or “san” and “francisco”. The code to plot the scatter plot and line of best fit is given for you, you just need to calculate the correct values to r , slope and intercept . Hint 1: It’s easier to think of words with a positive correlation, i.e. words that are often mentioned together. Try to think of common phrases or idioms. Hint 2: Refer to Section 15.2 of the textbook for the formulas. For additional past examples of regression, see Homework 9. In [62]: word_x = 'blue' word_y = 'moon' # These arrays should make your code cleaner! arr_x = movies . column(word_x) arr_y = movies . column(word_y) x_su = standard_units(arr_x) y_su = standard_units(arr_y) r = np . mean(x_su * y_su) slope = r * np . std(arr_y) / np . std(arr_x) intercept = np . mean(arr_y) - slope * np . mean(arr_x) # DON'T CHANGE THESE LINES OF CODE movies . scatter(word_x, word_y) max_x = max (movies . column(word_x)) plots . title( f"Correlation: { r } , magnitude greater than .2: { abs (r) >= 0.2 } " ) plots . plot([ 0 , max_x * 1.3 ], [intercept, intercept + slope * (max_x *1.3 )], color = 'gold' ); 1
2
Question 1.3.1 Draw a horizontal bar chart with two bars that show the proportion of Comedy movies in each dataset ( train_movies and test_movies ). The two bars should be labeled “Training” and “Test”. Complete the function comedy_proportion first; it should help you create the bar chart. Hint : Refer to Section 7.1 of the textbook if you need a refresher on bar charts. In [66]: def comedy_proportion (table): # Return the proportion of movies in a table that have the comedy genre. movie_len = table . num_rows movie_group = table . group( 'Genre' ) . where( 'Genre' , are . equal_to( 'comedy' )) . column( 'count' ) . i return movie_group / movie_len # The staff solution took multiple lines. Start by creating a table. # If you get stuck, think about what sort of table you need for barh to work comedy_proportion_t = comedy_proportion(train_movies) comedy_proportion_test = comedy_proportion(test_movies) comedy_tbl = Table() . with_columns( 'Categories' , make_array( 'Training' , 'Test' ), 'Proportions' , comedy_tbl . barh( 'Categories' ) 3
4
Question 3.1.7 In two sentences or less, describe how you selected your features. I selected these features because of the slope and looking at other around the middle of the slope which will help satisfy that these words appear once in at least each movie. 5
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help