project3

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

C8

Subject

Computer Science

Date

Jul 2, 2024

Type

pdf

Pages

12

Uploaded by CoachSheepPerson165

Report
Question 1.2.2 Choose two different words in the dataset with a magnitude (absolute value) of correlation higher than 0.2 and plot a scatter plot with a line of best fit for them. Please do not pick “outer” and “space” or “san” and “francisco”. The code to plot the scatter plot and line of best fit is given for you, you just need to calculate the correct values to r , slope and intercept . Hint 1: It’s easier to think of words with a positive correlation, i.e. words that are often mentioned together. Try to think of common phrases or idioms. Hint 2: Refer to Section 15.2 of the textbook for the formulas. For additional past examples of regression, see Homework 9. In [62]: word_x = 'blue' word_y = 'moon' # These arrays should make your code cleaner! arr_x = movies . column(word_x) arr_y = movies . column(word_y) x_su = standard_units(arr_x) y_su = standard_units(arr_y) r = np . mean(x_su * y_su) slope = r * np . std(arr_y) / np . std(arr_x) intercept = np . mean(arr_y) - slope * np . mean(arr_x) # DON'T CHANGE THESE LINES OF CODE movies . scatter(word_x, word_y) max_x = max (movies . column(word_x)) plots . title( f"Correlation: { r } , magnitude greater than .2: { abs (r) >= 0.2 } " ) plots . plot([ 0 , max_x * 1.3 ], [intercept, intercept + slope * (max_x *1.3 )], color = 'gold' ); 1
2
Question 1.3.1 Draw a horizontal bar chart with two bars that show the proportion of Comedy movies in each dataset ( train_movies and test_movies ). The two bars should be labeled “Training” and “Test”. Complete the function comedy_proportion first; it should help you create the bar chart. Hint : Refer to Section 7.1 of the textbook if you need a refresher on bar charts. In [66]: def comedy_proportion (table): # Return the proportion of movies in a table that have the comedy genre. movie_len = table . num_rows movie_group = table . group( 'Genre' ) . where( 'Genre' , are . equal_to( 'comedy' )) . column( 'count' ) . i return movie_group / movie_len # The staff solution took multiple lines. Start by creating a table. # If you get stuck, think about what sort of table you need for barh to work comedy_proportion_t = comedy_proportion(train_movies) comedy_proportion_test = comedy_proportion(test_movies) comedy_tbl = Table() . with_columns( 'Categories' , make_array( 'Training' , 'Test' ), 'Proportions' , comedy_tbl . barh( 'Categories' ) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4
Question 3.1.7 In two sentences or less, describe how you selected your features. I selected these features because of the slope and looking at other around the middle of the slope which will help satisfy that these words appear once in at least each movie. 5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 3.3.3 Do you see a pattern in the types of movies your classifier misclassifies? In two sentences or less, describe any patterns you see in the results or any other interesting findings from the table above. If you need some help, try looking up the movies that your classifier got wrong on Wikipedia. Some patterns I see from the data are that comedy movies cover things that would also be in a horror or thriller movie. 7
8
Question 4.2 Do you see a pattern in the mistakes your new classifier makes? How good an accuracy were you able to get with your limited classifier? Did you notice an improvement from your first classifier to the second one? Describe in two sentences or less. Hint: You may not be able to see a pattern. I did not really notice a pattern until I double checked and I saw that with a new classifier that the new proportion is higher than the first time I did it. 9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10
Question 4.3 Given the constraint of five words, how did you select those five? Describe in two sentences or less. I choose my words by zooming into the 11
12
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help