main mt 1

pdf

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6040X

Subject

Computer Science

Date

Dec 6, 2023

Type

pdf

Pages

Uploaded by ChefStraw5566

11/28/23, 8:41 PM main file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html 1/28 Midterm 1, Spring 2021: Music recommender Version 1.0 This problem builds on your knowledge of basic Python data structures and string processing. It has seven (7) exercises, numbered 0 to 6. There are eleven (11) available points. However, to earn 100%, the threshold is just 10 points. (Therefore, once you hit 10 points, you can stop. There is no extra credit for exceeding this threshold.) Each exercise builds logically on the previous one, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and try the next one. However, if you see a code cell introduced by the phrase, "Sample result(s) for ...", please run it. Some demo cells in the notebook may depend on these precomputed results. The point values of individual exercises are as follows: Exercise 0: 1 point Exercise 1: 1 point Exercise 2: 2 points Exercise 3: 2 points Exercise 4: 2 points Exercise 5: 1 point Exercise 6: 2 points Pro-tips. Many or all test cells use randomly generated inputs. Therefore, try your best to write solutions that do not assume too much. To help you debug, when a test cell does fail, it will often tell you exactly what inputs it was using and what output it expected, compared to yours. If you need a complex SQL query, remember that you can define one using a triple-quoted (multiline) string (https://docs.python.org/3.7/tutorial/introduction.html#strings). If your program behavior seem strange, try resetting the kernel and rerunning everything. If you mess up this notebook or just want to start from scratch, save copies of all your partial responses and use Actions Reset Assignment to get a fresh, original copy of this notebook. (Resetting will wipe out any answers you've written so far, so be sure to stash those somewhere safe if you intend to keep or reuse them!) If you generate excessive output that causes the notebook to load slowly or not at all (e.g., from an ill- placed print statement), use Actions Clear Notebook Output to get a clean copy. The clean copy will retain your code but remove any generated output. However , it will also rename the notebook to clean.xxx.ipynb . Since the autograder expects a notebook file with the original name, you'll need to rename the clean notebook accordingly. Be forewarned: we won't manually grade "cleaned" notebooks if you forget! Good luck! → →

11/28/23, 8:41 PM main file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html 2/28 Background and overview: Spotify playlist data Suppose you are running a musical service and would like to help your users discover artists based on artists they already like. In this problem, you'll prototype a simple recommender by mining a dataset of user-generated playlists from Spotify, circa 2015. Your overall workﬂow will be as follows: 1. Manually inspect the data and how it is stored 2. Gather some preliminary statistics to get a "feel" for the data 3. Clean the data a bit, namely by "normalizing" artist names 4. Use ideas from Notebook 2 to analyze artist co-occurrences in playlists With that in mind, let's start! Modules and data. Run the following two code cells, which load some modules this notebook needs as well as the data itself. The data for this problem are several hundred megabytes in size and so may take a minute to load. In [1]: ### BEGIN HIDDEN TESTS % load_ext autoreload % autoreload 2 ### END HIDDEN TESTS from pprint import pprint from testing_tools import load_pickle print("Ready!") Opening pickle from './resource/asnlib/publicdata/user_ids.pickle' ... Opening pickle from './resource/asnlib/publicdata/artist_names.pickle' ... Opening pickle from './resource/asnlib/publicdata/playlist_names.pickl e' ... Opening pickle from './resource/asnlib/publicdata/track_titles.pickle' ... Opening pickle from './resource/asnlib/publicdata/artist_translation_t able.pickle' ... Ready!

11/28/23, 8:41 PM main file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html 3/28 In [2]: !date spotify_users = load_pickle('user_playlists.pickle') print("==> Finished loading the data.") !date Familiarize yourself with these data The variable spotify_users holds the data you'll need. It consists of a list of about 15,000 or so users: In [3]: print(f"`spotify_users`: type == {type(spotify_users)}, number of elem ents == {len(spotify_users):,}.") Each element of this list corresponds to a distinct user. Have a look at the user at position 2526 of this list: In [4]: pprint(spotify_users[2526]) Fri 26 Feb 2021 12:27:54 AM PST Opening pickle from './resource/asnlib/publicdata/user_playlists.pickl e' ... ==> Finished loading the data. Fri 26 Feb 2021 12:28:03 AM PST `spotify_users`: type == <class 'list'>, number of elements == 15,918. {'playlists': [{'name': 'Favoritas de la radio', 'tracks': [{'artist': 'Vico C', 'title': 'Desahogo'}, {'artist': 'Vico C', 'title': 'El Bueno, El Malo Y El Feo (The Good, ' 'The Bad & The Ugly) - Feat. Tego ' 'Calderón And Eddie Dee'}, {'artist': 'Vico C', 'title': 'Quieren'}, {'artist': 'Vico C', 'title': "Vamonos Po' Encima"}]}, {'name': 'Starred', 'tracks': [{'artist': 'Vico C', 'title': 'El'}, {'artist': 'Strike 3', 'title': 'Enamorado De Ti'}, {'artist': 'Strike 3', 'title': 'Es Por T i'}]}, {'name': 'Two', 'tracks': [{'artist': 'Walk the Moon', 'title': 'Quesa dilla'}, {'artist': 'Two Door Cinema Club', 'title': 'Sleep Alone'}, {'artist': 'Two Door Cinema Club', 'title': 'Something Good Can Work'}, {'artist': 'Two Door Cinema Club', 'title': 'Sun'}]}], 'user_id': '22c5af0c50b557327894d0c9ea6aa5fa'}

Your preview ends here