main mt 1
pdf
keyboard_arrow_up
School
Georgia Institute Of Technology *
*We aren’t endorsed by this school
Course
6040X
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
28
Uploaded by ChefStraw5566
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
1/28
Midterm 1, Spring 2021: Music recommender
Version 1.0
This problem builds on your knowledge of basic Python data structures and string processing. It has
seven (7)
exercises, numbered 0 to 6. There are eleven (11) available points. However, to earn 100%, the threshold is just
10 points.
(Therefore, once you hit 10 points, you can stop. There is no extra credit for exceeding this threshold.)
Each exercise builds logically on the previous one, but you may solve them in any order. That is, if you can't
solve an exercise, you can still move on and try the next one.
However, if you see a code cell introduced by
the phrase, "Sample result(s) for ...", please run it.
Some demo cells in the notebook may depend on these
precomputed results.
The point values of individual exercises are as follows:
Exercise 0: 1 point
Exercise 1: 1 point
Exercise 2: 2 points
Exercise 3: 2 points
Exercise 4: 2 points
Exercise 5: 1 point
Exercise 6: 2 points
Pro-tips.
Many or all test cells use
randomly generated inputs.
Therefore, try your best to write solutions that
do not assume too much. To help you debug, when a test cell does fail, it will often tell you exactly
what inputs it was using and what output it expected, compared to yours.
If you need a complex SQL query, remember that you can define one using a
triple-quoted (multiline)
string
(https://docs.python.org/3.7/tutorial/introduction.html#strings).
If your program behavior seem strange, try resetting the kernel and rerunning everything.
If you mess up this notebook or just want to start from scratch, save copies of all your partial
responses and use
Actions
Reset Assignment
to get a fresh, original copy of this notebook.
(Resetting will wipe out any answers you've written so far, so be sure to stash those somewhere safe if
you intend to keep or reuse them!)
If you generate excessive output that causes the notebook to load slowly or not at all (e.g., from an ill-
placed
print
statement), use
Actions
Clear Notebook Output
to get a clean copy. The clean
copy will retain your code but remove any generated output.
However
, it will also
rename
the
notebook to
clean.xxx.ipynb
. Since the autograder expects a notebook file with the original name,
you'll need to rename the clean notebook accordingly. Be forewarned:
we won't manually grade
"cleaned" notebooks if you forget!
Good luck!
→
→
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
2/28
Background and overview: Spotify playlist data
Suppose you are running a musical service and would like to help your users discover artists based on artists
they already like. In this problem, you'll prototype a simple recommender by mining a dataset of user-generated
playlists from Spotify, circa 2015.
Your overall workflow
will be as follows:
1. Manually inspect the data and how it is stored
2. Gather some preliminary statistics to get a "feel" for the data
3. Clean the data a bit, namely by "normalizing" artist names
4. Use ideas from Notebook 2 to analyze artist co-occurrences in playlists
With that in mind, let's start!
Modules and data.
Run the following two code cells, which load some modules this notebook needs as well as
the data itself.
The data for this problem are several hundred megabytes in size and so may take a minute to
load.
In [1]:
### BEGIN HIDDEN TESTS
%
load_ext
autoreload
%
autoreload
2
### END HIDDEN TESTS
from
pprint
import
pprint
from
testing_tools
import
load_pickle
print("Ready!")
Opening pickle from './resource/asnlib/publicdata/user_ids.pickle' ...
Opening pickle from './resource/asnlib/publicdata/artist_names.pickle'
...
Opening pickle from './resource/asnlib/publicdata/playlist_names.pickl
e' ...
Opening pickle from './resource/asnlib/publicdata/track_titles.pickle'
...
Opening pickle from './resource/asnlib/publicdata/artist_translation_t
able.pickle' ...
Ready!
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
3/28
In [2]:
!date
spotify_users = load_pickle('user_playlists.pickle')
print("==> Finished loading the data.")
!date
Familiarize yourself with these data
The variable
spotify_users
holds the data you'll need. It consists of a list of about 15,000 or so users:
In [3]:
print(f"`spotify_users`: type == {type(spotify_users)}, number of elem
ents == {len(spotify_users):,}.")
Each element of this list corresponds to a distinct user. Have a look at the user at position
2526
of this list:
In [4]:
pprint(spotify_users[2526])
Fri 26 Feb 2021 12:27:54 AM PST
Opening pickle from './resource/asnlib/publicdata/user_playlists.pickl
e' ...
==> Finished loading the data.
Fri 26 Feb 2021 12:28:03 AM PST
`spotify_users`: type == <class 'list'>, number of elements == 15,918.
{'playlists': [{'name': 'Favoritas de la radio',
'tracks': [{'artist': 'Vico C', 'title': 'Desahogo'},
{'artist': 'Vico C',
'title': 'El Bueno, El Malo Y El Feo (The
Good, '
'The Bad & The Ugly) - Feat. Tego
'
'Calderón And Eddie Dee'},
{'artist': 'Vico C', 'title': 'Quieren'},
{'artist': 'Vico C',
'title': "Vamonos Po' Encima"}]},
{'name': 'Starred',
'tracks': [{'artist': 'Vico C', 'title': 'El'},
{'artist': 'Strike 3', 'title': 'Enamorado
De Ti'},
{'artist': 'Strike 3', 'title': 'Es Por T
i'}]},
{'name': 'Two',
'tracks': [{'artist': 'Walk the Moon', 'title': 'Quesa
dilla'},
{'artist': 'Two Door Cinema Club',
'title': 'Sleep Alone'},
{'artist': 'Two Door Cinema Club',
'title': 'Something Good Can Work'},
{'artist': 'Two Door Cinema Club',
'title': 'Sun'}]}],
'user_id': '22c5af0c50b557327894d0c9ea6aa5fa'}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
4/28
Every user has a unique user ID (a hex string) as well as a list of
playlists
that they have created. Each playlist is
named and consists of a list of songs or
tracks.
Each track has a
title
and is performed by an
artist
(musician or
group).
Take a minute to understand how this data is stored: note what data structures are being used (e.g., dictionaries
versus lists), for what purpose, and how they are nested.
If you understand the storage scheme, you should be able to verify the following facts about the above user:
1. The user's ID is
'22c5af0c50b557327894d0c9ea6aa5fa'
.
2. The user has three playlists, one named
'Favoritas de la radio'
, another named
'Starred'
,
and the last named
'Two'
.
3. The
'Favoritas de la radio'
playlist has four songs, all of which were performed by the same
artist,
'Vico C'
.
4. The
'Starred'
playlist has one song also by
'Vico C'
, but includes two songs by a different artist,
'Strike 3'
.
5. The
'Two'
playlist has four songs: one by
'Walk the Moon'
and three by
'Two Door Cinema
Club'
.
Other users may have only one playlist with just one song, or many playlists with many songs by many artists.
Part A: Preliminary analysis
To make sure you know how to navigate these data, let's start with two basic exercises.
Exercise 0:
count_playlists
(1 point)
Given a user playlist dataset,
users
, complete the function,
count_playlists(users)
so that it returns the
total number of playlists.
For instance, suppose the user dataset consists of the following two users:
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
5/28
In [5]:
ex0_demo_users = [{'user_id': '0c8435917bd098dce8df8f62b736c0ed',
'playlists': [{'name': 'Starred',
'tracks': [{'artist': 'André Rieu',
'title': 'Once Upon A Ti
me In The West - Main Title Theme'},
{'artist': 'André Rieu',
'title': 'The Second Wal
tz - From Eyes Wide Shut'}]}]},
{'user_id': 'fc799d71e8d2004377d6d8e861479559',
'playlists': [{'name': 'Liked from Radio',
'tracks': [{'artist': 'The Police',
'title': 'Every Breath You Take'},
{'artist': 'Lucio Battist
i', 'title': 'Per Una Lira'},
{'artist': 'Alicia Keys f
t. Jay-Z', 'title': 'Empire State of Mind'}]},
{'name': 'Starred', 'tracks': [{'arti
st': 'U2', 'title': 'With Or Without You'}]}]}]
Then
count_playlists(ex0_demo_users)
would return 1+2=3, because the first user has one playlist
(named
'Starred'
) and the second has two playlists (one named
'Liked from Radio'
and the other also
named
'Starred'
).
In [6]:
def
count_playlists(users):
### BEGIN SOLUTION
from
random
import
randint
return
count_playlists__soln1(users)
def
count_playlists__soln0(users):
num_playlists = 0
for
user
in
users:
num_playlists += len(user['playlists'])
return
num_playlists
def
count_playlists__soln1(users):
return
sum(len(user['playlists'])
for
user
in
users)
### END SOLUTION
In [7]:
# Demo cell
count_playlists(ex0_demo_users)
# should return 3
Out[7]:
3
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
6/28
In [8]:
# Test cell 0: `mt1_ex0_count_playlists` (1 point)
### BEGIN HIDDEN TESTS
global_overwrite =
False
def
tracks_iterator(users, offsets=
False
):
for
i, u
in
enumerate(users):
for
j, p
in
enumerate(u['playlists']):
for
k, t
in
enumerate(p['tracks']):
if
offsets:
yield
u, p, t, i, j, k
else
:
yield
u, p, t
def
randomly_error(threshold=0.05):
from
random
import
random
return
random() < threshold
def
mt1_ex0__gen_soln():
print(f"The Spotify dataset consists of {count_playlists(spotify_u
sers):,} playlists in total.")
#!date
#mt1_ex0__gen_soln()
#!date
### END HIDDEN TESTS
#from testing_tools import mt1_ex0__check
#print("Testing...")
assert
count_playlists(spotify_users) == 231_844
from
testing_tools
import
mt1_ex0__check
for
trial
in
range(250):
mt1_ex0__check(count_playlists)
print("
\n
(Passed!)")
(Passed!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
7/28
Exercise 1:
count_artist_strings
(1 point)
For your next task, suppose we wish to count how many
distinct case-insensitive
artist strings are in the dataset
(across all users and playlists). By "distinct case-insensitive," we mean two strings
a
and
b
would be "equal" if,
after conversion to lowercase
, they are equal in the Python sense of
a == b
. For example, we would treat
'Jay-Z'
and
'JAY-Z'
as equal, but we would regard
'Jay-Z'
(with a hyphen) and
'Jay Z'
(without a
hyphen) as
unequal
.
In a subsequent exercise, we will try to normalize names in a different way.
Your task.
Given a user playlist dataset,
users
, complete the function
count_artist_strings(users)
below so that it counts the number of distinct case-insensitive artist strings contained in
users
.
For example, recall the demo dataset from Exercise 0:
In [9]:
pprint(ex0_demo_users)
Looking across all users and playlists, this dataset has five (5)
distinct
artist strings:
'André Rieu'
,
'The
Police'
,
'Lucio Battisti'
,
'Alicia Keys ft. Jay-Z'
, and
'U2'
. Observe that
'André Rieu'
appears twice, but for our tally, we would count it just once. And if
'the POLICE'
had been in the data, then it
would be consider the same as
'The Police'
.
Note:
Your function must
not
modify the input dataset. Even if your code returns a correct result,
if it changes the input data, the autograder will mark it as incorrect.
[{'playlists': [{'name': 'Starred',
'tracks': [{'artist': 'André Rieu',
'title': 'Once Upon A Time In The West -
Main '
'Title Theme'},
{'artist': 'André Rieu',
'title': 'The Second Waltz - From Eyes Wi
de '
'Shut'}]}],
'user_id': '0c8435917bd098dce8df8f62b736c0ed'},
{'playlists': [{'name': 'Liked from Radio',
'tracks': [{'artist': 'The Police',
'title': 'Every Breath You Take'},
{'artist': 'Lucio Battisti',
'title': 'Per Una Lira'},
{'artist': 'Alicia Keys ft. Jay-Z',
'title': 'Empire State of Mind'}]},
{'name': 'Starred',
'tracks': [{'artist': 'U2', 'title': 'With Or Without
You'}]}],
'user_id': 'fc799d71e8d2004377d6d8e861479559'}]
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
8/28
In [10]:
def
count_artist_strings(users):
### BEGIN SOLUTION
artist_strings = set()
for
user
in
users:
for
playlist
in
user['playlists']:
for
track
in
playlist['tracks']:
artist_strings |= {track['artist'].lower()}
return
len(artist_strings)
### END SOLUTION
In [11]:
# Demo: Should return '5'
count_artist_strings(ex0_demo_users)
In [12]:
# Test cell 0: `mt1_ex1_count_artist_strings` (1 point)
### BEGIN HIDDEN TESTS
def
mt1_ex1__gen_soln():
print(f"The Spotify dataset contains a total of {count_artist_stri
ngs(spotify_users):,} "
"distinct case-insensitive artist strings.")
#!date
#mt1_ex1__gen_soln()
#!date
### END HIDDEN TESTS
#assert count_artist_strings(spotify_users) == 282_555
from
testing_tools
import
mt1_ex1__check
print("Testing...")
for
trial
in
range(250):
mt1_ex1__check(count_artist_strings)
print("
\n
(Passed!)")
Answer for this dataset.
If your function works correctly, running it on the full Spotify dataset would result in
282,555 distinct case-insensitive artist strings. That's a lot of artists!
(We have omitted this check to reduce the
running time of the notebook.)
Out[11]:
5
Testing...
(Passed!)
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
9/28
Part B: Data cleaning
Unfortunately, artist names are encoded in a messy fashion. Here are some examples:
The artist "Jay-Z" is written as
"Jay-Z"
and
"JAY Z"
, with several other variations having different
capitalization.
Worse, there is no consistent standard for encoding multiple artists who worked together on a song.
For example, here is how several of Jay-Z's collaborations appear:
'Alicia Keys ft. Jay-Z'
(
'ft.'
used as an artist-separator)
'A-Trak x Kanye x Jay-Z'
(
' x '
used as an artist-separator)
'JAY Z Featuring Beyoncé'
(variation on "Jay-Z" and yet another variation on
"featuring" to separate artists)
'Jay-Z Featuring Beyoncé Knowles'
(Beyoncé's last name included in this
variation)
'Jay-Z/Kanye West/Lil Wayne/T.I.'
(... you get the idea ...)
'Jay Z (Dr. Dre, Rakim, & Truth Hurts)'
'Young Jeezy Ft. Jay-Z & Fat Joe'
'Lil Wayne Drake Jay-Z And Gif Majorz'
(spaces used ambiguously: there
are four artists in this example!)
'Timbaland & Magoo feat Jay-Z'
'OutKast/Jay-Z/Killer Mike'
'Jay-Z Ft.Rihanna And Kanye West'
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina
Aguilera, & M.I.A.'
("Benatar" is misspelled as
'Benetar'
)
'jay z with the roots.
s'
(yes, excess spaces between
.
and
trailing
s
are real)
It is difficult to design a robust algorithm to extract individual artist names. Instead, let's use the following
approximate algorithm, given an artist-string as input.
1.
Lowercase:
First, convert all characters to lowercase.
2.
Space-equivalents:
Next, convert any hyphen (
'-'
), period (
'.'
), question mark (
'?'
), exclamation
point (
'!'
), and underscores (
'_'
) into a space character.
3.
Separators:
Then split the string, treating the following patterns as artist-name separators:
A. All of the following words, but
only
when there are spaces
both
before
and
after:
'and'
,
'with'
,
'ft'
,
'feat'
,
'featuring'
,
'vs'
, and
'x'
.
B. All of the following symbols:
'/'
(forward slash),
'&'
(ampersand), comma (
','
), semicolon
(
';'
), and each enclosing parenthesis or bracket (
'('
,
')'
,
'['
,
']'
,
'{'
,
'}'
)
4.
Whitespace compression:
Lastly, for any artist name-string following the above separation steps,
strip out any preceding and trailing whitespace and collapse multiple consecutive whitespace
characters into a single space.
When applying this algorithm, we'll perform steps 1-4 in the exact same sequence as shown above.
⟹
⟹
⟹
⟹
⟹
⟹
⟹
⟹
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
10/28
Exercise 2:
extract_artists
(2 points)
Complete the function
extract_artists(artist)
so that it applies the artist name-separation algorithm
described above, returning a
Python set
consisting of the separate artist names. For example:
'Alicia Keys ft. Jay-Z' ==> {'jay z', 'alicia keys'}
'A-Trak x Kanye x Jay-Z' ==> {'a trak', 'jay z', 'kanye'}
'JAY Z Featuring Beyoncé' ==> {'jay z', 'beyoncé'}
'Jay-Z Featuring Beyoncé Knowles' ==> {'beyoncé knowles', 'jay z'}
'Jay-Z/Kanye West/Lil Wayne/T.I.' ==> {'lil wayne', 'jay z', 't i', 'kanye w
est'}
'Young Jeezy Ft. Jay-Z & Fat Joe' ==> {'fat joe', 'jay z', 'young jeezy'}
'Lil Wayne Drake Jay-Z And Gif Majorz' ==> {'gif majorz', 'lil wayne drake j
ay z'}
'Timbaland & Magoo feat Jay-Z' ==> {'jay z', 'timbaland', 'magoo'}
'OutKast/Jay-Z/Killer Mike' ==> {'outkast', 'jay z', 'killer mike'}
'Jay-Z Ft.Rihanna And Kanye West' ==> {'rihanna', 'jay z', 'kanye west'}
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Aguilera,
& M.I.A.' ==> {'m i a', 'beyonce', 'christina aguilera', 'pat benetar', 'bri
tney spears', '3oh 3'}
'jay z with the roots.
s' ==> {'jay z', 'the roots s'}
Note 0:
Pay close attention to the target output.
Note 1:
This procedure is imperfect. For example, observe that
'Lil Wayne Drake Jay-Z
And Gif Majorz'
is, in reality, four artists (Lil' Wayne, Drake, Jay-Z, and Gif Majorz), but the
algorithm cannot disambiguate the intention of spaces. Also, in the last example, even though in
reality
'the roots.
s'
should resolve to
'the roots'
, it instead becomes
'the
roots s'
. And a band like
'Tom Petty and the Heartbreakers'
will be erroneously
split into two artists (
'Tom Petty'
and
'the Heartbreakers'
). But it is what it is.
In [13]:
def
extract_artists(artist):
### BEGIN SOLUTION
import
re
artist = artist.lower()
# convert to lowercase
for
space_equivalent
in
'-.?!_':
artist = artist.replace(space_equivalent, ' ')
for
separator_word
in
['featuring', 'feat', 'ft', 'and', 'with',
'vs', 'x']:
artist = artist.replace(f'
{separator_word}
', ' & ')
for
and_equivalent
in
'/&,;()[]
{}
':
artist = artist.replace(and_equivalent, ' & ')
artists = artist.split('&')
artists = set(re.sub('\s+', ' ', a).strip()
for
a
in
artists)
return
{a
for
a
in
artists
if
a}
# prune empty strings
### END SOLUTION
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
11/28
In [14]:
# Demo
ex0_inputs = ['Alicia Keys ft. Jay-Z',
'A-Trak x Kanye x Jay-Z',
'JAY Z Featuring Beyoncé',
'Jay-Z Featuring Beyoncé Knowles',
'Jay-Z/Kanye West/Lil Wayne/T.I.',
'Young Jeezy Ft. Jay-Z & Fat Joe',
'Lil Wayne Drake Jay-Z And Gif Majorz',
'Timbaland & Magoo feat Jay-Z',
'OutKast/Jay-Z/Killer Mike',
'Jay-Z Ft.Rihanna And Kanye West',
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears,
Christina Aguilera, & M.I.A.',
'jay z with the roots.
s']
for
a
in
ex0_inputs:
print(f"'
{a}
' ==> {extract_artists(a)}")
'Alicia Keys ft. Jay-Z' ==> {'alicia keys', 'jay z'}
'A-Trak x Kanye x Jay-Z' ==> {'kanye', 'a trak', 'jay z'}
'JAY Z Featuring Beyoncé' ==> {'jay z', 'beyoncé'}
'Jay-Z Featuring Beyoncé Knowles' ==> {'beyoncé knowles', 'jay z'}
'Jay-Z/Kanye West/Lil Wayne/T.I.' ==> {'kanye west', 't i', 'lil wayn
e', 'jay z'}
'Young Jeezy Ft. Jay-Z & Fat Joe' ==> {'fat joe', 'young jeezy', 'jay
z'}
'Lil Wayne Drake Jay-Z And Gif Majorz' ==> {'gif majorz', 'lil wayne d
rake jay z'}
'Timbaland & Magoo feat Jay-Z' ==> {'timbaland', 'magoo', 'jay z'}
'OutKast/Jay-Z/Killer Mike' ==> {'killer mike', 'outkast', 'jay z'}
'Jay-Z Ft.Rihanna And Kanye West' ==> {'rihanna', 'jay z', 'kanye wes
t'}
'Pat Benetar vs. Beyonce vs. 3OH!3 Feat. Britney Spears, Christina Agu
ilera, & M.I.A.' ==> {'beyonce', '3oh 3', 'pat benetar', 'christina ag
uilera', 'm i a', 'britney spears'}
'jay z with the roots.
s' ==> {'the roots s', 'jay z'}
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
12/28
In [15]:
# Test cell: `mt1_ex2_extract_artists` (2 points)
### BEGIN HIDDEN TESTS
def
mt1_ex2__gen_soln(fn_base="artist_translation_table", fn_ext="pick
le", overwrite=
False
):
from
testing_tools
import
file_exists, load_pickle, save_pickle
fn = f"
{fn_base}
.
{fn_ext}
"
if
file_exists(fn)
and
not
overwrite:
print(f"'
{fn}
' exists; skipping...")
else
:
# not file_exists(fn) or overwrite
print(f"'
{fn}
' does not exist or needs to be overwritten; gene
rating...")
artist_translation_table = {}
for
_, _, t
in
tracks_iterator(spotify_users):
artist_translation_table[t['artist']] = extract_artists(t
['artist'])
save_pickle(artist_translation_table, fn)
!date
mt1_ex2__gen_soln(overwrite=
False
or
global_overwrite)
!date
### END HIDDEN TESTS
from
testing_tools
import
mt1_ex2__check
print("Testing...")
for
trial
in
range(250):
mt1_ex2__check(extract_artists)
extract_artists__passed =
True
print("
\n
(Passed!)")
Sample results for Exercise 2:
artist_translation_table
If you had a working solution to Exercise 2, then in principle you could use it to normalize and separate the artist
names. We have precomputed these translations for you, for every artist name that appears in the data; run the
cell below to load a name-translation table, stored in the variable,
artist_translation_table
.
Read and run this cell even if you skipped or otherwise did not complete Exercise 2.
Fri 26 Feb 2021 12:28:04 AM PST
'artist_translation_table.pickle' exists; skipping...
Fri 26 Feb 2021 12:28:04 AM PST
Testing...
(Passed!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
13/28
In [16]:
from
testing_tools
import
mt1_artist_translation_table
as
artist_trans
lation_table
print("
\n
=== Examples ===")
for
q
in
ex0_inputs[:5]:
print(f"artist_translation_table['
{q}
']
\\\n
==
{artist_translati
on_table[q]}
")
Part C: Gathering playlists
The data structure has a complicated nesting. Let's "flatten" it by collecting just the playlists. And for each
playlist, let's keep only the artist names.
For example, recall the demo dataset from before:
In [17]:
pprint(ex0_demo_users)
=== Examples ===
artist_translation_table['Alicia Keys ft. Jay-Z'] \
== {'alicia keys', 'jay z'}
artist_translation_table['A-Trak x Kanye x Jay-Z'] \
== {'kanye', 'a trak', 'jay z'}
artist_translation_table['JAY Z Featuring Beyoncé'] \
== {'beyoncé', 'jay z'}
artist_translation_table['Jay-Z Featuring Beyoncé Knowles'] \
== {'beyoncé knowles', 'jay z'}
artist_translation_table['Jay-Z/Kanye West/Lil Wayne/T.I.'] \
== {'jay z', 't i', 'lil wayne', 'kanye west'}
[{'playlists': [{'name': 'Starred',
'tracks': [{'artist': 'André Rieu',
'title': 'Once Upon A Time In The West -
Main '
'Title Theme'},
{'artist': 'André Rieu',
'title': 'The Second Waltz - From Eyes Wi
de '
'Shut'}]}],
'user_id': '0c8435917bd098dce8df8f62b736c0ed'},
{'playlists': [{'name': 'Liked from Radio',
'tracks': [{'artist': 'The Police',
'title': 'Every Breath You Take'},
{'artist': 'Lucio Battisti',
'title': 'Per Una Lira'},
{'artist': 'Alicia Keys ft. Jay-Z',
'title': 'Empire State of Mind'}]},
{'name': 'Starred',
'tracks': [{'artist': 'U2', 'title': 'With Or Without
You'}]}],
'user_id': 'fc799d71e8d2004377d6d8e861479559'}]
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
14/28
The first user has one playlist with two tracks by the same artist. The second user has two playlists, one playlist
with three tracks and four artists (since one track has a compound artist name), and the other playlist with one
track.
For our next task, we'd like to construct a copy of this data with the following simpler structure:
In [18]:
ex3_demo_output = [{'André Rieu'},
{'The Police', 'Lucio Battisti', 'Alicia Keys ft. J
ay-Z'},
{'U2'}]
This object is simply a Python list of Python sets, with the "outer" list containing playlists and each playlist
consisting only of distinct artist strings (
without
postprocessing per Exercise 2—we'll handle that later).
Exercise 3:
extract_playlists
(2 points)
Complete the function,
extract_playlists(users)
, so that it returns the simplified list of artist names as
shown above. For instance, calling
extract_playlists(ex0_demo_users)
should return an object that
matches
ex3_demo_output
.
Note 0:
You should
not
process the artist names per Exercise 2; that step comes
later.
Note 1:
You should
preserve
the exact order of playlists from the input. That is, you should loop
over users and playlists in the order that they appear in the input and produce the
corresponding output in that same order.
Note 2:
Do not forget that the final output should be a Python list (holding playlists) of Python
sets (unprocessed artist names).
Note 3:
Your function should
not
modify the input dataset.
In [19]:
def
extract_playlists(users):
### BEGIN SOLUTION
playlists = []
for
user
in
users:
for
playlist
in
user['playlists']:
artists = set()
for
tracks
in
playlist['tracks']:
artists |= {tracks['artist']}
playlists.append(artists)
return
playlists
### END SOLUTION
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
15/28
In [20]:
# Demo cell
ex3_your_output = extract_playlists(ex0_demo_users)
print("=== Your output ===")
pprint(ex3_your_output)
assert
all(a == b
for
a, b
in
zip(ex3_your_output, ex3_demo_output)),
"Your output does not match the demo output!"
print("
\n
(Your output matches the demo output — so far, so good!)")
In [21]:
# Test cell: `mt1_ex3_extract_playlists` (2 points)
### BEGIN HIDDEN TESTS
def
mt1_ex3__gen_soln(fn_base="simple_playlists", fn_ext="pickle", ove
rwrite=
False
):
from
testing_tools
import
file_exists, load_pickle, save_pickle
fn = f"
{fn_base}
.
{fn_ext}
"
if
file_exists(fn)
and
not
overwrite:
print(f"'
{fn}
' exists; skipping...")
else
:
# not file_exists(fn) or overwrite
print(f"'
{fn}
' does not exist or needs to be overwritten; gene
rating...")
simple_playlists = extract_playlists(spotify_users)
save_pickle(simple_playlists, fn)
!date
mt1_ex3__gen_soln(overwrite=
False
or
global_overwrite)
!date
### END HIDDEN TESTS
from
testing_tools
import
mt1_ex3__check
print("Testing...")
for
trial
in
range(250):
mt1_ex3__check(extract_playlists)
extract_playlists__passed =
True
print("
\n
(Passed!)")
=== Your output ===
[{'André Rieu'},
{'Lucio Battisti', 'Alicia Keys ft. Jay-Z', 'The Police'},
{'U2'}]
(Your output matches the demo output — so far, so good!)
Fri 26 Feb 2021 12:28:04 AM PST
'simple_playlists.pickle' exists; skipping...
Fri 26 Feb 2021 12:28:05 AM PST
Testing...
(Passed!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
16/28
Sample results for Exercise 3:
simple_playlists
If you had a working solution to Exercise 3, then in principle you could use it to construct simplified playlists for
the full Spotify dataset. Instead, we have precomputed these for you, for playlist in that dataset; run the cell
below to load it into a variable named
simple_playlists
.
Read and run this cell even if you skipped or otherwise did not complete Exercise 3.
In [22]:
simple_playlists = load_pickle('simple_playlists.pickle')
print("
\n
=== Examples (first three playlists) ===")
pprint(simple_playlists[:3])
Opening pickle from './resource/asnlib/publicdata/simple_playlists.pic
kle' ...
=== Examples (first three playlists) ===
[{'Cocktail Slippers',
'Crosby, Stills & Nash',
'Crowded House',
'Elvis Costello',
'Elvis Costello & The Attractions',
'Joe Echo',
'Joshua Radin',
'Lissie',
'Paul McCartney',
'Paul McCartney & Eric Clapton',
'The Breakers',
'The Coronas',
'The Len Price 3',
'Tiffany Page'},
{'Biffy Clyro',
'Bruce Springsteen',
'Elbow',
'Madness',
'Miles Kane',
'Noah And The Whale',
"Noel Gallagher's High Flying Birds",
'Oasis',
'Pearl Jam',
'Spector',
'Thunderclap Newman',
'Tom Petty',
'Tom Petty And The Heartbreakers'},
{'2080'}]
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
17/28
Part D: An itemset representation
Our artist-recommender system will reuse ideas from Notebook 2 (pairwise association rule mining). The next
two exercises do so.
But first, we'll need to identify analogues of
baskets
(or
receipts
) and
items
for our artist-recommender problem.
Here is how we'll do that.
Receipts (baskets):
Let's consider each playlist to be a receipt.
Items:
Let's consider each distinct artist (
after
name normalization per Exercise 2!) to be an item.
Example.
Recall the simplified playlists example,
ex3_demo_output
, from Exercise 3:
In [23]:
print(ex3_demo_output)
Since there are three playlists, there are three "receipts." We want to treat each one as an itemset consisting of
normalized
artist names, per Exercise 2.
In [24]:
ex4_demo_output = [{'andré rieu'}, {'the police', 'lucio battisti', 'a
licia keys', 'jay z'}, {'u2'}]
Observe that the second playlist includes one track having a compound artist name,
'Alicia Keys ft.
Jay-Z'
. In these instances, each collaborating artist should become an element of the itemset. Here, both
'alicia keys'
and
'jay z'
appear in the output.
Whether your Exercise 2 works or not, recall that we precomputed translations from raw artist name strings to
itemsets. These are stored in
artist_translation_table
, e.g.:
In [25]:
artist_translation_table['Alicia Keys ft. Jay-Z']
Code reuse from Notebook 2.
In addition to its concepts, Notebook 2 also has a lot of code we want you to
reuse.
For example, recall the
make_itemsets(receipts)
function. Given a bunch of receipts, it converts each
receipt into an itemset, a Python set of its items. Here is a generalized version of that code, which allows the the
user to supply a function,
make_set
, for converting one receipt into an itemset.
In [26]:
def
make_itemsets(receipts, make_set=set):
return
[make_set(r)
for
r
in
receipts]
[{'André Rieu'}, {'Lucio Battisti', 'Alicia Keys ft. Jay-Z', 'The Poli
ce'}, {'U2'}]
Out[25]:
{'alicia keys', 'jay z'}
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
18/28
For example, recall how this function worked in the case where "words" are receipts and the individual letters
are itemsets. Furthermore, simply calling the default
set
on one receipt creates an itemset:
In [27]:
make_itemsets(['hello', 'world'])
To use
make_itemsets
for our problem, we need to create a function that is compatible with the requirements
of the
make_set
argument. That is your next task.
Exercise 4:
normalize_artist_set
(2 points)
Complete the function,
normalize_artist_set(artist_set)
, where
artist_set
is a Python set of
unprocessed
artist names. It should return a Python set of
normalized
artist names, per Exercise 2.
For instance,
normalize_artist_set({'Alicia Keys ft. Jay-Z', 'Lucio Battisti', 'The Polic
e'})
should return
{'the police', 'lucio battisti', 'alicia keys', 'jay z'}
Note:
You may reuse your function from Exercise 2, if you are confident it is bug-free; otherwise,
we recommend using the precomputed values in
artist_translation_table
.
In [28]:
def
normalize_artist_set(artist_set):
### BEGIN SOLUTION
global
artist_translation_table
# not strictly necessary, but self
-documenting
output_set = set()
for
a
in
artist_set:
output_set |= artist_translation_table[a]
return
output_set
### END SOLUTION
In [29]:
# Demo cell:
normalize_artist_set({'Alicia Keys ft. Jay-Z', 'Lucio Battisti', 'The
Police'})
# expected output: `{'alicia keys', 'jay z', 'lucio battisti', 'th
e police'}`
Out[27]:
[{'e', 'h', 'l', 'o'}, {'d', 'l', 'o', 'r', 'w'}]
Out[29]:
{'alicia keys', 'jay z', 'lucio battisti', 'the police'}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
19/28
In [30]:
# Test cell: `mt1_ex4_normalize_artist_set` (2 points)
### BEGIN HIDDEN TESTS
def
mt1_ex4__gen_soln(fn_base="normalized_artist_sets", fn_ext="pickl
e", overwrite=
False
):
from
testing_tools
import
file_exists, load_pickle, save_pickle
def
make_itemsets(receipts, make_set=set):
return
[make_set(r)
for
r
in
receipts]
fn = f"
{fn_base}
.
{fn_ext}
"
if
file_exists(fn)
and
not
overwrite:
print(f"'
{fn}
' exists; skipping...")
else
:
# not file_exists(fn) or overwrite
print(f"'
{fn}
' does not exist or needs to be overwritten; gene
rating...")
simple_playlists = load_pickle('simple_playlists.pickle')
normalized_artist_sets = make_itemsets(simple_playlists, make_
set=normalize_artist_set)
save_pickle(normalized_artist_sets, fn)
!date
mt1_ex4__gen_soln(overwrite=
False
or
global_overwrite)
!date
### END HIDDEN TESTS
from
testing_tools
import
mt1_ex4__check
print("Testing...")
for
trial
in
range(250):
mt1_ex4__check(normalize_artist_set)
normalize_artist_set__passed =
True
print("
\n
(Passed!)")
Sample results for Exercise 4:
artist_itemsets
If you had a working solution to Exercise 4, then in principle you could use it to construct artist itemsets for all of
the playlists. Instead, we have precomputed these for you; run the cell below to load it into a variable named
artist_itemsets
.
Read and run this cell even if you skipped or otherwise did not complete Exercise 4.
Fri 26 Feb 2021 12:28:08 AM PST
'normalized_artist_sets.pickle' exists; skipping...
Fri 26 Feb 2021 12:28:08 AM PST
Testing...
(Passed!)
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
20/28
In [31]:
artist_itemsets = load_pickle('normalized_artist_sets.pickle')
print("
\n
=== Examples (first three playlists) ===")
pprint(artist_itemsets[:3])
Opening pickle from './resource/asnlib/publicdata/normalized_artist_se
ts.pickle' ...
=== Examples (first three playlists) ===
[{'cocktail slippers',
'crosby',
'crowded house',
'elvis costello',
'eric clapton',
'joe echo',
'joshua radin',
'lissie',
'nash',
'paul mccartney',
'stills',
'the attractions',
'the breakers',
'the coronas',
'the len price 3',
'tiffany page'},
{'biffy clyro',
'bruce springsteen',
'elbow',
'madness',
'miles kane',
'noah',
"noel gallagher's high flying birds",
'oasis',
'pearl jam',
'spector',
'the heartbreakers',
'the whale',
'thunderclap newman',
'tom petty'},
{'2080'}]
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
21/28
Exercise 5:
get_artist_counts
(1 point)
For the Notebook 2 analysis, we also needed a way to count in how many receipts each item occurred. That's
your next task.
Given a collection of artist itemsets, complete the function
get_artist_counts(itemsets)
so that it returns
a dictionary-like object with artist names as keys and the number of occurrences as values.
For example, suppose you start with these three itemsets:
itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the police'}, {'u2',
'the police'}, {'jay z'}]
Then
get_artist_counts(itemsets)
should return:
{'alicia keys': 1, 'jay z': 2, 'lucio battisti': 1, 'the police': 2, 'u2':
1}
Note:
By "dictionary-like," we mean either a conventional Python dictionary or a
collections.defaultdict
, as you prefer.
Hint:
Recall
update_item_counts
from Notebook 2, which we've provided again in the code
cell below.
In [32]:
def
update_item_counts(item_counts, itemset):
for
a
in
itemset:
item_counts[a] += 1
def
get_artist_counts(itemsets):
### BEGIN SOLUTION
from
collections
import
defaultdict
counts = defaultdict(int)
for
s
in
itemsets:
update_item_counts(counts, s)
return
counts
### END SOLUTION
In [33]:
# Demo cell:
itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the police'},
{'u2', 'the police'}, {'jay z'}]
get_artist_counts(itemsets)
Out[33]:
defaultdict(int,
{'alicia keys': 1,
'lucio battisti': 1,
'jay z': 2,
'the police': 2,
'u2': 1})
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
22/28
In [34]:
# Test cell: `mt1_ex5_get_artist_counts` (1 point)
### BEGIN HIDDEN TESTS
def
mt1_ex5__gen_soln(fn_base="artist_counts", fn_ext="pickle", overwr
ite=
False
):
from
testing_tools
import
file_exists, load_pickle, save_pickle
fn = f"
{fn_base}
.
{fn_ext}
"
if
file_exists(fn)
and
not
overwrite:
print(f"'
{fn}
' exists; skipping...")
else
:
# not file_exists(fn) or overwrite
print(f"'
{fn}
' does not exist or needs to be overwritten; gene
rating...")
artist_sets = load_pickle('normalized_artist_sets.pickle')
artist_counts = get_artist_counts(artist_sets)
save_pickle(artist_counts, fn)
!date
mt1_ex5__gen_soln(overwrite=
False
or
global_overwrite)
!date
### END HIDDEN TESTS
from
testing_tools
import
mt1_ex5__check
print("Testing...")
for
trial
in
range(250):
mt1_ex5__check(get_artist_counts)
get_artist_counts__passed =
True
print("
\n
(Passed!)")
Sample results for Exercise 5:
artist_counts
If you had a working solution to Exercise 5, then in principle you could run
get_artist_counts(artist_itemsets)
to count the number of occurrences of all artists. Instead, we
have precomputed these for you; run the cell below to load it into the object,
artist_counts
.
Read and run this cell even if you skipped or otherwise did not complete Exercise 5.
Fri 26 Feb 2021 12:28:10 AM PST
'artist_counts.pickle' exists; skipping...
Fri 26 Feb 2021 12:28:11 AM PST
Testing...
(Passed!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
23/28
In [35]:
artist_counts = load_pickle('artist_counts.pickle')
print("Examples:")
for
a
in
['lady gaga', 'fats domino', 'kishi bashi']:
print(f"* Artist '
{a}
' appears in
{artist_counts[a]:,}
playlist
s.")
Part E: A simple artist-recommender system
We now have all the pieces we need to build a recommender system to help users find artists they might like,
building on Notebook 2's pairwise association-rule miner. However, we'll need a modified procedure.
Why? Recall how many artists there are (run the cell below):
In [36]:
print(f'The dataset has {len(artist_counts):,} artists! (After name no
rmalization per Exercise 2.)')
That's a lot! So rather than finding all association rules, let's use the following procedure instead.
1. First, suppose a user has given us the name of one artist they already like. Call that the
root artist
.
2. Filter all playlists to only those containing the root artist. Call these the
root playlists
(or
root itemsets
).
3. For each root playlist, remove any artists that are "uncommon," based on a given threshold. However,
do
not
remove the root artist; those should always be kept, whether common or not. Call these
resulting playlists the
pruned playlists.
4. Run the pairwise association rule miner on these pruned playlists, which should be smaller and thus
faster to process, and report the top result(s).
For your last exercise, we'll give you code for Step 2 and need you to combine it with Step 3. We will provide
the rest, and if your procedure works, you'll be able to try it out!
Filtering step.
Here is code we are providing for Step 2 of this proposed recommender algorithm (filter
playlists).
In [37]:
def
filter_itemsets(root_item, itemsets):
return
[s
for
s
in
itemsets
if
root_item
in
s]
Opening pickle from './resource/asnlib/publicdata/artist_counts.pickl
e' ...
Examples:
* Artist 'lady gaga' appears in 5,121 playlists.
* Artist 'fats domino' appears in 327 playlists.
* Artist 'kishi bashi' appears in 427 playlists.
The dataset has 258,036 artists! (After name normalization per Exercis
e 2.)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
24/28
Here is a demo of
filter_itemsets
, which generates "root playlists" for the artist, "
Kishi Bashi
(https://www.kishibashi.com/)."
Pop-up Video / Behind The Lyrics trivia:
At the time of this exam (Spring 2021), Kishi Bashi lives
in Athens, Georgia, USA, about 90-minutes or so outside Atlanta!
In [38]:
root_playlists_for_kishi_bashi = filter_itemsets('kishi bashi', artist
_itemsets)
print(f"Found {len(root_playlists_for_kishi_bashi)} playlists containi
ng 'kishi bashi.'")
print("Example:", root_playlists_for_kishi_bashi[2])
Found 427 playlists containing 'kishi bashi.'
Example: {'hozier', 'the new pornographers', 'plushgun', 'the smashing
pumpkins', 'rockabye baby', 'discovery', "chris o'brien", 'matt nathan
son', 'ed sheeran', 'the xx', 'lisa hannigan', 'first aid kit', 'clap
your hands say yeah', 'kishi bashi', 'stars'}
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
25/28
Exercise 6:
prune_itemsets
(2 points)
Complete the function,
def
prune_itemsets(root_item, itemsets, item_counts, min_count):
...
so that it implements Step 2
and
Step 3 of the recommender. That is, the inputs are:
root_item
: The root item (i.e., the root artist name)
itemsets
: A collection of itemsets
item_counts
: A pre-tabulated count of how many times each item appears in an itemset
min_count
: The minimum number of itemsets in which an item should appear to be considered a
recommendation
Your function should return the playlists pruned as follows:
1. Filter the itemsets to only those containing
root_item
. The resulting itemsets are the
filtered
itemsets.
2. For each filtered itemset, remove any item where
item_counts[a] < min_count
. However, do
not
remove
root_item
, regardless of its count.
3. The resulting itemsets are the
pruned itemsets.
Discard any pruned itemsets that contain only the root
item. Return the remaining pruned itemsets as a Python list of sets.
Note 0:
Although the procedure above is written as though your function will modify its input
arguments,
it must not do so.
Use copies as needed instead. The test cell will not pass if you
modify the input arguments.
Note 1:
You can return pruned itemsets in any order. (So if the test cell does not pass, it is
not
because it assumes results in a particular order.)
Example.
Suppose the itemsets and item counts are given as follows:
In [39]:
ex6_demo_itemsets = [{'alicia keys', 'jay z', 'lucio battisti', 'the p
olice'}, {'u2', 'the police'}, {'jay z'}]
ex6_demo_item_counts = {'alicia keys': 1, 'jay z': 2, 'lucio battist
i': 1, 'the police': 2, 'u2': 1}
Then
prune_itemsets('the police', ex6_demo_itemsets, ex6_demo_item_counts, 2)
will end up returning a list with just one itemset,
[{'the police', 'jay z'}]
. That's because only two
itemsets have
'the police'
in them, and of those, only one has at least one item whose count exceeds
min_count=2
.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
26/28
In [40]:
def
prune_itemsets(root_item, itemsets, item_counts, min_count):
### BEGIN SOLUTION
filtered_itemsets = filter_itemsets(root_item, itemsets)
pruned_itemsets = []
for
s
in
filtered_itemsets:
s_pruned = set()
for
x
in
s:
if
item_counts[x] >= min_count
or
x == root_item:
s_pruned |= {x}
if
len(s_pruned) >= 2:
pruned_itemsets.append(s_pruned)
return
pruned_itemsets
### END SOLUTION
In [41]:
# Demo cell:
prune_itemsets('the police', ex6_demo_itemsets, ex6_demo_item_counts,
2)
Out[41]:
[{'jay z', 'the police'}]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
27/28
In [42]:
# Test cell: `mt1_ex6_prune_itemsets` (2 points)
### BEGIN HIDDEN TESTS
def
mt1_ex6__gen_soln(fn_base="pruned_playlists", root='kishi bashi',
threshold=1000, fn_ext="pickle", overwrite=
False
):
from
testing_tools
import
file_exists, load_pickle, save_pickle
fn = f"
{fn_base}
--{root.replace(' ', '-')}--
{threshold}
.
{fn_ext}
"
if
file_exists(fn)
and
not
overwrite:
print(f"'
{fn}
' exists; skipping...")
else
:
# not file_exists(fn) or overwrite
print(f"'
{fn}
' does not exist or needs to be overwritten; gene
rating...")
artist_itemsets = load_pickle('normalized_artist_sets.pickle')
artist_counts = load_pickle('artist_counts.pickle')
pruned_playlists = prune_itemsets(root, artist_itemsets, artis
t_counts, threshold)
save_pickle(pruned_playlists, fn)
!date
mt1_ex6__gen_soln(overwrite=
False
or
global_overwrite)
!date
### END HIDDEN TESTS
from
testing_tools
import
mt1_ex6__check
print("Testing...")
for
trial
in
range(250):
mt1_ex6__check(prune_itemsets)
prune_itemsets__passed =
True
print("
\n
(Passed!)")
Fin!
If you passed the preceding exercise, then you have all the pieces necessary to try your recommendation
algorithm! It is optional to do so, but if you have any time left, pick your favorite artist (assuming they are in the
dataset) and see if you get reasonable results.
Otherwise, you’ve reached the end of this problem. Don’t forget to restart and run all cells again to make sure
your code works when running all code cells in sequence; and make sure your work passes the submission
process. Good luck!
Fri 26 Feb 2021 12:28:14 AM PST
'pruned_playlists--kishi-bashi--1000.pickle' exists; skipping...
Fri 26 Feb 2021 12:28:14 AM PST
Testing...
(Passed!)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
11/28/23, 8:41 PM
main
file:///Users/dannie/Downloads/pmt1-sample-solutions-su21/problem24-sample-solutions.html
28/28
In [43]:
assert
prune_itemsets__passed ==
True
, "Are you sure you passed Exerci
se 6?"
# `recommend` implements the complete recommender algorithm
def
recommend(root_artist, conf=0.2, min_count=1000, verbose=
True
):
from
cse6040nb2
import
find_assoc_rules, print_rules
global
artist_itemsets, artist_counts
print("Pruning...")
pruned_playlists = prune_itemsets(root_artist, artist_itemsets, ar
tist_counts, min_count)
num_artists = sum(len(p)
for
p
in
pruned_playlists)
print("
\t
", len(pruned_playlists), "itemsets remain with", num_art
ists, "artists.")
print("Finding association rules...")
rules = find_assoc_rules(pruned_playlists, conf)
rules = {(a, b): c
for
(a, b), c
in
rules.items()
if
a == root_art
ist}
print("
\t
", len(rules), f"rules of the form `conf('
{root_artist}
'
=> x) >=
{conf}
")
print(f"
\n
=== Our top recommendations for '
{root_artist}
' ===")
print_rules(rules, limit=20)
# DEMO: 'kishi bashi' produces some spurious results because
# both "Of Monsters and Men" and "Mumford and Sons" are
# erroneously split into two.
recommend('kishi bashi')
Pruning...
411 itemsets remain with 19530 artists.
Finding association rules...
20 rules of the form `conf('kishi bashi' => x) >= 0.2
=== Our top recommendations for 'kishi bashi' ===
conf(kishi bashi => alt j) = 0.265
conf(kishi bashi => passion pit) = 0.258
conf(kishi bashi => men) = 0.255
conf(kishi bashi => of monsters) = 0.255
conf(kishi bashi => vampire weekend) = 0.231
conf(kishi bashi => bon iver) = 0.229
conf(kishi bashi => grizzly bear) = 0.224
conf(kishi bashi => the lumineers) = 0.219
conf(kishi bashi => two door cinema club) = 0.219
conf(kishi bashi => the xx) = 0.219
conf(kishi bashi => m83) = 0.219
conf(kishi bashi => the shins) = 0.219
conf(kishi bashi => first aid kit) = 0.217
conf(kishi bashi => sons) = 0.214
conf(kishi bashi => mumford) = 0.212
conf(kishi bashi => the black keys) = 0.212
conf(kishi bashi => grouplove) = 0.212
conf(kishi bashi => lana del rey) = 0.204
conf(kishi bashi => imagine dragons) = 0.204
conf(kishi bashi => phantogram) = 0.202
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/1d7e7/1d7e7583d6f456277727f8d158d820c51233aa30" alt="Text book image"
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
data:image/s3,"s3://crabby-images/7459b/7459bf678b74427bda237ab38d4b5d3949952a7e" alt="Text book image"
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/76250/762503ef8bed15d929593c1ab492e2e2028e039d" alt="Text book image"
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning
Recommended textbooks for you
- C++ for Engineers and ScientistsComputer ScienceISBN:9781133187844Author:Bronson, Gary J.Publisher:Course Technology PtrC++ Programming: From Problem Analysis to Program...Computer ScienceISBN:9781337102087Author:D. S. MalikPublisher:Cengage LearningEBK JAVA PROGRAMMINGComputer ScienceISBN:9781337671385Author:FARRELLPublisher:CENGAGE LEARNING - CONSIGNMENT
- Programming Logic & Design ComprehensiveComputer ScienceISBN:9781337669405Author:FARRELLPublisher:CengageMicrosoft Visual C#Computer ScienceISBN:9781337102100Author:Joyce, Farrell.Publisher:Cengage Learning,Systems ArchitectureComputer ScienceISBN:9781305080195Author:Stephen D. BurdPublisher:Cengage Learning
data:image/s3,"s3://crabby-images/1d7e7/1d7e7583d6f456277727f8d158d820c51233aa30" alt="Text book image"
C++ for Engineers and Scientists
Computer Science
ISBN:9781133187844
Author:Bronson, Gary J.
Publisher:Course Technology Ptr
data:image/s3,"s3://crabby-images/7459b/7459bf678b74427bda237ab38d4b5d3949952a7e" alt="Text book image"
C++ Programming: From Problem Analysis to Program...
Computer Science
ISBN:9781337102087
Author:D. S. Malik
Publisher:Cengage Learning
data:image/s3,"s3://crabby-images/76250/762503ef8bed15d929593c1ab492e2e2028e039d" alt="Text book image"
EBK JAVA PROGRAMMING
Computer Science
ISBN:9781337671385
Author:FARRELL
Publisher:CENGAGE LEARNING - CONSIGNMENT
Programming Logic & Design Comprehensive
Computer Science
ISBN:9781337669405
Author:FARRELL
Publisher:Cengage
data:image/s3,"s3://crabby-images/f69b6/f69b6127845775e68542aa44ed44f5dcebe26fad" alt="Text book image"
Microsoft Visual C#
Computer Science
ISBN:9781337102100
Author:Joyce, Farrell.
Publisher:Cengage Learning,
data:image/s3,"s3://crabby-images/b907a/b907ada1f4be11d175260bd2a8acbc475b9f1fe1" alt="Text book image"
Systems Architecture
Computer Science
ISBN:9781305080195
Author:Stephen D. Burd
Publisher:Cengage Learning