Counting hashtags Write Python code to count the frequency of hashtags in a twitter feed. Your code assumes a twitter feed variable tweets exists, which is a list of strings containing tweets. Each element of this list is a single tweet, stored as a string. For example, tweets may look like: tweets = ["Happy #IlliniFriday!", "It is a pretty campus, isn't it, #illini?", "Diving into the last weekend of winter break like... #ILLINI #JoinTheFight", "Are you wearing your Orange and Blue today, #Illini N
Write Python code to count the frequency of hashtags in a twitter feed.
Your code assumes a twitter feed variable tweets exists, which is a list of strings containing tweets. Each element of this list is a single tweet, stored as a string. For example, tweets may look like:
tweets = ["Happy #IlliniFriday!", "It is a pretty campus, isn't it, #illini?", "Diving into the last weekend of winter break like... #ILLINI #JoinTheFight", "Are you wearing your Orange and Blue today, #Illini Nation?"]
|
Your code should produce a sorted list of tuples stored in hashtag_counts, where each tuple looks like (hashtag, count), hashtag is a string and count is an integer. The list should be sorted by count in descending order, and if there are hashtags with identical counts, these should be sorted alphabetically, in ascending order, by hashtag.
From the above example, our unsorted hashtag_counts might look like:
[('#illini', 2), ('#jointhefight', 1), ('#illinifriday!', 1), ('#illini?', 1)]
|
[('#illini', 2), ('#illini?', 1), ('#illinifriday!', 1), ('#jointhefight', 1)]
|
You may use str.split() to split each tweet into a list of words. A hashtag is any word that starts with a hash mark (#). (That means that the hash mark # should be included in the hashtag value above.)
Steps/Hints:
-
Preprocessing: You will need to convert each hashtag to lower case before you count it. For example, for this question #UIUC and #Uiuc add to the count of same hashtag (#uiuc).
-
Do not further process the tweets or hashtags beyond using .split(), such as attempting to remove punctuation. While in the 'real world' you would absolutely do this, in this problem the autograder will be unhappy with you if you do.
-
And if using .split(), do not pass any arguments (when no arguments are added then every kind of whitespace will be considered).
-
You may find it helpful to use an intermediate data structure for this problem to count the frequency of each hastag.
-
If you aren't sure how to sort or convert to lowercase, you may find Python docs how to sort and Python docs for string methods useful.
import re
def extract_hashtags(tweet):
return re.findall(r'#\w+', tweet.lower())
def count_hashtags(tweets):
hashtag_counts = {}
for tweet in tweets:
hashtags = extract_hashtags(tweet)
for hashtag in hashtags:
hashtag_counts[hashtag] = hashtag_counts.get(hashtag, 0) + 1
return hashtag_counts
def sort_hashtags(hashtag_counts):
sorted_hashtag_counts = sorted(hashtag_counts.items(), key=lambda x: (-x[1], x[0]))
return sorted_hashtag_counts
def main():
# Example tweets
tweets = [
"Happy #IlliniFriday!",
"It is a pretty campus, isn't it, #illini?",
"Diving into the last weekend of winter break like... #ILLINI #JoinTheFight",
"Are you wearing your Orange and Blue today, #Illini Nation?"
]
# Counting hashtags
hashtag_counts = count_hashtags(tweets)
# Sorting and converting to list of tuples
sorted_hashtag_counts = sort_hashtags(hashtag_counts)
hashtag_counts = list(sorted_hashtag_counts)
# Displaying the result
for hashtag, count in sorted_hashtag_counts:
print(f"{hashtag}: {count}")
if __name__ == "__main__":
main()
Score: 0/30 (0%)
Test Results
- Max points: 30
- Earned points: 0
-
MessageFeedback for case 0 --------------------- hashtag_counts is not a list. Feedback for case 1 --------------------- hashtag_counts is not a list. Feedback for case 2 --------------------- hashtag_counts is not a list.
![](/static/compass_v2/shared-icons/check-mark.png)
Trending now
This is a popular solution!
Step by step
Solved in 5 steps
![Blurred answer](/static/compass_v2/solution-images/blurred-answer.jpg)
I get the same error. I have no idea what I am missing.
- Max points: 30
- Earned points: 0
-
MessageFeedback for case 0 --------------------- hashtag_counts is not a list. Feedback for case 1 --------------------- hashtag_counts is not a list. Feedback for case 2 --------------------- hashtag_counts is not a list.
![Database System Concepts](https://www.bartleby.com/isbn_cover_images/9780078022159/9780078022159_smallCoverImage.jpg)
![Starting Out with Python (4th Edition)](https://www.bartleby.com/isbn_cover_images/9780134444321/9780134444321_smallCoverImage.gif)
![Digital Fundamentals (11th Edition)](https://www.bartleby.com/isbn_cover_images/9780132737968/9780132737968_smallCoverImage.gif)
![Database System Concepts](https://www.bartleby.com/isbn_cover_images/9780078022159/9780078022159_smallCoverImage.jpg)
![Starting Out with Python (4th Edition)](https://www.bartleby.com/isbn_cover_images/9780134444321/9780134444321_smallCoverImage.gif)
![Digital Fundamentals (11th Edition)](https://www.bartleby.com/isbn_cover_images/9780132737968/9780132737968_smallCoverImage.gif)
![C How to Program (8th Edition)](https://www.bartleby.com/isbn_cover_images/9780133976892/9780133976892_smallCoverImage.gif)
![Database Systems: Design, Implementation, & Manag…](https://www.bartleby.com/isbn_cover_images/9781337627900/9781337627900_smallCoverImage.gif)
![Programmable Logic Controllers](https://www.bartleby.com/isbn_cover_images/9780073373843/9780073373843_smallCoverImage.gif)