Write a python program that scans in a large number of tweets from a file, and prints the top 5 hashtags. Approach Parse each word in the file Find a way to isolate the hashtags from the rest of the tweet Compute the frequency of each unique hashtag. Find the top 5 hashtags. Caveats 1. Because this is public dataset, there are many tweets that use special characters that may cause issues during reading in your file. To avoid this, explicitly specify the encoding mode in open when reading in the file. with open('twitter_data.txt', 'r', encoding='utf8') as f: 2. When parsing hashtags be sure to change everything to lowercase. There are cases in the file where two hashtags are the same but differ in case. 3. There is no standard way of sorting a dictionary based on the values. Python gives you access to a sorted function where you can pass in a collection and also specify a comparator which outlines how you want to sort the values. This can be done with a one line lambda function.
Write a python program that scans in a large number of tweets from a file, and prints the top 5 hashtags.
Approach
- Parse each word in the file
- Find a way to isolate the hashtags from the rest of the tweet
- Compute the frequency of each unique hashtag.
- Find the top 5 hashtags.
Caveats
1. Because this is public dataset, there are many tweets that use special characters that may cause issues during reading in your file. To avoid this, explicitly specify the encoding mode in open when reading in the file.
2. When parsing hashtags be sure to change everything to lowercase. There are cases in the file where two hashtags are the same but differ in case.
3. There is no standard way of sorting a dictionary based on the values. Python gives you access to a sorted function where you can pass in a collection and also specify a comparator which outlines how you want to sort the values. This can be done with a one line lambda function.
Step by step
Solved in 5 steps with 3 images