In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists. What to do Rewrite/finish the analyze function to print out: the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments Starting Resources Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly. import csv def analyze(entries): print(f'first entry: {entries[0]}') with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)] avgScore = analyze(entries) Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts): import csv # Introduce function print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.') def analyze(entries): # Calculate average number of comments across all posts. total_comments = sum(entry[2] for entry in entries) avg_comments = total_comments / len(entries) print(f'The average number of comments across all posts is {avg_comments:.2f}') # Average score across all posts. total_score = sum(entry[1] for entry in entries) avg_score = total_score / len(entries) print(f'The average score across all posts is {avg_score:.2f}') # Highest score and title for that post. high_score = max(entry[1] for entry in entries) hs_index = entries.index(high_score) high_score_title = entries[hs_index][3] print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.') # Lowest score and title for that post. # Most commented post with its title and number of comments. # Open and read file with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)] # Call analyze function analyze(entries)
In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists.
What to do
Rewrite/finish the analyze function to print out:
- the average number of comments across all posts
- the average score across all posts
- what the highest score is and the title for that post
- what the lowest score is and the title for that post
- what the most commented post is with its title and number of comments
Starting Resources
Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv
And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly.
import csv
def analyze(entries):
print(f'first entry: {entries[0]}')
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
avgScore = analyze(entries)
Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts):
# Introduce function
print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.')
def analyze(entries):
# Calculate average number of comments across all posts.
total_comments = sum(entry[2] for entry in entries)
avg_comments = total_comments / len(entries)
print(f'The average number of comments across all posts is {avg_comments:.2f}')
# Average score across all posts.
total_score = sum(entry[1] for entry in entries)
avg_score = total_score / len(entries)
print(f'The average score across all posts is {avg_score:.2f}')
# Highest score and title for that post.
high_score = max(entry[1] for entry in entries)
hs_index = entries.index(high_score)
high_score_title = entries[hs_index][3]
print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.')
# Lowest score and title for that post.
# Most commented post with its title and number of comments.
# Open and read file
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)]
# Call analyze function
analyze(entries)
Trending now
This is a popular solution!
Step by step
Solved in 3 steps