In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists. What to do Rewrite/finish the analyze function to print out: the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments Starting Resources Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly. import csv def analyze(entries): print(f'first entry: {entries[0]}') with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)] avgScore = analyze(entries) Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts): import csv # Introduce function print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.') def analyze(entries): # Calculate average number of comments across all posts. total_comments = sum(entry[2] for entry in entries) avg_comments = total_comments / len(entries) print(f'The average number of comments across all posts is {avg_comments:.2f}') # Average score across all posts. total_score = sum(entry[1] for entry in entries) avg_score = total_score / len(entries) print(f'The average score across all posts is {avg_score:.2f}') # Highest score and title for that post. high_score = max(entry[1] for entry in entries) hs_index = entries.index(high_score) high_score_title = entries[hs_index][3] print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.') # Lowest score and title for that post. # Most commented post with its title and number of comments. # Open and read file with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)] # Call analyze function analyze(entries)

In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists. What to do Rewrite/finish the analyze function to print out: the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments Starting Resources Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly. import csv def analyze(entries): print(f'first entry: {entries[0]}') with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)] avgScore = analyze(entries) Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts): import csv # Introduce function print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.') def analyze(entries): # Calculate average number of comments across all posts. total_comments = sum(entry[2] for entry in entries) avg_comments = total_comments / len(entries) print(f'The average number of comments across all posts is {avg_comments:.2f}') # Average score across all posts. total_score = sum(entry[1] for entry in entries) avg_score = total_score / len(entries) print(f'The average score across all posts is {avg_score:.2f}') # Highest score and title for that post. high_score = max(entry[1] for entry in entries) hs_index = entries.index(high_score) high_score_title = entries[hs_index][3] print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.') # Lowest score and title for that post. # Most commented post with its title and number of comments. # Open and read file with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)] # Call analyze function analyze(entries)

Database System Concepts

7th Edition

ISBN:9780078022159

Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan

Chapter1: Introduction

Section: Chapter Questions

Problem 1PE

See similar textbooks

What to do

Rewrite/finish the analyze function to print out:

the average number of comments across all posts
the average score across all posts
what the highest score is and the title for that post
what the lowest score is and the title for that post
what the most commented post is with its title and number of comments

Starting Resources

Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv

And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly.

import csv

def analyze(entries):
print(f'first entry: {entries[0]}')

with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
avgScore = analyze(entries)

Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts):

import csv

# Introduce function
print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.')

def analyze(entries):
# Calculate average number of comments across all posts.
total_comments = sum(entry[2] for entry in entries)
avg_comments = total_comments / len(entries)
print(f'The average number of comments across all posts is {avg_comments:.2f}')

# Average score across all posts.
total_score = sum(entry[1] for entry in entries)
avg_score = total_score / len(entries)
print(f'The average score across all posts is {avg_score:.2f}')

# Highest score and title for that post.
high_score = max(entry[1] for entry in entries)
hs_index = entries.index(high_score)
high_score_title = entries[hs_index][3]
print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.')

# Lowest score and title for that post.

# Most commented post with its title and number of comments.

# Open and read file
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)]

# Call analyze function
analyze(entries)

Expert Solution

Trending now

This is a popular solution!

Step by step

Solved in 3 steps

SEE SOLUTION Check out a sample Q&A here

Knowledge Booster

Learn more about

Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.