In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists. What to do Rewrite/finish the analyze function to print out: the average number of comments across all posts the average score across all posts what the highest score is and the title for that post what the lowest score is and the title for that post what the most commented post is with its title and number of comments Starting Resources Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly. import csv def analyze(entries):     print(f'first entry: {entries[0]}') with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:     entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]     avgScore = analyze(entries)   Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts): import csv # Introduce function print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.') def analyze(entries): # Calculate average number of comments across all posts. total_comments = sum(entry[2] for entry in entries) avg_comments = total_comments / len(entries) print(f'The average number of comments across all posts is {avg_comments:.2f}') # Average score across all posts. total_score = sum(entry[1] for entry in entries) avg_score = total_score / len(entries) print(f'The average score across all posts is {avg_score:.2f}') # Highest score and title for that post. high_score = max(entry[1] for entry in entries) hs_index = entries.index(high_score) high_score_title = entries[hs_index][3] print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.') # Lowest score and title for that post. # Most commented post with its title and number of comments. # Open and read file with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file: entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)] # Call analyze function analyze(entries)

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

In this assignment, we will analyze some real-world data using lists and tuples. Kaggle is a good resource for datasets that have been collected and are available for you to analyze. We are going to use a collection of reddit's r/VaccineMythsLinks to an external site. (note: reddit is unfiltered and may contain offensive content) to demonstrate that we can perform statistical functions on lists.

What to do

Rewrite/finish the analyze function to print out:

  • the average number of comments across all posts
  • the average score across all posts
  • what the highest score is and the title for that post
  • what the lowest score is and the title for that post
  • what the most commented post is with its title and number of comments

Starting Resources

Here is the file you will need: reddit_vm.csvDownload reddit_vm.csv

And following is a starter program that you can run using Python on your own computer. Note that you will need to have the CSV file in the same directory as your python script in order for this to work correctly.

import csv

def analyze(entries):
    print(f'first entry: {entries[0]}')

with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input:
    entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input)]
    avgScore = analyze(entries)

 

Here's what I've got so far (I'm stuck on how to find the title of the high or low score posts):

import csv

# Introduce function
print('The following is an analysis of a collection of posts from the subreddit r/VaccineMyths.')


def analyze(entries):
# Calculate average number of comments across all posts.
total_comments = sum(entry[2] for entry in entries)
avg_comments = total_comments / len(entries)
print(f'The average number of comments across all posts is {avg_comments:.2f}')

# Average score across all posts.
total_score = sum(entry[1] for entry in entries)
avg_score = total_score / len(entries)
print(f'The average score across all posts is {avg_score:.2f}')

# Highest score and title for that post.
high_score = max(entry[1] for entry in entries)
hs_index = entries.index(high_score)
high_score_title = entries[hs_index][3]
print(f'The post with the highest score was "{high_score_title}" with a score of {high_score}.')

# Lowest score and title for that post.

# Most commented post with its title and number of comments.

# Open and read file
with open("reddit_vm.csv", "r", encoding='UTF-8', errors="ignore") as input_file:
entries = [(e['id'], int(e['score']), int(e['comms_num']), e['title']) for e in csv.DictReader(input_file)]

# Call analyze function
analyze(entries)
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps

Blurred answer
Knowledge Booster
Hash Table
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education