STA220_Assignment_Instructions

pdf

School

York University *

*We aren’t endorsed by this school

Course

MATH6631

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

5

Uploaded by daniellyu33

Report
STA220H1 The Practice of Statistics I (Winter 2024) Assignment Instructions Due Date: March 24, 2024 at 11:59 on Crowdmark Instructions This is an individual assignment. You are expected to work on this independently. While you may discuss ideas and concepts, please do not share your code or written answers. It is expected that all code and written work should be written by yourself. Please note, this assignment is fairly open, so the context of most of the work completed here should not match your peers. Submission Format and Instructions Your final submission will be in PDF file. You will submit your solutions on Crowdmark. There will be a different upload box for each question, so it is recommended that you place each question on different pages or files. After uploading your file on Crowdmark, you need to click the submit button to submit your work. Your PDF file(s) will need to show (1) R code, (2) R output/figures, and (3) your written answers. Here are some suggested ways you can create your final submission: Use Microsoft Word to type out your answers. Screenshot your R output and place these images throughout the document. For the R code, either copy/paste as text or screenshot. Use an app like Notability, OneNote, etc., where you can write/type your answers and include screenshots of your R code and output. Use RMarkdown and knit to a PDF. Alternatively, you can knit to an HTML file and then save it as a PDF. How you create the final file is up to you, as long as it is clear and organized. You don’t want the TA to be frustrated while marking your work! Late or Missing Submissions If the assignment is not submitted by the due date, it will be subject to a late penalty of 20% per day. No extensions will be provided for the assignment. Alternatively, if the assignment is missed due to an illness or personal emergency please fill out the form listed in the syllabus.
Data for this Assignment Consider the Spotify dataset provided for this assignment. This dataset contains a comprehensive list of the most famous songs of 2023 as listed on Spotify as of August 2023. It provides insights into each song's attributes, popularity, and presence on various music platforms. The following variables are provided in the data: track_name: Name of the song artist: Name of the artist(s) of the song artist_count: Number of artists contributing to the song released_year: Year when the song was released released_month: Month when the song was released released_day: Day of the month when the song was released in_spotify_playlists: Number of Spotify playlists the song is included in in_spotify_charts: Presence and rank of the song on Spotify charts streams: Total number of streams on Spotify in_apple_playlists: Number of Apple Music playlists the song is included in in_apple_charts: Presence and rank of the song on Apple Music charts in_deezer_playlists: Number of Deezer playlists the song is included in in_deezer_charts: Presence and rank of the song on Deezer charts in_shazam_charts: Presence and rank of the song on Shazam charts bpm: Beats per minute, a measure of song tempo key: Key of the song mode: Mode of the song (major or minor) danceability_percent: Percentage indicating how suitable the song is for dancing valence_percent: Positivity of the song's musical content energy_percent: Perceived energy level of the song acousticness_percent: Amount of acoustic sound in the song instrumentalness_percent: Amount of instrumental content in the song liveness_percent: Presence of live performance elements speechiness_percent: Amount of spoken words in the song
Tips with R Depending on how you want to go about your assignment and code, here are a few tips that might help you out: In this assignment, you may wish to create new binary variables based on existing variables. For example, the following code creates a new variable called “c_sharp” in the dataset “spotify_data”. It is equal to 1 if the key is C# and 0 otherwise. spotify_data$c_sharp <- as.numeric(spotify_data$key == 'C#') In another example, the following code creates a new variable called “after_2020”. It is equal to 1 if the release year is after 2020 and 0 otherwise. spotify_data$after_2020 <- as.numeric(spotify_data$released_year > 2020) You may also wish to remove observations that have an NA (i.e., missing) in a variable of interest. For example, this code creates a new dataset called ‘spotify_data2’ such that all the observations where ‘c_sharp’ was missing is removed. spotify_data2 <- spotify_data[!is.na(spotify_data$c_sharp),]
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Question 1 (22 marks) Your task is to explore and analyze the Spotify dataset using graphical summaries created using R. Create a minimum of 3 properly labelled plots/graphs/figures that summarize some of the patterns in the data. It is recommended that you use at least 2 different types of plots. For each plot, write a paragraph explaining the patterns and/or trends you see in the plot. You will be graded on not only your ability to create plots in R, but also your ability to use the plots to highlight important and/or compelling information in the data. Question 2 (6 marks) We are interested in the proportion of songs that are in a major key. Answer the following questions using the Spotify dataset. All calculations should be done in R; please show your code and output. a) Test whether 50% of songs are in a major key using ࠵? = 0.1 . State the hypotheses, calculate the Z- statistic and p-value using R, and state the relevant conclusion. (4 marks) b) Compute a 99% confidence interval for the proportion of songs that are in a major key. You may use ࠵? ≈ 0.5. Interpret the interval. (2 marks) Question 3 (6 marks) We are interested in average beats per minute of a song. Answer the following questions using the Spotify dataset. All calculations should be done in R; show your code and output. a) Test whether the average beats per minute of a song is equal to 124 bpm using ࠵? = 0.05 . State the hypotheses, calculate the test statistic and p-value using R, and state the relevant conclusion. (4 marks) b) Compute a 90% confidence interval for the average beats per minute of a song. Interpret the interval. (2 marks)
Grading Question 1 will be graded based on the rubric below. Questions 2 and 3 will be graded based on correctness. Inadequate Fair Good Excellent Plots (10 marks) 0-4 marks Does not meet the requirement of 3+ plots. 5-6 marks Required plots are provided, but plots do not highlight the important information or show a variety of trends. The type of plots chosen are not ideal for the situation. 7-8 marks Required plots are provided, and mostly shows that the student is able to create a plot relevant for the situation. A variety of plots are used, and plots are labelled properly. 9-10 marks Required plots are provided, and a lot of thought was put into creating the plot. Plots are interesting, compelling, and communicate well to the viewer. Written descriptions (10 marks) 0-4 marks Written descriptions are not sufficient in describing the plots. Writing is unclear. 5-6 marks Written descriptions are provided but contain major errors. The descriptions do not accurately describe the plots or are misleading. Writing is somewhat unclear. 7-8 marks Written descriptions are provided and shows that student is able to properly interpret plots. Writing is generally clear. 9-10 marks Written descriptions are excellent. Student exceeds expectations and highlights the important trends within the data. Writing is clear and compelling. R code (2 marks) 0 marks R code is not shown. 1 marks R code is provided but is difficult to follow and/or has major errors. 2 marks R code is provided. A variety of plot types are shown.