Exploring a Comprehensive Dataset of 2023's Top Songs

Dataset Exploration Project Part 1 BDAT1005-23F Mathematics for Data Analytics Submitted By: Darpan Aryal Student ID: 200569576 Submitted To: Prof. Eshan Pourjavad

About Dataset: The dataset used for this project was taken from Kaggle. The most well-known songs of 2023 are fully listed in this dataset, according to Spotify. The dataset provides a multitude of features that are not generally present in datasets of a similar nature. It offers information about the qualities, appeal, and visibility of each song across different music outlets. Track name, artist(s) name, release date, Spotify playlists and charts, streaming statistics, Apple Music presence, Deezer presence, Shazam charts, and numerous audio attributes are among the details included in the dataset. [ CITATION Kag23 \l 1033 ] According to the project requirements this includes 24 different variables and more than 1000 data. Assumptions of the Dataset: It's critical to be aware of any possible assumptions that must be made about a dataset when dealing with it. The following presumptions are applied to the provided music dataset. 1) Data collection: Data collection: The dataset makes the assumption that the data were correctly and fairly gathered. Understanding the data's origin and any possible mistakes in the data collection process is important. Data reliability and quality can be evaluated by knowing how and when the data were collected. 2) Sample vs. Population: It seems that the dataset only contains a sample of all the music tracks that are currently in existence. Since it's frequently impossible to collect data on every member of a population, this assumption tends to be made in data analysis. 3) Units of Measurement: It's critical to define and fully understand each variable's units of measurement in the dataset. For instance, "streams" can refer to the overall number of streams on a streaming music site, "bpm" stands for beats per minute, and percentages like "valence_%" and "danceability_%" should be measured on a scale of 0 to 100. The interpretation of variables is made sure to be accurate by understanding the units.

4) Data Integrity: The dataset makes the assumption that the data are accurate and full. However, problems with data quality, including missing numbers or exceptions, can occur. Data cleansing or imputation may be necessary to solve data integrity problems. 5) Categorical Variables: It is assumed that the categories for categorical variables, such as "key" and "mode," are clearly specified and follow accepted practices in music theory. It's critical to confirm that the categories are reliable representations of the musical recordings. 6) Time Assumptions: The dataset makes the assumption that the release dates accurately reflect the times at which the tracks became publicly accessible. However, there might be inconsistencies, like pre-release advertising or re- releases of earlier songs. 7) Accuracy of Popularity Analytics: Variables linked to popularity (such as "in_spotify_charts," "streams") rely on the idea that these metrics are accurate measures of how well-liked a music is. These measurements, nevertheless, might be influenced by things like marketing initiatives and outside circumstances, so they might not always accurately reflect the caliber of the music. 8) Creators: It is believed that the dataset adequately depicts the dynamics of the collaboration, including the roles played by each artist in the track, for songs with multiple artists ("artist_count" > 1). The type of collaboration may differ, which may have an impact on the analysis. 9) Genre and Listener Groups: It doesn't appear that the dataset contains explicit data on music genres or listener groups. Additional information might be needed for any analysis involving audience characteristics or genre preferences. 10) Sampling Errors: There may be sampling errors if the dataset wasn't created using random sampling. For instance, if it mostly consists of hit songs, it might not adequately reflect unknown or up-and-coming musicians.

Your preview ends here