Answered: Investigation 2: Songs from Spotify –…

MATLAB: An Introduction with Applications

6th Edition

ISBN:9781119256830

Author:Amos Gilat

Publisher:Amos Gilat

Chapter1: Starting With Matlab

Section: Chapter Questions

Problem 1P

See similar textbooks

Related questions

Question

100%

Investigation 2: Songs from Spotify – Genre and Mode (revisited)
From our first Data Analysis Assignment, a sample of 3439 songs were selected from the Spotify
database and twelve variables were measured for each song. In Data Analysis 1 Investigation 1,
we were interested in analyzing the Genre and Mode of each song. Remember, the Mode
variable indicates the type of scale the song’s melodic content is derived from (Major or Minor).
Let us now investigate whether a difference exists between the Major songs that are Rock and
the Minor songs that are Rock. From the random sample of songs, there were 1964 Major songs
and of those 365 were Rock songs. In addition, there were 1475 Minor songs and of those 147
were Rock songs.

a) Calculate and label the two sample proportions separately and round each value to four
decimal places. Next, calculate the difference between these sample proportions of
Major and Minor songs by subtracting (Major – Minor). Type these calculations, label
each of them, and present each of these values in your solutions document.

b) What is the parameter of interest? Use words and symbol(s) in context in your answer.
Define any subscripts that you use.

c) Create a bootstrap distribution by following these instructions. In StatKey under the
middle pane labeled ‘Bootstrap Confidence Intervals’, click CI for Difference in
Proportions. Click ‘Edit Data’, then enter in the count and sample size for each group.
Make Group 1 Major and Group 2 Minor. Next, click ‘Generate 1000 Samples.’ Take a
screenshot of your bootstrap distribution including the mean and standard error and paste
it in your solutions document.

d) Describe the shape of the bootstrap distribution in a complete sentence.

e) Construct a 95% bootstrap confidence interval using the original sample statistic and the
+ 2SE method. Show all work and present your answer as (lower value, upper value).

f) Interpret the meaning of the confidence interval you obtained in part (e) in context.

g) Does your 95% confidence interval capture 0? Based on your answer, what can we infer
about whether a difference exists between the Modes? Answer these questions in the
context of the problem in one or two sentences.

h) Using your bootstrap distribution from part (c), construct a 90% confidence interval using
the percentile method. Go to the top left corner of the distribution and click ‘Two-Tail’
and then enter in the percentile values needed based on the significance level. Present a
screenshot of your bootstrap distribution (with all five blue boxes visible) and write your
answer as (lower value, upper value).

i) Does your 90% confidence interval capture 0? Based on your answer, what can we infer
about whether a difference exists between the Modes? Answer these questions in the
context of the problem in one or two sentences.

j) If the analyst was testing the hypothesis that there exists a difference between the
proportion of Rock songs that were of Major and Minor Mode, state the null and
alternative hypotheses using correct notation. Consider Major as Population 1 and Minor
as Population 2.

k) Create a randomization distribution by following these instructions. In StatKey, go to the
right pane labeled ‘Randomization Hypothesis Tests’ and click Test for Difference in
Proportions. Edit the data in ‘Edit Data’ by entering in the count and sample size for each
group and click ‘Generate 1000 Samples.’ Screenshot your distribution and paste it in
your solutions document.

l) Why is your randomization distribution centered at zero? Answer in one sentence.

m) Calculate the p-value from your randomization distribution using your observed statistic
calculated in part 2(a). First, click the ‘Right Tail’ button and enter the value of your
observed statistic in the blue box below the x-axis. Next, click the ‘Left Tail’ button and
enter the negative value of your observed statistic in the blue box below the x-axis (to the
left of zero). Then, if necessary, readjust your bottom blue box to the right of zero to
correctly display the value of the observed statistic. Finally, add the values of the two
blue boxes above their corresponding red x’s to obtain the p-value.

n) Is this p-value significant at the 10% significance level? Is it significant at the 5%
significance level? Compare the answers to these questions to your answers to parts (g)
and (i) in two complete sentences.

Definition Definition Measure of central tendency that is the value that occurs most frequently in a data set. A data set may have more than one mode if multiple categories repeat an equal number of times. For example, in a data set with five item—3, 5, 5, 29, 473—the mode is 5 because it occurs twice and no other value occurs more than once. On a histogram or bar chart, the element with the highest bar represents the mode. Therefore, the mode is sometimes considered the most popular option. The mode is useful for nominal or categorical data (e.g., the most common color car that users purchase), but it is problematic for continuous data because it is more likely not to have any value that is more frequent than the other.

Expert Solution

Trending now

This is a popular solution!

Step by step

Solved in 6 steps with 7 images

SEE SOLUTION Check out a sample Q&A here

Follow-up Questions

Read through expert solutions to related follow-up questions below.

Follow-up Question

Is this p-value significant at the 10% significance level? Is it significant at the 5% significance level? Compare the answers to these questions to your answers to parts (g) and (i) in two complete sentences.

Solution

by Bartleby Expert

SEE SOLUTION