Assignment 2 Solved(1)
docx
keyboard_arrow_up
School
University Of Connecticut *
*We aren’t endorsed by this school
Course
5604
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
8
Uploaded by EarlMusicCrab33
Part 1
– Refer to page 89 of your textbook and answer the following problems. Use the attached RidingMowers and LaptopSalesJanuary2008 datasets to answer the following questions. You may complete this part in either JMP or Python.
3.2 a – With most of the owners (as indicated by the blue dots) occurring in the top right area of the data space, it seems that people are people are more likely own a riding lawnmower then they have a larger lot and a higher income. This makes sense because riding lawnmowers are more expensive than push mowers so a higher income would be required to decide to make that
expensive purchase. Also, riding lawnmowers are more useful on a larger lot. If a lot is too small, it might be hard to navigate a riding lawnmower in the small space. In fact, when the lot is smaller than 16,000 sqft, nobody owns a riding lawnmower. When the lot is large, you can relax and ride your mower around instead of doing the hard work of pushing it across the yard. The data show that everyone with a lot size over 21,000 sqft owns a riding lawnmower.
3.3 a – The store in postcode N17 6OA has the highest average retail price of 495. The store in postcode W4 3PH has the lowest retail price of 481.
Part 2
– Continue working in the LaptopSalesJanuary2008 dataset to answer the following questions. You may complete this part in either JMP or Python. 1.
Assuming the dataset includes all laptop models sold by the stores, would you want to use the Screen Size column to predict Retail Price? Why or why not? Justify your choice with an appropriate visualization.
When I look at the distribution of the screen size column, I see that all values in that column are the same, exactly 15 inches. Both the max and min values are 15 inches. Because there is no variability in the data for screen size, it can’t impact the price. When a column is all one value, we don’t use it for modeling. 2.
What has a bigger impact on Retail Price – RAM or Processor Speed? Make comparative box plots to support your answer. Include screen shots of the visualizations you made. (Hint: For boxplots you should have one continuous variable and one categorical variable.)
When looking at the boxplots of the distribution of retail price subsetted by RAM, we can see that laptops with 2 GB of RAM are generally priced higher than laptops with 1 GB of RAM. The median price of laptops with 2GB of RAM is 500 compared to 470 when there’s 1 GB of RAM. The quartiles, min, and max are all higher when the RAM is 2 GB.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
When looking at the boxplots of the distribution of retail price subsetted by processor speed, we
can see that the medians are almost the same. They only differ by 15. The higher processor speed is higher at each key point in the box plot, but only by a little. When comparing the two variables as potential predictors, the RAM drives more differentiation in price than the processor speed. So I would expect the RAM to be a better predictor. 3.
Check the correlations among Configuration, Retail Price, and CustomerStoreDistance. Make three observations about the correlations beyond just the numerical value. What do the numbers and patterns indicate about the meaning of the data? Include a screenshot of the correlations and scatterplot matrices.
Here are some observations that occur to me from the output above:
- The configuration variable is not truly continuous. That is why you see the banding in the scatterplots that include the configuration column. There is data in the ranges from 1-80, 145-
224, and 289-368.
- As the configuration number increases, the laptop probably also has higher end features which drive up price. A little bit of extra exploration shows that the configuration ranges are based off of battery life. - At first I thought that customers seem to be a little bit more willing to travel further to purchase
the higher battery life laptop configurations. This is a very weak pattern though, as evidenced by
the 0.0021 correlation. I’m just noticing that there seems to be a bit more dots on the right side of the scatterplot for the highest group of configurations. So I binned the configurations in the ranges that indicate the different battery life and RAM groupings. I then looked at the distribution of the distance traveled for each of these. There is no noteworthy pattern here. So the configuration does not influence the distance people travel to purchase.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
- Price doesn’t seem to be a factor in how far people travel to purchase a laptop. They will travel
about the same distances for all different price points. To further show this relationship, I binned
the distances (assuming that within each range the person’s perception of the distance isn’t that different) and then looked at the distribution of the price for each distance. The only variability that appears is at the upper end where the data gets very thin. NOTE: In the cases where I did an extra vis to justify my answers for this question, that isn’t necessary for this assignment. Just interpreting the correlations and scatterplot matrices is
enough. I just wanted to go ahead with the next step to show you how I approach what I’m seeing.
Related Documents
Related Questions
The entirety of the data set will be in the two pictures
arrow_forward
Please use the given info to answer the subquestion Part B
arrow_forward
Please use the given info to answer the subquestion Part A
arrow_forward
Block: _
Date.
Five year-old Susie's parents are concerned that she seems short for her age. Their doctor has the followng rer
of Susie' height:
Age (months): 20
30
45
50
57
60
Height (cm):
80
86
87
90
91
94
(a) Make a scatterplot of these data:
arrow_forward
please. help me answer this question. thank you
arrow_forward
In IBM SPSS, what does clicking on this icon do?
arrow_forward
Give a detailed outline for this:
Also, do not give plagirised answer.
Suppose that you have two sets of data. The first set is a list of all the
injuries that were seen in a clinic in a month's time. The second set
contains data on the number of minutes that each patient spent in the
waiting room of a doctor's office. You can make assumptions about
other information or variables that are included in each data set.
For each data set, propose your idea of how best to represent the key
information.
To organize your data, would you choose to use a frequency table, a
cumulative frequency table, or a relative frequency table? Why?
What type of graph would you use to display the organized data from
each frequency distribution? What would be shown on each of the
axes for each graph?
Minimum of 1 scholarly source and one appropriate resource such as
the textbook, math video, and/or math website
arrow_forward
You will use the following data set to answer all parts of the project. This data set is the number of students enrolled at CCA from 2015 to 2019 by semester
Fall 2015
6933
Summer 2015
2495
Spring 2015
7518
Fall 2016
7386
Summer 2016
2301
Spring 2016
8056
Fall 2016
8025
Summer 2016
2235
Spring 2016
8725
Fall 2018
7982
Summer 2018
2140
Spring 2018
8436
Fall 2019
5859
Summer 2019
2089
Spring 2019
9048
1) Find the mean, median, and mode for CCA Student enrollment data set. Then using the formulas for samples, find the variance and standard deviation.
2) Organize the data set on student enrollment by creating a frequency distribution and include the relative frequency. Group the data into seven logical equal intervals starting with 2,000 ≤ x < 10,000 and so on.
x
f
relative f…
arrow_forward
Hi i’m in grade 12 Data Management and i need help with this practice question
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Male
Green Condition
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Gender Too Fast Fine
Female
35
40
Too Fast
10
65
60
Fine
40
Total
100
100
Female Golfers
200
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Note: This exercise is an example of Simpson's Paradox.
39
Fine
9
Total
75
125
Which group shows the highest percentage saying that the greens are too fast?
Females, at 40%
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For…
arrow_forward
Continue monitoring the process. A second ten days of data have been collected, see table labeled “2nd 10 Days of Monitoring Reservation Processing Time” in the Data File.
Develop Xbar and R charts for the 2nd 10 days of monitoring. Plot the data for the 2nd 10 days on the Xbar and R charts.
Is the reservation process for the 2nd 10 days of monitoring in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart.
Based on the X-bar and R Charts that you developed for the 2nd 10 days of data, is the process in control?
Group of answer choices
No. The X-bar and R Charts are both out of control.
No. The X-bar Chart is in control, but the R Chart is out of control.
No. The R Chart is in control, but the X-bar Chart is out of control.
Yes. The X-bar and R Charts are both in control.
arrow_forward
Page
of 11
ZOOM +
5. The table shows how the cost of a carne asada taco at my favorite taco
stand has increased as they have become more popular since their
opening in 2013. Use the data to answer the questions below.
Year, x
2013, 0 2014, 1
2015, 2 2016, 3 2017, 4 2018, 5 2019,6
Cost ($) 0.50
0.55
0.65
0.75
0.90
1.00
1.10
(a) What is the regression line given by your TI-84 for this data?
Round values to 3 decimal places.
(b) Using the regression equation above, predict the cost of a carne asada
taco at my favorite taco stand in 2020. Show the work.
arrow_forward
Please help!!
From these following topics in CANADA, choose one of the 3, and formulate a research question, in which you could analyze the data.
Sawn wood: Sawn wood, production, deliveries and stocks by species
Weather: Weather data for Hamilton January 2020
Basketball - The Raptors: Statistics with the players
arrow_forward
Help me fast so that I will give Upvote.
arrow_forward
I need help with this problem
arrow_forward
Please fill in the blanks and select the correct answers over the image provided. Thank you
arrow_forward
Please help
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below.
team
league
price
wins
cost/win
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
Cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
NL
20.09
82
34.64
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
NL
26.73
89
48.69
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71…
arrow_forward
On December 17, 2007 baseball writer John Hickey wrote an article for the Seattle P-I about increases to ticket prices for Seattle Mariners games during the 2008 season. The article included a data set that listed the average ticket price for each MLB team, the league in which the team plays (AL or NL), the number of wins during the 2007 season and the cost per win (in dollars). The data for the 16 National League teams are shown below.
team
league
price
wins
cost/win
Arizona Diamondbacks
NL
19.68
90
35.40
Atlanta Braves
NL
17.07
84
32.89
Chicago Cubs
NL
34.30
85
65.33
Cincinnati Reds
NL
17.90
72
40.32
Colorado Rockies
NL
14.72
90
26.67
Florida Marlins
NL
16.70
71
38.13
Houston Astros
NL
26.66
73
59.11
Los Angeles Dodgers
NL
20.09
82
34.64
Milwaukee Brewers
NL
18.11
83
35.37
N.Y. Mets
NL
25.28
88
46.56
Philadelphia Phillies
NL
26.73
89
48.69
Pittsburgh Pirates
NL
17.08
68
40.67
San Diego Padres
NL
20.83
89
38.15
San Francisco Giants
NL
24.53
71…
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,
Related Questions
- Block: _ Date. Five year-old Susie's parents are concerned that she seems short for her age. Their doctor has the followng rer of Susie' height: Age (months): 20 30 45 50 57 60 Height (cm): 80 86 87 90 91 94 (a) Make a scatterplot of these data:arrow_forwardplease. help me answer this question. thank youarrow_forwardIn IBM SPSS, what does clicking on this icon do?arrow_forward
- Give a detailed outline for this: Also, do not give plagirised answer. Suppose that you have two sets of data. The first set is a list of all the injuries that were seen in a clinic in a month's time. The second set contains data on the number of minutes that each patient spent in the waiting room of a doctor's office. You can make assumptions about other information or variables that are included in each data set. For each data set, propose your idea of how best to represent the key information. To organize your data, would you choose to use a frequency table, a cumulative frequency table, or a relative frequency table? Why? What type of graph would you use to display the organized data from each frequency distribution? What would be shown on each of the axes for each graph? Minimum of 1 scholarly source and one appropriate resource such as the textbook, math video, and/or math websitearrow_forwardYou will use the following data set to answer all parts of the project. This data set is the number of students enrolled at CCA from 2015 to 2019 by semester Fall 2015 6933 Summer 2015 2495 Spring 2015 7518 Fall 2016 7386 Summer 2016 2301 Spring 2016 8056 Fall 2016 8025 Summer 2016 2235 Spring 2016 8725 Fall 2018 7982 Summer 2018 2140 Spring 2018 8436 Fall 2019 5859 Summer 2019 2089 Spring 2019 9048 1) Find the mean, median, and mode for CCA Student enrollment data set. Then using the formulas for samples, find the variance and standard deviation. 2) Organize the data set on student enrollment by creating a frequency distribution and include the relative frequency. Group the data into seven logical equal intervals starting with 2,000 ≤ x < 10,000 and so on. x f relative f…arrow_forwardHi i’m in grade 12 Data Management and i need help with this practice questionarrow_forward
- Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Male Green Condition Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Gender Too Fast Fine Female 35 40 Too Fast 10 65 60 Fine 40 Total 100 100 Female Golfers 200 Green Condition Handicap Under 15 15 or more Too Fast 1 Note: This exercise is an example of Simpson's Paradox. 39 Fine 9 Total 75 125 Which group shows the highest percentage saying that the greens are too fast? Females, at 40% b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For…arrow_forwardContinue monitoring the process. A second ten days of data have been collected, see table labeled “2nd 10 Days of Monitoring Reservation Processing Time” in the Data File. Develop Xbar and R charts for the 2nd 10 days of monitoring. Plot the data for the 2nd 10 days on the Xbar and R charts. Is the reservation process for the 2nd 10 days of monitoring in control? If the control chart indicates an out-of-control process, note which days, the pattern, and whether it is the Xbar or R chart. Based on the X-bar and R Charts that you developed for the 2nd 10 days of data, is the process in control? Group of answer choices No. The X-bar and R Charts are both out of control. No. The X-bar Chart is in control, but the R Chart is out of control. No. The R Chart is in control, but the X-bar Chart is out of control. Yes. The X-bar and R Charts are both in control.arrow_forwardPage of 11 ZOOM + 5. The table shows how the cost of a carne asada taco at my favorite taco stand has increased as they have become more popular since their opening in 2013. Use the data to answer the questions below. Year, x 2013, 0 2014, 1 2015, 2 2016, 3 2017, 4 2018, 5 2019,6 Cost ($) 0.50 0.55 0.65 0.75 0.90 1.00 1.10 (a) What is the regression line given by your TI-84 for this data? Round values to 3 decimal places. (b) Using the regression equation above, predict the cost of a carne asada taco at my favorite taco stand in 2020. Show the work.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Elementary Geometry for College StudentsGeometryISBN:9781285195698Author:Daniel C. Alexander, Geralyn M. KoeberleinPublisher:Cengage LearningMathematics For Machine TechnologyAdvanced MathISBN:9781337798310Author:Peterson, John.Publisher:Cengage Learning,
Elementary Geometry for College Students
Geometry
ISBN:9781285195698
Author:Daniel C. Alexander, Geralyn M. Koeberlein
Publisher:Cengage Learning
Mathematics For Machine Technology
Advanced Math
ISBN:9781337798310
Author:Peterson, John.
Publisher:Cengage Learning,