Assignment 2 Solved(1)

docx

School

University Of Connecticut *

*We aren’t endorsed by this school

Course

5604

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

8

Uploaded by EarlMusicCrab33

Report
Part 1 – Refer to page 89 of your textbook and answer the following problems. Use the attached RidingMowers and LaptopSalesJanuary2008 datasets to answer the following questions. You may complete this part in either JMP or Python. 3.2 a – With most of the owners (as indicated by the blue dots) occurring in the top right area of the data space, it seems that people are people are more likely own a riding lawnmower then they have a larger lot and a higher income. This makes sense because riding lawnmowers are more expensive than push mowers so a higher income would be required to decide to make that expensive purchase. Also, riding lawnmowers are more useful on a larger lot. If a lot is too small, it might be hard to navigate a riding lawnmower in the small space. In fact, when the lot is smaller than 16,000 sqft, nobody owns a riding lawnmower. When the lot is large, you can relax and ride your mower around instead of doing the hard work of pushing it across the yard. The data show that everyone with a lot size over 21,000 sqft owns a riding lawnmower. 3.3 a – The store in postcode N17 6OA has the highest average retail price of 495. The store in postcode W4 3PH has the lowest retail price of 481.
Part 2 – Continue working in the LaptopSalesJanuary2008 dataset to answer the following questions. You may complete this part in either JMP or Python. 1. Assuming the dataset includes all laptop models sold by the stores, would you want to use the Screen Size column to predict Retail Price? Why or why not? Justify your choice with an appropriate visualization.
When I look at the distribution of the screen size column, I see that all values in that column are the same, exactly 15 inches. Both the max and min values are 15 inches. Because there is no variability in the data for screen size, it can’t impact the price. When a column is all one value, we don’t use it for modeling. 2. What has a bigger impact on Retail Price – RAM or Processor Speed? Make comparative box plots to support your answer. Include screen shots of the visualizations you made. (Hint: For boxplots you should have one continuous variable and one categorical variable.) When looking at the boxplots of the distribution of retail price subsetted by RAM, we can see that laptops with 2 GB of RAM are generally priced higher than laptops with 1 GB of RAM. The median price of laptops with 2GB of RAM is 500 compared to 470 when there’s 1 GB of RAM. The quartiles, min, and max are all higher when the RAM is 2 GB.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
When looking at the boxplots of the distribution of retail price subsetted by processor speed, we can see that the medians are almost the same. They only differ by 15. The higher processor speed is higher at each key point in the box plot, but only by a little. When comparing the two variables as potential predictors, the RAM drives more differentiation in price than the processor speed. So I would expect the RAM to be a better predictor. 3. Check the correlations among Configuration, Retail Price, and CustomerStoreDistance. Make three observations about the correlations beyond just the numerical value. What do the numbers and patterns indicate about the meaning of the data? Include a screenshot of the correlations and scatterplot matrices.
Here are some observations that occur to me from the output above: - The configuration variable is not truly continuous. That is why you see the banding in the scatterplots that include the configuration column. There is data in the ranges from 1-80, 145- 224, and 289-368.
- As the configuration number increases, the laptop probably also has higher end features which drive up price. A little bit of extra exploration shows that the configuration ranges are based off of battery life. - At first I thought that customers seem to be a little bit more willing to travel further to purchase the higher battery life laptop configurations. This is a very weak pattern though, as evidenced by the 0.0021 correlation. I’m just noticing that there seems to be a bit more dots on the right side of the scatterplot for the highest group of configurations. So I binned the configurations in the ranges that indicate the different battery life and RAM groupings. I then looked at the distribution of the distance traveled for each of these. There is no noteworthy pattern here. So the configuration does not influence the distance people travel to purchase.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
- Price doesn’t seem to be a factor in how far people travel to purchase a laptop. They will travel about the same distances for all different price points. To further show this relationship, I binned the distances (assuming that within each range the person’s perception of the distance isn’t that different) and then looked at the distribution of the price for each distance. The only variability that appears is at the upper end where the data gets very thin. NOTE: In the cases where I did an extra vis to justify my answers for this question, that isn’t necessary for this assignment. Just interpreting the correlations and scatterplot matrices is
enough. I just wanted to go ahead with the next step to show you how I approach what I’m seeing.