Classwork #9-ExplainVariation-TrumpVote

pdf

School

California State University, Los Angeles *

*We aren’t endorsed by this school

Course

3020

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by PostMalonFalling

Classwork #9-ExplainVariation-TrumpVote March 28, 2024 1 Classwork #9: Predicting Presidents by Explaining Variation [2]: # This code will load the R packages we will use suppressPackageStartupMessages ({ library (coursekata) }) # Updated USStates data with election data USStates <- read.csv ( "https://docs.google.com/spreadsheets/d/e/ ↪ 2PACX-1vSEc6kO1zrL_3Jlc_cA7cMgk6E2xcIjuUbTL50y-0ENwWby36EFj1MpWZLVKud8YMTtqb1zsef_a8Ss/ ↪ pub?gid=1275513973&single=true&output=csv" , header = TRUE ) 1.1 1.0 - Which states might vote for Trump in 2024? Former president Trump made these remarks at CPAC 2021: Actually, as you know, they just lost the White House. But it’s one of those things. But who knows, who knows? I may even decide to beat them for a third time. Okay? Today we will consider this question: If president Trump decided to run again in 2024, what kind of states would vote for him? 1.1 - One of the biggest uses of statistics is for the purpose of prediction. Why might it be useful to predict voting results of presidential elections? [3]: head (USStates) A data.frame: 6 × 19 State HouseholdIncome IQ Region Population EighthGradeMath Hi <chr> <int> <dbl> <chr> <dbl> <dbl> <d 1 Alabama 38160 95.7 S 5.024279 262.21 82 2 Alaska 57071 99.0 W 0.733391 278.96 90 3 Arizona 46693 97.4 W 7.151502 274.31 84 4 Arkansas 37458 97.5 S 3.011524 271.64 79 5 California 54385 95.5 W 39.538223 268.56 81 6 Colorado 53900 101.6 W 5.773714 280.82 88 We’re going to look at a data frame called USStates . Remember that you can use functions like head() , and glimpse() to get different kinds of information about the data. In addition to the offcially documented data, we added a variable called TrumpVote20 . 1

• State Name of state • HouseholdIncome Mean household income (in dollars) • IQ Mean IQ score of residents • McCainVote Percentage of votes for John McCain in 2008 Presidential election • Region Area of the country: MW=Midwest, NE=Northeast, S=South, or W=West • Pres2008 Which president won that state in 2008 (McCain or Obama) • Population Number of residents (in millions) • EighthGradeMath Average score on standardized test administered to 8th graders • HighSchool Percentage of high school graduates • GSP Gross State Product (dollars per capita) • FiveVegetables Percentage of residents who eat at least five servings of fruits/vegetables per day • Smokers Percentage of residents who smoke • PhysicalActivity Percentage of residents who have competed in a physical activity in past month • Obese Percentage of residents classified as obese • College Percentage of residents with college degrees • NonWhite Percentage of residents who are not white • HeavyDrinkers Percentage of residents who drink heavily • TrumpVote16 Percentage of votes for Donald Trump in 2016 Presidential election • TrumpVote20 Percentage of votes for Donald Trump in 2020 Presidential election • BidenVote20 Percentage of votes for Joe Biden in 2020 Presidential election 1.2 - Take a look at the variable TrumpVote20 at the very end of the data frame. Does the TrumpVote20 variable tell you how many people voted for Trump? Why or why not? It doesn’t show many people exactly but it gives us a percentage 1.3 - To explore variation in how the states voted, make a visualization of TrumpVote20 . What do you notice? Is there anything surprising about this distribution? [4]: gf_histogram ( ~ TrumpVote20, data = USStates) 2

1.2 2.0 - Explaining Variation in TrumpVote20 2.1 - You might suppose that states that had a large share of votes for Trump in 2020 might also have larger shares of Trump votes in 2024. It might not be exactly the same, but similar. What kind of states might tend to vote for Trump? Take a look at some of the variables in the data frame for some ideas. population religion 2.2 - Quick Review : What does it mean to “explain variation”? Explaining variation means that variation is underestood with the variable 2.3 - Which of these two variables do you think will likely explain more of the variation in TrumpVote20 : FiveVegetables or NonWhite ? Why? I think Nonwhite because I don’t think eating 5 vegetables will show that much of a relationship 2.4 - Let’s apply our casual definition of “explaining variation.” What would it mean for NonWhite to explain the variation in TrumpVote20 ? What would it mean for FiveVegetables to explain the variation in TrumpVote20 ? Nonwhite shows nonwhite people that voted for trump and FiveVegetables means that people eat five vegetables a day voted for trump 2.5 - What we have are two little theories about the world. Let’s write these theories as word equations to represent the relationship between variables. These word equations will serve as our 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

first attempt to model the variation we see in TrumpVote20 . How would we interpret these word equations? TrumpVote20=FiveVegetables + Other stuff TrumpVote20= NonWhite + Other Stuff 2.6 - What if we find out neither of our models help us explain variation in TrumpVote20 ? How would we update our word equation? How would we interpret it in words? TrumpVote20= Other Stuff 1.3 3.0 - Exploring TrumpVote20 = NonWhite + Other Stuff. 3.1 - Let’s take a look at a few states, and their level of TrumpVote20 and NonWhite percentages. Is there a way to look just at those variables in this data frame? Yes by using select [5]: select (USStates, State, TrumpVote20, NonWhite) %>% head () A data.frame: 6 × 3 State TrumpVote20 NonWhite <chr> <dbl> <dbl> 1 Alabama 62.03 29.4 2 Alaska 52.83 26.2 3 Arizona 49.06 31.1 4 Arkansas 62.40 17.8 5 California 34.32 53.0 6 Colorado 41.90 22.5 3.2 - Let’s just take a look at the state of Alabama. What do these numbers mean? [6]: filter (USStates, State == "Alabama" ) A data.frame: 1 × 19 State HouseholdIncome IQ Region Population EighthGradeMath HighSc <chr> <int> <dbl> <chr> <dbl> <dbl> <dbl> Alabama 38160 95.7 S 5.024279 262.21 82.4 3.3 - Let’s make a visualization to explore this model: some of the variation in the percentage of votes for Trump is explained by the proportion of NonWhite individuals in that state. If I run the code below, I get a very unfortunate looking plot. Why? [7]: gf_histogram ( ~ TrumpVote20, data = USStates) %>% gf_facet_grid (NonWhite ~ .) 4

[8]: USStates $ NonWhite 1. 29.4 2. 26.2 3. 31.1 4. 17.8 5. 53 6. 22.5 7. 16.4 8. 22.1 9. 35.7 10. 36.8 11. 73.3 12. 13.5 13. 32 14. 14.1 15. 6.8 16. 13.6 17. 9.4 18. 36.9 19. 4.8 20. 37.6 21. 17.2 22. 21.1 23. 9.8 24. 37.7 25. 15.7 26. 9 27. 12.4 28. 37.4 29. 5.4 30. 35.2 31. 48.9 32. 39.1 33. 29.5 34. 7 35. 14.9 36. 28.1 37. 16.8 38. 15.4 39. 16.5 40. 33.3 41. 8.3 42. 20.3 43. 43.1 44. 12.1 45. 5.6 46. 22.9 47. 18.1 48. 7.7 49. 9.1 50. 9.6 3.4 - Try to find a more effective way of visualizing the relationship between NonWhite and TrumpVote20 variables. [15]: gf_point (TrumpVote20 ~ NonWhite, data = USStates) %>% gf_lm () 5

3.5 - What do you notice in this visualization? Are you surprised by anything you see here? Nonwhite votes decrease and are low. Im not susprised due to the narrative that trump is a racist and misogynist was heavy at the time 3.6 - Hmmm… TrumpVote20 and NonWhite are both percentages but one is out of 1.00 and the other is out of 100. What can we do to make them both consistent? Then, try making the visualization again. Does it change? What changes? What stays the same? NonWhite states with high percentages shows lower trump votes. For a low Nonwhite state theres higher percentage 3.7 - Based on this, how would you adjust your prediction of TrumpVote20 for a hypothetical state that had a very high NonWhite percentage? How about for a low NonWhite state? NonWhite states with high percentage shows lower trump votes. For a low NonWhite states theres higher percentage 1.4 4.0 - Exploring TrumpVote20 = FiveVegetables + Other Stuff 4.1 - Make a visualization to explore the idea that some of the variation in TrumpVote20 is explained by FiveVegetables . [17]: gf_point (TrumpVote20 ~ FiveVegetables, data = USStates) %>% gf_lm () 6

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

4.2 - What do you notice in this visualization? Are you surprised by anything you see here? i noticed a decrease of people that eat FiveVegetables among all 50 states 4.3 - Based on this, how would you adjust your prediction of TrumpVote20 for a hypothetical state that had a very high FiveVegetables percentage? For a low FiveVegetables state? 1.5 5.0 - Comparing our Two Models 5.1 - Based on our visualizations, what kinds of states seem to have a lower TrumpVote20 ? [ ]: The states that had more vegetables throughout the day had lower trump votes and nonwhites 5.2 - If we didn’t know anything about a state, what should we predict their TrumpVote20 to be? [ ]: That ' s h a r d t o t e l l b e c a u s e n o t a l o t o f v a r r i a b l e s a r e u s e d e x c e p t t h e s t a t e s 5.3 - As you eyeball the visualizations you’ve made so far, which variable seems to explain more variation in TrumpVote20 : FiveVegetables or NonWhite ? What aspect of the visualizations are you looking at to make that judgment? [18]: gf_point (TrumpVote20 ~ FiveVegetables, data = USStates) %>% gf_lm () gf_point (TrumpVote20 ~ NonWhite, data = USStates) %>% gf_lm () 7

5.4 - Now that we have explored this data, consider this tweet that someone wrote. What’s wrong with it? Eating some kale salad? You probably aren’t a Trump supporter! Data proves that people who eat unhealthy are more likely to vote for Trump. 1.6 6.0 - Reflect and Connect 6.1 - In our Jupyter notebook lesson 4A , we looked at the gamesales data and explored whether a video game’s platform can explain variation in critic and user ratings. How could we write those models of the data with our word equations? [ ]: Critic_Scores = Platform + Other Stuff User_Scores = Platform + Other Stuff 6.2 - Compare those to the models we explored today: • TrumpVote20 = NonWhite + Other Stuff • TrumpVote20 = FiveVegetables + Other Stuff Aside from the variable names, what makes our models in 4A different from our models in 4B? What makes them similar? [ ]: The outcome variables are quantitative which makes them similar 6.3 - In both 4A and this lesson, how did we decide whether the explanatory variables were explain- ing variation in the outcome variables, even though we were using different visualizations? Why did we need to use different visualizations? 1.7 7.0 - Data in the News 7.1 - If you are interested in further reading on political leanings and food preferences, check out this article: https://recipes.howstuffworks.com/do-food-choices-demonstrate-political-preferences.htm 7.2 - Can you tell a Trump fridge from a Biden fridge? Try your luck in the game in this article: https://www.nytimes.com/interactive/2020/10/27/upshot/biden-trump-poll- quiz.html?action=click&module=Editors%20Picks&pgtype=Homepage 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version