Assignment 2 2023W1-1

docx

School

University of British Columbia *

*We aren’t endorsed by this school

Course

380

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

5

Uploaded by ElderFog6364

Report
Assignment 2 There are FIVE questions for a total of 25 points – points per question are indicated in parentheses before each question. In none of this should you paste in your R results to give us a string of numbers to look at. Even if it looks OK when you paste it, it doesn’t look good when we grade it. Write sentences with the numbers included and explained. You may work on the assignment with other students, but all answers must be written up individually. Answers that are substantially identical to those of another student will be treated as plagiarism. This assignment is going to use TWO different datasets ! 1) The first is the same as last time: your individual sample of the Census Microdata File. 2) For Questions 3 – 5, you will use the Canadian National Election Study from 2021. It is posted on Canvas under /modules/Data Sets and Codebooks After you’ve written the answers in a document that you save for yourself, submit the answers in the appropriate question box on Canvas. Q1: Proportion mean & CI using census data. [2pts] (1 point) a. Use your census dataset sample to estimate the NUMBER (not the percentage) of people living in Canada who were born outside of the country in 2016. Use the variable “pob”. Assume the total population of Canada is exactly 35 million. (You have no other source to help you estimate the number, just your sample). (1 point) b. Indicate how far away from the true number of people born outside Canada in the full population (i.e. not the percentage) you would expect to be, 19 times out of 20. Say: ± ______ number of people. (NOT ± %) You can calculate this using the formula for the standard error of a proportion and then use that result to calculate the number of Canadians, as you will have done in question a. In Canvas, just enter one number for a) and another number for b). For this question you do not need to ‘write up’ the answers. 1
2
Q2. Means from the census [5 pts] Find the average 'income from government transfers' for people born inside and outside Canada. For the income variable, use gtrfs and check the codebook to find values that you need to exclude from your analysis. In addition, restrict your analysis to those aged 25-64. Report the means in a smoothly worded paragraph that summarizes the findings for a reader. Be sure to account for missing values. Questions 3 to 5: Canadian National Election Study. Now switch to the 2021 Canadian National Election Study. DO THIS FIRST : Draw a random sample of 3,500 cases from the dataset. That way you will all get different samples that I can have my computer replicate. First, set the random number seed by typing: set.seed(courseidnumber) (where you replace ‘courseidnumber’ with the same number as your course id (also the number from your census data set), NOT your real student number ). Run the function sample_n(data, 3500) , where you replace 'data' with the name of the census database in the R environment. You will need to load the dplyr package to run it. Note, when you open the data set, it is called ces21.data . (CES is Canadian Election Study and 21 is… 2021). You might want to change the data set name. All the variables start with cps21_ or pes21_ (cps= Campaign Survey – so these questions were asked during the campaign; pes=Post-election survey – these questions were asked after the election). It might get tricky to be writing things like ces21.data%>%summarize(cps21_variable4). It’s easy to mix up ces21 and cps21 Q3. Attitudes toward immigrants [10 points] We are going to look Canadians’ attitudes toward immigrants. We know, of course, that many Canadians are also immigrants. To start, prepare the following variables for use in an index: pes21_fitin, pes21_immigjobs, cps21_groups_therm_2 . For each variable, code people who did not answer the question or said don’t know as NA. Then recode each variable so that they range from 0 to 4 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
with higher values being more positive evaluations of immigrants. For the first two variables, there are five values so your variable will have the values 0,1,2,3,4. For the feeling thermometer, recode the variable so it ranges 0 to 4. People will have many different values within that range (e.g. if someone rated immigrants at 70 out of 100, there score will be 2.8 on the 0 to 4 scale. Combine the three variables into an index that also ranges 0 to 4. You’ll also need to use cps21_age to answer this question In a single, concise, and engaging paragraph, summarize the distribution of the ‘feelings toward index’ for a) all survey respondents, b) respondents under 30 years old and c) respondents over 60 years old. Write your response for someone who has not seen this question. Be sure to provide information about how these concepts were measured as well as descriptive information about these distributions (i.e., mean, range, information about spread, information about shape of the distributions). Your audience is someone reading a newspaper article or op-ed. You’ll need to explain, in general terms, what the index variable measures and what different values/scores mean. At the end of your answer, paste the R code you used to recode the variables and create the index. Also the output of the command ur.data%> %tabyl(ur.index) Q4: Economic concerns, migrants, and crosstabs [5 points] What is the relationship between people’s expectations about their own economic situation and vote choice? Start with the variable cps21_own_fin_future . Exclude people who did not answer any of ‘better’, ‘same’, or ‘worse’. Then use pes21_votechoice2021 to measure vote. To make things a bit easier, let’s focus only on the Liberals, CPC, and NDP (so you can exclude people who voted for another party (or didn’t vote) or code them as NA). Now run a crosstab telling us how vote choice is distributed within categories of the ‘financial future’ question. Report results (in percentage) of this crosstab in a clear and compelling paragraph. Use some of the results from the table, but not all, when explaining what you found. You do not need to run a chi-square test, or report the p-value, for this question. 4
Hint, both variables will have three categories so you should end up with a 3 by 3 table. Q5: Interpreting a p-value [3pts] Run the same crosstab as in Question 4 but only among people who answered ‘yes’ to “Do you belong to a union” ( cps21_union ). When you run the crosstab this time be sure to use the chisq.test function to get a p-value. In a single sentence, report and interpret the p-value for an intelligent reader who knows little about statistics. Your answer should only talk about the p-value and what it means. Don’t discuss the percentages present in your crosstab for this question. 5