Tut2023T1_Week9_Answers_v2
pdf
keyboard_arrow_up
School
University of New South Wales *
*We aren’t endorsed by this school
Course
1190
Subject
Communications
Date
Apr 3, 2024
Type
Pages
7
Uploaded by MinisterKoala1969
1 | P a g e COMM1190 Data, Insights and Decisions Week 9 Workshop •
Read over the lecture notes thoroughly. R •
One problem requires R using data file housing.xls
. Problem Set (these will be discussed in tutorial classes) Q1.
Jenny Craig is a weight-loss intervention. Their commercials show a photo of some celebrity before and after joining Jenny Craig. a)
What are the control and treatment in this experiment? •
This is a (within person) before and after design. It is the weight of the same person being compared before (control) and after (treatment). b)
Are you worried about any selection problems? Explain. •
Jenny Craig has control over who they use in their ads and so are likely to choose only success stories and presumably the more successful the better. What’s more they choose when to take the after photo. What if the success story only relates to a short period and the person reverts to old habits so that an after photo taken later actually indicates no change? •
These are sample selection
problems. We need to distinguish these from the endogenous selection
issue, where people choose whether to be treated or not. This latter selection issue is also relevant as people choose to join Jenny Craig and those that do will typically be motivated to join and to be successful in losing weight.
2 | P a g e Q2.
A certain school typically has two, year 7 classes that occupy one of two rooms. These rooms have different capacities (20 or 25 students) presenting the opportunity to exploit a natural experiment to investigate the impact of class size on student performance. Suppose you have data from this school’s year 7 cohort collected over many years and comprising student grades and whether they were in the small or large class. Under what conditions would you expect the difference in year 7 grades between those in the large class relative to those in the small class to best reflect the casual effect of class size on student performance? •
Allocation to classes needs to be random & not by say: date of birth with split being into younger & older students; or by year 6 performance into better & poorer performers. •
The allocation of teachers needs to be random. •
The performance should be measured on a standardized test that is comparable over time. •
The population of students needs to be relatively stable over time again to ensure comparability in results. Q3.
Your statistically naïve friend, Denzil, is very interested in the impact on housing prices of being located under the flight path. Given data on sales over a month, the regression of housing price
on flightpath
(a dummy variable indicating whether the house was under the Kingsford-Smith airport flightpath) provides a difference in means estimate. (a)
Explain why this estimate would be a poor indication of the impact on sales of being under the flightpath. •
This is a “difference-in-means” regression as flightpath
is a binary variable that divides sales according to whether they were under the flightpath or not. This is fine as a descriptive tool but not if you want to infer causality. •
There are likely to be many other confounding factors here that would be correlated with flightpath
and would help explain the price difference. (b)
After the Western Sydney International Airport at Badgerys Creek Sydney opens, suppose the decision is made to close the east-west runway at the Kingsford-Smith airport leaving just the two north-south runways in operation. This is purely hypothetical and highly unlikely to happen but suppose it does. Denzil now revisits his original research question, but this time uses two samples of houses that were located under the east-west flight path. But one sample represented sales in a year before the closure of the runway while the second sample represented sales in a year after the runway was closed. Now he runs a regression of the following form: 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑖𝑖
=
𝛽𝛽
0
+
𝛽𝛽
1
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
𝑖𝑖
+
𝑢𝑢
𝑖𝑖
where 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
= housing sale price; 𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
= 1 if house was sold after the runway closure and zero otherwise. How do you interpret the regression parameters for this model? •
This is a “difference-in-means” regression as after
is a binary variable that divides sales according to whether they were sold after the closure of the runway or not. •
The estimated intercept will be the mean of sales price for those houses sold in suburbs under the runway and before it was closed. The estimated after parameter will be the difference in means associated with being sold after the closure. •
Because the negative features of being under the flightpath (notably aircraft noise) has been removed you would expect house prices of those previously under the flightpath would increase and hence the estimate of 𝛽𝛽
1
to be positive.
3 | P a g e (c)
Do you think this is a good research design to address this problem? Explain. •
As always, the interpretation of 𝛽𝛽
1
relies on the likely unobservables captured in the disturbance and whether they are likely correlated with after
. •
Here you would worry about what is happening in the housing market in general. You may find a positive estimate of 𝛽𝛽
1
but is that attributable to the reduced aircraft noise or simply that the housing market is active, and all houses are increasing in value over time? •
This is an example of a natural experiment. In such cases you worry that the treatment is truly exogenous. For example, if the closure has been mooted for a while, then developers may buy up houses in the affected area and renovate in anticipation of the expected post-
closure appreciation in value. This again would distort the before and after estimate as a reflection of just being under the flightpath or not. Q4.
Underquoting is an illegal business practice that occurs when real estate agents are alleged to underquote the valuation of the houses they have for sale. In their advertising and in email responses to enquiries, these agents quote prices that are low relative to an accurate valuation in the hope of luring more potential buyers. These practices are unethical and contrary to the professional standards expected of real estate agents, and furthermore such agents would be in breach of various consumer protection laws related to false advertising and engaging in misleading or deceptive conduct and could be prosecuted for being in breach; see below. Suppose there is a current court case and you have been commissioned as an independent expert witness. Note you will report directly to the judge. The judge is expecting you to use available data to provide evidence useful in evaluating claims of underquoting in the case being brought against Agent X. The judge has given you authority to collect whatever data you see fit.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4 | P a g e (a)
Formulate the proposed approach in terms of estimating a causal effect. Explain what this causal effect is in this situation. •
Underquoting occurs when sale prices are systematically higher than the quoted prices. Think of this 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑎𝑎𝑎𝑎
=
𝑠𝑠𝑎𝑎𝑠𝑠𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 − 𝑞𝑞𝑢𝑢𝑞𝑞𝑎𝑎𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
as the outcome of interest. •
Conceptually, we would want to compare sales of houses under two counterfactual scenarios; one where the sale was made by agent X and one by another control agent.
(b)
Suppose you have collected data from Domain to provide a sample of recent house sales where the variables include house characteristics, selling price, quoted price (if a range was provided this is the average of the upper and lower prices), and whether sold by Agent X or not (a dummy variable denoted by AX
). Discuss whether these data could be used to provide a good estimate of the relevant causal effect. •
Thus, you could run the following regression using these data to obtain an estimate of 𝛽𝛽
1
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑎𝑎𝑎𝑎
𝑖𝑖
=
𝛽𝛽
0
+
𝛽𝛽
1
𝐴𝐴𝑋𝑋
𝑖𝑖
+
𝑝𝑝𝑞𝑞𝑐𝑐𝑎𝑎𝑝𝑝𝑞𝑞𝑠𝑠𝑠𝑠
+
𝑢𝑢
𝑖𝑖
•
𝛽𝛽
1
would represent the difference in the means of 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑎𝑎𝑎𝑎
comparing the sales of Agent X with the sales of all other agents after controlling for house characteristics. A positive estimate of 𝛽𝛽
1
would be consistent with Agent X underquoting. •
There are numerous threats to interpreting this estimate as causal. The main ones are: o
A cursory look at Domain indicates that selling and quoted prices are not always disclosed. We immediately have a “selected” sample that could be a source of bias. o
Possibly a more serious sample selection problem is that you can only observe selling prices given the owner first puts their house on the market (which often will depend on prevailing market conditions, think of headlines about properties selling way above the reserve price) and then only if the owner subsequently accepts the offer. Both are going to mean it is highly likely that the difference between the quoted and selling price will be positive even absent any underquoting problem! o
There is another pervasive problem in that owners choose agents and one of the main criteria they use to choose is the ability of the agent to get a good price. This is an endogenous treatment problem. A positive estimate of 𝛽𝛽
1
may simply reflect that Agent X is good at selling houses. Again, there may be no underquoting. Their quoted price might be an accurate reflection of market value, but Agent X is able to get an even better price and this in turn may be why he is chosen. (c)
Discuss how you would design an experiment to estimate the relevant causal effect. Explain why such an experiment would not be feasible. •
In an experiment a sample of houses would be chosen to be put up for sale. They would then need to be randomly assigned to Agents including Agent X. •
Just by articulating what is involved should be enough to indicate this would never happen. But even if you got this far, a less than ethical agent is unlikely to reveal their usual practices when being closely monitored in the experimental setting, (This is called a Hawthorne effect.) (d)
This type of independent expert witness approach to providing data-based evidence to the court contrasts with the usual approach. More commonly, the defendant and the prosecution employ their own experts to provide such evidence so that the judge needs to evaluate what could possibly be different conclusions being drawn by different experts. Compare these alternative approaches of employing expert witnesses to provide objective evidence to the court. (A useful reference is section C of the ASA guidelines for statistical practice
.)
5 | P a g e •
You would hope that experts in data analysis adhere to their professional standards in which case it would not matter which of the two approaches is used to provide evidence. Nevertheless, having an independent expert answerable directly to the court allays any fears that experts are shading their analysis towards the interests of their respective clients. •
By “shading their analysis”, we don’t necessarily mean that the analyst is lying or doing anything illegal. But we do know from earlier in the course that there are often various options in presenting statistical material and hence opportunities to slant the evidence in a certain direction. In this context, that would typically mean not presenting statistical evidence according to best practice and hence would contravene ethical guidelines for presenting such evidence.
6 | P a g e The following problem involves the use of R. Q5: Upgrading infrastructure, such as freeways, has the potential to increase property values for impacted dwellings. Here the claim is that a major freeway upgrade has led to an increase in the value of residential houses. (a)
Explain how you would design an experiment to test this causal claim. Is the experiment you have proposed feasible to conduct? •
Randomly allocate houses to whether they benefit from the upgrade and compare average selling prices between the two groups. •
Upgrade the freeway and observe the selling price of a sample of houses. Now rewind the clock and don’t upgrade the freeway and observe the selling prices of the same houses. Attribute any differences in average selling prices to the upgrade. •
Clearly neither is feasible. (b)
Instead of the experiment outlined in (a), consider investigating the claim by using available observational data contained in housing.xls
. This is a sample of house sales over a 12-month period covering 9 months before and 3 months after the completion of the freeway upgrade. The data includes information on dwelling characteristics including a dummy variable, 𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
, that indicates whether the house was in an area with ready access to the freeway. Using only the data where 𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
= 1
, estimate the regression, 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
=
𝛽𝛽
0
+
𝛽𝛽
1
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
+
𝑢𝑢
, where 𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
is a dummy variable that is equal to one when the sale is made in the three months after the completion of the upgrade and zero otherwise. Interpret these results and discuss whether they are consistent with the claim of improved property values. •
Results for the remaining questions have been collected in the following table. •
Houses sold after attracted a selling price that was $122,400 higher and you would reject the null hypothesis of no effect at the 5% level with a one tail test (p-value is 0.037), although not with a two-tailed test as evidenced by the confidence interval that includes zero. •
The results are consistent with the claim of improved property values BUT see next part. (c)
Identify the potential problems associated with the approach in (b) by contrasting it with the experiment you described in (a). •
You worry that this estimate reflects, at least in part, other factors. A confoundment problem. Variables Part (b) Part (d) Part (e) after 122.4 (-11.9, 256.8)
80.3 (22.6, 138.1) 74.6 (18.7, 130.5) upgrade -87.9 (-176.0, 0.1) 137.7 (48.9, 226.5) upgrade*after 42.1 (-203.5, 287.6) 63.5 (-174.2, 301.2) distance -52.7 (-58.6, -46.9) constant 1212.8 (1164.8, 1260.8) 1300.7 (1278.9, 1322.6) 1768.5 (1712.7, 1824.4) Sample size (
n
) 282 4663 4663 𝑅𝑅
2
0.011 0.003 0.066
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7 | P a g e •
Maybe it is just an overall increase in property values irrespective of the upgrade. •
Such problems were avoided by either winding back the clock or random assignment. (d)
Consider using a difference-in-difference approach using the entire sample to estimate following the regression model. What do you conclude from these results? 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
=
𝛽𝛽
0
+
𝛽𝛽
1
𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
+
𝛽𝛽
2
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
+
𝛽𝛽
3
𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
×
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
+
𝑢𝑢
•
Notice that the coefficient of interest that represents the impact of the upgrade of housing prices (bold in the table) is now the coefficient on the interaction term. This DiD estimate of the effect $42,100 is still positive but much smaller than in (b). •
Moreover, the estimate is very imprecise (CI is wide) and formally you would not reject the hypothesis of no effect at the 5% level. (e)
Someone evaluating your results notes that the freeway extension only impacts people living in the outer suburbs where prices are lower. They are concerned that the results in (d) may be biased and your conclusions problematic. Run the following regression to explore this concern. What do you conclude? 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
=
𝛽𝛽
0
+
𝛽𝛽
1
𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
+
𝛽𝛽
2
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
+
𝛽𝛽
3
𝑢𝑢𝑝𝑝𝑢𝑢𝑝𝑝𝑎𝑎𝑝𝑝𝑝𝑝
×
𝑎𝑎𝑎𝑎𝑎𝑎𝑝𝑝𝑝𝑝
+
𝛽𝛽
4
𝑝𝑝𝑝𝑝𝑠𝑠𝑎𝑎𝑎𝑎𝑐𝑐𝑝𝑝𝑝𝑝
+
𝑢𝑢
•
It is true that distance from the CBD does imply lower prices (on average $52,700 less for each km ceteris paribus). •
This impacts the estimated coefficient on the upgrade, which is now positive and statistically significant, but this is not the parameter of primary interest. •
Controlling for distance does not change the qualitative conclusion of the parameter of interest. The DiD estimate of $63,500 is positive and imprecise.