BUSN5000 HW1 Part B
pdf
keyboard_arrow_up
School
Northwestern University *
*We aren’t endorsed by this school
Course
5000
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
10
Uploaded by arushic12
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 1 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
B. Empirical exercise
In this exercise you will demonstrate basic knowledge about data structures
and data documentation. Along the way, you will be introduced to a few R
commands. You will do this using a data set constructed by David Card
(https://davidcard.berkeley.edu/) for his well-known study analyzing the
e
!
ect of education on wages:
Card, D., “Using Geographic Variation in College Proximity to Estimate the
Return to Schooling”
(https://davidcard.berkeley.edu/papers/geo_var_schooling.pdf), in Aspects
of Labour Market Behavior: Essays in Honour of John Vanderkamp
, E.
Christophides, et al., eds, Toronto: University of Toronto Press (1995).
These data will be featured prominently in Part II of the course when we will
replicate some of Card’s analysis. For now, we’ll take the opportunity
describe the key features of his data, as we would if we had conducted his
analysis ourselves.
The version of Card’s data we will use comes from the wooldridge
package. You don’t have to worry about installing packages in this
environment because that has been taken care of for you, but if you were
replicating this exercise on your machine you would need to and here’s the
command to do it:
install.packages("wooldridge")
install.packages("wooldridge")
Then you would load the package using the library
function:
library()
library()
Once loaded, all of the package’s exported functions and objects become
directly accessible in your R session. This means you can use those functions
and objects as if they were part of the base R distribution without needing to
reference the package name.
Homework 1:
Homework 1:
Data
Data
Fundamentals
Fundamentals
A. Short
answer
(https://chris-
cornwell.shinyapps.io/Assignment_1/#section-
a.-short-
answer)
B.
Empirical
exercise
(https://chris-
cornwell.shinyapps.io/Assignment_1/#section-
b.-
empirical-
exercise)
Submission
(https://chris-
cornwell.shinyapps.io/Assignment_1/#section-
submission)
Start Over
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 2 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Finally, you may want to explicitly load the Card data into your R
environment. The Card data set is named card
in the wooldridge
package. (That’s card
, all lower case. Case matters in R.) It’s generally not
necessary to explicitly load the data with the data function a
"
er the relevant
package is attached, but it will helpful with the project.
Question 1:
First, use the library()
and data()
functions to load the wooldridge
package and card
data set.
There will be a few coding questions in the homework assignments that we
need to grade so that you are on track to continue. This is one of them. So,
before moving on make sure you click Submit Answer. If you have completed
the code chunk correctly, you will get a “Correct” response in a green-shaded
box below the chunk. Errors will be indicated in a red box.
Provenance
Before we look at the structure of the data, let’s do a little provenance
work.
(Just a little.) Go to the paper linked to above to answer the next six
questions.
Question 2:
Card obtained the data from the _____.
Card obtained the data from the _____.
NLSYM
Correct!
Question 3:
R Code
!
Start Over
▶
Run Code
☑
Submit Answer
I couldn’t have done it better myself. Correct!
library(wooldridge)
data(card)
1
2
3
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 3 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
The source of Card’s data is a survey that began in _____ with
The source of Card’s data is a survey that began in _____ with
_____ young men age 14-24.
_____ young men age 14-24.
1966, 5525
Correct!
Question 4:
The same young men were surveyed again in selected years
The same young men were surveyed again in selected years
through _____ , e
!
ectively creating a _____ data set where the
through _____ , e
!
ectively creating a _____ data set where the
unit of observation is the person- _____ .
unit of observation is the person- _____ .
1981, panel, year
Correct!
Question 5:
The survey was not a random sample of the US population
The survey was not a random sample of the US population
because men from neighborhoods with a high concentration of
because men from neighborhoods with a high concentration of
_____ residents were over-sampled.
_____ residents were over-sampled.
non-white
Correct!
Question 6:
Card’s analysis is based on the 1976 survey when the youngest
Card’s analysis is based on the 1976 survey when the youngest
respondents are _____. By 1976, attrition had reduced the
respondents are _____. By 1976, attrition had reduced the
sample size to _____ observations. A
"
er filtering the sample on
sample size to _____ observations. A
"
er filtering the sample on
observations with valid education and wage data, Card is le
"
observations with valid education and wage data, Card is le
"
with an analysis sample of _____ young men.
with an analysis sample of _____ young men.
24, 3964, 97
Incorrect
Try Again
Continue
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 4 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Data Documentation and Structure
Now let’s turn to documentation and structure.
The wooldridge vignette
(https://cran.r-
project.org/web/packages/wooldridge/wooldridge.pdf) provides
descriptions of the variables contained in the data set. Use the vignette to
answer the next few questions.
Question 7:
The The key
key
variable in the data set is _____.
variable in the data set is _____.
id
Correct!
Question 8:
The The wage
wage
variable is measured in _____. The variable is measured in _____. The lwage
lwage
variable
variable
is the _____ transformation of is the _____ transformation of wage
wage
.
cents, log
Correct!
Question 9:
The variable The variable exper
exper
measures labor-market experience as
measures labor-market experience as
______.
______.
education
Incorrect
Try Again
The str()
function, which provides an overview of the data type, size, and
content in a data set. Apply it to determine the structure of the card
data
set and answer the questions that follow.
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 5 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Question 10:
The The card
card
data set contains _____ observations and _____
data set contains _____ observations and _____
variables.
variables.
3010, 34
Correct!
Question 11:
R Code
!
Start Over
▶
Run Code
'data.frame': 3010 obs. of 34 variables:
'data.frame': 3010 obs. of 34 variables:
$ id : int 2 3 4 5 6 7 8 9 10 11 ...
$ id : int 2 3 4 5 6 7 8 9 10 11 ...
$ nearc2 : int 0 0 0 1 1 1 1 1 1 1 ...
$ nearc2 : int 0 0 0 1 1 1 1 1 1 1 ...
$ nearc4 : int 0 0 0 1 1 1 1 1 1 1 ...
$ nearc4 : int 0 0 0 1 1 1 1 1 1 1 ...
$ educ : int 7 12 12 11 12 12 18 14 12 12
$ educ : int 7 12 12 11 12 12 18 14 12 12 ...
...
$ age : int 29 27 34 27 34 26 33 29 28 2
$ age : int 29 27 34 27 34 26 33 29 28 2
9 ...
9 ...
$ fatheduc: int NA 8 14 11 8 9 14 14 12 12 .
$ fatheduc: int NA 8 14 11 8 9 14 14 12 12 .
..
..
$ motheduc: int NA 8 12 12 7 12 14 14 12 12
$ motheduc: int NA 8 12 12 7 12 14 14 12 12 ...
...
$ weight : num 158413 380166 367470 380166
$ weight : num 158413 380166 367470 380166 367470 ...
367470 ...
$ momdad14: int 1 1 1 1 1 1 1 1 1 1 ...
$ momdad14: int 1 1 1 1 1 1 1 1 1 1 ...
$ sinmom14: int 0 0 0 0 0 0 0 0 0 0 ...
$ sinmom14: int 0 0 0 0 0 0 0 0 0 0 ...
$ step14 : int 0 0 0 0 0 0 0 0 0 0 ...
$ step14 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg661 : int 1 1 1 0 0 0 0 0 0 0 ...
$ reg661 : int 1 1 1 0 0 0 0 0 0 0 ...
$ reg662 : int 0 0 0 1 1 1 1 1 1 1 ...
$ reg662 : int 0 0 0 1 1 1 1 1 1 1 ...
$ reg663 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg663 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg664 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg664 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg665 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg665 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg666 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg666 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg667 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg667 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg668 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg668 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg669 : int 0 0 0 0 0 0 0 0 0 0 ...
$ reg669 : int 0 0 0 0 0 0 0 0 0 0 ...
$ south66 : int 0 0 0 0 0 0 0 0 0 0 ...
$ south66 : int 0 0 0 0 0 0 0 0 0 0 ...
$ black : int 1 0 0 0 0 0 0 0 0 0 ...
$ black : int 1 0 0 0 0 0 0 0 0 0 ...
$ smsa : int 1 1 1 1 1 1 1 1 1 1 ...
$ smsa : int 1 1 1 1 1 1 1 1 1 1 ...
$ south : int 0 0 0 0 0 0 0 0 0 0 ...
$ south : int 0 0 0 0 0 0 0 0 0 0 ...
$ smsa66 : int 1 1 1 1 1 1 1 1 1 1 ...
$ smsa66 : int 1 1 1 1 1 1 1 1 1 1 ...
str(card)
1
2
3
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 6 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
What data type is What data type is lwage
lwage
? _____. How about ? _____. How about wage
wage
? ______.
? ______.
(Use the full-name description of the data type in your
(Use the full-name description of the data type in your
answers.)
answers.)
numeric, integer
Correct!
Question 12:
The third person in the data set is _____ years old, has _____
The third person in the data set is _____ years old, has _____
years of education, has _____ years of experience, and reported
years of education, has _____ years of experience, and reported
a wage of $ ______ .
a wage of $ ______ .
34, 12, 16, 7.21
Correct!
The skim()
function provided by the skimr
package is another useful
tool for data documentation Load skimr
via a library()
command and
then “skim” the card
data. Answer a few more questions based on the
skim()
output.
R Code
!
Start Over
▶
Run Code
Data summary
Name
card
Number of rows
3010
Number of columns
34
_______________________
Column type frequency:
numeric
34
________________________
Group variables
None
Variable type: numeric
Variable type: numeric
library(skimr)
skim(card)
1
2
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 7 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
skim_variable
skim_variable
n_missing
n_missing
complete_rate
complete_rate
mean
mean
sd
sd
p0
p0
id
0
1.00
2581.75
1500.54
2.00
nearc2
0
1.00
0.44
0.50
0.00
nearc4
0
1.00
0.68
0.47
0.00
educ
0
1.00
13.26
2.68
1.00
age
0
1.00
28.12
3.14
24.00
fatheduc
690
0.77
10.00
3.72
0.00
motheduc
353
0.88
10.35
3.18
0.00
weight
0
1.00
321185.26
170645.80
75607.00
momdad14
0
1.00
0.79
0.41
0.00
sinmom14
0
1.00
0.10
0.30
0.00
step14
0
1.00
0.04
0.19
0.00
reg661
0
1.00
0.05
0.21
0.00
reg662
0
1.00
0.16
0.37
0.00
reg663
0
1.00
0.20
0.40
0.00
reg664
0
1.00
0.06
0.25
0.00
reg665
0
1.00
0.21
0.41
0.00
reg666
0
1.00
0.10
0.29
0.00
reg667
0
1.00
0.11
0.31
0.00
reg668
0
1.00
0.03
0.17
0.00
reg669
0
1.00
0.09
0.29
0.00
south66
0
1.00
0.41
0.49
0.00
black
0
1.00
0.23
0.42
0.00
smsa
0
1.00
0.71
0.45
0.00
south
0
1.00
0.40
0.49
0.00
smsa66
0
1.00
0.65
0.48
0.00
wage
0
1.00
577.28
262.96
100.00
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 8 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Question 13:
How many variables have missing data? _____ .
How many variables have missing data? _____ .
6
Correct!
Question 14:
What percentage of young men in the sample are missing What percentage of young men in the sample are missing IQ
IQ
test scores? _____ % . (Answer to 1 decimal place, for example:
test scores? _____ % . (Answer to 1 decimal place, for example:
``99.9’’ percent)
``99.9’’ percent)
31.5
Correct!
Question 15:
What percentage of the sample are Black? _____ %. Is that
What percentage of the sample are Black? _____ %. Is that
representative of the US population in 1976? (Yes/No) _____ .
representative of the US population in 1976? (Yes/No) _____ .
23, no
Correct!
Finally, use the object.size
function to estimate the amount of memory
allocated to store the Card data.
enroll
0
1.00
0.09
0.29
0.00
KWW
47
0.98
33.54
8.61
4.00
IQ
949
0.68
102.45
15.42
50.00
married
7
1.00
2.27
2.07
1.00
libcrd14
13
1.00
0.67
0.47
0.00
exper
0
1.00
8.86
4.14
0.00
lwage
0
1.00
6.26
0.44
4.61
expersq
0
1.00
95.58
84.62
0.00
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 9 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Question 16:
Based on Based on object.size
object.size
the Card data take up _____ MB in
the Card data take up _____ MB in
memory. (Round to 3 digits)
memory. (Round to 3 digits)
0.438
Correct!
Continue
Previous Topic
Next Topic
R Code
!
Start Over
▶
Run Code
438416 bytes
438416 bytes
object.size(card)
1
2
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
8/28/23, 9
:
33 PM
Homework 1: Data Fundamentals
Page 10 of 10
https://chris-cornwell.shinyapps.io/Assignment_1/#section-b.-empirical-exercise
Related Documents
Recommended textbooks for you
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Recommended textbooks for you
- Big Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin HarcourtGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
data:image/s3,"s3://crabby-images/af711/af7111c99977ff8ffecac4d71f474692077dfd4c" alt="Text book image"
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
data:image/s3,"s3://crabby-images/b9e14/b9e141b888912793d57db61a53fa701d5defdb09" alt="Text book image"
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
data:image/s3,"s3://crabby-images/9ae58/9ae58d45ce2e430fbdbd90576f52102eefa7841e" alt="Text book image"
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL