Assignment 1

pdf

School

McGill University *

*We aren’t endorsed by this school

Course

665

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

16

Uploaded by SuperBookPigeon37

Report
ECON665 Assignment 1 A. Household characteris1cs (15%) Look at how different the household characteris=cs are between the par=cipants and nonpar=cipants of microfinance programs. Open hh_98.dta, which consists of household-level variables. Please fill in the following table. Full sample Female par+cipants Household without female par+cipants Mean Standard Devia+on Mean Standard Devia+on Mean Standard Devia+on Average household size 5.300266 2.20517 5.205042 2.033799 5.406367 2.379091 Average household assets 155576.4 849719.9 75395.11 172979.1 244917 1216354 Average household landholding 76.83225 204.0166 42.35855 79.51883 115.244 279.7059 Average age of household head 46.0124 12.67865 46.0958 11.77622 45.91948 13.62452 Average years of educ+on of household head 2.317095 3.47617 1.752941 3.015304 2.945693 3.832669 Percentage of households with male head .9078831 .2893191 .9008403 .2991277 .9157303 .2780523 Are the sampled households very different among the full sample, par+cipants, and nonpar+cipants? It depends on what variable we are looking at. Some variables such as household size (famsize), age of household head (agehead), percentage of households with male head are not very different among full sample, female par=cipants, and non-female par=cipants. Some other variables such as household assets (hhasset), household landholding (hhland) are significantly varied within different sampled households. Some variables such as years of educ=on of household head (educhead) vary liUle by different sampled households. Is it possible that the Gender of household heads may also affect household characteris+cs.? Check by filling in the following table Male-headed households Female-headed households Mean Standard Devia+on Mean Standard Devia+on Average household size 5.413659 2.146629 4.182692 2.460422 Average yeas of head schooling 2.466341 3.539825 .8461538 2.314032 Average head age 45.63024 12.71268 49.77885 11.74482 Average household assets 159943.1 888157 112539.8 250752.1 Average household landholding 76.69499 204.8297 78.1851 196.7733
Are the sampled households headed by males very different from those headed by females? Yes. All variables are varied significantly by gender. We can hypothesize that male-headed households tend to have a higher average household size, average years of head schooling, and average household assets. Female-headed households on the other hand tend have higher average head age and average household landholding. B. Village characteris1cs (10%) Please find informa=on on: Mean Standard Devia+on If village is accessible by road .8352524 .371117 Percentage of village land irrigated .5603686 .3320238 C. Prices (15%) Full sample Par+cipants Non-par+cipants Mean Standard Devia+on Mean Standard Devia+on Mean Standard Devia+on Rice 10.28298 1.566328 10.31295 1.598671 10.23062 1.508652 Wheat 7.466872 .8467278 7.45622 .8090694 7.485481 .9095028 Edible oil 39.40337 4.008882 39.47184 4.141778 39.28375 3.76743 Milk 10.89583 3.381805 10.93529 3.382613 10.8269 3.383408 Potato 6.958305 1.059905 6.978586 1.081791 6.922874 1.020877 D. Expenditure (20%) The data set has household-level consump=on expenditure informa=on. Please look at the consump=on paUerns. Per capita expenditure Per capita food expenditure Per capita non food expenditure Mean Standard Devia+on Mean Standard Devia+on Mean Standard Devia+on By head gender 5473.268 4140.221 3660.191 1558.638 1813.078 3316.891 Male-headed households 5442.213 4041.49 3658.636 1554.719 1783.577 3246.287 Female-headed households 5779.345 5023.394 3675.514 1604.364 2103.831 3952.394 By head educ-on level 5473.268 4140.221 3660.191 1558.638 1813.078 3316.891 Head has some educ+on 6603.109 4917.749 4150.296 1736.203 2452.813 4054.352 Head has no educ+on 4676.236 3265.596 3314.451 1315.732 1361.784 2587.073 By household size 5473.268 4140.221 3660.191 1558.638 1813.078 3316.891 Large household (>5) 5089.209 3414.951 3424.852 1393.446 1664.357 2579.855
Small household (<= 5) 5738.315 4557.831 3822.602 1644.572 1915.713 3740.287 By land ownership 5473.268 4140.221 3660.191 1558.638 1813.078 3316.891 Large land ownership (>50/person) 8335.404 6243.355 5171.166 2551.548 3164.238 5285.045 Small land ownership or landless 5260.855 3860.389 3548.053 1396.497 1712.801 3102.424 Full sample Female par+cipants Household without female par+cipants Mean Standard Devia+on Mean Standard Devia+on Mean Standard Devia+on Per capita expenditure 5473.268 4140.221 5439.074 4118.625 5511.369 4167.686 Per capita food expenditure 3660.191 1558.638 3627.243 1371.99 3696.901 1743.831 Per capita non food expenditure 1813.078 3316.891 1811.831 3397.786 1814.467 3227.547 Please summarize your findings on per capita expenditure comparison. Any par+cular insight? - By head gender, there is not much difference in the per capita expenditure of total, food, and non-food categories amongst male-headed, female-headed, and full sample. - By head educ=on level, there is a significant difference in the per capita expenditure of total, food, and non-food categories amongst some educ=on group, no educ=on group, and full sample. It seems like when househead has some educa=on (years of educa=on >0), the expenditure is higher in all 3 categories. - By household size, there is not much difference in the per capita expenditure of total, food, and non-food categories amongst large household (famsize >5) , small household (famsize <= 5), and full sample. - By land ownership, there is a significant difference in the per capita expenditure of total, food, and non-food categories amongst large land ownership (hhland/famsize >50), small land ownership or landless (hhland/famsize <= 50), and full sample. It seems like when each person owns more than 50 unit of land, they also spend more in all 3 categories. - There are no difference amongst expenditure whether household have female par=cipants or not. E. Sta1s1cal Analysis (20%) Use these commands to support the descrip=ve analysis that you provided in the previous sec=ons. In using these commands, be careful to understand what the tes=ng entails: i.e. what the null hypothesis is, what the p-value reveals and how do the tests and graphs support (or not) the ini=al analysis you provided previously. 1. Finding the correla+on of different explanatory factors to microfinance par+cipants - We first created a new binary variable, par=cipant, using the or logic command for male and female par=cipants to indiciate if an entry has a response value of 0 = no microfinance par=cipants, and 1 = microfinance par=cipants.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
- We look at the descrip=ve sta=s=cs of demographic factors such as agehead, sexhead, educhead, and famsize, which in part A we see there are no differences between female par=cipants households, no female par=cipants households, and full sample. ð We also see the same paUern: these 4 factors do not have a strong impact on the dummy variables par=cipants. - We look at other demographic factors that shown in part A to create a stronger variance between sampled households: hhland, hhasset ð We also see the same paUern: hhland and hhasset vary largely between households that have and that don’t have microfinance par=cipants. ð Hypothesis 1: hhland and hhasset have a negave=ve correla=on on decisions to par=cipate microfinance. - To test this hypothesis, we use a logis=c regression model for y = par=cipant, x1 = hhland, x2 = hhasset.
+ The model has a likelihood ra=o chi-square = 86.78 with degree of freedom = 2 and the probability that obtaining the chi-square sta=s=c given that the null hypothesis (i.e, there is no effect of the independent variables) is true is very small, showing that our model is sta=s=cally significant. + Both factors hhland and hhasset have a nega=ve coefficients. This is consistent with our hypothesis that they have nega=ve correla=on on our response value par=cipant. We would expect a -0.0029013 unit decrease in the log odds of par=cipants for every one-unit increase in hhland, holding all other variables constant in the model. And we would expect a smaller decrease of -1.32e-06 unit in the log odds of par=cipants for every one-unit increase in hhasset. So we can say that hhland have a more siginificant impact on the response value than hhasset. + We also see that both factors hhland and hhasset have very small p-values, indica=ng sta=s=cal significance. We can reject the null hypothesis and say that the coefficients are significantly different from 0. 2. Finding the correla+on of microfinance par+cipant factors to household characteris+cs ð Hypothesis 2: Households that have microfinance par=cipants have a lower amount of landholding and assets compare to households that don’t have microfincance par=cipants
+ The hhland-par=cipant model has a low R-squared value of 4.3%, indica=ng that 4.3% of the variance in hhland can be explained by par=cipant factor. The p-value associated with this F value is very small (0.0000), indica=ng that par=cipant is sta=s=cally significant in predic=ng hhland. Similar observa=ons can be seen in the hhasset-par=cipant model. + Looking at the coefficient itself, a nega=ve coefficient of par=cipant is consistent with our hypothesis that a household with par=cipants lower hhland and hhasset. P-value is very small, so we can can reject the null hypothesis and say that the coefficient of par=cipant is significantly different from 0. ð Hypothesis 3: Households that have female microfinance par=cipants have a lower amount of landholding and assets compare to households that don’t have female microfincance par=cipants
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
+ The hhland-female par=cipant model has a low R-squared value of 3.18%, indica=ng that 3.18% of the variance in hhland can be explained by par=cipant factor. The p-value associated with this F value is very small (0.0000), indica=ng that par=cipant is sta=s=cally significant in predic=ng hhland. Similar observa=ons can be seen in the hhasset-female par=cipant model. + Looking at the coefficient itself, a nega=ve coefficient of female par=cipant is consistent with our hypothesis that a household with female par=cipants lower hhland and hhasset. P-value is very small, so we can can reject the null hypothesis and say that the coefficient of female par=cipant is significantly different from 0. + We can also compare the quality of fit of 2 models using female par=cipant vs total par=cipant factor. Because the models using par=cipants have higher R-squared and adjusted R-squared value, it indicates that par=cipant is a beUer eplanatory factor for hhland and hhasset. 3. Finding the correla+on of gender of household heads to household characteris+cs
ð Hypothesis 4: Households led by males have a higher average household size, average years of head schooling, and average household assets than female-led households. + The famsize-sexhead model has a low R-squared value of 2.61%, indica=ng that 2.61% of the variance in famsize can be explained by gender of household heads factor. The p-value associated with this F value is very small (0.0000), indica=ng that gender of household is sta=s=cally significant in predic=ng household size. Similar observa=ons can be seen in the educhead-sexhead model. + Looking at the coefficient itself, a posi=ve coefficient of sexhead is consistent with our hypothesis that a household led by males have a higher famsize and educhead . P-value is very small, so we can can reject the null hypothesis and say that the coefficient of sexhead is significantly different from 0.
+ The hhasset-sexhead model has a low R-squared value of 0.03%, indica=ng that 0.03% of the variance in household asset can be explained by gender of household heads factor. The p-value associated with this F value is very large (0.588), indica=ng that gender of household is not sta=s=cally significant in predic=ng household asset. + Even though the coefficient of sexhead is posi=ve, but the p-value indicates sta=s=cal insignificance. We fail to reject that the coefficient of sexhead is different from 0. The correla=on of gender of households and household asset is not proven. ð Hypothesis 5: Households led by males have a lower average household headage and landholding. + The agehead-sexhead model has a low R-squared value of 0.9%, indica=ng that 0.9% of the variance in age of household head can be explained by gender of household heads factor. The p-value associated with this F value is very small (0.0000), indica=ng that gender of household is sta=s=cally significant in predic=ng agehead. + Looking at the coefficient itself, a nega=ve coefficient of sexhead is consistent with our hypothesis that a household led by males have a lower agehead. P-value is very small, so we
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
can can reject the null hypothesis and say that the coefficient of sexhead is significantly different from 0. + The hhland-sexhead model has a R-squared value of 0%, indica=ng that 0% of the variance in household asset can be explained by gender of household heads factor. The p-value associated with this F value is very large (0.9435), indica=ng that gender of household is not sta=s=cally significant in predic=ng household landholding. + Even though the coefficient of sexhead is nega=ve, but the p-value indicates sta=s=cal insignificance. We fail to reject that the coefficient of sexhead is different from 0. The correla=on of gender of households and household landholding is not proven. 4. Finding the correla+on of different dummy variables to household expenditure ð Hypothesis 6: Households with some educa=on have a higher total expenditure, food, and nonfood expenditure
+ The exptot-edu model has a low R-squared value of 0.53%, indica=ng that 0.53% of the variance in total expenditure can be explained by educa=on factor. The p-value associated with this F value is very small (0.0000), indica=ng that educa=on is sta=s=cally significant in predic=ng total expenditure. Similar paUerns can be observed for food and nonfood expenditure models. + Looking at the coefficient itself, a posi=ve coefficient of edu is consistent with our hypothesis that a household that has some level of educa=on (educhead is different than 0) increases total expenditure, food and non-food expenditure. P-value is very small, so we can can reject the null hypothesis and say that the coefficient of edu is significantly different from 0. ð Hypothesis 7: Households higher per capita land ownership spend more in total, food, and nonfood
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
+ The exptot-land model has a low R-squared value of 0.35%, indica=ng that 0.35% of the variance in total expenditure can be explained by per capita land ownership factor. The p-value associated with this F value is very small (0.0000), indica=ng that per capita land ownership is sta=s=cally significant in predic=ng total expenditure. Similar paUerns can be observed for food and nonfood expenditure models. + Looking at the coefficient itself, a posi=ve coefficient of land is consistent with our hypothesis that a household that has higher landownership per capita (hhland/famsize > 50) increases total expenditure, food and non-food expenditure. P-value is very small, so we can can reject the null hypothesis and say that the coefficient of land is significantly different from 0. F. Another look at Exploratory Data Analysis (20%) For this ques=on, you should go on the Mycourses page and download two data sets from the Gabor book: hotels-europe_price.dta and hotels-europe_features.dta. You may remember the example on hotels in Vienna from the Gabor book in Chapter 3 that looked at the ra=ngs and the prices of hotel rooms in Vienna. This ques=on asks you to describe the rela=onships between the stars ra=ngs of hotels and the prices they charge for the rooms in two different ci=es: Rome and Paris. In other words, do something like case study 3.B1 for both the prices and the stars ra=ngs for the hotels in both Rome and Paris and present your descrip=ve analysis, using some of the EDA principles discussed in class and in Chapter 3 of the Gabor book. - We first merge 2 datasets using hotel_id variable - We first look at Star ra=ngs aUribute of Hotels in Rome and Paris
+ We can see that there in Rome, the lowest star ra=ng hotel is 2 while in Paris’ lowet star ra=ng hotel is 1.5. Max, Mean, and SD values of hotels in 2 ci=es are similar. + Looking at the histogram, we can see that the distribu=on of star ra=ng hotels in both ci=es skew len, meaning there are more hotels of ra=ng >= 3.5. But skewness in Paris is more significant, meaning that even though Paris has lower star ra=ng hotel type, the total number of hotels with star ra=ng <= mean value is fewer than that of Rome. - We then look at the price aUribute of Hotels in Rome and Paris The descrip=ve of price has a large range and SD due to weekend and number of nights factors, so I filter out these 2 and only look at weekdays and nnights = 1.
We also see that by filtering out weekend data, there is no record for price of hotels in 2018 in the 2 ci=es.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
+ We can see that price per night of Hotels in weekdays 2017 in Rome has lower sta=s=c values than in Paris, indica=ng a lower price range with less variability. + Looking at the histogram, both ci=es have very high right skew distribu=on of price. However, Paris have a higher variability in price distribu=on shown through higher SD value as well as higher distribu=on in around mean value. - We take a closer look into the rela=onships between weekday price and star ra=ngs of hotels in Rome and Paris + We can see that for hotels of 1.5 and 2 stars, both ci=es have similar price range distribu=on. The higher star ra=ngs, the more right skewed the distribu=on of price in both ci=es is. Paris has a more extreme right skewness comparing to Rome. For both ci=es, the most right skewed distribu=on happens in 4.5-star hotel group. + While in Rome, minimum price of hotels stay the same regardless or star ra=ngs for <= 4.5 star type, Paris’ hotels have a tendency to increase minimum price in correla=on with star ra=ngs. Maximum price of both ci=es lie in the 4.5-star hotel group.