HW3SOL

pdf

School

Texas A&M University *

*We aren’t endorsed by this school

Course

652

Subject

Sociology

Date

Apr 3, 2024

Type

pdf

Pages

4

Uploaded by BailiffRaven2568

Report
Homework 3 Problem 1. A sociologist is studying what factors may affect whether college students would support new laws that would make it a crime for students to purchase papers from the Internet and then turn in the papers as their own work. A random sample of 45 students at a large state university are interviewed and asked to provide a measure of their strength of support of criminalizing the purchase of term papers. A CRIME score is obtained from each student with values ranging from 0 to 25 with 0 being totally opposed to criminal penalties and 25 being totally in favor of criminal penalties. The following explanatory variables were also obtained from each student: age of student (A), number of years of college (N), income of parents (I) (in 1,000s of dollars), and gender (G) (0=female). Run JMP to answer the following questions. (1) Are there any collinearity problems based on the above data? Please look at the scatter plot matrix to answer this question. Yes. Which pairs have statistically significant correlations? Be 95% confident. none of the pairs college and income all of the pairs college and age income and gender age and gender college and gender age and income college and age; income and gender; college and income. (2) Use the output from the complete regression to determine which explanatory variables should be included in the model. gender income college age gender;age;income. student’s grade,student’s parents working status, the type of courses that the student is taking. (3) Use the output from a stepwise regression to determine which explanatory variables should be included in the model. Compare the results of your conclusions from the stepwise to your results from the complete regression. Are they different? No. (4) Use the Cp value to determine which explanatory variables should be included in the model. Does it agree with the previous models? What is the corresponding Cp value in decision making? Yes. No. 1
0.9360. Yes. Problem2 The basic process of making paper has not changed in more than 2,000 years. It involves two stages: the breaking up of raw material in water to form a suspension of individual fibers and the formation of felted sheets by spreading this suspension on a suitable porous surface, through which excess water can drain. Most paper is made from wood pulp that has been bleached with chlorine. This bleaching takes place for two reasons: to remove the last traces of a material called lignin from the raw pulp in order to make the paper stronger and to create a brilliant white writing surface. Chlorine is an ideal chemical for these tasks, but unfortunately its use in paper mills also results in a wide variety of toxic substances being released into the environment. Studies have been conducted to determine which factors in the paper process are most highly correlated with the brightness of finished paper. The article, Advantages of CE-HDP bleaching for high brightness kraft pulp production, Tappi 47 (1964): 170A175A, contains the following data on the variables: y brightness of finished paper, x1=hydrogen peroxide (% by weight), x2=sodium hydroxide (% by weight), x3=silicate (% by weight), x4=process temperature (in F). There were 32 runs in the study. (1) Use scatter plots and VIF to determine if there is evidence of collinearity in the explanatory variables. (a) Is there an evidence for collinearity according to the scatter plots? No. (b) Is there an evidence for collinearity according to the VIF? No. (2) Use a variable selection procedure with maximum R square as the criterion to formulate a model. Which variables are in the model? x1;x2;x3;x4. (3) Fit the model with all the given independent variables. (a) What are the coefficients in the regression model y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + β 4 x 4? β 0 = 80 . 42675 , β 1 = 6 . 1811694 , β 2 = 2 . 902164 , β 3 = 0 . 1162983 , β 4 = 0 . 0177532. (b) Is there evidence in the residual plots of a violation of the constant variance condition? No. Problem3 The cotton aphid is pale to dark green in cool seasons and yellow in hot, dry summers. Generally distributed throughout temperate, subtropic, and tropic zones, the cotton aphid occurs in all cotton-producing areas of the world. These insects congregate on lower leaf surfaces and on terminal buds, extracting plant sap. If weather is cool during the spring, populations of natural enemies will be slow in building up and heavy infestations of aphids may result. When this occurs, leaves begin to curl and pucker; seedling plants become stunted and may die. Most aphid damage is of this type. If honeydew resulting from late season aphid infestations falls onto open cotton, it can act as a growing medium for sooty mold. Cotton stained by this black fungus is reduced in quality and brings a low price for the grower. Entomologists studied the aphids to determine weather conditions which may result in increased aphid density on cotton plants. The following data were reported in Statistics and Data Analysis (2005) by Peck, Olson, and Devore and come from an extensive study as reported in the article, Estimation of the economic threshold of infestation for cotton aphid, Mesopotamia Journal of Agriculture (1982): 7175. In the data, y=infestation rate (aphids/100 2
leaves), x1=mean temperature (centigrate), x2=mean relative humidity. Run JMP to answer the following questions. (1) Fit the linear model with both independent variables. (a) What are the coefficients of the regression model y = β 0 + β 1 x 1 + β 2 x 2? β 0 = 20 . 002552 , β 1 = 0 . 290153 , β 2 = 1 . 4009541. (b) Is the model significant? Be 95% confident. Yes. (c) Now look at the individual parameters. i. Is x 1 a significant predictor? No. ii. Is x 2 a significant predictor? Yes. iii. Is there a collinearity problem according to VIF? No. (2) Fit the following new model, y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 2 + β 4 x 2 2 + β 5 x 1 x 2 + ε to the aphid data. Compare the linear model (reduced model) with the new model (complete model). (a) What is the value of the test statistics and p value? The test statistics = 1.64179003770589, the pvalue = 0.20137. (b) Did you reject H 0 ? Be 95% confident. No. (c) Which model has a higher adjusted R square? y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 1 2 + β 4 x 2 2 + β 5 x 1 x 2 + ε . (4) Calculate a new response variable ty = log ( y ), the natural logarithm of the aphid count. Fit the new model, ty = β 0 + β 1 x 1 + β 2 x 2 + ε to the aphid data. Compare the fit in this log model to the previous new model. (a) Which has the highest Adjusted R square? y = β 0 + β 1 x 1 + β 2 x 2 + ε y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 12 + β 4 x 2 2 + β 5 x 1 x 2 + ε ty = β 0 + β 1 x 1 + β 2 x 2 + ε y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 12 + β 4 x 2 2 + β 5 x 1 x 2 + ε . (b) Which has a lowest RMSE? y = β 0 + β 1 x 1 + β 2 x 2 + ε y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 12 + β 4 x 2 2 + β 5 x 1 x 2 + ε ty = β 0 + β 1 x 1 + β 2 x 2 + ε 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
ty = β 0 + β 1 x 1 + β 2 x 2 + ε . (c) Which has a lowest PRESS? y = β 0 + β 1 x 1 + β 2 x 2 + ε y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 12 + β 4 x 2 2 + β 5 x 1 x 2 + ε ty = β 0 + β 1 x 1 + β 2 x 2 + ε ty = β 0 + β 1 x 1 + β 2 x 2 + ε . 4