IDS 270 CH2 Lab

pdf

School

University of Illinois, Chicago *

*We aren’t endorsed by this school

Course

270

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

2

Uploaded by DeaconTeam13422

Report
Lab CH2: Introduction to Least-Squares Regression The data set CSDATA used for this assignment contains information about all 224 students who entered a large university in a single year and who planned to major in computer science. We are interested in predicting GPA (Grade Point Average) after three semesters of college from information available before the student enters college. In this assignment you will look at a number of potential explanatory variables for predicting a college student’s GPA. You will use variables HSM (High School Math), HSS (High School Science), HSE (High School English), SATM (SAT Math score), SATV (SAT Verbal score), and SEX to predict the response variable GPA. First generate the correlation coefficient between all variables, and - on the basis of the sign and the magnitude - determine the direction and strength of a linear relationship. Next, produce a scatterplot between the response variable (GPA) and each of the predictor variables. Examine the scatterplot for extreme values (outliers) and discuss. You should also produce a scatterplot between the predictor variables to visually display the relationship between them. Once you have the scatterplots, examine them to determine which relationships look most promising. The most promising relationships will have the data closely clustered about a potential straight line. Finally, determine the form of the linear relationship between the response variable and each predictor variable in turn. Produce a business report covering your analysis and conclusions using the software results as supporting evidence. Time The lab exercise takes approximately 2-3 hours of dedicated time to complete. Objectives Upon completing this assignment, you should be able to: Construct a regression analysis using a single predictor variable based on a business problem statement. Identify the response (dependent) variable and the explanatory (predictor, independent) variable and differentiate between quantitative variables and categorical variables in the regression model. Perform preliminary analysis, including scatterplots, descriptive statistics, and correlation analysis, to visualize the data and determine whether or not a linear relationship exists. Compute the linear regression model parameters using software, and relate the software results to material presented in the book. Use your knowledge of the t-test statistic and associated p-value to examine the model parameters (intercept and slope) and decide whether they are statistically significant or not. Use the model to predict. Based on your statistical analysis results, be able to prepare a business report presenting the conclusions of the analysis. Report is not required (to be submitted), but highly recommended that you create a report. Instructions Download the needed data set .
TASK 1: Identify the types of variables (quantitative or categorical) TASK 2: Generate correlations between all quantitative variables. On the basis of the sign and the magnitude - determine the direction and strength of each linear relationship obtained. Discuss the two best predictor variables for GPA. Please note, the correlations between the response variable (GPA) and the predictor variables (other variables except obs and categorical variable(s)) will help you identify the most promising predictors. The correlations between the predictor variables will help you identify highly correlated predictor variables. TASK 3: Produce scatterplots with GPA as the response variable and each of the two best predictor variables (the scatter plot should have the least-squares line added). Discuss how well each of these variables predict GPA. Also discuss whether the scatter plots contain unusual observations. TASK 4: Perform regression analyses to get two linear models for the response variable (GPA) in terms of each of the two best predictor variables. Examine each of the regression results, including the model parameters (intercept and slope), the standard error of the parameter, the t-test statistic and associated p- value. Note that this is the same concept we used in inference using the t-test statistic with one population. Interpret these results accordingly to understand statistical significance of model parameters. Task 5: To gain a deeper understanding of the process, use one of the linear models (i.e. from one of the two best predictors) with each value of the related predictor variable to generate model-predicted responses. Excel will automatically create a new column to record the model predicted response for each observation (for each row). Rename this new column to PGPA. Excel also calculates the error for each observation, which is the difference between an observed response (the GPA value) and a model- predicted response (the PGPA value). Excel will automatically create another new row to keep all the standardized errors for all observations. Rename this column to Standardized Errors. Generate descriptive statistics for the either Residuals/Errors or standardized Residuals/Errors, and you will notice that it has a mean of zero (or very close to zero due to rounding). Plot the histogram for Errors and examine the shape of the histogram. TASK 6: Prepare a business report presenting the conclusions of the analysis, incorporating software results as evidence. Summarize your results using the measures learned this week as a guide. This is not required to be submitted. However, I highly recommend you create a report.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help