Logistic Regression Analysis of University Types Using R

ALY-6015 – Intermediate Analytics Bhavana Bhariday Module 3 Assignment – GLM and Logistic Regression College of Professional Studies, Northeastern University Mykhaylo Trubskyy January 30, 2024

Introduction: We examine the College dataset in this analysis with the goal of developing a logistic regression model to determine if a university is private or public. We start with exploratory data analysis and use descriptive statistics and visualizations to find patterns. To train and evaluate the model, we divided the dataset into train and test sets using R. We forecast university classifications by utilizing logistic regression, and we evaluate model performance with ROC curves and classification metrics. As a summary metric, the Area Under the Curve (AUC) measures the discriminatory strength of the model. This analysis offers insightful information about how well logistic regression predicts colleges according to important characteristics. Analysis: 1) Loading the Dataset Exploratory Data Analysis: There are 777 observations in the College dataset that describe characteristics of both public and private universities. Important variables include rates of application, acceptance, and enrollment in addition to budgetary measures like the cost of room and board and tuition for out-of-state students. Descriptive statistics draw attention to important distinctions between public and private entities. For example, compared to public institutions, private universities typically have greater applicant and acceptance rates. Additional information can be gleaned from the dataset's mean values, which show an average enrollment rate of 780 and an average application rate of 3002. Furthermore, the average cost of accommodation and board is about $4358, with $9660 being spent on average. These numbers highlight the costs related to pursuing higher education. All things considered, the dataset provides academics and policymakers with a wealth of information about the traits and financial profiles of colleges. A) Barplot of Private and Public Universities

We created a bar chart for the first plot, placing the count of universities on the y-axis and the type of university on the x-axis. Based on the results gathered, we can conclude that there are more private universities than state universities. B) Barplot of Elite vs Non Elite universities We acquired the frequency of Elite vs. Non-Elite Universities for the second visualization. We can draw the conclusion that non-elite universities outnumber elite ones. C) Boxplot of Universities and their Out of State fee’s We generated a boxplot with x-axis University type and y-axis Outstate Tuition, we can interpret that the public university’s tuition is significantly less than the Private University’s tuition. With more outliers for the public and comparatively less for the private. 2) We will use the caret library, which offers the function for Data partitioning. For reproducibility, we use set.seed(123) to set the seed. we generate a random split of the data according to the "Private" variable using the createDataPartition function. Eighty percent of the data is assigned to the training set, as indicated by the p = 0.8 parameter. Lastly, we divide the original dataset into the training and test sets using the generated index. Currently, we have train_data, which comprises 80% of the data, and test_data, which comprises the remaining 20% for assessment. It gives a single numeric vector with the indices of the chosen samples when we set list = FALSE. In other words, every chosen index is combined into a single vector. If list is equal to FALSE, the output is

Your preview ends here