Project #2

pdf

School

Pennsylvania State University *

*We aren’t endorsed by this school

Course

380

Subject

Communications

Date

Jan 9, 2024

Type

pdf

Pages

4

Uploaded by GeneralSummer13484

Report
For your second project, you’ll continue to work with the publicly available data from MIDFIELD–Multiple-Institution Database for Investigating Engineering Longitudinal Development. You can learn more about MIDFIELD project from their website https://midfield.online/ . Getting the Data You will need to install the {midfielddata} package in order to complete this project. This package is not located on CRAN. You’ll need to use the following command to get the package: The version of the package that I explored is version 0.2.0. You can learn more about the data in this package from their package page: https://midfieldr.github.io/midfielddata/ . There is a companion package, {midfieldr} , that could provide you with some additional tools for working with the MIDFIELD data. Central Research Question When a student starts at any university or college, there are several Project #2 Returning to MIDFIELD Neil J. Hatfield October 15, 2023 October 16, 2023 AUTHOR PUBLISHED MODIFIED install.packages ( "midfielddata" , repos = "https://MIDFIELDR.github.io/drat/" , type = "source" ) Tip
potential outcomes: They could graduate with a degree from their initial university/college. They could transfer to another university/college and graduate from there. They could cease progress towards a college degree a their initial university/college (i.e., “drop out of college”). They could transfer to another university/college and then cease progressing towards a degree. Within the context of the MIDFIELD data, the last two outcomes are indistinguishable from each other. The second option has two forms: transferring out of a MIDFIELD institution and transferring into a MIDFIELD institution. Students who transfer out of a MIDFIELD institution are also indistinguishable from the last two potential outcomes. However, students who transfer into a MIDFIELD institution can be traced through the data. Against this backdrop, you are tasked with coming up with a potential answer to the following question: What predictors/factors impact the probability at a student will graduate with a degree ? To answer this question, you’ll need to build an appropriate regression model. Further, you’ll need to be able to explain your model, the process you used and the decisions you made, as well as discuss your evaluation of your model(s). Project Format You will need to prepare and submit a typed report that includes all necessary narrative text, visualizations, values of statistics, and end material (i.e., references, appendices). Your report should 1 . Have a coherent structure that assists the reader. 2 . Build upon your prior explorations with the MIDFIELD data. You do not need to include all of what you submitted from Project #1, but you should use that project as a launch pad for your explorations here. 3 . Address the Central Research Question. 4 . Provide a clear explanation of your decision process, complete with evidence. 5 . Be submitted as a knitted PDF or Word Document. (HTML files saved as PDFs will be returned ungraded.)
How you choose to carry out your project is up to you. Here’s what you’ll need to submit to the appropriate submission portal: A knitted/rendered document, either a Word document or PDF . There should be a coherent structure There should be a Code Appendix The code should show evidence of being reproducible If you use code written by another individual and/or generated for you by a large-language model/generative AI, you MUST flag and document all such instances. You may work with other students in the class, but each person is responsible for submitting their own report. This means you can do things such as help each other out with coding issues, bouncing ideas off of each other, running interpretations by each other, and brainstorming ideas for further analyses. What you can’t do includes submitting a single report as a team or submitting reports that are functionally identical. Targeted Learning Outcomes The following learning outcomes will be assessed via your project submission. SRT.1: The student will be able to determine which underlying perspective (e.g., Exploratory Data Analysis) they or someone else is working from for a particular analysis. SRT.2: The student will be able to differentiate between the goals of prediction and inference. Tech.2: The student will learn to use technology to create data visualizations. Comm.4: The student will learn to meaningfully discuss data visualizations to support others in their learning about the current context. Tech.3: The student will learn to use technology to perform calculations on data sets. Comm.5: The student will learn to interpret the values of statistics (both descriptive/incisive and inferential) within the current context. Algo.1: The student will be able to explain how an algorithm works (e.g., linear regression, logistic regression, k-nearest neighbors regression, ridge regression, LASSO regression, k-nearest neighbors classification, classification/decision trees, random forest, k-means clustering) so that an individual can decide whether it would be appropriate in a given context. Algo.2: The student will be able to compare and critque different Warning
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
algorithms/modeling approaches. SRT.5: The student will be able to describe various methods for assessing the closeness of predictions to the observed data. Algo.5: The student will learn to evaluate implemented models through a variety of tools (e.g., MSE, RMSE, Misclassification Rate, Confusion Matrix, Gini Index, ROCs, AUC). Algo.6: The student will be able to apply various techniques meant to improve a model (e.g., subset selection methods, shrinkage/regularization, tuning, cross validation). Tech.4: The student will learn to use technology to implement different types of algorithms (e.g., regression methods, k-nearest neighbors, regression and classification trees, clustering) to gain insight about an underlying sample, population, or phenomenon. DW.1 The student will demonstrate a workable knowledge base from Stat184 that functions as a basis for Stat380. DW.3: The student will create/modify data frames using sub-setting and other transformational functions to assist in data analysis. Comm.1: The student will learn to generate materials (e.g., presentations, posters, reports, etc.) that tell a coherent story, incorporating visualizations and statistics, and provides a basis for making informed decisions. Tech.1: The student will learn to use technology to his/her advantage when engaged in data analysis. Tech.5: The student will learn to use technology to analyze data to answer research questions. Comm.6: The student will learn to produce insights, grounded in the present context, based upon their analytical work using various statistical models.