Project #2
pdf
keyboard_arrow_up
School
Pennsylvania State University *
*We aren’t endorsed by this school
Course
380
Subject
Communications
Date
Jan 9, 2024
Type
Pages
4
Uploaded by GeneralSummer13484
For your second project, you’ll continue to work with the publicly available
data from MIDFIELD–Multiple-Institution Database for Investigating
Engineering Longitudinal Development. You can learn more about
MIDFIELD project from their website https://midfield.online/
.
Getting the Data
You will need to install the {midfielddata}
package in order to complete
this project. This package is not
located on CRAN. You’ll need to use the
following command to get the package:
The version of the package that I explored is version 0.2.0. You can learn
more about the data in this package from their package page:
https://midfieldr.github.io/midfielddata/
.
There is a companion package, {midfieldr}
, that could provide you with
some additional tools for working with the MIDFIELD data.
Central Research Question
When a student starts at any university or college, there are several
Project #2
Returning to MIDFIELD
Neil J. Hatfield
October 15, 2023
October 16, 2023
AUTHOR
PUBLISHED
MODIFIED
install.packages
(
"midfielddata"
,
repos = "https://MIDFIELDR.github.io/drat/"
,
type = "source"
)
Tip
potential outcomes:
They could graduate with a degree from their initial university/college.
They could transfer to another university/college and graduate from
there.
They could cease progress towards a college degree a their initial
university/college (i.e., “drop out of college”).
They could transfer to another university/college and then cease
progressing towards a degree.
Within the context of the MIDFIELD data, the last two outcomes are
indistinguishable from each other. The second option has two forms:
transferring out of a MIDFIELD institution and transferring into a MIDFIELD
institution. Students who transfer out of a MIDFIELD institution are also
indistinguishable from the last two potential outcomes. However, students
who transfer into a MIDFIELD institution can be traced through the data.
Against this backdrop, you are tasked with coming up with a potential
answer to the following question:
What predictors/factors impact the probability at a student will
graduate with a degree
?
To answer this question, you’ll need to build an appropriate regression
model. Further, you’ll need to be able to explain your model, the process
you used and the decisions you made, as well as discuss your evaluation
of your model(s).
Project Format
You will need to prepare and submit a typed report that includes all
necessary narrative text, visualizations, values of statistics, and end
material (i.e., references, appendices).
Your report should
1
. Have a coherent structure that assists the reader.
2
. Build upon your prior explorations with the MIDFIELD data.
You do not need to include all of what you submitted from Project #1,
but you should use that project as a launch pad for your explorations
here.
3
. Address the Central Research Question.
4
. Provide a clear explanation of your decision process, complete with
evidence.
5
. Be submitted as a knitted PDF or Word Document. (HTML files saved
as PDFs will be returned ungraded.)
How you choose to carry out your project is up to you. Here’s what you’ll
need to submit to the appropriate submission portal:
A knitted/rendered document, either a Word document
or PDF
.
There should be a coherent structure
There should be a Code Appendix
The code should show evidence of being reproducible
If you use code written by another individual and/or generated for you by a
large-language model/generative AI, you MUST
flag and document all such
instances.
You may work with other students in the class, but each person is
responsible for submitting their own report. This means you can do things
such as help each other out with coding issues, bouncing ideas off of
each other, running interpretations by each other, and brainstorming ideas
for further analyses. What you can’t do includes submitting a single report
as a team or submitting reports that are functionally identical.
Targeted Learning Outcomes
The following learning outcomes will be assessed via your project
submission.
SRT.1: The student will be able to determine which underlying
perspective (e.g., Exploratory Data Analysis) they or someone else is
working from for a particular analysis.
SRT.2: The student will be able to differentiate between the goals of
prediction and inference.
Tech.2: The student will learn to use technology to create data
visualizations.
Comm.4: The student will learn to meaningfully discuss data
visualizations to support others in their learning about the current
context.
Tech.3: The student will learn to use technology to perform
calculations on data sets.
Comm.5: The student will learn to interpret the values of statistics
(both descriptive/incisive and inferential) within the current context.
Algo.1: The student will be able to explain how an algorithm works
(e.g., linear regression, logistic regression, k-nearest neighbors
regression, ridge regression, LASSO regression, k-nearest neighbors
classification, classification/decision trees, random forest, k-means
clustering) so that an individual can decide whether it would be
appropriate in a given context.
Algo.2: The student will be able to compare and critque different
Warning
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
algorithms/modeling approaches.
SRT.5: The student will be able to describe various methods for
assessing the closeness of predictions to the observed data.
Algo.5: The student will learn to evaluate implemented models
through a variety of tools (e.g., MSE, RMSE, Misclassification Rate,
Confusion Matrix, Gini Index, ROCs, AUC).
Algo.6: The student will be able to apply various techniques meant to
improve a model (e.g., subset selection methods,
shrinkage/regularization, tuning, cross validation).
Tech.4: The student will learn to use technology to implement
different types of algorithms (e.g., regression methods, k-nearest
neighbors, regression and classification trees, clustering) to gain
insight about an underlying sample, population, or phenomenon.
DW.1 The student will demonstrate a workable knowledge base from
Stat184 that functions as a basis for Stat380.
DW.3: The student will create/modify data frames using sub-setting
and other transformational functions to assist in data analysis.
Comm.1: The student will learn to generate materials (e.g.,
presentations, posters, reports, etc.) that tell a coherent story,
incorporating visualizations and statistics, and provides a basis for
making informed decisions.
Tech.1: The student will learn to use technology to his/her advantage
when engaged in data analysis.
Tech.5: The student will learn to use technology to analyze data to
answer research questions.
Comm.6: The student will learn to produce insights, grounded in the
present context, based upon their analytical work using various
statistical models.