Quiz L04 Submit - HW 4
pdf
keyboard_arrow_up
School
Pennsylvania State University *
*We aren’t endorsed by this school
Course
365
Subject
Industrial Engineering
Date
Feb 20, 2024
Type
Pages
7
Uploaded by Jordanmchampion
L04: Submit - HW 4
Started: Feb 7 at 1:27pm
Quiz Instructions
Please be sure to read L04 before attempting this assignment.
The widely used Statlog German credit dataset (
click here
(
https://psu.instructure.com/courses/2313011/files/158251342?wrap=1
) (https://psu.instructure.com/courses/2313011/files/158251342
/download?download_frd=1) ), named 'South German Credit' data, contains some
important features (i.e., variables) related to a debtor, which can be used to predict
if there is a credit risk once the debtor has applied for loans from the bank. ID#
: Users’ ID
duration: Credit duration in months (in the range of 0 and 1200)
purpose
: Purpose for which the credit is needed (Ten levels: furniture, car(used), car(new), retraining, repairs, domestic appliances, business, television,
vacation, others)
employment_duration: Duration of debtor's employment with the current
employer (Five levels: unemployed, less than 1 year, 1-4 years, 4-7 years, more
than 7 yrs)
age:
Age expressed in years (in the range of 18-75)
housing: Type of housing the debtor lives in (Three levels: rent, own, for free)
foreign_worker: Is the debtor a foreign worker? (Two levels: Yes, No)
number_credits: Number of credits including the current one the debtor has (or
had) at this bank (Four levels: 1, 2-3, 4-5, equal or more than 6)
credit_history
: The credit history of the debtor (Five Levels)
• 0 : delay in paying off in the past
• 1 : critical account/other credits elsewhere
• 2 : no credits taken/all credits paid back duly
• 3 : existing credits paid back duly till now
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
1 of 7
2/7/2024, 1:29 PM
• 4 : all credits at this bank paid back duly
amount: Amount requested by the debtor (in the range of 1K and MAX)
installment_rate
: Credit installments as a percentage of debtor's disposable
income (Four levels):
• 1 : >= 35
• 2 : 25 <= … < 35
• 3 : 20 <= … < 25
• 4 : < 20
credit_risk
: Credit risk assessed as the potential that a borrower fails to meet its
obligations (Two levels: good, bad)
2 pts
Question 1
Use the appropriate function to read the data into R as a data frame named
hw4.data
. Complete the blanks below with the missing syntax you would use to
complete this import. Please assume that you've already set your working directory
to the folder containing the file.
= (
, header =
)
4 pts
Question 2
Use R and the ggplot package to create the plot shown below.
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
2 of 7
2/7/2024, 1:29 PM
Complete the blanks below with the missing syntax you would use to generate
this plot in R.
plot1 <- (
, aes(
,
))
plot1 + 1 pts
Question 3
Based on your plot, which subcategory of data type has the most outliers?
<1 yr
>=7 yrs
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
3 of 7
2/7/2024, 1:29 PM
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
1<=...<4 yrs
4<=...<7 yrs
unemployed
1.5 pts
Question 4
Based on your plot, which subcategory of data type has the most normal
distribution?
<1 yr
>=7 yrs
1<=...<4 yrs
4<=...<7 yrs
unemployed
1 pts
Question 5
Based on your plot, which subcategory/subcategories of employment duration
has/have a median less than or equal to 30? (Select all that apply)
<1 yr
>=7 yrs
1<=...<4 yrs
4<=...<7 yrs
unemployed
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
4 of 7
2/7/2024, 1:29 PM
4 pts
Question 6
Create a histogram for the continuous variable 'age'. Be sure to label your x-axis
as "Age" and your y-axis as "Frequency".
Complete the blanks below with the missing syntax you would use to generate this
plot in R.
hist <- (
, aes(
))
hist + (binwidth = 0.5) + labs(x=
,y=
)
1.5 pts
Question 7
Based on your histogram, the most frequently reported number of people impacted
in a data breach can be found:
at or near 20
at or near 25
at or near 30
at or near 35
1 pts
Question 8
Create two additional histograms in R. One should be for the continuous variable
"amount", and the other should be for the continuous variable "duration".
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
5 of 7
2/7/2024, 1:29 PM
When examining all the histograms created in R, which of these distributions
appears to be the most normal?
age
amount
duration
3 pts
Question 9
Create a scatterplot to show the relationship between "age" and "amount."
Complete the blanks below with the missing syntax you would use to generate this
plot in R.
scatter1 <- (
,aes(
,
))
scatter1 + 1 pts
Question 10
Create another scatterplot in R. It should show the relationship between "duration"
and "amount." When examining all the scatterplots created in R, which variable seems to be most
strongly related to "amount"? age
duration
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
6 of 7
2/7/2024, 1:29 PM
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
No new data to save. Last checked at 1:29pm
Submit Quiz
Quiz: L04: Submit - HW 4
https://psu.instructure.com/courses/2313011/quizzes/4976401/take
7 of 7
2/7/2024, 1:29 PM