DAT 640 Practical R Activity Eight
docx
keyboard_arrow_up
School
Southern New Hampshire University *
*We aren’t endorsed by this school
Course
640
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
1
Uploaded by DeaconSeaLionMaster750
DAT 640 Practical R Activity Eight
Performance Evaluation With an R Confusion Matrix
Overview:
During this activity, you will construct a classification algorithm and then add a confusion matrix to assist with model evaluation.
Instructions: Complete the lab activities below. Provide responses to the questions and screenshots when prompted. Please note: This assignment will be submitted and graded in Brightspace. Part 1
: Complete uCertify Lab 11.9.1 Analyzing Cost-benefit using Data-driven Misclassification Costs
. Take a screenshot illustrating successful execution of the lab R commands.
Part 2
: You will continue to utilize the uCertify lab environment for the second part of this assignment. Visit the R data website and choose a data set from the provided list. Then, choose and construct a classification algorithm on this data set, such as decision trees, random forest, or logistic regression. Then, you will build a confusion matrix and a ROC chart to evaluate the model. To complete this activity, follow the steps below:
1.
Visit R Data
and select one of the following data sets:
MASS birthwt
(Target: low [indicator of birth weight less than 2.5 kg])
Boot urine
(Target: r [Indicator of the presence of calcium oxalate crystals])
CarData TitanicSurvival
(Target: survived [Indicating they survived the incident])
Ecdat Car
(Target: choice [Vehicle of choice from six options])
Ecdat Fishing
(Target: mode [recreation fishing mode choice: beach, pier, boat, and charter])
Ecdat Katsup
(Target: choice [choice of brand of katsup {heinz, hunts, delmonte, and stb}])
2.
Describe the data set you chose and the algorithm used to predict the derived classes.
3.
Build a classification model using decision trees, logistic regression, or random forest. Provide screenshots of the confusion matrix and discuss the results.
4.
Provide screenshots of the ROC chart along with two paragraphs discussing the results. Include details on how the confusion matrix and ROC chart can assist with model evaluation.
The following websites may be of assistance in your submission:
A Beginner’s Guide to Learning R With the Titanic Dataset
Titanic Tutorial in R!
Confusion Matrix in R: A Complete Guide
How to Create a ROC Curve in R
Discover more documents: Sign up today!
Unlock a world of knowledge! Explore tailored content for a richer learning experience. Here's what you'll get:
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
Thank you for any feedback on this one.
arrow_forward
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
Briefly describe the methods of collecting primary data
arrow_forward
The r code for side by side boxplot of vitamind v newage and vitamin d v country.
Scatterplot code for relationship between vitamin d level and age.
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Green Condition
Gender Too Fast
Male
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Female
Too Fast
10
Fine
Fine
40
Female Golfers
Total
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Fine
9
39 51
Total
Which group shows the highest percentage saying that the greens are too fast?
- Select your answer -
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For the low handicappers, the - Select your answer - have a higher percentage who…
arrow_forward
All analysis, calculations, and explanations must be done in a single Excel file (use separate Excel sheets for each question). Upload the completed Excel file using the file extension format Lastname_Firstname_RegressionProblem.
Regression Problem
Sarah Anderson, the business analyst at TV Revolution, is conducting research on the dealership’s various television brands. She has collected data over the past year (2022) on the manufacturer, screen size, and price of various television brands. The data is given in the file below.
You have been hired as an intern to run analyses on the data and report the results back to Sarah; the five questions that Sarah needs you to address are given below.
Does there appear to be a positive or negative relationship between price and screen size? Use a scatter plot to examine the relationship.
Determine and interpret the correlation coefficient between the two variables. In your interpretation, discuss the direction of the relationship (positive,…
arrow_forward
On a cold day in Minneapolis, the afternoon temperature was 48 degrees before a cold front moved through. As
the front moved through the temperature dropped an average of 5 degrees per hour for a total of 5 hours.
14
2/1
Identify the domain of the data set.
arrow_forward
Recently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and
100 female golfers. The survey results are summarized here.
Excel File: data02-31.xlsx
Male Golfers
Male
Green Condition
Handicap
Under 15
15 or more
25
25
a. Complete the crosstabulation shown below.
Green Condition
Gender Too Fast Fine
Female
35
40
Too Fast
10
65
60
Fine
40
Total
100
100
Female Golfers
200
Green Condition
Handicap
Under 15
15 or more
Too Fast
1
Note: This exercise is an example of Simpson's Paradox.
39
Fine
9
Total
75
125
Which group shows the highest percentage saying that the greens are too fast?
Females, at 40%
b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast?
For…
arrow_forward
A survey about social media reported that 82% of B2B marketers (marketers that focus primarily on
attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731
B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A. What is the probability that a randomly selected respondent is a B2C marketer?
B. What is the probability that a randomly selected respondent plans to increase use of social media
or is a B2C marketer?
C. Explain the difference in the results in (a) and (b)
arrow_forward
College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary.
Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables?
Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA.
At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables?
GPA
Salary
2.21
71000
2.28
49000
2.56
71000
2.58
63000
2.76
87000
2.85
97000
3.11
134000
3.35
130000
3.67
156000
3.69
161000
arrow_forward
Please help me
arrow_forward
A survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers
(marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below.
A Click the icon to view the contingency table about social media use and marketers.
Contingency table
a. What is the probability that a randomly selected respondent plans to increase use of social media?
(Round to three decimal places as needed.)
Increase Use of
Social Media?
Business Focus
B2B
B2C
Total
b. What is the probability that a randomly selected respondent is a B2C marketer?
Yes
1,049
901
1,950
(Round to three decimal places as needed.)
No
284
768
1,052
Total
1,333
1,669
3,002
c. What is the probability that a randomly selected respondent plans to increase use of social media or is…
arrow_forward
I need help with identifying the independent and dependent variables in excel.
Explain the step-by-step procedures in excel.
arrow_forward
The Minister responsible for Trade and Industry has assigned you the task of evaluating the improvement in productivity ofmanufacturing businesses in South Africa. Data for one of the businesses you are to evaluate is provided. The data are themonthly average of last year and the monthly average for this year. Determine the percentage change in multi-factorproductivity between the two years.Labor: R8 per hourCapital: 0.83% per month of investmentEnergy: R0.60 per BTU
arrow_forward
Explain when can we use data grouping?
arrow_forward
What is meant by strategic mapping, and why is this technique especially useful in healthcare strategic planning?
arrow_forward
The graph below shows response curves for three drugs, A, B, and C. The horizontal axis is dosage, and the vertical axis is response.
(Note: The horizontal green line on the graph marks the minimum desired response level. The horizontal red line on the graph marks the maximum safe
response level.)
eb1c07da-7833-3e6e-9020-d2aae79d619a_90a3...
A webwork.uits.iu.edu/webwork2_files/tmp/FA21-BL-MATH-M1...
(a) Which drug requires the largest dose for the desired response?
10
(Enter A, B, or С.)
The smallest dose?
(Enter A, B, or C.)
(b) Which drug has the largest maximum response?
Нах
B
A
(Enter A, B, or С.)
Hin
The smallest?
1,0
-1
10
(Enter A, B, or С.)
-1
(c) Assuming (as your textbook does) that drugs with broader ranges of safe and reliable dosages are safer to administer, which of these drugs is the safest to
administer?
(Enter A, B, or C.)
arrow_forward
Describe the three primary charts and graphs used to organize and display data.
arrow_forward
Please look at the picture and explain all three examples (a, b, c). Explain what is the main effect of factor A and factor B and why do we have/do not have interaction in each example
arrow_forward
Please
*Find the equation of the least-squares regression line that models the data.
*Graph the data and the regression line in the same viewing window using the parameters given below the graph choices. Choose the correct graph below.
*Estimate the tuition and fees in 2005.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc

Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning

Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning

Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON

The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman

Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman
Related Questions
- Thank you for any feedback on this one.arrow_forwardCreate a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forwardBriefly describe the methods of collecting primary dataarrow_forward
- The r code for side by side boxplot of vitamind v newage and vitamin d v country. Scatterplot code for relationship between vitamin d level and age.arrow_forwardRecently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Green Condition Gender Too Fast Male Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Female Too Fast 10 Fine Fine 40 Female Golfers Total Green Condition Handicap Under 15 15 or more Too Fast 1 Fine 9 39 51 Total Which group shows the highest percentage saying that the greens are too fast? - Select your answer - b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For the low handicappers, the - Select your answer - have a higher percentage who…arrow_forwardAll analysis, calculations, and explanations must be done in a single Excel file (use separate Excel sheets for each question). Upload the completed Excel file using the file extension format Lastname_Firstname_RegressionProblem. Regression Problem Sarah Anderson, the business analyst at TV Revolution, is conducting research on the dealership’s various television brands. She has collected data over the past year (2022) on the manufacturer, screen size, and price of various television brands. The data is given in the file below. You have been hired as an intern to run analyses on the data and report the results back to Sarah; the five questions that Sarah needs you to address are given below. Does there appear to be a positive or negative relationship between price and screen size? Use a scatter plot to examine the relationship. Determine and interpret the correlation coefficient between the two variables. In your interpretation, discuss the direction of the relationship (positive,…arrow_forward
- On a cold day in Minneapolis, the afternoon temperature was 48 degrees before a cold front moved through. As the front moved through the temperature dropped an average of 5 degrees per hour for a total of 5 hours. 14 2/1 Identify the domain of the data set.arrow_forwardRecently, management at Oak Tree Golf Course received a few complaints about the condition of the greens. Several players complained that the greens are too fast. Rather than react to the comments of just a few, the Golf Association conducted a survey of 100 male and 100 female golfers. The survey results are summarized here. Excel File: data02-31.xlsx Male Golfers Male Green Condition Handicap Under 15 15 or more 25 25 a. Complete the crosstabulation shown below. Green Condition Gender Too Fast Fine Female 35 40 Too Fast 10 65 60 Fine 40 Total 100 100 Female Golfers 200 Green Condition Handicap Under 15 15 or more Too Fast 1 Note: This exercise is an example of Simpson's Paradox. 39 Fine 9 Total 75 125 Which group shows the highest percentage saying that the greens are too fast? Females, at 40% b. Refer to the initial crosstabulations. For those players with low handicaps (better players), which group (male or female) shows the highest percentage saying the greens are too fast? For…arrow_forwardA survey about social media reported that 82% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 55% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,286 B2B marketers and 1,731 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A. What is the probability that a randomly selected respondent is a B2C marketer? B. What is the probability that a randomly selected respondent plans to increase use of social media or is a B2C marketer? C. Explain the difference in the results in (a) and (b)arrow_forward
- College GPA and Salary. Do students with higher college grade point averages (GPAs) earn more than those graduates with lower GPAs (CivicScience)? Consider the college GPA and salary data (10 years after graduation) provided in the file GPASalary. Develop a scatter diagram for these data with college GPA as the independent variable. PLEASE MAKE SIMPLE GRAPH. What does the scatter diagram indicate about the relationship between the two variables? Use these data to develop an estimated regression equation that can be used to predict annual salary 10 years after graduation given college GPA. At the .05 level of significance, does there appear to be a significant statistical relationship between the two variables? GPA Salary 2.21 71000 2.28 49000 2.56 71000 2.58 63000 2.76 87000 2.85 97000 3.11 134000 3.35 130000 3.67 156000 3.69 161000arrow_forwardPlease help mearrow_forwardA survey about social media reported that 79% of B2B marketers (marketers that focus primarily on attracting businesses) plan to increase their use of social media, as compared to 54% of B2C marketers (marketers that primarily target consumers). The survey was based on 1,333 B2B marketers and 1,669 B2C marketers. The accompanying table summarizes the results. Complete parts (a) through (d) below. A Click the icon to view the contingency table about social media use and marketers. Contingency table a. What is the probability that a randomly selected respondent plans to increase use of social media? (Round to three decimal places as needed.) Increase Use of Social Media? Business Focus B2B B2C Total b. What is the probability that a randomly selected respondent is a B2C marketer? Yes 1,049 901 1,950 (Round to three decimal places as needed.) No 284 768 1,052 Total 1,333 1,669 3,002 c. What is the probability that a randomly selected respondent plans to increase use of social media or is…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- MATLAB: An Introduction with ApplicationsStatisticsISBN:9781119256830Author:Amos GilatPublisher:John Wiley & Sons IncProbability and Statistics for Engineering and th...StatisticsISBN:9781305251809Author:Jay L. DevorePublisher:Cengage LearningStatistics for The Behavioral Sciences (MindTap C...StatisticsISBN:9781305504912Author:Frederick J Gravetter, Larry B. WallnauPublisher:Cengage Learning
- Elementary Statistics: Picturing the World (7th E...StatisticsISBN:9780134683416Author:Ron Larson, Betsy FarberPublisher:PEARSONThe Basic Practice of StatisticsStatisticsISBN:9781319042578Author:David S. Moore, William I. Notz, Michael A. FlignerPublisher:W. H. FreemanIntroduction to the Practice of StatisticsStatisticsISBN:9781319013387Author:David S. Moore, George P. McCabe, Bruce A. CraigPublisher:W. H. Freeman

MATLAB: An Introduction with Applications
Statistics
ISBN:9781119256830
Author:Amos Gilat
Publisher:John Wiley & Sons Inc

Probability and Statistics for Engineering and th...
Statistics
ISBN:9781305251809
Author:Jay L. Devore
Publisher:Cengage Learning

Statistics for The Behavioral Sciences (MindTap C...
Statistics
ISBN:9781305504912
Author:Frederick J Gravetter, Larry B. Wallnau
Publisher:Cengage Learning

Elementary Statistics: Picturing the World (7th E...
Statistics
ISBN:9780134683416
Author:Ron Larson, Betsy Farber
Publisher:PEARSON

The Basic Practice of Statistics
Statistics
ISBN:9781319042578
Author:David S. Moore, William I. Notz, Michael A. Fligner
Publisher:W. H. Freeman

Introduction to the Practice of Statistics
Statistics
ISBN:9781319013387
Author:David S. Moore, George P. McCabe, Bruce A. Craig
Publisher:W. H. Freeman