Normalization is the process that makes the numerical data independent of scale

docx

School

Slippery Rock University of Pennsylvania *

*We aren’t endorsed by this school

Course

MISC

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

Uploaded by DrClover9807

Normalization is the process that makes the numerical data independent of scale True The Jaccard's coefficient is appropriate when it is more informative to match negative outcomes between observations False Using the Manhattan distance between pairwise observations, which pairwise observation is most similar? Observation 1: 2 - 3 Observation 2: 6 - 4 Observation 3: 8 - 2 Observation 2 & 3 Consider the partial data set in the table represents online hours spent shopping by age and income. The average and standard deviation for the full data set is $47,667 and $14,292 respectively. Using z-scored to standardize the observations, what is the average standard deviation of income for the three provided? ID 1: 62000 - 48 - 2 ID 2: 58000 - 52 - 4 ID 3: 53000 - 22 - 5 0.6997 The process of applying a set of analytical techniques for the development of machine learning is called data mining True The key distinction between supervised and unsupervised data mining is that the identification of the target variable is identified in supervised data mining True Cross-industry standard process for Data Mining (CRISP-DM) consists of six phases. Of the six, which represents the phase where data wrangling occurs? Data preparation Oversampling involves intentionally selecting more samples from one class than from other classes to adjust the class distribution of a data set True

A diagram that represents the information in equal-sized intervals, deciles, is called a cumulative lift chart False The process of dividing a data set into a training, a validation, and an optimal test data set is called: Data partitioning When a predictive model is made overly complex to fit in the quirks of given sample data, it is called: overfitting The principal component analysis (PCA) is a dimension reduction technique used to reduce variables without removing variables True In real-world situations, data sets contain many variables. If some variables are eliminated, valuable information may be lost True Which is the best-fit definition for the use of Principal Component Analysis (PCA)? The transformation of a large number of correlated variables into a smaller number of uncorrelated variables Of the following selections, which is NOT a descriptor of principal component analysis? The first principal account is not suitable for analysis The use of classifying or predicting the value to create an outcome is called scoring a record True KNN is a simple data mining tool, known for developing personalized recommendations for many online company applications True Which chart allows for the categorization of large data sets from high to low values, dividing sets of observations into an easy visual representation of the data Decile-wise chart This chart measures the effectiveness of a predictive model, containing both a baseline and a lift curve Cumulative lift chart KNN belongs to a category of mining techniques called computer-based reasoning False While k-nearing neighbors is effective as a classifier, it provides no information on predictor importance

True Using the table below, find the k-nearest neighbor for record 4 using k=3 for age 24,31,&34 What is the estimated probability that the cheese sample tested in NW will be Gouda? k=3 67% The naive Bayes method is an unsupervised data mining technique that uses partitioning to assess model performance False When performing a naive Bayes analysis, all predictor variables must be categorical True An issue with the naive Bayes classifier is determining rare outcomes because the estimate is 0. To overcome this problem, the algorithm allows a replacement of zero probability with a nonzero value. This technique is called smoothing The following table reflects the observations made on the color and type of vehicle, if a speeding ticket was received (1) or a warning (0), and if there was a prior driving violation (yes or no). Using the naive Bayes calculation, what is the conditional probability of receiving a ticket with a red vehicle 0.53 A pure subset contains leaf nodes where cases have contradicting values to the target variable, to enhance the variable case outcomes and allow for further splits. False Decision trees produced by the CART algorithm are binary, meaning that there are two branches for each decision mode

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Related Documents

Digital Video Processing Asg 4.pdf

CYB-210 Module 2-2 Quiz.docx

ENGL 101 Ethical Rogerian Argument Essay Assignment.docx

Working Memory Lab (1).docx

GEN103 week 4 assignment APA Essay Template.docx

GEN103 week 5 final paper.docx

assignment6.pdf

ASURITE - Project - Submission 4 - Wireframe.docx

SITHCCC013_Assessment 1 Practical Assessment.docx

SITHCCC014_Assessment 1 Practical Assessment.docx

SITHCCC008_Assessment 1 Practical Assessment.docx

SITHCCC012_Assessment 2 Short Answer Questions.docx

Recommended textbooks for you

Operations Research : Applications and Algorithms

Computer Science

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Brooks Cole

Np Ms Office 365/Excel 2016 I Ntermed

Computer Science

ISBN:9781337508841

Author:Carey

Publisher:Cengage

COMPREHENSIVE MICROSOFT OFFICE 365 EXCE

Computer Science

ISBN:9780357392676

Author:FREUND, Steven

Publisher:CENGAGE L

A Guide to SQL

Computer Science

ISBN:9781111527273

Author:Philip J. Pratt

Publisher:Course Technology Ptr

Programming with Microsoft Visual Basic 2017

Computer Science

ISBN:9781337102124

Author:Diane Zak

Publisher:Cengage Learning

Oracle 12c: SQL

Computer Science

ISBN:9781305251038

Author:Joan Casteel

Publisher:Cengage Learning

SEE MORE TEXTBOOKS

Recommended textbooks for you

Operations Research : Applications and Algorithms
Computer Science
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Brooks Cole
Np Ms Office 365/Excel 2016 I Ntermed
Computer Science
ISBN:9781337508841
Author:Carey
Publisher:Cengage
COMPREHENSIVE MICROSOFT OFFICE 365 EXCE
Computer Science
ISBN:9780357392676
Author:FREUND, Steven
Publisher:CENGAGE L
A Guide to SQL
Computer Science
ISBN:9781111527273
Author:Philip J. Pratt
Publisher:Course Technology Ptr
Programming with Microsoft Visual Basic 2017
Computer Science
ISBN:9781337102124
Author:Diane Zak
Publisher:Cengage Learning
Oracle 12c: SQL
Computer Science
ISBN:9781305251038
Author:Joan Casteel
Publisher:Cengage Learning