Normalization is the process that makes the numerical data independent of scale

docx

School

Slippery Rock University of Pennsylvania *

*We aren’t endorsed by this school

Course

MISC

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

3

Uploaded by DrClover9807

Report
Normalization is the process that makes the numerical data independent of scale True The Jaccard's coefficient is appropriate when it is more informative to match negative outcomes between observations False Using the Manhattan distance between pairwise observations, which pairwise observation is most similar? Observation 1: 2 - 3 Observation 2: 6 - 4 Observation 3: 8 - 2 Observation 2 & 3 Consider the partial data set in the table represents online hours spent shopping by age and income. The average and standard deviation for the full data set is $47,667 and $14,292 respectively. Using z-scored to standardize the observations, what is the average standard deviation of income for the three provided? ID 1: 62000 - 48 - 2 ID 2: 58000 - 52 - 4 ID 3: 53000 - 22 - 5 0.6997 The process of applying a set of analytical techniques for the development of machine learning is called data mining True The key distinction between supervised and unsupervised data mining is that the identification of the target variable is identified in supervised data mining True Cross-industry standard process for Data Mining (CRISP-DM) consists of six phases. Of the six, which represents the phase where data wrangling occurs? Data preparation Oversampling involves intentionally selecting more samples from one class than from other classes to adjust the class distribution of a data set True
A diagram that represents the information in equal-sized intervals, deciles, is called a cumulative lift chart False The process of dividing a data set into a training, a validation, and an optimal test data set is called: Data partitioning When a predictive model is made overly complex to fit in the quirks of given sample data, it is called: overfitting The principal component analysis (PCA) is a dimension reduction technique used to reduce variables without removing variables True In real-world situations, data sets contain many variables. If some variables are eliminated, valuable information may be lost True Which is the best-fit definition for the use of Principal Component Analysis (PCA)? The transformation of a large number of correlated variables into a smaller number of uncorrelated variables Of the following selections, which is NOT a descriptor of principal component analysis? The first principal account is not suitable for analysis The use of classifying or predicting the value to create an outcome is called scoring a record True KNN is a simple data mining tool, known for developing personalized recommendations for many online company applications True Which chart allows for the categorization of large data sets from high to low values, dividing sets of observations into an easy visual representation of the data Decile-wise chart This chart measures the effectiveness of a predictive model, containing both a baseline and a lift curve Cumulative lift chart KNN belongs to a category of mining techniques called computer-based reasoning False While k-nearing neighbors is effective as a classifier, it provides no information on predictor importance
True Using the table below, find the k-nearest neighbor for record 4 using k=3 for age 24,31,&34 What is the estimated probability that the cheese sample tested in NW will be Gouda? k=3 67% The naive Bayes method is an unsupervised data mining technique that uses partitioning to assess model performance False When performing a naive Bayes analysis, all predictor variables must be categorical True An issue with the naive Bayes classifier is determining rare outcomes because the estimate is 0. To overcome this problem, the algorithm allows a replacement of zero probability with a nonzero value. This technique is called smoothing The following table reflects the observations made on the color and type of vehicle, if a speeding ticket was received (1) or a warning (0), and if there was a prior driving violation (yes or no). Using the naive Bayes calculation, what is the conditional probability of receiving a ticket with a red vehicle 0.53 A pure subset contains leaf nodes where cases have contradicting values to the target variable, to enhance the variable case outcomes and allow for further splits. False Decision trees produced by the CART algorithm are binary, meaning that there are two branches for each decision mode
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help