Question 4 (PROGRAMMING) A classification algorithm is a kind of machine learning algorithm which is used to assign a label to an object, based on the knowledge of similar objects. Write a C program able to implement the lassification algorithm 1-NN. The program receives as input two files and , whose names are passed as arguments from the command line: The file contains a set of N objects (one for each line), each one with the corresponding label. This le is used to train the classification algorithm, which learns how to recognize the known objects. Each line of he file contains the name of the object represented as a string at most 5 characters, a set of M real numbers 2 T3 TM, followed by an integer L. The space character is used to separate the elements. The M real umbers are the features that represents an object. The integer L is the object label. he file contains a set of objects to be classified. The format is similar to the file, but the umber of rows is NOT KNOWN IN ADVANCE. Assume that the values M and N are known in advance and are defined as symbolic constants by means of ne #define directive. he distance between two objects O and T is defined as follows: 0-=VE0,- T)² where 0, and T, are the real numbers used to represents the objects. The program shall predict the label of each object O contained in the file , according to the following ules: Calculate the distance hetween the obiect O and every obiect T of the file

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Topic Video
Question
Question 4 (PROGRAMMING)
A classification algorithm is a kind of machine learning algorithm which is used to assign a label to
an object, based on the knowledge of similar objects. Write a C program able to implement the
classification algorithm 1-NN.
The program receives as input two files <train> and <test>, whose names are passed as
arguments from the command line:
The <train> file contains a set of N objects (one for each line), each one with the corresponding label. This
file is used to train the classification algorithm, which learns how to recognize the known objects. Each line of
the file contains the name of the object represented as a string at most 5 characters, a set of M real numbers
T1 T2 T3 .. TM, followed by an integer L. The space character is used to separate the elements. The M real
numbers are the features that represents an object. The integer L is the object label.
The <test> file contains a set of objects to be classified. The format is similar to the <train> file, but the
number of rows is NOT KNOWN IN ADVANCE.
Assume that the values M and N are known in advance and are defined as symbolic constants by means of
the #define directive.
The distance between two objects O and T is defined as follows:
Do-r = VE(0 - T,)2 where 0, and T, are the real numbers used to represents the objects.
The program shall predict the label of each object O contained in the file <test>, according to the following
rules:
• Calculate the distance between the object O and every object T of the file <train>.
• The label to be assigned to O is the one of the object T that is the nearest to O.
For each classification, the program shall print a message on the screen indicating the name of the object, the
label that has been predicted for the object, and its actual label (as reported in the file <test>).
Finally, the program shall print on screen the accuracy of the classification algorithm, which is calculated as
the ratio between the number of labels predicted properly divided by the total number of classified objects.
Example:
M=2 N=9
Train.txt
Test.txt
Distances
Distances
01 1.3 25 2
02 3.5 4 3
01- T1 1.30
Q1- T2 1.43
01- 13 1.22
01- 14 3.09
01- T5 3.13
02 - T1 2.21
02 - T2 1.90
02 - 13 2.02
02 - T4 3.04
02 - T5 2.96
02 - T6 2.69
02 - 17 1.80
02 - T8 2.00
02 - 19 1.66 02 label = 2
T1
1.3 3.8 1
T2
1.6 3.9 1
T3
1.5 3.7 1
T4
4.0 10 3
T5
4.1 1.1 3
T6
4.2 1.4 3
01- T6 3.10
01- 17 1.20
01- 18 1.00 01 label = 2
01- 19 1.30 (correct)
T7
2.5 2.5 2
T8
2.3 2.4 2
T9
2.6 2.6 2
(wrong)
C:l>classify.exe train.txt test.txt
Object 01: predicted label 2 - actual label 2
Object 02: predicted label 2 – actual label 3
The accuracy is equal to 0.50
Transcribed Image Text:Question 4 (PROGRAMMING) A classification algorithm is a kind of machine learning algorithm which is used to assign a label to an object, based on the knowledge of similar objects. Write a C program able to implement the classification algorithm 1-NN. The program receives as input two files <train> and <test>, whose names are passed as arguments from the command line: The <train> file contains a set of N objects (one for each line), each one with the corresponding label. This file is used to train the classification algorithm, which learns how to recognize the known objects. Each line of the file contains the name of the object represented as a string at most 5 characters, a set of M real numbers T1 T2 T3 .. TM, followed by an integer L. The space character is used to separate the elements. The M real numbers are the features that represents an object. The integer L is the object label. The <test> file contains a set of objects to be classified. The format is similar to the <train> file, but the number of rows is NOT KNOWN IN ADVANCE. Assume that the values M and N are known in advance and are defined as symbolic constants by means of the #define directive. The distance between two objects O and T is defined as follows: Do-r = VE(0 - T,)2 where 0, and T, are the real numbers used to represents the objects. The program shall predict the label of each object O contained in the file <test>, according to the following rules: • Calculate the distance between the object O and every object T of the file <train>. • The label to be assigned to O is the one of the object T that is the nearest to O. For each classification, the program shall print a message on the screen indicating the name of the object, the label that has been predicted for the object, and its actual label (as reported in the file <test>). Finally, the program shall print on screen the accuracy of the classification algorithm, which is calculated as the ratio between the number of labels predicted properly divided by the total number of classified objects. Example: M=2 N=9 Train.txt Test.txt Distances Distances 01 1.3 25 2 02 3.5 4 3 01- T1 1.30 Q1- T2 1.43 01- 13 1.22 01- 14 3.09 01- T5 3.13 02 - T1 2.21 02 - T2 1.90 02 - 13 2.02 02 - T4 3.04 02 - T5 2.96 02 - T6 2.69 02 - 17 1.80 02 - T8 2.00 02 - 19 1.66 02 label = 2 T1 1.3 3.8 1 T2 1.6 3.9 1 T3 1.5 3.7 1 T4 4.0 10 3 T5 4.1 1.1 3 T6 4.2 1.4 3 01- T6 3.10 01- 17 1.20 01- 18 1.00 01 label = 2 01- 19 1.30 (correct) T7 2.5 2.5 2 T8 2.3 2.4 2 T9 2.6 2.6 2 (wrong) C:l>classify.exe train.txt test.txt Object 01: predicted label 2 - actual label 2 Object 02: predicted label 2 – actual label 3 The accuracy is equal to 0.50
Expert Solution
steps

Step by step

Solved in 3 steps with 9 images

Blurred answer
Knowledge Booster
Instruction Format
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education