ask (C language) Natural language processing (NLP) is a field of artificial intelligence that seeks to develop the ability of a computer program to understand human language. Usually, the first step of an NLP system is to convert words into numeric codes. Thus, the system converts an input text into a sequence of numeric codes before any high-level analysis. This process is known as text preprocessing. We can only perform text preprocessing if we have a vocabulary of words and their associated numeric codes. Your task is to create a vocabulary of unique words for a given text file and assign a different number from 1 to N to each unique word, with N being the total number of unique words. You must perform this assignment so that the first word in alphabetical order gets the number 1, the second word in alphabetical order gets the number 2, and so on. A word is a sequence of letters (uppercase or lowercase). The file is composed of letters and white spaces (spaces, tabs, newlines). White spaces serve as word separators and cannot be part of any word. A file can have multiple consecutive separators. Different case variations of the same word (The, the, and THE) must be considered the same. All vocabulary words must contain uppercase letters only. Your program will receive two command-line arguments, the name of the input text file and the name of the file where the vocabulary must be saved. Example: $ ./a.out inputX.txt vocabularyX.txt Each line of the output file must contain a number (the numeric code) and a word (a unique word) separated by a space, and the words must be in alphabetical order. Below are some examples of input and expected output.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

Vocabulary

Task (C language)

Natural language processing (NLP) is a field of artificial intelligence that seeks to develop the ability of a computer program to understand human language. Usually, the first step of an NLP system is to convert words into numeric codes. Thus, the system converts an input text into a sequence of numeric codes before any high-level analysis. This process is known as text preprocessing.

We can only perform text preprocessing if we have a vocabulary of words and their associated numeric codes. Your task is to create a vocabulary of unique words for a given text file and assign a different number from 1 to N to each unique word, with N being the total number of unique words. You must perform this assignment so that the first word in alphabetical order gets the number 1, the second word in alphabetical order gets the number 2, and so on.

A word is a sequence of letters (uppercase or lowercase). The file is composed of letters and white spaces (spaces, tabs, newlines). White spaces serve as word separators and cannot be part of any word. A file can have multiple consecutive separators. Different case variations of the same word (The, the, and THE) must be considered the same. All vocabulary words must contain uppercase letters only.

Your program will receive two command-line arguments, the name of the input text file and the name of the file where the vocabulary must be saved. Example:

$ ./a.out inputX.txt vocabularyX.txt

Each line of the output file must contain a number (the numeric code) and a word (a unique word) separated by a space, and the words must be in alphabetical order. Below are some examples of input and expected output.

Examples (your program must follow this format precisely)

Example #1

input0.txt

the THE The ha Ha HA

vocabulary0.txt

1 HA
2 THE

Example #2

input1.txt

Lorem ipsum dolor sit amet consectetur adipiscing elit

Ut commodo nec magna et sodales

vocabulary1.txt

1 ADIPISCING
2 AMET
3 COMMODO
4 CONSECTETUR
5 DOLOR
6 ELIT
7 ET
8 IPSUM
9 LOREM
10 MAGNA
11 NEC
12 SIT
13 SODALES
14 UT

Requirements

  • Name your program project6_vocabulary.c.
  • Your program must read the input text from the file specified by the first command-line argument, and must write the vocabulary to a file specified by the second command-line argument.
  • Assume the input text does not have more than 1000 words, and each words does not have more than 100 characters.
  • Follow the format of the examples above.
  • Hint: use an array of strings to store the words from the input file.
  1.  

Programming Style Guidelines 

The major purpose of programming style guidelines is to make programs easy to read and understand. Good programming style helps make it possible for a person knowledgeable in the application area to quickly read a program and understand how it works. 

  • Your program should begin with a comment that briefly summarizes what it does.  This comment should also include your name. 
  • In most cases, a function should have a brief comment above its definition describing what it does.  Other than that, comments should be written only needed in order for a reader to understand what is happening. 
  • Variable names and function names should be sufficiently descriptive that a knowledgeable reader can easily understand what the variable means and what the function does. If this is not possible, comments should be added to make the meaning clear. 
  • Use consistent indentation to emphasize block structure. 
  • Full line comments inside function bodies should conform to the indentation of the code where they appear.   
  • Macro definitions (#define) should be used for defining symbolic names for numeric constants. For example: #define PI 3.141592 
  • Use names of moderate length for variables.  Most names should be between 2 and 12 letters long. 
  • Use underscores to make compound names easier to read:  tot_vol and total_volumn are clearer than totalvolumn
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Array
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education