Vocabulary Task (C language) please also include how to create a txt file and where will it be saved in the computer Natural language processing (NLP) is a field of artificial intelligence that seeks to develop the ability of a computer program to understand human language. Usually, the first step of an NLP system is to convert words into numeric codes. Thus, the system converts an input text into a sequence of numeric codes before any high-level analysis. This process is known as text preprocessing. We can only perform text preprocessing if we have a vocabulary of words and their associated numeric codes. Your task is to create a vocabulary of unique words for a given text file and assign a different number from 1 to N to each unique word, with N being the total number of unique words. You must perform this assignment so that the first word in alphabetical order gets the number 1, the second word in alphabetical order gets the number 2, and so on. A word is a sequence of letters (uppercase or lowercase). The file is composed of letters and white spaces (spaces, tabs, newlines). White spaces serve as word separators and cannot be part of any word. A file can have multiple consecutive separators. Different case variations of the same word (The, the, and THE) must be considered the same. All vocabulary words must contain uppercase letters only. Your program will receive two command-line arguments, the name of the input text file and the name of the file where the vocabulary must be saved. Example: $ ./a.out inputX.txt vocabularyX.txt Each line of the output file must contain a number (the numeric code) and a word (a unique word) separated by a space, and the words must be in alphabetical order. Below are some examples of input and expected output. Examples (your program must follow this format precisely) Example #1 input0.txt the THE The ha Ha HA vocabulary0.txt 1 HA 2 THE Example #2 input1.txt Lorem ipsum dolor sit amet consectetur adipiscing elit Ut commodo nec magna et sodales vocabulary1.txt 1 ADIPISCING 2 AMET 3 COMMODO 4 CONSECTETUR 5 DOLOR 6 ELIT 7 ET 8 IPSUM 9 LOREM 10 MAGNA 11 NEC 12 SIT 13 SODALES 14 UT Requirements Name your program project6_vocabulary.c. Your program must read the input text from the file specified by the first command-line argument, and must write the vocabulary to a file specified by the second command-line argument. Assume the input text does not have more than 1000 words, and each words does not have more than 100 characters. Follow the format of the examples above. Hint: use an array of strings to store the words from the input file. Programming Style Guidelines The major purpose of programming style guidelines is to make programs easy to read and understand. Good programming style helps make it possible for a person knowledgeable in the application area to quickly read a program and understand how it works. Your program should begin with a comment that briefly summarizes what it does. This comment should also include your name. In most cases, a function should have a brief comment above its definition describing what it does. Other than that, comments should be written only needed in order for a reader to understand what is happening. Variable names and function names should be sufficiently descriptive that a knowledgeable reader can easily understand what the variable means and what the function does. If this is not possible, comments should be added to make the meaning clear. Use consistent indentation to emphasize block structure. Full line comments inside function bodies should conform to the indentation of the code where they appear. Macro definitions (#define) should be used for defining symbolic names for numeric constants. For example: #define PI 3.141592 Use names of moderate length for variables. Most names should be between 2 and 12 letters long. Use underscores to make compound names easier to read: tot_vol and total_volumn are clearer than totalvolumn.
Vocabulary
Task (C language)
please also include how to create a txt file and where will it be saved in the computer
Natural language processing (NLP) is a field of
We can only perform text preprocessing if we have a vocabulary of words and their associated numeric codes. Your task is to create a vocabulary of unique words for a given text file and assign a different number from 1 to N to each unique word, with N being the total number of unique words. You must perform this assignment so that the first word in alphabetical order gets the number 1, the second word in alphabetical order gets the number 2, and so on.
A word is a sequence of letters (uppercase or lowercase). The file is composed of letters and white spaces (spaces, tabs, newlines). White spaces serve as word separators and cannot be part of any word. A file can have multiple consecutive separators. Different case variations of the same word (The, the, and THE) must be considered the same. All vocabulary words must contain uppercase letters only.
Your program will receive two command-line arguments, the name of the input text file and the name of the file where the vocabulary must be saved. Example:
$ ./a.out inputX.txt vocabularyX.txt
Each line of the output file must contain a number (the numeric code) and a word (a unique word) separated by a space, and the words must be in alphabetical order. Below are some examples of input and expected output.
Examples (your program must follow this format precisely)
Example #1
input0.txt
the THE The ha Ha HA
vocabulary0.txt
1 HA
2 THE
Example #2
input1.txt
Lorem ipsum dolor sit amet consectetur adipiscing elit
Ut commodo nec magna et sodales
vocabulary1.txt
1 ADIPISCING
2 AMET
3 COMMODO
4 CONSECTETUR
5 DOLOR
6 ELIT
7 ET
8 IPSUM
9 LOREM
10 MAGNA
11 NEC
12 SIT
13 SODALES
14 UT
Requirements
- Name your program project6_vocabulary.c.
- Your program must read the input text from the file specified by the first command-line argument, and must write the vocabulary to a file specified by the second command-line argument.
- Assume the input text does not have more than 1000 words, and each words does not have more than 100 characters.
- Follow the format of the examples above.
- Hint: use an array of strings to store the words from the input file.
The major purpose of programming style guidelines is to make programs easy to read and understand. Good programming style helps make it possible for a person knowledgeable in the application area to quickly read a program and understand how it works.
- Your program should begin with a comment that briefly summarizes what it does. This comment should also include your name.
- In most cases, a function should have a brief comment above its definition describing what it does. Other than that, comments should be written only needed in order for a reader to understand what is happening.
- Variable names and function names should be sufficiently descriptive that a knowledgeable reader can easily understand what the variable means and what the function does. If this is not possible, comments should be added to make the meaning clear.
- Use consistent indentation to emphasize block structure.
- Full line comments inside function bodies should conform to the indentation of the code where they appear.
- Macro definitions (#define) should be used for defining symbolic names for numeric constants. For example: #define PI 3.141592
- Use names of moderate length for variables. Most names should be between 2 and 12 letters long.
- Use underscores to make compound names easier to read: tot_vol and total_volumn are clearer than totalvolumn.
Trending now
This is a popular solution!
Step by step
Solved in 3 steps