Write a Python program to count the word in the corpus. The output of the program should be a text file, named 'word count.txt'. The first line contains the total document frequency in five classes, in the order of 'crude', 'grain', 'money-fx', 'acq' and 'earm', separated by a space. In the following lines, each line contains a word and its frequency in five classes, in the same order of the classes, separated by space.

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Need python show screenshot code and output

(a2 3.16 E ll ll
6:33
KB/s
Convert all lett
Use NLIK.vord_t
Use nitk.Porterstomner to stem the words
%23
%23
return
def count_word(inputfile,outputflle):
TODO: Count the words from the corpus, and output the result to the output file in the format required.
A dictionary object may help you with this wark.
return
def feature_selection(inputfile, threshold, outputfile):
#TODO: Choose the nost frequent 10000 Mords (defined by threshold) as the feature word
* Use the frequency obtained in 'word_count. txt" to calculate the total ward frequency In each class.
Notice that wuhen calculating the word frequency, only Mords recognized as features are taken inta consideration.
* Output the result to the output File in the fornat required
return
def calculate probability(word count, word dict, outputfile):
WTODO: Calculate the posterior probability of each feature word, and the prior probability of the class.
Output the result to the output file in the format required
* Use uord_count.txt and "word dict. txt jointly.
%23
%23
return
def classify(probability, testset, outputfile):
TO00: Inplenent the naive Bayes classifier to assign clas labels to the documents in the test set.
Output the result to the output fila in the format required
%23
return
det f1 score(testset, classification result):
WTODO: Use the F1 score to assess the perfornance of the implenented classification nodel
* The return value should be a float abject.
return
1. Write a Python program to count the word in the corpus. The output of the program should
be a text file, named 'word count.txt'. The first line contains the total document frequency
in five classes, in the order of 'crude', 'grain', 'money-fx', 'acq' and 'earn', separated by a
space. In the following lines, each line contains a word and its frequency in five classes, in
the same order of the classes, separated by space.
Word count.txt:
2300 1400 768 520 123
apple 5 10 78 6
banana 10 9 16 24 6
candy 102 0 400 450 200
silly 30 20 15 23 64
the 350 120 260 730 110
2. Write a program to choose the most frequent 10000 words as the feature words, use the
frequency obtained in requirement 1 to calculate the total word frequency in each class.
Notice that only words recognized as features are taken into consideration. The output
of the program should be a text file, named word dict.txt', contains the information about
the feature words. The first line contains the total feature word frequency in five classes, in
order of 'crude', 'grain', 'money-fx', 'acq' and 'earn', separated by a space. In the following
lines, each line contains a word and its frequency in five classes, in the same order as above,
separated by space.
3. Use the above formula to calculate the posterior probability of each feature word and the
prior probability of the class. The output of your program should be a text file, named
word probability.txt', The first line contains the prior probability of the five classes, in the
order of. 'crude', 'grain', 'money-fx', 'acq' and 'ean', separated by space. In the following
lines, each line contains a word and its posterior probability in five classes, separated by
space.
Transcribed Image Text:(a2 3.16 E ll ll 6:33 KB/s Convert all lett Use NLIK.vord_t Use nitk.Porterstomner to stem the words %23 %23 return def count_word(inputfile,outputflle): TODO: Count the words from the corpus, and output the result to the output file in the format required. A dictionary object may help you with this wark. return def feature_selection(inputfile, threshold, outputfile): #TODO: Choose the nost frequent 10000 Mords (defined by threshold) as the feature word * Use the frequency obtained in 'word_count. txt" to calculate the total ward frequency In each class. Notice that wuhen calculating the word frequency, only Mords recognized as features are taken inta consideration. * Output the result to the output File in the fornat required return def calculate probability(word count, word dict, outputfile): WTODO: Calculate the posterior probability of each feature word, and the prior probability of the class. Output the result to the output file in the format required * Use uord_count.txt and "word dict. txt jointly. %23 %23 return def classify(probability, testset, outputfile): TO00: Inplenent the naive Bayes classifier to assign clas labels to the documents in the test set. Output the result to the output fila in the format required %23 return det f1 score(testset, classification result): WTODO: Use the F1 score to assess the perfornance of the implenented classification nodel * The return value should be a float abject. return 1. Write a Python program to count the word in the corpus. The output of the program should be a text file, named 'word count.txt'. The first line contains the total document frequency in five classes, in the order of 'crude', 'grain', 'money-fx', 'acq' and 'earn', separated by a space. In the following lines, each line contains a word and its frequency in five classes, in the same order of the classes, separated by space. Word count.txt: 2300 1400 768 520 123 apple 5 10 78 6 banana 10 9 16 24 6 candy 102 0 400 450 200 silly 30 20 15 23 64 the 350 120 260 730 110 2. Write a program to choose the most frequent 10000 words as the feature words, use the frequency obtained in requirement 1 to calculate the total word frequency in each class. Notice that only words recognized as features are taken into consideration. The output of the program should be a text file, named word dict.txt', contains the information about the feature words. The first line contains the total feature word frequency in five classes, in order of 'crude', 'grain', 'money-fx', 'acq' and 'earn', separated by a space. In the following lines, each line contains a word and its frequency in five classes, in the same order as above, separated by space. 3. Use the above formula to calculate the posterior probability of each feature word and the prior probability of the class. The output of your program should be a text file, named word probability.txt', The first line contains the prior probability of the five classes, in the order of. 'crude', 'grain', 'money-fx', 'acq' and 'ean', separated by space. In the following lines, each line contains a word and its posterior probability in five classes, separated by space.
Expert Solution
steps

Step by step

Solved in 4 steps with 4 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY