Context for remaining questions Natural Language Processing (NLP) is the field of computer science that is concerned with using computers to make sense of language as it is spoken naturally. One of the most commonly used formulas is "tf-idf". We use tf-idf to quantify how important a particular word -- also called a term in this context -- is to the document it appears in. "tf" stands for "term frequency". We calculate it by taking the number of times a term appears in the document, and dividing it by the total term count (just like the word count of an essay), like so: tf = (number of times the term appeared in the document)/(total word count for the document) "idf" stands for "inverse document frequency". Document frequency is simply the number of documents the term appeared in at least once in your entire document collection. A document collection could be a collection of books or articles, or all of the webpages returned by a search result, or all the reviews on a single product on Amazon, etc. This term helps us lower the importance of words that are so common in this collection that it's meaningless that they are present in a document. For instance, most documents will contain common words like "the" or "and" many times. That doesn't mean that the document is about those words. Document frequency is calculated like so: df = (number of documents the term appears in at least once)/(total number of documents in the collection) To get inverse document frequency, you just divide one by the document frequency like so: idf = 1/df = (total number of documents in the collection)/(number of documents the term appears in at least once) Finally, to calculate tf-idf, you multiply tf by idf. This is mathematically identical to dividing tf by df. Question 2 a) Write code that opens the file "term_data.txt" and loads data into the following variables, in this order: termCount = number of times the term appeared in the document length = total word count for the document docCount = number of documents the term appears in at least once totalDocs = total number of documents in the collection Hint: You will need to include the right header file to complete this question. b) Continue by adding code that calculates tf, idf, and tf-idf, and prints all three to the console.