Submission details: Make a directory and name it, using your first and last names, applying the camel notation, for example haniGirgis. Please name the main file, where execution starts, as hw2.cpp. Compress your directory by the zip utility, and submit the compressed file using Blackboard Introduction Suppose you had a document (perhaps a newspaper article, or an unattributed manuscript for a book), and you were interested in knowing who wrote it. One way to try to determine the authorship of the anonymous document is by comparing properties of the anonymous document with properties of known documents, and seeing if there is enough similarity to make a judgment of authorship. Some simple properties one might use to distinguish different authors include: Vocabulary (ie. the set of words an author uses). Word frequencies (ie. the frequencies with which an author uses words). Bigram frequencies (ie the frequencies of two consecutive words). Bigram probabilities (ie. the probability that one word follows another word). Terminology A unigram is a sequence of words of length one (ie. a single word). A bigram is a sequence of words of length two. • The conditional probability of an event E2 given another event E1, written P(E2|E1), is the probability that E2 will occur given that event El has already occurred. We write p(w(k)]w[k-1)) for the conditional probability of a word w in position k, w(k), given the immediately preceding word, w(k-1). You determine the conditional probabilities by determining unigram counts (the number of times each word appears, written c(w{k}), bigram counts (the number of times each pair of words appears, written c(w(k-1) w(k)), and then dividing each bigram count by the unigram count of the first word in the bigram: P(WORD(k)|WORD(k-1) = c(WORD(k-1) WORD(k)) / c(WORD(k-1)) For example, if the word "time" occurs seven times in a text, and "time of occurs threetimes, then the probability of "of occurring after "time" is 3/7. Project description In this project, you will be determining conditional probabilities of bigrams. To do this, you will write a C program, which reads in a file of text and produces three output files, as described below.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

Based on the requirements in the image make sure the code below matches the description and run, i have already done some code Make sure implement  INCORPORATE CODE BELOW

#include <stdio.h>
#include <stdlib.h>

//Represent a node of the singly linked list
struct node{
    int data;
    struct node *next;
};

//Represent the head and tail of the singly linked list
struct node *head, *tail = NULL;

//addNode() will add a new node to the list
void addNode(int data) {
    //Create a new node
    struct node *newNode = (struct node*)malloc(sizeof(struct node));
    newNode->data = data;
    newNode->next = NULL;

    //Checks if the list is empty
    if(head == NULL) {
        //If list is empty, both head and tail will point to new node
        head = newNode;
        tail = newNode;
    }
    else {
        //newNode will be added after tail such that tail's next will point to newNode
        tail->next = newNode;
        //newNode will become new tail of the list
        tail = newNode;
    }
}

//removeDuplicate() will remove duplicate nodes from the list
void removeDuplicate() {
    //Node current will point to head
    struct node *current = head, *index = NULL, *temp = NULL;

    if(head == NULL) {
        return;
    }
    else {
        while(current != NULL){
            //Node temp will point to previous node to index.
            temp = current;
            //Index will point to node next to current
            index = current->next;

            while(index != NULL) {
                //If current node's data is equal to index node's data
                if(current->data == index->data) {
                    //Here, index node is pointing to the node which is duplicate of current node
                    //Skips the duplicate node by pointing to next node
                    temp->next = index->next;
                }
                else {
                    //Temp will point to previous node of index.
                    temp = index;
                }
                index = index->next;
            }
            current = current->next;
        }
    }
}

//display() will display all the nodes present in the list
void display() {
    //Node current will point to head
    struct node *current = head;
    if(head == NULL) {
        printf("List is empty \n");
        return;
    }
    while(current != NULL) {
        //Prints each node by incrementing pointer
        printf("%d ", current->data);
        current = current->next;
    }
    printf("\n");
}

struct node* findMiddle(struct node* b, struct node* e)
{
    if(b == NULL)
        return NULL;
    struct node* slow = b;
    struct node* fast = b->next;
    while(fast != e)
    {
        fast=fast->next;
        if(fast != e)
        {
            slow = slow->next;
            fast = fast->next;
        }
    }
    return slow;
};

struct node* binarysearch(int searchItem)
{
    struct node* beg = head;
    struct node* end = NULL;
    do
    {
        struct node* middle = findMiddle(beg, end);
        if(middle == NULL)
            return middle;
        else if(middle->data == searchItem)
            return middle;
        else if(middle->data < searchItem)
            beg = middle->next;
        else
            end = middle;
    }while(end==NULL || end != beg);
    return NULL;
}

int main()
{
    //Adds data to the list
    addNode(1);
    addNode(2);
    addNode(3);
    addNode(2);
    addNode(2);
    addNode(4);
    addNode(1);

    printf("Originals list: \n");
    display();

    //Removes duplicate nodes
    removeDuplicate();

    printf("List after removing duplicates: \n");
    display();

    printf("Search for 3 in the linked list:\n");
    struct node* n = binarysearch(3);
    if(n==NULL)
    {
        printf("Item not Found\n");
    }
    else
    {
        printf("Item Found\n");
        printf("%d\n",n->data);
    }
    printf("Search for 7 in the linked list:\n");
    n = binarysearch(7);
    if(n==NULL)
    {
        printf("Item not Found\n");
    }
    else
    {
        printf("Item Found\n");
        printf("%d",n->data);
    }
    return 0;
}

Submission details: Make a directory and name it, using your first and last names,
applying the camel notation, for example haniGirgis. Please name the main file,
where execution starts, as hw2.cpp. Compress your directory by the zip utility, and
submit the compressed file using Blackboard.
Introduction
Suppose you had a document (perhaps a newspaper article, or an unattributed
manuscript for a book), and you were interested in knowing who wrote it. One way
to try to determine the authorship of the anonymous document is by comparing
properties of the anonymous document with properties of known documents, and
seeing if there is enough similarity to make a judgment of authorship.
Some simple properties one might use to distinguish different authors include:
• Vocabulary (ie. the set of words an author uses).
• Word frequencies (i.e. the frequencies with which an author uses words).
• Bigram frequencies (i.e. the frequencies of two consecutive words).
• Bigram probabilities (i.e. the probability that one word follows another
word).
Terminology
• A unigram is a sequence of words of length one (i.e. a single word).
A bigram is a sequence of words of length two.
The conditional probability of an event E2 given another event E1, written
p(E2|E1), is the probability that E2 will occur given that event El has already
occurred.
We write p(w(k)]w(k-1)) for the conditional probability of a word w in position k,
w(k), given the immediately preceding word, w(k-1). You determine the conditional
probabilities by determining unigram counts (the number of times each word
appears, written c(w(k)), bigram counts (the number of times each pair of words
appears, written c(w(k-1) w(k)), and then dividing each bigram count by the
unigram count of the first word in the bigram:
p(WORD(k)|WORD(k-1)) = c(WORD(k-1) WORD(k}) / c(WORD(k-1))
For example, if the word "time" occurs seven times in a text, and "time of" occurs
three times, then the probability of "of" occurring after "time" is 3/7.
Project description
In this project, you will be determining conditional probabilities of bigrams. To do
this, you will write a C program, which reads in a file of text and produces three
output files, as described below.
Transcribed Image Text:Submission details: Make a directory and name it, using your first and last names, applying the camel notation, for example haniGirgis. Please name the main file, where execution starts, as hw2.cpp. Compress your directory by the zip utility, and submit the compressed file using Blackboard. Introduction Suppose you had a document (perhaps a newspaper article, or an unattributed manuscript for a book), and you were interested in knowing who wrote it. One way to try to determine the authorship of the anonymous document is by comparing properties of the anonymous document with properties of known documents, and seeing if there is enough similarity to make a judgment of authorship. Some simple properties one might use to distinguish different authors include: • Vocabulary (ie. the set of words an author uses). • Word frequencies (i.e. the frequencies with which an author uses words). • Bigram frequencies (i.e. the frequencies of two consecutive words). • Bigram probabilities (i.e. the probability that one word follows another word). Terminology • A unigram is a sequence of words of length one (i.e. a single word). A bigram is a sequence of words of length two. The conditional probability of an event E2 given another event E1, written p(E2|E1), is the probability that E2 will occur given that event El has already occurred. We write p(w(k)]w(k-1)) for the conditional probability of a word w in position k, w(k), given the immediately preceding word, w(k-1). You determine the conditional probabilities by determining unigram counts (the number of times each word appears, written c(w(k)), bigram counts (the number of times each pair of words appears, written c(w(k-1) w(k)), and then dividing each bigram count by the unigram count of the first word in the bigram: p(WORD(k)|WORD(k-1)) = c(WORD(k-1) WORD(k}) / c(WORD(k-1)) For example, if the word "time" occurs seven times in a text, and "time of" occurs three times, then the probability of "of" occurring after "time" is 3/7. Project description In this project, you will be determining conditional probabilities of bigrams. To do this, you will write a C program, which reads in a file of text and produces three output files, as described below.
To compute the conditional probabilities you need to determine unigram and
bigram counts first (you can do this in a single pass through a file if you do things
carefully) and store them in a Binary Search Tree (BST). After that, you can compute
the conditional probabilities.
Input files
Test files can be found on (http://www.gutenberg.org/ebooks/). For example,
search for "Mark Twain." Then click on any of his books. Next download the "Plain
Text UTF-8" format.
In addition, you should test your program on other input files as well for which vou
can hand-compute the correct answer.
Output files
Your program must accept the name of an input file as a command line argument.
Let's call the file name of this file fn. Your program must then produce as output the
following set of files:
• Your program must write the unigram counts to a file named fn.uni in which
each unigram is listed on a separate line, and each line contains just the
unigram and its count (an integer), separated by a single space.
• Your program must write the bigram counts to a file named fn.bi in which
each bigram is listed on a separate line, and each line contains just the
bigram and its count (an integer), separated by a single space.
Your program must write the conditional probabilities to a file named fn.cp,
reported in the form P(WORD(k)|WORD(k-1)) = p, where p is the conditional
probability of WORD(k) given WORD(k-1).
Transcribed Image Text:To compute the conditional probabilities you need to determine unigram and bigram counts first (you can do this in a single pass through a file if you do things carefully) and store them in a Binary Search Tree (BST). After that, you can compute the conditional probabilities. Input files Test files can be found on (http://www.gutenberg.org/ebooks/). For example, search for "Mark Twain." Then click on any of his books. Next download the "Plain Text UTF-8" format. In addition, you should test your program on other input files as well for which vou can hand-compute the correct answer. Output files Your program must accept the name of an input file as a command line argument. Let's call the file name of this file fn. Your program must then produce as output the following set of files: • Your program must write the unigram counts to a file named fn.uni in which each unigram is listed on a separate line, and each line contains just the unigram and its count (an integer), separated by a single space. • Your program must write the bigram counts to a file named fn.bi in which each bigram is listed on a separate line, and each line contains just the bigram and its count (an integer), separated by a single space. Your program must write the conditional probabilities to a file named fn.cp, reported in the form P(WORD(k)|WORD(k-1)) = p, where p is the conditional probability of WORD(k) given WORD(k-1).
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps with 1 images

Blurred answer
Knowledge Booster
Linked List Representation
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education