Computer Science
In this problem you will create a simple implementation of the FlajoletMartin algorithm using Python. The stream will be the contents of a text file and you will produce an approximation of the number of unique words in the file as given by the algorithm. You will need to process the file one line at a time and may not store any part of the file. You can obtain words by splitting the lines on whitespace. Your code will be run from a terminal according to the following command cat filename | python your_code.py. If you are running OSX then you can run this from the terminal. If you are using Windows you should make use of the Ubuntu terminal installed for running Spark. Your code will need to read from sys.stdin. You may not use list, dictionaries, or any other container as these are not employed by the algorithm. You may use Python’s built-in hash and bin functions. hash will create a number from a string and bin will give you a binary conversion of this number that you can then count consecutive zeros starting at the right of the string. You may also use hash functions defined in the hashlib module. Write your code in the file problem1.py. You may import hashlib but nothing else besides those which are already imported in the file. This is not necessary if you use the function hash mentioned above. If you would prefer you do not need to use the starter code in the file, but your solution must conform to the parameters given above.
Process or set of rules that allow for the solving of specific, well-defined computational problems through a specific series of commands. This topic is fundamental in computer science, especially with regard to artificial intelligence, databases, graphics, networking, operating systems, and security.
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
This is a popular solution!
Trending now
This is a popular solution!
Step by step
Solved in 2 steps