Write a Python program to create a Markov model of order k, and then use that model to generate text. Our Markov model will be stored in a dictionary. The keys of the dictionary will be k-grams, and the value for each key will also be a dictionary, storing the number of occurrences of each character that follows the k-gram. Note how, for instance, the key 'ga' in the dictionary has the value {'g': 4, 'a': 1}. That is because, following 'ga' in the input text, the letter 'g' appears four times and the letter 'a' appears one time. write the functions: get_grams(text, k): Returns a dictionary of k-grams as described above, using the input string text and the given positive integer k. Do not form k-grams for the last k characters of the text.
Write a Python program to create a Markov model of order k, and then use that model to generate text. Our Markov model will be stored in a dictionary. The keys of the dictionary will be k-grams, and the value for each key will also be a dictionary, storing the number of occurrences of each character that follows the k-gram.
Note how, for instance, the key 'ga' in the dictionary has the value {'g': 4, 'a': 1}. That is because, following 'ga' in the input text, the letter 'g' appears four times and the letter 'a' appears one time.
write the functions:
get_grams(text, k): Returns a dictionary of k-grams as described above, using the input string text and the given positive integer k. Do not form k-grams for the last k characters of the text.
combine_grams(grams1, grams2): Takes two k-gram dictionaries and combines them, returning the new combined dictionary. All key-value pairs from both dictionaries should be added into the new dictionary. If a key exists in both dictionaries, then combine the values (if the two values are dictionaries, then combine the dictionaries as above; if the two values are integers, then add them together). The input dictionaries should not be modified.
get_grams_from_files(filenames, k): This function will take a list of strings filenames and a positive integer k. It will read in the files at the given filenames, and create a k-grams dictionary for each file. It will combine all such k-grams dictionaries and return the combined dictionary. Note: When opening the files, use the keyword argument encoding='utf-8', which tells Python to interpret the text file using a particular character encoding. Otherwise, Python may use a different encoding depending on your OS which can result in errors.
generate_next_char(grams, cur_gram): This function returns the prediction of the next character to follow the k-gram cur_gram, given the dictionary of k-grams grams. To do so, it must determine the probability of each character that can follow the cur_gram. Probability is beyond the scope of this course, so we will allow you to use a function from the random module to help here called random.choices(). random.choices(population, weights) takes two lists as arguments. The first list is the items to be chosen from at random (in this case, the characters that can possibly follow the given k-gram). The second list is the weighting for each of the items in the former list. That is, the element (character) at population[i] will be chosen with probability weights[i]. All you have to do is create these two lists, then call the latter function to obtain the predicted character. Note that the weight for a character can be found by taking the number of occurrences of that character following the k-gram and dividing by the total number of occurrences of any character following the k-gram. That is, if k = 1, the current k-gram is 'a', and the k-gram dictionary is {'a': {'b': 3, 'c': 9}, 'c': {'d': 4}}, then either 'b' or 'c' could follow, with 'b' having weight 3 12 and 'c' having weight 9 12 . (So the function would be much more likely to return 'c' than 'b'.) If the cur_gram is not present in the grams dictionary, or if it has a different number of characters than the k-grams in the dictionary, then raise an AssertionError with an appropriate error message in each case
generate_text(grams, start_gram, k, n): This function generates a piece of text of length n (a positive integer), given the k-grams dictionary grams, the positive integer k, and the starting kgram start_gram. That is, starting with the start_gram, continue generating characters until you have a text of length k. Then, cut off the text at the last empty whitespace (space or newline character), and return the text. (Cutting off the text may result in the text being a few characters smaller than n, but that’s OK.) Note: If the start_gram string is longer than k characters, then use only the first k characters (discard the rest).
Trending now
This is a popular solution!
Step by step
Solved in 3 steps with 4 images