CSC280 – Fall, 2018
Homework 7: Markov gibberish
Due: Sunday, December 2nd
Total points: 50pts
Objective
Your goal for this project is to take a block of text, analyze it, and produce random sentences in the style of
the original text. For example, your program, given Wizard of Oz, might produce:
how quite lion bread them no mighty us said then for you or would and boots her wizard arts and injury
with they the he nose said them
How?
I generated the above text using a two step process.
My program first analyzed the story and recorded word sequences. For example, it observed that the word
so was followed by the following words: you, you, do, get, you, goodbye. I recorded this by making a
Word Dictionary. Each unique word in the book has a dictionary entry. Under each entry is a list of all the
words in the book that were found to follow that word.
Next my program randomly chose a “starter word” from the dictionary. It used that word to generate
gibberish as follows:
- it looked up the entry in the dictionary under the starter word and got the list of words that followed it.
wordDictionary[“so”] ==> [“you”,”you”,”do”,”get”,”you”,”goodbye”]
- it chose a random word from the list. Notice that words occurring more than once in the list are more
likely to be chosen.
[“you”,”you”,”do”,”get”,”you”,”goodbye”] ==> “you”
- it added that random word to the output string
“this was sadly one for his so you”
- it repeated these steps, now using this random word to index the dictionary.
wordDictionary[“you”] ==> ...
Steps
I recommend you complete the project in the following steps:
1. Go to Project Gutenberg and download some books. Alice in Wonderland, Wizard of Oz, and the Bible
tend to work well. Write a python program to read a book and store it as a string.
For example, you can analyze Obama’s speech:
http://central.gutenberg.org/articles/the_speech:_race_and_barack_obama's_%22a_more_perfect_u
nion%22
1
2. Write a function generateDictionary. This function will take a string containing the book as a parameter
and will return the word dictionary. You can do this as follows:
- make an empty dictionary
- split the string into words
- for each word, check if it is in the dictionary
-if it is not, make a dictionary entry for the word: an empty list
-add the next word in the story to this list
3. Make a function pickStartingWord. It should take the dictionary as a parameter, choose a word at
random from it, and return that word.
4. Make a function randomWalk. This function will take the dictionary and a single word as parameters.
It will look the word up in the dictionary, pick a next word randomly from the list of next words, and return
that word.
Here, for example, you pick a word from the dictionary, and then random pick a word from the dictionary
value, which are next words.
5. In your main function, call pickStartingWord to get a first word. Then call randomWalk repeatedly to
generate your gibberish text. Everytime you call randomWalk again, it picks up a different word from the
dicitionary.
Add preprocessing
Try removing punctuation and capitalization before generating your dictionary. Does this improve your
results?
Make a second-order generator
You should now have completed a first-order gibberish generator. It is called this because it used only a
single word to index the dictionary. If you used a two-word sequence instead of a single word, your output
text will capture more of the sentence structure of the original text. Try repeating the steps above, but now
use two word pairs instead of single words.
Hint: your key of the dictionary should be a two-word phrase.
Your results should look like this:
the scarecrow began to tremble with fear for she was so green and then another groan reached their ears
and looked everywhere to see if she ever took off her old friend again and solder him together where he
was always careful not to kill the wicked witch
Make an n-order generator
Now try rewriting your program so that it takes an additional parameter n, which chooses how many words
will index the dictionary.
For this purpose you might want to use if__main__() and run your code in command line.
Grading
Grading for this project will be as follows:
10%
Did you make an effort and submit something
30%
Does your project compile
10%
Did you comment your code
15%
Do you generate the first-order dictionary correctly (step 2)
2
15%
10%
10%
Do you generate the first-order gibberish correctly (step 5)
Do you generate second-order gibberish correctly?
Do you generate n-order gibberish correctly?
Requirements
• Your program must be written in Python.
• Your code must include comments.
• Your submitted code must compile without errors to receive any partial credit.
•
•
•
•
The main function in your program should be called gibberish(filename, numwords, n) and should take
as parameters the name of the text file (a string), the number of words in your output (an int), and the
order (an int).
You should submit your .py file on Blackboard. Please name your file FirstName_LastName.py
You must make sure your code in command line environment.
You must include the book as .txt file in your submitted folder. So I can run your code automatically
without have to downloading anything extra. If you mess this up, you will pose points.
3
Purchase answer to see full
attachment