North Lake College Spell checker Program

Content Type

User Generated

User

Evx11

Subject

Computer Science

School

North Lake College

Description

.................................... ............. ................

Unformatted Attachment Preview

12:40 CM @ 00 Ml. • Q 17% ال. ج Per % Program requirements (220 points) Unless otherwise specified, you should always assume that every function you are asked to implement has to work for all sizes and variations of the data. This assignment has 2 main parts: • Part 1: Allow the user to repeatedly enter pairs of words to compute the edit distance for. The code that reads the words and calls the edit distance is provided in spell_checker.c. It will work once you implement the edit distance function. The edit distance function will build and print the table for the edit distance and also the distance itself. It should be CASE SENSITIVE. EditDistance("dog","DOG") = 2 (I consider it important to be able to print the data from your program in a formatted, easily readable way that allows you to easily check and verify that the program does what you want it to do. Printing the table for the edit distance along with the indices and corresponding letters from the strings allows you to check that the program generates the same table as we did in class or for any other test case you develop on your own on paper.) • Part 2: Simulate a simple version of a spell checker program. • You will write all your code in the file spell.c (provided). A client file, spell_checker.c, is also provided. It implements the high level behavior of the program and calls specific functions that you must implement in spell.c. Details: 1. Implement the Edit Distance between 2 strings as shown in class. It must be the BOTTOM-UP DYNAMIC PROGRAMMING method (i.e. the one that has NO III O ? 12:40 CM @ 20 M3. • 17% الا. ه Details: 1. Implement the Edit Distance between 2 strings as shown in class. It must be the BOTTOM-UP DYNAMIC PROGRAMMING method (i.e. the one that has NO recursion). Simply write the loop(s) to populate the 2D table. - 20 points Dist(0,0) = 0 Dist(0.j) =j Dist(1,0) = i Dist(ij) = Dist(i-1.j-1) if Xi-1 = Yj-1 1 + min { Dist(i-1,j), Dist(i,j-1), Dist(i-1,j-1) } if Xi-1 Yj-1 It should be CASE SENSITIVE. EditDistance("dog","DOG") = 2 2. Print the distance matrix as a formatted table. - 25 points 3. Allow the user to repeatedly compute the edit distance between pairs of words given as input. It stops when the user enters -1 -1 . (Implementation already provided) 4. Spell check. 1. If user selects verbose mode, print the dictionary words before and after sorting and also the words touched-on during binary search. It should match the sample output perfectly: index number and word. 2. load a dictionary file. Sort the data in the file in alphabetical order. (if verbose mode, print the dictionary before and after sorting.) You can assume all the words in the dictionary are in lowercase. I STRONGLY encourage you to use the qsort function from the C library. E.g. read http://www.cplusplus.com/reference/cstdlib/qsort/. Note that the compar function that it uses (and you need to write) takes POINTERS to whatever type of data is in your array. That means that if your array already has pointors, that function takes point ã pointers. It may take a bit of cart. rial and error, but it is worth the price to learn to use 11. .וח III. O ( T 12:40 CM @ 00 Ml. • 17% الان. مه may take a bit of careful trial and error, but it is worth the price to learn to use the qsort function. The compar argument is a function pointer. If you are not familiar with it you can read here https://www.geeksforgeeks.org/function- pointer-in-c). Based on how you store the dictionary words (array of pointers or 2D array of chars), it may be a bit tricky to set up the compar function, or to give the correct size of the elements for the qsort function. If you write a good function to be passed for the compar argument, qsort will work and you do not need to implement a sorting function. This is a great opportunity to learn how to use a library function, as opposed to writing everything ourselves. Function pointers are also so cool... 3. open a file to write the corrected text to. That is the output file. It will have the same name as the text file, but with the prefix "out_" added to it. E.g. if processing text fie "text1.txt" a new file with name "out_text1.txt" will be created and have the spell-checked version of the paragraph from file "text1.txt". 4. open the text file (e.g. text1.txt) and process it as follows: 1. any separator is just copied in the output file. List of symbols to be recognized as separators: space (one white space), comma, dot, exclamation mark, question mark (, .!?). You do not need to worry about other separators. You can assume that the file has only English letters and the supported separators. Make sure all separators are copied, even if there are several consecutive separators. You can assume that there is NO new III. ? 17% ال. ج -T I 12:40 DM 00 M2 • Per % 2. extract a word. You can assume that the file starts with a word (and never with a separator). The last symbol in the file may be a separator or a letter. Make sure you extract the last word correct even when it does not have a separator after it. 3. for each extracted word do: print it to the screen. Put two vertical bars around it to be able to tell if you read any extra space or not with the word. E.g. print |Can|, not just Can. - search for it in the sorted dictionary using binary search. Keep the count of how many words were touched- on during binary search (or how many times the loop for binary search executed) and print it. If in verbose mode, print the dictionary words that were used during binary search. - If the word is found, it means the spelling is correct. Write it to the output file. - If the word is not found, identify the most similar words in the dictionary and give these options to the user as to what correction to be used for this word in the output file -1 - user will type the correct spelling 0 - leave the word as is (do not apply any correction) list of most similar words from dictionary. The user will select a word from this list. print the corrected word in the output file. 5. In order to find the most similar words in the dictionary file do: 1. compute the edit distance between the misspelled word and all the dictionary words. You can store all these distances. Here you can do an improvement and not compute the distance to all wori 't you do NOT have to. It is fine to T ㅈ distance to all words in the dictionary even though some may clearly be too pute the III. ? 12:41 CM @ 00 Ml • @ * Put will 17% distance to all words in the dictionary even though some may clearly be too different (because they are too big or too small). 2. find the smallest distance 3. print all the dictionary words that have that edit distance. Print an index as well to allow the user to easily select the correct word. 6. Get the user's choice. If the choice is (-1) the program will also allow the user to type in a word. See sample runs. 7. Calculate the worst case time complexity to find the most similar words in the dictionary in case of misspelled words. Assume there are T misspelled words in the text file, D words in the dictionary and that each word can be at most MAX_LEN chars. What is the time complexity to compute the edit distance from each test word to each dictionary word? Since the word length can vary, you should assume the worst case, that is, assume that every test word and every dictionary word is size MAX_LEN. What is the o for this worst case scenario? Give your answer as a function of T,D and MAX_LEN. For example if T = 10 extracted words and D = 222 dictionary words, and MAX_LEN =100 you would assume that each of those (10+222) words has 100 characters. Write the time complexity at the top of your file as a comment. (You do not need to worry about the time to read the words from files. Assume they are already in memory for this calculation.) 8. Calculate the time complexity to search for a word in the dictionary (to see if it is correctly spelled) uniao binary search. Assume the case: the word is not found, an.... the words have MAX LEN. III. <
Purchase answer to see full attachment

Tags: spell checker pair of words edit distance spell c bottom up dynamic programming