Programming language and Paradigms

User Generated

zhmhob

Programming

Description

Please follow the requires in the file. Check the file and requires before accept it.

Unformatted Attachment Preview

COMP3109 Assignment 2 Parsing LL(1) Grammars (10 marks) This second assignment is an individual assignment, and is due Friday, 6pm, in Week 8. Please submit this assignment via eLearning. Your assignment will not be assessed unless the following criteria are met: (1) For each task have at least 15 test cases. (2) The documentation should be done in form of comments. Please provide plenty of them. In this exercise, you will be implementing a table-driven predictive parser1 for LL(1) grammars, that constructs a table in a pre-processing step using first- and follow sets. After the construction of the table, the input is traversed by reading terminal by terminal from the input, and the parser will crosscheck with the means of a stack and a predictive parsing table whether the input is well-formed. This assignment is split into four tasks. The first task is to compute “first and follow” sets of a given grammar, which are necessary for parsing sentences. The second task is to compute a predictive parsing table based on first- and follow sets. The third task is to implement a table-driven predictive parser which checks whether an input sentence is member of a language. The forth task rewrites an EBNF grammar to a BNF grammar. We can define a context-free grammar (CFG) as a 4-tuple hT, V, P, Si, where • T is a finite set of terminals, • V is a finite set of non-terminals, • P is a set of production rules of the form V → (T ∪ V )∗ , and • S ∈ V is the start symbol for the grammar. Here is an example of a CFG in BNF: S A A B ::= ::= ::= ::= A B B A a epsilon b where epsilon denotes the empty string ε. For sake of simplicity, we further assume that small letters are terminals and capital letters are non-terminals. The alternatives of a non-terminal are spelled out in separate production rules rather splitting alternative productions by a separator symbol. Note that the set of terminals T is {a, b}, the set of nonterminals V is {S, A, B}, and the start symbol is S for the above grammar. 1 Some of you may be familiar with parsing CFGs via a method called recursive descent – the table-based method is different. 1 COMP3109 Sentences of the language which is generated by the grammar above are: bb abb bba abba Your programs for accomplishing the following tasks must run in the ED workspaces, and should be implemented in Python. The symbol for derivation should be ::=. Note that your program(s) will have to work out the start symbol for the grammar; the LHS of the first grammar rule is the start symbol for the grammar. Note also that the case of each symbol does not define whether or not the symbol is a terminal or non-terminal; your program will have to work out whether each symbol is a terminal or not. You should have one (or more) common files which do all of the “first and follow” logic, and your three front end programs (one for each task) should be calling functions from this common library. For example, our sample implementation consists of four files: bash$ wc -l *.py 120 common.py 20 question1.py 20 question2.py 19 question3.py 179 total Programming Languages and Paradigms Page 2 of 9 COMP3109 Task 1 (3 marks) For the construction of parsers you need to compute two functions called F IRST and F OLLOW associated with a grammar G. During top-down parsing the two functions choose which production is used for expanding the left-most non-terminal in the sentential form. F IRST sets We define F IRST(α) as the set of terminal symbols that are the first symbols in the language of sentential form α. If ε is in the language of α, then ε will be also in F IRST(α). E.g., F IRST(B) is b for our example grammar because B → b, and F IRST(A) is set {ε, a} because A either derives to ε or a. We define F OLLOW(A) for a non-terminal A to be the set of terminals a that can appear immediately ∗ to the right of A, i.e., this is the set of terminal symbols a such that there exists a derivation S → αAaβ for some α and β. For parsing we introduce a special symbol called the input end marker denoted by $. Ths input end marker is used as a stop symbol for the input and needs to be considered in parsing since we cannot really check for ε as an input for the language. First, we compute F IRST(X) for all grammar symbols X ∈ V ∪ T by applying following rules 1. If X ∈ T , then F IRST(X) = {X}. 2. If X ∈ V and X → Y1 . . . Yk ∈ P for some k ≥ 1, then place a in F IRST (X) if a ∈ F IRST(Y1 ) or for some i, a ∈ F IRST(Yi ), and ε is in all of F IRST(Y1 ), . . . , F IRST(Yi−1 ). If ε is in all of F IRST(Y1 ), . . . , F IRST(Yk ), place ε in F IRST(X). 3. If X ∈ V , and X → ε, then place ε in F IRST(X). until no more terminals or ε can be added to any F IRST set. Example 1. For our grammar the FIRST sets are given below: Symbol a b A B S F IRST {a} {b} {a, ε} {b} {a, b} We extend the definition of F IRST to arbitrary sentential forms, α = Y1 . . . Yk for some k > 1, as follows, ( F IRST(Y1 ), if ε 6∈ F IRST(Y1 ), F IRST(Y1 . . . Yk ) = F IRST(Y1 ) \ {ε} ∪ F IRST(Y2 . . . Yk ), otherwise, and F IRST(ε) is {ε}. For example, F IRST(ABBA) is {a, b}. Programming Languages and Paradigms Page 3 of 9 COMP3109 F OLLOW sets Second, we compute F OLLOW(B) for all nonterminals by applying the following rules until nothing can be added to any F OLLOW set: 1. Place $ in F OLLOW(S) for start symbol S. 2. If there is a production A → αBβ, then all symbols in F IRST(β) except ε is in F OLLOW(B). 3. If there is a production A → αB or ε is in F IRST(β) of a production A → αBβ, then all symbols in F OLLOW(A) are in F OLLOW(B). Example 2. For our grammar the F OLLOW sets are given below: Non-terminal F OLLOW S {$} A {b, $} B {a, b, $} Your task is to write a program which implements the “first and follow” algorithm. Your program should take a filename as a command line argument, which will contain a grammar definition in the same BNF format as our example grammars. In this file there will be one rule per line, with production alternatives split over multiple lines. There will be no blank lines in the input file. Each symbol in the production of a rule will be separated by a single space character. The literal string “epsilon” will be the symbol representing ε. All other symbols will consist of a single character only (strings of length 1). Your program should determine the F IRST and F OLLOW sets for all terminals and non-terminals in the provided grammar file, and then output the values for all non-terminals in the format shown below. bash$ cat example.grammar S ::= A B B A A ::= a A ::= epsilon B ::= b bash$ ./question1.py example.grammar First: A -> a epsilon B -> b S -> a b Follow: A -> b $ B -> a b $ S -> $ Programming Languages and Paradigms Page 4 of 9 COMP3109 Task 2 (2 marks) Algorithm 1 collects the information from F IRST and F OLLOW into a predictive parsing table M [A, a] where A is a non-terminal, and a is a terminal symbol or the input end marker $. Based on the predictive table the parsing is performed on the idea that the production rule A → α is chosen if the next input symbol a is in F IRST(α). The only problem occurs if ε can be derived from the sequence α. In this case, we choose A → α, if the current symbol is in F OLLOW(A), or if the input end marker $ has been reached and $ is in F OLLOW(A). If there is no production at entry M [A, a], then we set M [A, a] to error. Algorithm 1 Construction of the Parsing Table for all A → α ∈ P do for all a ∈ F IRST(α) do let M [A, a] := A → α end for if ε ∈ F IRST(α) then for all b ∈ F OLLOW(A) do let M [A, b] := A → α end for if $ ∈ F OLLOW(A) then let M [A, $] := A → α end if end if end for Algorithm 1 can be applied to any LL(1) grammar, and produces a single table entry that is either a production or signals an error. It can be shown that if you assign a table entry more than once, the grammar is not LL(1). Example 3. The predictive parsing table of our grammar is given below: M S A B a S → ABBA A→a error b S → ABBA A→ε B→b $ error A→ε error Programming Languages and Paradigms Page 5 of 9 COMP3109 Your task is to write an algorithm that generates a predictive parsing table and reports an error if the grammar is not LL(1). Your program should produce as output one of two things. If the grammar is not LL(1), then it should output “Grammar is not LL(1)!”. Otherwise, your program should output a readable representation of the parsing table for the grammar. The output format should be lines of the format R[A, a] = n where A is the non-terminal, a is the terminal, and n is the rule number (counting from zero). The order of these lines of output does not matter. An example usage of your program is as follows: bash$ cat isLL1.grammar S ::= A B B A A ::= a A ::= epsilon B ::= b bash$ ./question2.py isLL1.grammar R[A, a] = 1 R[A, b] = 2 R[A, $] = 2 R[S, a] = 0 R[S, b] = 0 R[B, b] = 3 bash$ cat notLL1.grammar S ::= A B B A A ::= a A ::= epsilon B ::= b B ::= epsilon bash$ ./question2.py notLL1.grammar Grammar is not LL(1)! Programming Languages and Paradigms Page 6 of 9 COMP3109 Task 3 (3 marks) We construct a non-recursive predictive parser by utilising a stack that contains either non-terminals or terminals. The contents of the stack represents a sequence of non-terminals and terminals α (read from the top of the stack to the bottom) such that wα is a derivable sentential form of the start symbol, i.e, ∗ S → wα where w ∈ T ∗ is the input that has already been matched so far. Initially, the stack is set to the value hS, $i where S is the start symbol and $ is the input end marker. This stack configuration denotes the state that we have not consumed any input symbols from the input stream yet. Algorithm 2 Table-driven Predictive Parser push S$ let a be the first symbol in the input stream while stack is not empty do if X is a non-terminal then if M [X, a] = A → α then pop push α else report syntax error end if else if X = a then if X 6= $ then pop let a be the next symbol in the input stream end if else report syntax error end if end if end while if a 6= $ then report syntax error else report sentence is in the language end if The parser considers the top of the stack symbol X and the current input symbol a. If X is a nonterminal symbol, then X will be replaced by the entry M [X, a] of the predictive table. Note if M [X, a] has an error entry, then a syntax error will be reported. Otherwise, X is a terminal symbol and if X matches the current input symbol a, we advance with the next terminal in the input stream and pop the element X from the stack. If X does not match a, we will report a syntax error. The parsing is successful if the stack is empty and we have consumed all symbols in the input stream. The parsing of a table-driven predictive parser is summarised in Algorithm 2. In the algorithm we use a pop operation that pops one symbol from the stack. The push operation pushes a sequence of terminals Programming Languages and Paradigms Page 7 of 9 COMP3109 and non-terminals onto the stack whereby the right-most symbol of the sequence is pushed first on the stack. Note that the order is relevant otherwise the sequences of the productions will be reversed. Your task here is to write a predictive table-driven parser that reads in a LL(1) grammar and a sentence and either reports that the sentence is in the language or reports a syntax error. Your program should now take two command line arguments; the name of two files. The first file will be the grammar format as in the previous question. The second file will contain a set of strings, one string per line. Your program should use the parse table constructed in the previous question to parse the strings. For each string in the second file, your program should output either “accept” or “reject” depending on whether or not the string is accepted by the grammar. If the grammar is not LL(1), then your program should only output “Grammar is not LL(1)!”. bash$ cat isLL1.input abba aba ab bb bba bbb chicken bash$ ./question3.py isLL1.grammar isLL1.input accept reject reject accept accept reject reject bash$ ./question3.py notLL1.grammar anything_here Grammar is not LL(1)! Programming Languages and Paradigms Page 8 of 9 COMP3109 Task 4 (2 marks) This tasks is to transform a grammar given in EBNF into an equivalent BNF grammar. EBNF and BNF have two main differences (for the purpose of this question). Firstly, EBNF has curly brackets ({, }), which are used to indicate zero or more repetitions of the enclosed content. Additionally, EBNF has square brackets ([, ]) which indicates zero or one repetitions of the enclosed content. For example, the following is a grammar in EBNF: S ::= A { B b } e A ::= a | epsilon B ::= [ c ] d Its equivalent BNF form is: S T T A A B C C ::= ::= ::= ::= ::= ::= ::= ::= A T e T B b epsilon a epsilon C d c epsilon Programming Languages and Paradigms Page 9 of 9
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hi,Please find attached completed answer. Please give me a positive rating if it is satisfactory. Thanks

Running head: Generating first and follow sets

PROGRAM WHICH IMPLEMENTS THE “FIRST AND FOLLOW” ALGORITHM
STUDENT’S NAME
COURSE TITLE

1

TASK 1: A PYTHON PROGRAM WHICH IMPLEMENTS THE “FIRST AND FOLLOW” ALGORITHM
#!/bin/python3

import sys
from collections import defaultdict

class GrammarParser:
epsilon = 'EPSILON'
def __init__(self, grammar):
self.nt = set()
self.productions = defaultdict(list)
self.non_entry_nt = set()
self.first = defaultdict(set)
self.follow = defaultdict(set)
self.predict = defaultdict(set)
self.symbols = set()
self.eps = defaultdict(lambda: False)
self.nt_order = []
grammar = grammar.replace("→", "->")
grammar = grammar....


Anonymous
Great! Studypool always delivers quality work.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags