### Unformatted Attachment Preview

:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
Ambiguity of Context Free Grammar Using the CYK
Algorithm
R.C. Dharmik
Asso. Professor,
Deptt. of IT, YCCE, Nagpur
{raj_dharmik@yahoo.com}
R. S. Bhanuse
A.D. Gaikwad
Asstt. Professor,
Asstt. Professor,
Deptt. of IT, YCCE, Nagpur
Deptt. of IT, YCCE, Nagpur
{roshanbhanuse15@gmail.com} {amolgaikwad.ag@gmail.com}
Abstract
The syntax analysis phase of a Compiler is
to check syntactic structure of Programming
Language construct using Context Free
Grammar. Either by using Top-Down or
Bottom-Up parsing technique to parse string
of a given language. The string of a
Language is successfully parsed by parser of
Context Free Grammar then that string is
syntactically correct. In this paper CYK
algorithm is membership algorithm which
gives string is member of language
generated by Context Free Grammar or not.
We have found out the Context Free
Grammar is ambiguous or not using CYK
algorithm.
Keyword: Syntax analysis, Context Free
Grammar, Parsing, CYK algorithm,
ambiguous
string whether the string belongs to the
given language or not (i.e. the given string,
is the member of the given language). We
can describe membership of a string w in a
CFL L. There is an efficient technique based
on the idea of “Dynamic Programming
“which may known as “Table Filling
Algorithm” or “Tabulation”. This algorithm
known as CYK Algorithm (i.e. CockeYounger-Kasami ) [1].
The algorithm works only if the grammar is
in Chomsky normal form (CNF) and
succeeds by breaking one problem into a
sequence of smaller one. Compare at most n
pairs of previously computed sets [2]:
Xi,j = (Xi,i , Xi+1, j), (Xi,i+1 , Xi+2, j),
(Xi,i+2 , Xi+3, j), ----- , (Xi,j-1 , Xj, j)
CYK Triangular table:
1. Introduction
The Context Free Grammar is represented
by G= (V, T, P, S) Where V is s finite set of
Variable/Non-terminal symbols, T is a finite
set of terminal symbols, P is a set of
rules/productions and S is a start symbol of a
grammar. The Context Free Grammar is
used to recognize the programming
language construct using top-down and
bottom-up parsing techniques. Basically the
CYK algorithm is used to check or test the
978-1-4673-9214-3/16/$31.00 © 2016 IEEE
:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
2. Ambiguous Context Free
Grammar
The Context Free Grammar is said to be
ambiguous if there is more than one way to
generate given string of a language from a
given
grammar
or
having
two
derivation/parse trees for the string
generating from the given grammar. Let the
Context Free Grammar is G = (V, T, P, S)
EÆE+E
EÆE*E
E Æ id
The string of a language is w= id + id * id
Using left most derivation for generation of
string from the given grammar is [2]
1) E Æ E + E
Æid + E
Æ id + E * E
Î id + id * E
Î id + id * id
2) E Æ E * E
Î E+E*E
Î id + E * E
Î id + id * E
Î id + id * id
There are two ways to generate the string “id
+ id * id” from a given grammar, then the
given grammar is said to be ambiguous.
Fig 1 and Fig 2 are the two Derivation
Trees/Parse Trees for generating the same
given string from the given grammar
The Context Free Grammar is ambiguous.
:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
2.1 Removal of Ambiguity of
CFG
2.2 Chomsky Normal Form of
Grammar
The ambiguous grammar is converted into
unambiguous means only one way to
generate the given string of language from
the given grammar or having only one
derivation tree for a string from the given
grammar[2].
The Context Free Grammar is said to be
Chomsky Normal Form of Grammar only if
the right hand side of a production must
have two non-terminal symbols or single
terminal symbol then that grammar is CNF
grammar
EÆE+E
The two same non-terminal symbols appears
on right hand side of a production, replace
one non-terminal by any other new nonterminal symbol T, we get production as
EÆE+T|T
Each non-terminal symbol E is replace by T
EÆE*E
TÆT*T
again two same non-terminal symbols T
appears on right hand side of a production,
replace one non-terminal by any other new
non-terminal symbol F, we get production as
S Æ AB | a
For every CFG, there is an equivalent
grammar G in Chomsky Normal Form[3]
Construction of grammar in CNF
Step 1: Eliminate Null Productions and Unit
Productions
Step 2: Eliminate terminals on right hand
side of productions as follows
i)
ii)
TÆT*F|F
E Æ id
Non-terminal E is replace by T and T is
replace by F, we get production as
F Æ id
The unambiguous Context Free Grammar is
EÆE+T|T
TÆT*F|F
F Æ id
All the productions in P of the
form A Æ a and F Æ BC are
included
Consider A Æ w1w2 ----wn will
some terminal on right hand side
then wi is replace by any new
non-terminal symbol, add new
production as X Æ wi
Repeat same for all terminal
symbols
Step 3: Restricting the number of nonterminal symbol on the right hand side as
follows
i)
Consider A Æ A1A2------An
Introduce new non-terminal
Symbol T ÆA1A2, only two
Non-terminals on the right hand
side of production.
:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
3. CYK Algorithm
The Cocke–Younger–Kasami algorithm
(CYK) is a parsing algorithm for Context
Free Grammar. The structure of the
rules/productions of a CFG is in a Chomsky
Normal Form (CNF). CYK uses a dynamic
Programming or table filling algorithm. The
CYK algorithm is used to find whether the
given string is a member of grammar [1][2].
E Æ E R3
E Æ E R4
R3 Æ R1 E
R4 Æ R2 E
R1 Æ +
R2 Æ *
E Æ id
Let w be the string then w is in L (G)
begin
For i = 1 to n do
Vi1 = {A Æ a is a production and the ith
symbol of x is a}
For j = 2 to n do
For i = 1 to n-j+1 do
begin
X1, 2 = (Xi,i , Xi+1, j ) = (X1, 1 , X2,2 )
Vij = Ø;
= {E} { R1}
= {E R1 } = {Ø}
For k = 1 to j-1 do
X2,3 = (X2,2 , X3,3 ) = { R1} {E} = { R1 E }
= { R3}
Vij = Vij Ú {A Æ BC is a production, B is
in Vik and C is in Vi+k, j-k}
X3,4 = (X3,3 , X4,4 ) = {E} { R2} = { E R2}
={Ø}
end
X4,5 = (X4,4 , X5,5 ) = { R2} {E} = { R2 E} = { R4 }
X1,3 = (X1,1 , X2,3) U (X1,2 , X3,3)
4. Ambiguity of CFG using
CYK
Example 1:
The Context Free Grammar is
E Æ E+E
E Æ E*E
E Æ id
String: “id + id * id”
Converted into CNF Grammar is
= ({E} { R3}) U ({ Ø } {E} = { E R3} U {E}
= {E }
X2,4 = (X2,2 , X3,4) U (X2,3 , X4,4)
= ({ R1} { Ø }) U ({ R3 } {R2})
= { R1} U ({ R3 R2})
={Ø}
X3,5 = (X3,3 , X4,5) U (X3,4 , X5,5)
:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
= ({E} {R4}) U ({ Ø } {E})
R7 Æ R5 R6
= { E R4 , E}
R1 Æ i
={E}
R2 Æ t
X1,4 = (X1,1 , X2,4) U (X1,2 , X3,4) U (X1,3 , X4,4)
= ({E}, { Ø }) U ({ Ø }, { Ø }) U ( {E} , {R2})
R3 Æ e
CÆb
= { E } {E, R2 } = { Ø }
X2,5 = (X2,2 , X3,5) U (X2,3 , X4,5) U (X2,4 , X5,5)
= ({R1}, { E }) U ({ R3 }, { R4}) U ( { Ø } , {E})
= { R1 E , R3 R4 , E } = { R3}
X1,5 = (X1,1 , X2,5) U (X1,2 , X3,5) U (X1,3 , X4,5) U
(X1,4 , X5,5)
= ({E} { R3 }) U ({ Ø }, {E}) U ( { E } , {R4} U
{ Ø} {E})
= { ER3 , E, ER4 , E }
= { E, E}
Grammar is Ambiguous as it contains two times
Start Symbol ‘E’ in the Cell X1, n (‘n’ is length
of a string)
Example 2:
Context Free Grammar is
S Æ iC t S | i C t S e S | a
CÆb
String: “ i b t i b t a e a”
CNF Grammar is
S Æ R4 R5 | R4 R7 | a
R4 Æ R1 C
R5 Æ R2 S
R6 Æ R3 S
Grammar is Ambiguous as it contains two times
Start Symbol ‘S’ in the Cell X1, n (‘n’ is length
of a string)
:RUOG&RQIHUHQFHRQ)XWXULVWLF7UHQGVLQ5HVHDUFKDQG,QQRYDWLRQIRU6RFLDO:HOIDUH:&)75¶
5. Nathan
5. Conclusion
The CYK algorithm is used to check the
given string of a language is member of a
grammar. The given string is parsed using
dynamic programming or table filling
algorithm. If start symbol of a grammar is
appeared in top cell of first column of a
triangular table then the string is member of
language generated by a grammar. CYK
algorithm is only the membership algorithm.
In this paper, we have found out the given
Context Free Grammar is ambiguous or not
using CYK algorithm. If the start symbol of
a grammar is appeared two times in top cell
of first column of a triangular table (X1, n)
then the given Context Free Grammar is
ambiguous.
6. References
1. Shamshad
Ali,
“CYK
Algorithm”,
International Journal of Scientific
Research Engineering & Technology
(IJSRET), Volume 1 Issue 5 pp 001004 August 2012
2. Hopcroft, Ullman, “Introduction to
Automata Theory, Languages and
Computation”, Pearson
Education.
3. K.L.P.
Mishra
and
N.
Chandrasekaran,
“Theory
of
Computer
Science:
Automata,
Languages and
Computation”, PHI
4. Yuqiang Sun, Lei Zhou, Qiwei He,
Yuwan Gu, Liang Jia, “Algorithm of
Word-Lattice Parsing Based on
Improved CYK-algorithm”, 2010
International Conference on Web
Information Systems and Mining
Bodenstab,
“Efficient
Implementation of
the CYK
Algorithm”.
6. Xinying Songy, Shilin Dingx , ChinYew Linz, “Better Binarization for
the CKY Parsing”
7. Zsolt Tóth, László Kovács, “CFG
Extension for META Framework”,
INES 2012 , IEEE 16th International
Conference
on
Intelligent
Engineering Systems , June 13–15,
2012, Lisbon, Portugal
8. Xiao Yang, Jiancheng Wan, Ling
Zhang, “Arithmetic Computing
Based Chinese Automatic Parsing
Method”, Eighth ACIS International
Conference
on
Software
Engineering, Artificial Intelligence,
Networking, and Parallel/Distributed
Computing
9. Huong Thanh Le, Lam Ba Do,
Nhung Thi Pham,” Efficient
Syntactic Parsing with Beam
Search”.
10. Gend Lal Prajapati, Aditya Jain,
Mayank Khandelwal, Pooja Nema,
Priyanka Shukla, “On The Inference
of Context-Free Grammars Based On
Bottom-Up Parsing and Search”,
Second International Conference on
Emerging Trends in Engineering and
Technology, ICETET-09
11. Paweá Skórzewski, “Effective natural
language parsing with probabilistic
Grammars”, Proceedings of the
International Multiconference on
Computer Science and Information
Technology pp. 501–504ISBN 97883-60810-27-9ISSN 1896-7094
12. Richard E. Stearns and Harry B.
Hunt, “On the equivalence and
Containment
Problems
for
unambiguous Regular Expression,
Grammar and Automata,CH16956/81/0000/0074$OO.75 1981 IEEE.
Author Guidelines for 8.5 x 11-inch Proceedings Manuscripts
Author(s) Name(s)
Author Affiliation(s)
E-mail
Abstract
4. Performance Modeling (result)
The abstract is to be in fully-justified italicized text,
at the top of the left-hand column as it is here, below
the author information. Use the word “Abstract” as
the title, in 12-point Times, boldface type, centered
relative to the column, initially capitalized. The
abstract is to be in 10-point, single-spaced type, and
up to 150 words in length. Leave two blank lines after
the abstract, then begin the main text.
1. Introduction and Motivation
You need to introduce the topic and why it was chosen
5. Conclusion
10. References
List and number all bibliographical references in 9point Times, single-spaced, at the end of your paper.
When referenced in the text, enclose the citation
number in square brackets, for example [1]. Where
appropriate, include the name(s) of editors of
referenced books.
2. Related Work
3. Proposed Solution
You also need to add your solution here or what you
think of the proposed solution
[1] A.B. Smith, C.D. Jones, and E.F. Roberts, “Article Title”,
Journal, Publisher, Location, Date, pp. 1-10.
[2] Jones, C.D., A.B. Smith, and E.F. Roberts, Book Title,
Publisher, Location, Date.
...