PRINCIPLES OF COMPILER DESIGN – CS1352
Class: III CSE Batch: 2004 - 2008
UNIT II – SYNTAX ANALYSIS
2 mark Questions
1.What is BNF?
Every programming language has rules that prescribe the syntactic structure of well-formed programs.
The syntax of programming language constructs can be described by context free grammars or Backus-Naur Form (BNF) notation.
2. Advantages of grammars
Grammars offer significant advantages to both language designers and compiler writers.
Ø Easy to understand.
Ø Automatically construct an efficient parser that determines if a source program is syntactically well formed.
Ø Useful for the translation of source programs into correct object code and for the detection of errors.
Ø Languages evolve over a period of time, acquiring new constructs and performing additional tasks.
3. Goals of Error Handler in a Parser
The Error Handler in a parser has simple to state goals
Ø It should report the presence of errors clearly and accurately.
Ø It should recover from each error quickly enough to be able to detect subsequent errors.
Ø It should not significantly slow down the processing of correct programs.
4. Define Viable Prefix Property.
Viable Prefix property meaning they detect that an error has occurred as soon as they see a prefix of the input that is not a prefix of any string in the language.
5.Error Recovery Strategies in a Parser.
There are many different general strategies that a parser can employ to recover from a syntactic error.
Ø Panic mode.
Ø Phrase level
Ø Error productions
Ø Global correction
6. Define
context free grammar:
A context free grammar is a formal notation for expressing such recursive definitions of languages.G=(V,T,P,S)
A grammar consists of one or more variables that represent classes of strings i.e., languages.
production:
The productions of a grammar specify the manner in which the terminals and nonterminals can be combined to form strings.
Each production consists of a nonterminals can be combined to form strings. Each production consists of a nonterminal, followed by an arrow, followed by a string of nonterminals and terminals.
Production is a finite set of rules that represent the recursive definition of a language.
Sentence of a grammer:
Set of all possible strings of terminal that can be accepted by the grammar. A sentence is a sentential form with no nonterminal
sentential form:
Derivations from the start symbol produce strings that have a special role. These
strings consists of both terminal and nonterminal which are intermediateries of derivation. This is called sentential forms.
There are left and right sentential forms.
Equivalent grammer:
Two grammar that are able to derive same set of strings.
context free language;
If G(V,T,P,S) is a CFG, the language of G,denoted L(G), is the set of terminal
strings that have a derivations from the start symbol.
If a language L is the language of some context free grammar, then L is said to be a context free language.
derivation:
Expand the start symbol using one of its productions and further expand the resulting string by replacing one of the variables by the body of one of its productions and so on, until we derive a string consisting entirely of terminals.
The language of the grammar is all strings of terminals that we can obtain in this way. This use of grammar is called Derivation
Left Most (LM) derivation:
To restrict the number of choices we have in deriving a string, it is often useful to require that at each step we replace the leftmost variable by one of its production bodies. Such a derivation is called a leftmost derivation.
Right Most (RM) derivation (canonical derivations):
To restrict the number of choices we have in deriving a string, it is often useful to require that at each step we replace the rightmost variable by one of its production bodies. Such a derivation is called a rightmost derivation.
Parse tree or derivation tree:
A parse tree may be viewed as a graphical representation for a derivation that filters out the choice regarding replacement order.
There is a representation for derivations that has proved extremely useful. The symbols of a terminal string are grouped into substrings, each of which belongs to the language of one of the variables of the grammar. The tree is known as parse tree.
yield of a parse tree or frontier of the tree
The leaves of the parse tree are labeled by nonterminals or terminals and read from left to right, they constitute a sentential form, called the yield or frontier of the tree.
The yield is a terminal string. that is all leaves are labeled either with a terminal or
with є.
Parsing:
Process of finding a parse tree for a given string of tokens
7. Define the four components of CFG
The four components of CFG are (V, T, P, S).
Non terminal(V)-finite set of variable. Each represents set of strings.
Terminal(T)-finite setoff symbols that form the strings of the language being defined.
Production(P)-finite set of production specify the manner in which the terminal and nonterminal can be combined to form the string
Start symbol(S)-one of the variables represents the language being defined; it is called the start symbol.
8.List out the properties of parse tree.
Ø Each interior node of a parse tree is labeled by some nonterminal A.
Ø The children of the node are labeled, from left to right, by the symbols in the right side of the production by which this A was replaced in the derivation.
The leaves of the parse tree are labeled by nonterminals or terminals and read from left to right, they constitute a sentential form, called the yield or frontier of the tree.
9.Define Ambiguous grammar.
A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
An ambiguous grammar is one that produces more than one leftmost or more than one rightmost derivation for the same sentence.
10. How will u show a grammer is ambiguous?
Find a token string that has more than one parse tree
11. Why use regular expressions to define the lexical syntax of the language?
Ø The lexical rules of a language are frequently quite simple, and to describe them we donot need a notation as powerful grammars.
Ø Regular expressions generally provide a more concise and easier to understand notation for token than grammars.
Ø More efficient lexical analyzers can be constructed automatically from regular expressions than from arbitrary grammars.
Ø Separation the syntactic structure of a language into lexical and nonlexical parts provides a convenient way of modularizing the front end of a compiler into two manageable sized components.
12. How the language generated by a grammar is verified?
A proof that a grammar G generates a language L has two parts: we must show that every string generated by G is in L, and conversely that every string in L can indeed be generated by G.
13. Define left recursion and right recursion? draw diagram
Left Recursion: A ->Aa. In this production leftmost symbol on the right side is same as the nonterminal on the left side of the production. Here the procedure ‘A’ is called recursively and the parser loops forever causes the infinite loop is known as left recursion.
Right Recursion: R -> aR. In this production leftmost symbol on the right side is same as the nonterminal on the Right side of the production. Here the procedure ‘R’ is called recursively and the parser loops forever causes the infinite loop is known as Right recursion.
14. How the Ambiguity is removed from the grammar?
By doing
- State a rule that specifies in each ambiguous case which of the parse tree is the correct one. Such a rule is called a disambiguating rule. It corrects the ambiguity without changing the grammar
- Associativity and precedence of the arithmetic operator
- Change the grammar into a form that forces the construction of the correct parse tree.
- left recursion elimination
- left factoring.
15. How the left recursion is eliminated from the grammar?
Top down parsing methods cannot handle left recursive grammars, so a transformation that eliminates left recursion is needed.
Left recursive pair of productions A->Aα | β could be replaced by the non left recursive productions
A-> βA’
Á-> α A’ | є
Without changing the set of string derivable from A.
16.Define left factoring with examples.
Left factoring is a grammar transformation that is useful for producing a grammar suitable for predictive parsing. The basic idea is that when it is not clear which of two alternative productions to use to expand a nonterminal A, we may be able to rewrite the A-productions to defer the decision until we have seen enough of the input to make the right choice.
Case i) A-> ab1 | ab2
which is replaced as A-> aA’
A’->b1 | b2
Case ii)A-> ab1 | ab2 |......| abn | g
which is replaced as A-> aA’| g
A’->b1 | b2 |.....| bn
Eg: S->iEtS | iEtSeS | a
E->b
Ans: S->iEtSS’ | a
S’->eS | Є
E->b
17. Give the two different types of parsing and its definition.
Two different types of parsing are
Top down parsing:
In this type of parsing construction starts at root that is start symbol and proceeds towards the leaves that is terminal of the given string.
Bottom up parsing:
In this type of parsing construction starts at the leaves (given string) and proceeds towards the root (Start symbol).
18)List out the types of top down parsing.
The different types of top down parsing are
➔ Recursive Descent parsing – may involve backtracking
➔ Predictive parsing
- Recursive Predictive parsing
Uses a set of recursive procedures to recognize its input with no
backtracking
- Non recursive Predictive Parsing or predictive parsing
Uses an input buffer, stack and parsing table with predictive
parsing algorithm
19)Define back tracking.
Returning to a previous state in a computation in order to try an alternative strategy. In terms of a search tree, backtracking occurs when the end of a branch is reached without a solution being found, and a different branch is chosen.
Selection of a production for nonterminal may involve trial and error ie try a production and backtrack to try another production if the first is found to be unsuitable.
20)Define
recursivedescentparsing:
It is a general form of top down parsing ,that involve backtracking ,that is making repeated scans of the input. It is an attempt to construct a parse tree for the input starting from the root and creating the nodes of the parse tree in preorder.
recursive predictive parsing:
It is a general form of top down parsing ,that involve no backtracking in which we execute a set of recursive procedures to process the input.
non recursive predictive parsing:
It is possible to build a non recursive predictive parser by maintaining a stack explicitly rather than via recursive calls.
FIRST of a grammmar symbol:
If
a is any string of grammar symbols, let FIRST(
a) be the set of terminals that begins the strings derived from
a.
If a * e, then e is also in FIRST(a).
FOLLOW of a grammmar symbol:
FOLLOW(A),for nonterminal A,to be set of terminals a, that can appear immediately to the right of A in some sentential form, that is, the set of terminals a such that there exists a derivation of the form S => aAab for some a and b.
LL(1) grammar:
A Grammar G which produces a parsing table M using predictive parser with no multiply defined entry is said to be LL(1) grammar. The first “L” in LL(1) stands for scanning the input from left to right ,the second “L” for producing a leftmost derivation ,and the 1 for using one input symbol of lookahead at each step to make parsing action decision.
21)List out the steps in predictive parser.
a. Find unambiguous grammar by using elimination of left recursion and left factoring
b. Find FIRST & FOLLOW for every Nonterminal from the unambiguous grammar.
c. Construct the parsing table by using FIRST & FOLLOW
d. Then parse the given string using parsing table and predictive parsing algorithm.
e. Give the result that the given string was accepted by the grammar or not.
22)List out the properties of LL(1) or when the grammar is said to be LL(1).
a. Predictive parsing table should not have multiply defined entries
b. No ambiguous or left recursive grammar can be LL(1).
c. For no terminal ‘a’ do both a and b derive strings beginning with ‘a’.
d. At most one of a and b can derive the empty string.
e. If b -> e, then a does not derive any string beginning with a terminal in FOLLOW(A).
23)”Multiply defined entries in parsing table” what it mean?
If there are more than one production in parsing table at the the intersection a row and a column then it is said that there exist multiply defined entries in parsing table. For example, if grammar G is left recursive or ambiguous, then parsing table M will have atleast one multiply-defined entry.
24)What are the difficulties with the top down parsing (or) predictive parsing technique.
The difficulties are:
- In writing a grammar for the source language such that a predictive parser can be constructed from the grammar.
- Although left-recursion elimination and left factoring are easy to do ,they make the resulting grammar hard to read and difficult to use for translation purposes.
25. List out the error recovery actions in predictive parsing technique.
The error recovery actions in predictive parsing are:
Panic mode error recovery: It is based on the idea of skipping symbols on input until a token in a selected set of synchronizing tokens appears.
- Add all symbols in FOLLOW(A) in to the synchronizing set for nonterminal A
- Add key words that begin statements to the synchronizing set for the nonterminal A
- Add to the synchronizing set of a lower construct the symbol that begin higher construct
- If a nonterminal can generate the empty string, then the production deriving e can be used as default.
- If a terminal on top of the stack cannot be matched, a simple idea is to pop the terminal. This approach takes the synchronizing set of a token to consist of all other tokens.
Phrase level Recovery: Phrase level recovery is implemented by filling in the blank entries in the predictive parsing table with pointers to error routines.
26.What type of derivation used in top down and bottom up parsing method..
Top down parsing uses leftmost derivation and bottom up parsing uses Rightmost derivation.
27. Define
Handle:
A “handle” of a right sentential form g is a position A->B and a position of g where the string b may be found and replaced by A to produce the previous right sentential form in a right most derivation g.
Eg: Grammar: s->aABe
A->Abc|b
B->d
Sentential form
=>abbcde
=>aAbcde
=>aAde
=>aABe
=>S
In the example above, abbcde is a right-sentential firm whose handle is A->b at position 2.
Pruning the Handle: Reducing b to A in abw is called handle pruning. A right most derivation in reverse can be obtained by handle pruning.
Viable prefix
The set of prefixes of right sentential forms that can appear on the stack of a shift reduce parser are called viable prefixes
Operator grammar
- No production should have e on right side
- It should not have two adjacent non terminal
A grammar with the above property is called operator grammar
28.List out the types of bottom up parsing method.
- Shift reduce parsing
- Operator precedence parsing
- LR parsing
- Simple LR (SLR)
- Canonical LR (CALR)
- Look ahead LR (LALR)
29.What are the actions available in stack implementation of SR parser?
There are 4 possible actions a shift-reduce parser can make :
1. shift :In a shift action , the next input symbol is shifted onto the top of the stack.
2. reduce : In a reduce action , the parser knows the right end of the handle is at the top of the stack.It must then locate the left end of the handle within the stack and decide with what nonterminal to replace the handle.
3. accept :In accept action, the parser announces successful completion of parsing.
4. error :In error action,the parser discovers that a syntax error has occurred and calls an error recovery routine.
30. In what way stack is suitable for SR parser?
- Handle will always eventually appear on top of the stack, never inside.
- This becomes obvious when we consider the possible forms of two successive steps in any right most derivation
- It is not required to in to the stack to find the handle.
This aspect of handle pruning makes a stack particularly convenient data structure for implementing a shift reduce parser.
31.List out the conflicts in SR parsing and define
Parser, knowing the entire stack contents and the next input symbol, but
- cannot decide whether to shift or to reduce which is known as shift/reduce conflict
- cannot decide which of several reductions to make is known as reduce/reduce conflict
32.Give the two methods of determining the precedence relations between a pair of terminals.
- Based on associativity and precedence relations
- Construct an unambiguous grammar for the language, which reflects the correct associativity and precedence in its parse tree.
33. Advantage and disadvantage of operator precedence parser.
DisAdvantage:
- Hard to handle token like unary minus
- Worse, since relation between a grammar for the language being parsed and the operator precedence parser itself is week. That is cannot always be sure the parser accepts exactly the desired language.
Advantage:
- Because of its simplicity numeraous compiler using operator precedence parsig technique.
- Can have been built for the entire language.
34. What is the use of precedence function?
compilers using operator-precedence parsers need not store the table of precedence relations.
In most cases the table can be encoded by two precedence functions f and g that map terminal symbols to integers.we attempt to select f and g so that , for symbols a and b,
· f (a)<g(b) whenever a<·b,
· f(a)=g(b) whenever a= b,and
· f(a)>g(b)whenever a·>b.
Thus the precedence relation between a and b can be determined by a numerical comparison between f(a) and g(b).
35.How will you find the handle in operator precedence parser?
- Scan the string from the left end until the first > is encountered.
- Then scan backwards over any = until a < is encountered.
- The handle contains everything to the left of the first ·> and to the right of the <·, including any intervening or surrounding nonterminals.
36. Error Recovery actions in operator precedence parsing.
It detect error when
- No precedence relation holds between the terminal on top of the stack and the current input
- A handle has been found, there is no production with this handle as right side.
Handling error during reduction:
As there is no production to reduce by no semantic actions ae taken; a diagnostic message is printed.
To determine what the diagnostic should say the routine handling must decide what production the right side being popped “looks like”
Handling error during Shift reduce: (Blank entry in the table)
When there is no relation between the top of the stack and the current symbol, modify the stack, input or both.
Change the input symbol, insert symbols on to the input or stack, or delete the symbols from the input or stack.
36.Give the advantages and disadvantages of LR parser
Advantage:
a. can be constructed to recognize virtually all programming language constructs for which context free grammars can be written
b. The class of grammars that can be parsed is a proper superset of the class of grammars that can be parsed with predictive parsers.
c. It is the most general non backtracking shift reduce parsing method, can be implemented efficiently.
d. It can detect a syntactic error
Disadvantage:
a. Too much work to construct an LR parser by hand for a typical programming language grammar
b. Need an specialized tool like an LR parser generator.
37.Give the 3 techniques to implement LR parser
- Simple LR
- Canonical LR
- Look ahead LR
38. Compare the types of LR parsers.
Simple LR:
- Easiest to implement
- Least powerful of three
- May fail to produce a parsing table for certain grammar on which other method succeed.
- No of states are less so table size will be small
Canonical LR:
- Most expensive to implement
- Most powerful
- No of state are large there by increasing Table size
Look ahead LR:
- Easy to implement
- Intermediate in power and cost
- Work on most programming language grammars and with some effort can be implemented efficiently.
- LALR tables will be same as SLR table and considerably smaller then CALR table
39. Steps in LR parser
- Write all productions separately and Give number for all productions
- Find Augmented grammar.
- Find Closure for the augmented grammar.
- Find set of items finding goto
- Find FIRST and FOLLOW for the given grammar. (only in case of Simple LR)
- Construct LR parsing table by filling actions and goto entries in the table.
- Parse the given string using LR parsing algorithm.
- Give the result whether the given string can be accepted by the grammar or not.
40. Define
LR grammar:
LR(k) stands for Left to right Right most derivation with ‘k’ look ahead symbols.
A grammar for which we can construct a LR parsing table is said to be an LR grammar.
A grammar that can be parsed by an LR parser examining up to k input symbols on each move is called an LR(k) grammar
LR(0) item: of a grammar G is a production of G with a dot at some position of the right side. So A ® XYZ yields
A ® ×XYZ
A ® X×YZ
A® XY×Z
A® XYZ ×
Item: It can be represented by a pair integers, the first giving the number of the production and the second giving the position of the dot.
Augmented grammar:
If G is grammar with start symbol S, then G’, the augmented grammar for the G is the G with new start symbol S’ and production S’ ® S.
The purpose of new start symbol is to indicate to the parser when it should stop parsing and announce acceptance of the string.
Closure:
Go To:
Kernel item: Which includes the initial item S’ ® ×S and all items whose dots are not at the left end.
Non Kernel item: It includes the items which have their dots at the left end.
LR parser:
It is the parsing the technique that does the left to right scanning of the input symbols for constructing a right most derivation in reverse and the k input symbols of lookahead symbols that are used in making parsing decisions
38. Error recovery actions in LR parser.
39. Define
LL(1) grammar:
If there is no parsing action conflicts or multiply defined entries in predictive parsing table, then the given grammar is called an LL(1) grammar
LR(1) grammar or CALR(1) grammar:
If there is no parsing action conflicts or multiply defined entries in canonical LR(1) parsing table, then the given grammar is called an LR(1) grammar.
An LR parser using this table is called a canonical LR(1) parser.
SLR(1) grammar:
If there is no parsing action conflicts or multiply defined entries in SLR(1) parsing table, then the given grammar is called an SLR(1) grammar.
LALR(1) grammar:
If there is no parsing action conflicts or multiply defined entries in LALR(1) parsing table, then the given grammar is called an LALR(1) grammar.
40. problems in
- Find unambiguous grammar for the given ambiguous grammar.
- Find the right derivation of the string using the given grammar
- Find the left derivation of the string using the given grammar.
- Find FIRST & FOLLOW for the given grammar.
41. Give the algorithm for
- FOLLOW
- FIRST
- GOTO
- CLOSURE
- Elimination of left recursion
- Left factoring
PART B
- problems in
- Construct predictive parser for the given grammar and parse the given string
- Parse the given string using Shift reduce parser using the given grammar
- Construct Operator precedence parser for the given grammar and parse the given string
- Construct Precedence function for the given grammar and parse the given string
- Construct SLR / LALR / CALR parser for the given grammar and parse the given string
- Check whether the given grammar is LL(1) / LR(1) / SLR(1) / CALR(1) / LALR(1).
- Explain error recovery actions in
- Predictive parser.
- Operator precedence parser.
- LR parser.
- What are the preliminary steps that are to be carried out during parsing? Explain with suitable example.
- What are the necessary conditions to be carried before the construction of predictive parser?
- What is shift reduce parser? Explain in detail about the conflicts that may occur during shift reduce parsing?
- Explain following parsing algorithm with example
a. Predictive parsing or LL parsing
b. Recursive descent parsing
c. Operator precedence parsing
d. Shift reduce parsing
e. LR parsing