participate


Generics - Context-free LR(k) parsing of generics?
<<   Back to Forum  |   Give us Feedback
This topic has 10 replies on 1 page.
eblake
Posts:15
Registered: 6/24/99
Context-free LR(k) parsing of generics?   
Nov 6, 2002 4:30 PM

 
Has anyone been able to develop a syntax-free LR(k) (for finite k) grammar for parsing generics? The proposed grammar in spec8.pdf is hideous, as it has several ambiguities and even some typos. The two biggest problems I have in a context-free parser is deciding whether >> should be one or two tokens, and whether a<b starts a parameterized type name or is a relational expression. I know these two problems can be solved by writing a context-sensitive parser, but I would prefer not to do that if possible. I assume (without looking at the sources, because I'm trying to do a clean-room implementation) that the prototype compiler is using a context-sensitive parser.

I also am interested if any of the JSR folks would be willing to let the public know where the current syntax of the JSR or Java 1.5 differs from the ideas in the public draft. Remember the last minute change of the assert syntax (from "assert(a, b)" to "assert a:b"), 11 months after the public draft closed, made in order to acheive LR(1) parsing?

I've come close to writing an LR(2) grammar that gets everything except for casts. And I may even be able to trim that to LR(1) without too much headache. The problem with casts, however, is shown below:

Suppose an LR(2) parser is at "a" on this sequence:
class A { void m() { Object o = ( a < b

We could reduce "a" to ClassOrInterface, and ultimately have a cast expression, as in:
class A { void m() { Object o = ( a < b > ) null; } }

CastExpression : ( ReferenceType ) UnaryExpressionNotPlusMinus
ReferencType : ClassOrInterfaceType
ClassOrIterfaceType : ClassOrInterface TypeArguments
ClassOrInterface : Identifier(a)
TypeArguments : < ReferenceTypeList >
ReferenceTypeList : ReferenceType
ReferenceType : ClassOrInterfaceType
ClassOrInterfaceType : ClassOrInterface
ClassOrInterface : Identifier(b)

Or we could reduce "a" to Name, and ultimately have some expression, as in:
class A { void m() { Object o = ( a < b ) ""; } }

AdditiveExpression : AdditiveExpression
MultiplicativeExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Primary
Primary : ( Expression )
Expression : AssignmentExpression
AssignmentExpression : ConditionalExpression
ConditionalExpression : ConditionalOrExpression
ConditionalOrExpression : ConditionalAndExpression
ConditionalAndExpression : InclusiveOrExpression
InclusiveOrExpression : ExclusiveOrExpression
ExclusiveOrExpression : AndExpression
AndExpression : EqualityExpression
EqualityExpression : RelationalExpression
RelationalExpression : RelationalExpression < ShiftExpression
RelationalExpression : ShiftExpression
ShiftExpression : AdditiveExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Name
Name : Identifier(a)
ShiftExpression : AdditiveExpression
AdditiveExpression : MultiplicativeExpression
MultiplicativeExpression : UnaryExpression
UnaryExpression : UnaryExpressionNotPlusMinus
UnaryExpressionNotPlusMinus : PostfixExpression
PostfixExpression : Name
Name : Identifier(b)

And since an infinite number of tokens can appear in both ReferenceTypeList and ShiftExpression, there is no finite way to determine which reduction to choose.
 
gafter
Posts:669
Registered: 6/25/98
Re: Context-free LR(k) parsing of generics?   
Nov 6, 2002 5:43 PM (reply 1 of 10)  (In reply to original post )

 
Has anyone been able to develop a syntax-free LR(k)
(for finite k) grammar for parsing generics? The
proposed grammar in spec8.pdf is hideous, as it has
several ambiguities and even some typos. The two
biggest problems I have in a context-free parser is
deciding whether >> should be one or two tokens,

This question is explicitly handled in the spec. It is always
one token.

See http://www.cs.princeton.edu/~appel/modern/java/CUP/
for how Scott Ananian handled things.
 
eblake
Posts:15
Registered: 6/24/99
Re: Context-free LR(k) parsing of generics?   
Nov 7, 2002 12:12 PM (reply 2 of 10)  (In reply to #1 )

 
I am not interested in a parser generator targetting Java (I'm working with the Jikes compiler, where the parser generator targets C++). However, the ideas in CUP might be useful, if I ever look at it.

I concede that the grammar in the public draft, WHEN CORRECTED, will correctly parse a<b<c>> when the tokenizer always treats >> as one token (it makes the grammar messy, but it worked). Below, I list the LALR(1) grammar that I have so far, with comments on the rules I had to fix so far. This grammar still does not allow (a<b>)o (as I stated earlier, this is highly ambiguous), nor does it allow a<b>.class (but the public draft did not mention this as being legal, and I think it would return the erased type anyway, so the user should have just used a.class).

If anyone can produce an LR(k) way to allow parameterized type casts, I'd love to see it!

By the way, there was a BugParade entry I once saw that suggested an alternate syntax for casts. Of course, the original syntax would have to be preserved for casts to raw types, so that the new compiler can still parse old code. But since parameterized types are new syntax, casting to a parameterized type could safely require new syntax. The proposal was that casts be allowed with this syntax: myvar.(MyClass). The semantics are that the compile type of expression myvar is cast to type MyClass. Notice that you can then write CastExpression as:

CastExpression : '(' PrimitiveType Dimsopt ')' UnaryExpression
| '(' Expression ')' UnaryExpressionNotPlusMinus
| '(' Name Dims ')' UnaryExpressionNotPlusMinus
| UnaryExpressionNotPlusMinus '.' '(' ReferenceType ')'
| UnaryExpression '.' '(' PrimitiveType ')'

Now parameterized type cases would have to use the new form, but you no longer have grammar ambiguity. And since casts to parameterized types should be rare (after all, the point of generics is to reduce the number of explicit casts), it is not too much to ask for this new syntax.


This grammar started from the LALR(1) version in JLS1 chapter 18. It
has been tweaked to correctly parse everything of JLS2, and to add
generics (JSR 14). Everything added or modified for generics is
preceded by the comment GENERIC. Some rules of JLS1 were simplified in the interest of a simpler parse tree with code reuse, which occaisionally requires more work during semantic processing.

Goal : CompilationUnit
Literal : 'IntLiteral'
| 'LongLiteral'
| 'FloatLiteral'
| 'DoubleLiteral'
| BooleanLiteral
| 'CharLiteral'
| 'StringLiteral'
| 'null'
BooleanLiteral : 'true'
| 'false'
Type : PrimitiveType
| ReferenceType
PrimitiveType : NumericType
| 'boolean'
NumericType : IntegralType
| FloatingPointType
IntegralType : 'byte'
| 'short'
| 'int'
| 'long'
| 'char'
FloatingPointType : 'float'
| 'double'
ReferenceType : ClassOrInterfaceType
| ArrayType

// I used ClassOrInterface instead of Name, to allow things like A<B>.C
// GENERIC
ClassOrInterfaceType : ClassOrInterface
| ClassOrInterface '<'Identifier' instead of Name, to have a distinction between
// expression names and type names (since only type names can have type
// arguments).
// GENERIC
ClassOrInterface : 'Identifier'
| ClassOrInterfaceType '.' 'Identifier'

// The ArrayType rule makes more sense like this.
ArrayType : PrimitiveType Dims
| ClassOrInterfaceType Dims

Name : 'Identifier'
| Name '.' 'Identifier'
CompilationUnit : PackageDeclarationopt ImportDeclarationsopt
TypeDeclarationsopt
ImportDeclarations : ImportDeclaration
| ImportDeclarations ImportDeclaration
TypeDeclarations : TypeDeclaration
| TypeDeclarations TypeDeclaration
PackageDeclaration : 'package' Name ';'
ImportDeclaration : SingleTypeImportDeclaration
| TypeImportOnDemandDeclaration
SingleTypeImportDeclaration : 'import' Name ';'
TypeImportOnDemandDeclaration : 'import' Name '.' '' ';'
TypeDeclaration : ClassDeclaration
| InterfaceDeclaration
| ';'
Modifiers : Modifier
| Modifiers Modifier
Modifier : 'public'
| 'protected'
| 'private'
| 'static'
| 'abstract'
| 'final'
| 'native'
| 'strictfp'
| 'synchronized'
| 'transient'
| 'volatile'

// GENERIC
ClassDeclaration : Modifiersopt 'class' 'Identifier' TypeParametersopt
Superopt Interfacesopt ClassBody

// I deleted ClassType and InterfaceType for grammar simplicity.
Super : 'extends' ClassOrInterfaceType
Interfaces : 'implements' TypeList

// TypeList is the combination of InterfaceTypeList and ClassTypeList
TypeList : ClassOrInterfaceType
| TypeList ',' ClassOrInterfaceType
ClassBody : '{' ClassBodyDeclarationsopt '}'
ClassBodyDeclarations : ClassBodyDeclaration
| ClassBodyDeclarations ClassBodyDeclaration

// Since initializers are similar, I lumped them together into
// InitializerDeclaration. Likewise, since interface and class members
// are similar, I lumped them into MemberDeclaration. Of course, this
// requires more work on the semantic side of things, but makes
// parsing easier.
ClassBodyDeclaration : MemberDeclaration
| ConstructorDeclaration
| InitializerDeclaration
// Note that type declaration includes ';', the empty member declaration
MemberDeclaration : FieldDeclaration
| MethodDeclaration
| TypeDeclaration
FieldDeclaration : Modifiersopt Type VariableDeclarators ';'
VariableDeclarators : VariableDeclarator
| VariableDeclarators ',' VariableDeclarator
VariableDeclarator : VariableDeclaratorId
| VariableDeclaratorId '=' VariableInitializer
VariableDeclaratorId : 'Identifier' Dimsopt
VariableInitializer : Expression
| ArrayInitializer

// Similar to JLS2 chapter 18, I treat ExplicitConstructorInvocations
// as statements, then perform semantic filtering to make sure they
// only appear in the correct places. Then MethodBody can be shared in
// more places.
MethodDeclaration : MethodHeader MethodBody
| MethodHeader ';'
// To avoid ambiguity with FieldDeclarations, TypeParametersopt and
// ResultType must be expanded inline.
// GENERIC
MethodHeader : Modifiersopt TypeParameters Type MethodDeclarator Throwsopt
| Modifiersopt Type MethodDeclarator Throwsopt
| Modifiersopt TypeParameters 'void' MethodDeclarator Throwsopt
| Modifiersopt 'void' MethodDeclarator Throwsopt
MethodDeclarator : 'Identifier' '(' FormalParameterListopt ')' Dimsopt
FormalParameterList : FormalParameter
| FormalParameterList ',' FormalParameter
FormalParameter : 'final'opt Type VariableDeclaratorId
Throws : 'throws' TypeList
// See comment for MethodDeclaration
MethodBody : Block
InitializerDeclaration : 'static'opt MethodBody

// The public draft is not clear whether constructors can be
// parameterized, but it makes sense to me. TypeParametersopt needs to
// be expanded inline to avoid ambiguities with field declarations.
// GENERIC
ConstructorDeclaration : Modifiersopt ConstructorDeclarator Throwsopt
MethodBody
| Modifiersopt TypeParameters ConstructorDeclarator
Throwsopt MethodBody
ConstructorDeclarator : 'Identifier' '(' FormalParameterListopt ')'
ExplicitConstructorInvocation : 'this' '(' ArgumentListopt ')' ';'
| 'super' '(' ArgumentListopt ')' ';'
| Primary '.' 'super' '(' ArgumentListopt ')' ';'
| Name '.' 'super' '(' ArgumentListopt
')' ';'
// GENERIC
InterfaceDeclaration : Modifiersopt 'interface' 'Identifier' TypeParametersopt
ExtendsInterfacesopt InterfaceBody
ExtendsInterfaces : 'extends' TypeList
InterfaceBody : '{' InterfaceMemberDeclarationsopt '}'
// See comment on MemberDeclaration.
InterfaceMemberDeclarations : MemberDeclaration
| InterfaceMemberDeclarations MemberDeclaration
ArrayInitializer : '{' ','opt '}'
| '{' VariableInitializers ','opt '}'
VariableInitializers : VariableInitializer
| VariableInitializers ',' VariableInitializer
Block : '{' BlockStatementsopt '}'
BlockStatements : BlockStatement
| BlockStatements BlockStatement
// I added ExplicitConstructorInvocation here to make MethodBody more
// portable - this requires extra semantic work for a simpler grammar.
BlockStatement : LocalVariableDeclarationStatement
| Statement
| ClassDeclaration
| ExplicitConstructorInvocation
LocalVariableDeclarationStatement : LocalVariableDeclaration ';'
// To avoid ambiguities between Type and Name, I expanded this
// inline. Quite tricky to get right, especially for things like
// "a<b>.c<d> e;"
// GENERIC
LocalVariableDeclaration : PrimitiveType Dimsopt VariableDeclarators
| Name VariableDeclarators
| Name Dims VariableDeclarators
| Name TypeArguments Dimsopt VariableDeclarators
| Name TypeArguments '.' ClassOrInterfaceType
Dimsopt VariableDeclarators
| 'final' Type VariableDeclarators
Statement : StatementWithoutTrailingSubstatement
| LabeledStatement
| IfThenStatement
| IfThenElseStatement
| WhileStatement
| ForStatement
StatementNoShortIf : StatementWithoutTrailingSubstatement
| LabeledStatementNoShortIf
| IfThenElseStatementNoShortIf
| WhileStatementNoShortIf
| ForStatementNoShortIf
StatementWithoutTrailingSubstatement : Block
| EmptyStatement
| ExpressionStatement
| SwitchStatement
| DoStatement
| BreakStatement
| ContinueStatement
| ReturnStatement
| SynchronizedStatement
| ThrowStatement
| TryStatement
| AssertStatement
EmptyStatement : ';'
LabeledStatement : 'Identifier' ':' Statement
LabeledStatementNoShortIf : 'Identifier' ':' StatementNoShortIf
ExpressionStatement : StatementExpression ';'
StatementExpression : Assignment
| PreIncrementExpression
| PreDecrementExpression
| PostIncrementExpression
| PostDecrementExpression
| MethodInvocation
| ClassInstanceCreationExpression
IfThenStatement : 'if' '(' Expression ')' Statement
IfThenElseStatement : 'if' '(' Expression ')' StatementNoShortIf
'else' Statement
IfThenElseStatementNoShortIf : 'if' '(' Expression ')' StatementNoShortIf
'else' StatementNoShortIf
SwitchStatement : 'switch' '(' Expression ')' SwitchBlock
SwitchBlock : '{' SwitchBlockStatements SwitchLabelsopt '}'
| '{' SwitchLabelsopt '}'
SwitchBlockStatements : SwitchBlockStatement
| SwitchBlockStatements SwitchBlockStatement
SwitchBlockStatement : SwitchLabels BlockStatements
SwitchLabels : SwitchLabel
| SwitchLabels SwitchLabel
SwitchLabel : 'case' ConstantExpression ':'
| 'default' ':'
WhileStatement : 'while' '(' Expression ')' Statement
WhileStatementNoShortIf : 'while' '(' Expression ')' StatementNoShortIf
DoStatement : 'do' Statement 'while' '(' Expression ')' ';'
ForStatement : 'for' '(' ForInitopt ';' Expressionopt ';' ForUpdateopt ')'
Statement
ForStatementNoShortIf : 'for' '(' ForInitopt ';' Expressionopt ';'
ForUpdateopt ')' StatementNoShortIf
ForInit : StatementExpressionList
| LocalVariableDeclaration
ForUpdate : StatementExpressionList
StatementExpressionList : StatementExpression
| StatementExpressionList ',' StatementExpression
AssertStatement : 'assert' Expression ';'
| 'assert' Expression ':' Expression ';'
BreakStatement : 'break' 'Identifier'opt ';'
ContinueStatement : 'continue' 'Identifier'opt ';'
ReturnStatement : 'return' Expressionopt ';'
ThrowStatement : 'throw' Expression ';'
SynchronizedStatement : 'synchronized' '(' Expression ')' Block
TryStatement : 'try' Block Catches
| 'try' Block Catchesopt Finally
Catches : CatchClause
| Catches CatchClause
CatchClause : 'catch' '(' FormalParameter ')' Block
Finally : 'finally' Block

// Because "new int[]{0}[0]" is legal syntax, I split
// ArrayCreationExpression in two.
Primary : PrimaryNoNewArray
| ArrayCreationUninitialized
| ArrayCreationInitialized
PrimaryNoNewArray : Literal
| 'this'
| '(' Expression ')'
| ClassInstanceCreationExpression
| FieldAccess
| Name '.' 'this'
| PrimitiveType Dimsopt '.' 'class'
| Name Dims '.' 'class'
| Name '.' 'class'
| 'void' '.' 'class'
| MethodInvocation
| ArrayAccess
// GENERIC
ClassInstanceCreationExpression : 'new' ClassOrInterfaceType '('
ArgumentListopt ')' ClassBodyopt
| Primary '.' 'new' 'Identifier'
TypeArgumentsopt '(' ArgumentListopt
')' ClassBodyopt
| Name '.' 'new' 'Identifier'
TypeArgumentsopt '(' ArgumentListopt
')' ClassBodyopt
ArgumentList : Expression
| ArgumentList ',' Expression
ArrayCreationUninitialized : 'new' PrimitiveType DimExprs Dimsopt
| 'new' ClassOrInterfaceType DimExprs Dimsopt
ArrayCreationInitialized : 'new' PrimitiveType Dims ArrayInitializer
| 'new' ClassOrInterfaceType Dims ArrayInitializer
DimExprs : DimExpr
| DimExprs DimExpr
DimExpr : '[' Expression ']'
Dims : '[' ']'
| Dims '[' ']'
FieldAccess : Primary '.' 'Identifier'
| 'super' '.' 'Identifier'
| Name '.' 'super' '.' 'Identifier'
MethodInvocation : Name '(' ArgumentListopt ')'
| Primary '.' 'Identifier' '(' ArgumentListopt ')'
| 'super' '.' 'Identifier' '(' ArgumentListopt ')'
| Name '.' 'super' '.' 'Identifier' '(' ArgumentListopt ')'
ArrayAccess : Name '[' Expression ']'
| PrimaryNoNewArray '[' Expression ']'
| ArrayCreationInitialized '[' Expression ']'
PostfixExpression : Primary
| Name
| PostIncrementExpression
| PostDecrementExpression
PostIncrementExpression : PostfixExpression ''
PostDecrementExpression : PostfixExpression '--'
UnaryExpression : PreIncrementExpression
| PreDecrementExpression
| '' UnaryExpression
| '-' UnaryExpression
| UnaryExpressionNotPlusMinus
PreIncrementExpression : '+
' UnaryExpression
PreDecrementExpression : '--' UnaryExpression
UnaryExpressionNotPlusMinus : PostfixExpression
| '~' UnaryExpression
| '!' UnaryExpression
| CastExpression
// GENERIC : I don't know how to cast to a parameterized type - this
// part of the grammar is too ambiguous to easily make LR(k).
CastExpression : '(' PrimitiveType Dimsopt ')' UnaryExpression
| '(' Expression ')' UnaryExpressionNotPlusMinus
| '(' Name Dims ')' UnaryExpressionNotPlusMinus
MultiplicativeExpression : UnaryExpression
| MultiplicativeExpression '
' UnaryExpression
| MultiplicativeExpression '/' UnaryExpression
| MultiplicativeExpression '%' UnaryExpression
AdditiveExpression : MultiplicativeExpression
| AdditiveExpression '' MultiplicativeExpression
| AdditiveExpression '-' MultiplicativeExpression
ShiftExpression : AdditiveExpression
| ShiftExpression '<<' AdditiveExpression
| ShiftExpression '>>' AdditiveExpression
| ShiftExpression '>>>' AdditiveExpression
// Note that since < never operates on booleans, but always produces a
// boolean result, we can rewrite this rule. If we don't rewrite it, then
// the following would be ambiguous: "a instanceof b<c>"
// GENERIC
RelationalExpression : ShiftExpression
| ShiftExpression '<' ShiftExpression
| RelationalExpression '>' ShiftExpression
| RelationalExpression '<=' ShiftExpression
| RelationalExpression '>=' ShiftExpression
| RelationalExpression 'instanceof' ReferenceType
EqualityExpression : RelationalExpression
| EqualityExpression '==' RelationalExpression
| EqualityExpression '!=' RelationalExpression
AndExpression : EqualityExpression
| AndExpression '&' EqualityExpression
ExclusiveOrExpression : AndExpression
| ExclusiveOrExpression '^' AndExpression
InclusiveOrExpression : ExclusiveOrExpression
| InclusiveOrExpression '|' ExclusiveOrExpression
ConditionalAndExpression : InclusiveOrExpression
| ConditionalAndExpression '&&'
InclusiveOrExpression
ConditionalOrExpression : ConditionalAndExpression
| ConditionalOrExpression '||'
ConditionalAndExpression
ConditionalExpression : ConditionalOrExpression
| ConditionalOrExpression '?' Expression ':'
ConditionalExpression
AssignmentExpression : ConditionalExpression
| Assignment
// Because "(i) = 1" is legal, I rewrote this to allow all variables
// on the left hand side. This requires some semantic checks that it is
// actually a variable being assigned.
Assignment : PostfixExpression AssignmentOperator AssignmentExpression
AssignmentOperator : '='
| '*='
| '/='
| '%='
| '
='
| '-='
| '<<='
| '>>='
| '>>>='
| '&='
| '^='
| '|='
Expression : AssignmentExpression
ConstantExpression : Expression

// All rules below deal with generics.
// GENERIC
TypeArguments : '<' ReferenceTypeList '>'
ReferenceTypeList : ReferenceType
| ReferenceTypeList ',' ReferenceType
ReferenceTypeList1 : ReferenceType1
| ReferenceTypeList ',' ReferenceType1
// Note that I used ClassOrInterface instead of Name, to allow "a<b>.c<d>"
ReferenceType1 : ReferenceType '>'
| ClassOrInterface '<' ReferenceTypeList2
ReferenceTypeList2 : ReferenceType2
| ReferenceTypeList ',' ReferenceType2
ReferenceType2 : ReferenceType '>>'
| ClassOrInterface '<' ReferenceTypeList3
ReferenceTypeList3 : ReferenceType3
| ReferenceTypeList ',' ReferenceType3
ReferenceType3 : ReferenceType '>>>'
TypeParameters : '<' TypeParameterList1
TypeParameterList : TypeParameter
| TypeParameterList ',' TypeParameter
TypeParameterList1 : TypeParameter1
| TypeParameterList ',' TypeParameter1
TypeParameter : 'Identifier' TypeBoundopt
// I had to fix TypeParameter1, including adding new rules.
TypeParameter1 : 'Identifier' '>'
| 'Identifier' TypeBound1
TypeBound : 'extends' ClassOrInterfaceType AdditionalBoundListopt
TypeBound1 : 'extends' ReferenceType1
| 'extends' ClassOrInterfaceType AdditionalBoundList1
AdditionalBoundList : AdditionalBound
| AdditionalBoundList AdditionalBound
AdditionalBoundList1 : AdditionalBound1
| AdditionalBoundList AdditionalBound1
AdditionalBound : '&' ClassOrInterfaceType
AdditionalBound1 : '&' ReferenceType1
 
eblake
Posts:15
Registered: 6/24/99
Re: Context-free LR(k) parsing of generics?   
Nov 7, 2002 4:30 PM (reply 3 of 10)  (In reply to #2 )

 
Correction. This rule needs to change too:

TypeArguments : '<' ReferenceTypeList1
 
schapel
Posts:2,586
Registered: 6/22/99
Re: Context-free LR(k) parsing of generics?   
Nov 11, 2002 6:21 AM (reply 4 of 10)  (In reply to #2 )

 
I am not interested in a parser generator targetting Java (I'm working
with the Jikes compiler, where the parser generator targets C++).

I think you're missing the main point. CUP is an LALR(1) parser, and there's a CUP grammar for the Java language (including generics support) on that site. That means there's an LALR(1) grammar for generics there.
 
eblake
Posts:15
Registered: 6/24/99
Re: Context-free LR(k) parsing of generics?   
Nov 11, 2002 9:38 AM (reply 5 of 10)  (In reply to #4 )

 
You missed my point. I looked at CUP, and it is a CONTEXT-SENSITIVE LALR grammar. I am interested in a CONTEXT-FREE LALR grammar.

In CUP, Scott invented the terminal PLT, a LT with lookahead for differentiating whether "(a<b" starts an expression (as in "(a<b)") or a cast to parameterized type (as in "(a<b>)o"). Yes, I could do that with my parser (and in the end I will probably have to, since no one else has come up with a better solution), but it is context sensitive - it requires the lexer to do lookahead!
 
eblake
Posts:15
Registered: 6/24/99
Re: Context-free LR(k) parsing of generics?   
Nov 11, 2002 11:36 AM (reply 6 of 10)  (In reply to #5 )

 
Also, Scott's sample grammar does not (yet) parse the following, while the prototype compiler does.

class A<T> {
  class B<S> {}
  void m() {
    int i;
    (i) = new int[]{0}[0];
    A<Integer>.B<Integer> c;
  }
}
 
eblake
Posts:15
Registered: 6/24/99
Re: Context-free LR(k) parsing of generics?   
Nov 21, 2002 3:52 PM (reply 7 of 10)  (In reply to #6 )

 
I have FINALLY managed to write an LALR(1) grammar for generics, with NO LOOKAHEAD in the lexer. The solution for casts to parameterized types is rather ugly, but it works. If you are interested in seeing the grammar, it is in the CVS repository of jikes, under the generics-branch tag, in the file src/java.g: http://www-124.ibm.com/developerworks/oss/cvs/jikes/~checkout~/jikes/src/java.g?rev=1.31.2.17&content-type=text/plain

I hope to translate that grammar to CUP, and inform Scott Ananian of my success, in the next few days.
 
fredastaire
Posts:97
Registered: 1/3/00
Re: Context-free LR(k) parsing of generics?   
Apr 24, 2003 8:07 AM (reply 8 of 10)  (In reply to #6 )

 
Just a quick note to this forum that I've updated my CUP grammar to fix the deficiencies Eric noted. It is still available from
http://www.cs.princeton.edu/~appel/modern/java/CUP/
and linked to from
http://cag.lcs.mit.edu/~cananian/Projects/GJ/

Thanks for the bug reports, Eric!

My grammar still uses the "smart lexer" look-ahead feature, but I've looked over Eric's work and it appears quite clever.
 
gafter
Posts:669
Registered: 6/25/98
Re: Context-free LR(k) parsing of generics?   
Apr 28, 2003 11:15 PM (reply 9 of 10)  (In reply to #2 )

 
the rules I had to fix so far. This grammar still
does not allow (a<b>)o (as I stated earlier, this is
highly ambiguous), nor does it allow a<b>.class (but
the public draft did not mention this as being legal,
and I think it would return the erased type anyway, so
the user should have just used a.class).

Neither of those forms are supposed to be legal.
 
gafter
Posts:669
Registered: 6/25/98
Re: Context-free LR(k) parsing of generics?   
Apr 28, 2003 11:17 PM (reply 10 of 10)  (In reply to #9 )

 
I spoke too soon. The cast form is legal. The class literal
form, I believe, is not.
 
This topic has 10 replies on 1 page.
Back to Forum
 
Read the Developer Forums Code of Conduct

Click to email this message Email this Topic

Edit this Topic
  
 
 
Forums Statistics
    Users Online : 63
  • Guests : 118

About Sun forums
  • Sun Forums is a large collection of user generated discussions. It is here to help you ask questions, find answers, and participate in discussions.

    Check out our guide on Getting started with Sun Forums for a full walkthrough of how to best leverage the benefits of this community.

Powered by Jive Forums