antlr4 - ANTLR 4 lexer tokens inside other tokens -
i have following grammar antlr 4:
grammar pattern; //parser rules parse : string lbrack char dash char rbrack ; string : (char | dash)+ ; //lexer rules dash : '-' ; lbrack : '[' ; rbrack : ']' ; char : [a-za-z0-9] ; and i'm trying parse following string
ab-cd[0-9] the code parses out ab-cd on left treated literal string in application. parses out [0-9] character set in case translate digit. grammar works me except don't have (char | dash)+ parser rule when it's being treated token. rather lexer create string token , give me following tokens:
"ab-cd" "[" "0" "-" "9" "]" instead of these
"ab" "-" "cd" "[" "0" "-" "9" "]" i have looked @ other examples, haven't been able figure out. other examples have quotes around such string literals or have whitespace delimit input. i'd avoid both. can accomplished lexer rules or need continue handle in parser rules i'm doing?
in antlr 4, can use lexer modes this.
string : [a-z-]+; lbrack : '[' -> pushmode(charset); mode charset; dash : '-'; number : [0-9]+; rbrack : ']' -> popmode; after parsing [ character, lexer operate in mode charset until ] character reached , popmode command executed.
Comments
Post a Comment