The optimized Java grammar
This grammar, based on the optimized Java7 grammar by Terence Parr and Sam Harwell, is meant to parse the latest for the Java language, and is optimized for performance, practical usage, and clarity.
It does not correspond exactly to the Java Language Specification. The java8, java9, and java20 grammars follow the JLS, but are slower that this grammar due to ambiguity and max-k problems in the published JLS EBNF.
This grammar parses the file ManyStringsConcat.java
faster than the unoptimized java grammars. It implements operator precedence
using Antlr4-style alt ordering instead of operator-precedence rules. Thus, it avoids
creating parse trees with long, single-child chains for each string literal constant in
ManyStringsConcat.java. In addition, it is faster
because it avoids the large ATN-config set construction in the
AdaptivePredict() parsing engine.
Java Enhancement Proposals (JEP) are not implemented in this grammar.
Currently supported Java version
- Java 24 (latest)
Main contributors
- Terence Parr, 2013
- Sam Harwell, 2013
- Ivan Kochurkin (Positive Technologies), 2017
- Michał Lorek, 2021
Tests
- See examples/
- OpenJDK 24,
src/**/*.java(using Trash trgen to create app, thenfind ~/jdk-jdk-23-ga/src/ -name '*.java' | cygpath -w -f - | ./Test -x)
Benchmarks
Grammar performance has been tested on the following Java projects:
- OpenJDK 24
- Spring Framework
- Elasticsearch
- RxJava
- JUnit4
- Guava
- Log4j
See the benchmarks page for details.
Grammar style
Please use antlr-format and formatting style config to reformat in the coding standard format for the repo.
String literals
Generally, you can use either a string literal or the corresponding lexer rule name
(TOKEN_REF) directly in a parser rule for a token. It makes no difference because the
java/java/ grammar
is a split Antlr4 grammar, and the Antlr Tool prevents you from defining a token using
a string literal in a parser rule (it outputs
cannot create implicit token for string literal in non-combined grammar if you try).
When writing an Antlr listener or visitor, use the corresponding lexer rule name for the
string literal used in the parser rule.
Currently, the grammar contains a mixture of string literals and lexer rule names in parser rules. If you want a parser grammar that removes all string literals from parser rules, use Trash trfoldlit. If you want a parser grammar that uses string literals where a lexer rule exists for the string literal, use Trash trunfoldlit.