Javac compiler principle

Everyone has configured environment variables when learning Java. When checking whether the configuration is successful, we will always type two commands on the command line, one is java and the other is javac. At the beginning we didn't know the meaning of these two commands. As we learned more, we know that the java command is to check if the runtime environment is found, and javac is the compiler that compiles the Java source code. Now, take a deep look at the magic compiler javac and see what's behind it. The

javac process

javac is a compiler that converts one language specification into another. The task of javac is to translate the Java source code language into a language that the JVM can recognize, and then the JVM converts the JVM language into a machine language that the current machine can recognize. The Java language shields developers from many details related to the target machine, making the execution of the Java language platform-independent.

As shown in the following figure, the task of javac is to compile Java source code into Java bytecode (binary stream), which is the language that JVM can recognize:


To know what working modules or basics of javac compiler Structure, we must first know its workflow, through the workflow to analyze the details. The following javac work flow chart:


  • first step, lexical analysis: read the source code by byte, find out the grammar keywords we defined in these bytecode files. For example: if, else, for, while and other keywords. Identify which ifs are legal and which are not.

    By lexical analysis, some normalized token streams were found from the source code. For example: find punctuation, subject, predicate, object, verb, etc. in a sentence.

  • 第二步, Grammar Analysis: What we need to do now is to do a parsing of the Token stream and check if these keywords are combined to conform to the Java language specification, such as whether the for loop format is correct.

    has formed an abstract syntax tree that conforms to the Java language specification through syntax analysis. The abstract syntax tree is a structured grammatical expression that organizes the main lexical parts of a language in a structured form. For example, in discrete mathematics, numbers are used to express some material worlds with complex relationships, graphs, trees, and so on.

  • third step, semantic analysis: The main task of semantic analysis is to transform some difficult and complicated grammar into a simpler grammar. For example: transform the incomprehensible classical Chinese into an easy-to-understand vernacular. The result of

    semantic analysis is to translate complex grammar into the simplest grammar, for example: turning a lambda expression into a simple data structure with annotations. Finally, an annotated abstract syntax tree is formed, which is closer to the grammar rules of the target language.

  • Step 4, the code generator generates bytecodes: generates bytecodes from an annotated abstract syntax tree, that is, transforms one data structure into another. For example: translate Chinese into Chinese. Each module of

javac completes the task of converting Java source code into Java bytecode, so javac mainly has four modules: lexical analyzer, parser, semantic parser, code generator .

JDK rt.jar, tools.jar and dt.jar role
  • rt.jar is the Java base class library, located in the /jre/lib directory, loaded by the application class loader (Application ClassLoader) :


  • dt.jar is a class library for the running environment. It is located in the /lib directory and is loaded by the Bootstrap Classloader:

  • tools.jar is a tool library, which is required for compilation and operation. The classes in toos.jar are; sun.tols.javac., located in the /jre/lib directory, loaded by the boot class loader (Bootstrap Classloader):