Parsing and Modifying Code

Parsing code is a well understood problem. Grammar rules match the input text and when matched produce values created from the text they match. These values are assembled into an "abstract syntax tree" (or AST) which stores the information you need in a structured representation.

But when it comes to taking the AST and turning it back into code, with most parsers you start from scratch and write new code that expresses the inverse of those rules. That process produces a new file which works the same but looks different since it omits comments, and whitespace - information not preserved in the AST.

StrataCode's parser, Parselets, parses code using a standard PEG parser. And like like a few of the newer parsers, you can declaratively create the AST directly from the grammar. But unlike other parsers, the same grammar lets you go in the reverse direction as well. You can take the AST and re-generate the code. This is particularly powerful when you are making incremental changes to the AST - as is required by a code-processing engine - to keep source code changes localized. When you add or replace a statement, you don't affect the code around it.

Parselets additionally supports error recovery which is robust enough to be used in an IDE for partial file parsing and completion with reasonable error highlighting.

It also includes the ability to take diffs between one version of a file and the next and to do a fast 'reparse' of the AST - useful for fast editing of large files in the IDE.

The features behind parselets power a good deal of the IntelliJ plugin. Any language written in parselets can use those features making it much easier to build IDEs for new languages. Unlike most plugins, the most challenging IDE features were built using parselets, and the same code that powers the management UI framework, and the dynamic runtime. This makes the same features available in the code editing and navigation features found in the management UI and available for future IDE support.