Update grammars
Each programming language needs to be parsed in order to extract its
syntax and semantic: the so-called grammar of a language.
In big-code-analysis, we use
tree-sitter as parsing library since
it provides a set of distinct grammars for each of our supported
programming languages. But a grammar is not a static monolith, it
changes over time, and it can also be affected by bugs, hence it is
necessary to update it every now and then.
As now, since we have used bash scripts to automate the operations,
grammars can be updated natively only on Linux and MacOS
systems, but these scripts can also run on Windows using WSL.
In big-code-analysis we use both third-party and internal grammars.
The first ones are published on crates.io and maintained by external developers,
while the second ones have been thought and defined inside the project to manage variant of some languages
used in Firefox.
We are going to explain how to update both of them in the following sections.
Third-party grammars
Update the grammar version in Cargo.toml and enums/Cargo.toml.
Below an example for the tree-sitter-java grammar
tree-sitter-java = "x.xx.x"
where x represents a digit.
Run ./recreate-grammars.sh to recreate and refresh all grammars
structures and data
./recreate-grammars.sh
Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.
Commit your changes and create a new pull request
Internal grammars
Update the version of tree-sitter-cli in the package.json file of
the internal grammar and then install the updated version.
The five vendored grammars publish under the bca-tree-sitter-*
namespace (see RELEASING.md for the rename rationale), but consumer
call sites still reference them as tree-sitter-<lang> via Cargo's
package = ... alias. A grammar refresh does not bump the leaf's
version on its own — every crate in this repository shares one
workspace-wide version, and bumping the leaves out of step with the
parent is not allowed (see the "Lockstep version policy" in
RELEASING.md). Regenerate the parser tables, accept the resulting
test-snapshot drift, and ship the change under the current version.
The next workspace release picks up the new grammars at whatever
shared version the next tag declares.
If a regeneration also needs an updated tree-sitter runtime
dependency, bump the dev-dependency line inside the leaf's
Cargo.toml:
[dev-dependencies]
tree-sitter = "=x.x.x"
Leave [package] name = "bca-tree-sitter-<lang>",
[package] version, and [lib] name = "tree_sitter_<lang>"
untouched — the rename trick in [lib] is what keeps Rust import
paths stable, and the version line is managed by the lockstep
bump at release time.
Run the appropriate script to update the grammar by recreating and refreshing every file and script.
For tree-sitter-ccomment and tree-sitter-preproc run
./generate-grammars/generate-grammar.sh followed by the name of the
grammar.
Below an example always using the tree-sitter-ccomment grammar
./generate-grammars/generate-grammar.sh tree-sitter-ccomment
Instead, for tree-sitter-mozcpp and tree-sitter-mozjs, use their specific scripts.
For tree-sitter-mozcpp, run
./generate-grammars/generate-mozcpp.sh
For tree-sitter-mozjs, run
./generate-grammars/generate-mozjs.sh
Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.
Commit your changes and create a new pull request