Per-language Cargo features

Every tree-sitter grammar this library bundles is gated behind its own Cargo feature. The default feature set is all-languages, so the default

[dependencies]
big-code-analysis = "1.1.0"

pulls every grammar in — matching the library's historical behaviour and what the bca / bca-web binaries themselves ship with. The cost is concrete: every grammar crate compiles when the library compiles, and every grammar's parsing tables stay live in the final binary.

Library consumers that only need a subset of languages can opt out of the defaults and re-enable just the grammars they care about.

A worked example

A downstream service that only analyses Rust and TypeScript:

[dependencies]
big-code-analysis = { version = "1.1.0", default-features = false, features = ["rust", "typescript"] }

The library still compiles, the LANG enum still has every variant, and analyze / metrics_from_tree / the rest of the dispatch surface still work for the enabled languages.

Supported features

The following per-language features are available. Each feature pulls in the matching grammar crate (and any helper grammars the per-language pipeline depends on).

FeatureGrammar crates pulled in
bashtree-sitter-bash
cppbca-tree-sitter-mozcpp, bca-tree-sitter-ccomment, bca-tree-sitter-preproc (covers the Cpp, Ccomment, and Preproc variants)
csharptree-sitter-c-sharp
elixirtree-sitter-elixir
gotree-sitter-go
groovydekobon-tree-sitter-groovy
javatree-sitter-java
javascripttree-sitter-javascript
kotlintree-sitter-kotlin-ng
luatree-sitter-lua
mozjsbca-tree-sitter-mozjs
perltree-sitter-perl
phptree-sitter-php
pythontree-sitter-python
rubytree-sitter-ruby
rusttree-sitter-rust
tclbca-tree-sitter-tcl
typescripttree-sitter-typescript (used by both the Typescript and Tsx variants)

The umbrella all-languages feature enables every entry in this table. The bca-tree-sitter-* crates are in-tree forks of the upstream Mozilla / community grammars; the Rust import path remains tree_sitter_<lang> regardless. See RELEASING.md for the rename rationale and the workspace package = ... alias trick that keeps consumer call sites unchanged.

What happens when a feature is off

The LANG enum keeps every variant defined regardless of the active feature set — disabling a feature does not change the enum surface, the per-language *Code / *Parser type aliases, or any of the file-extension / emacs-mode detection helpers. Selecting a LANG whose feature is off only affects the dispatch path.

Every dispatch entry point that returns a Result surfaces the disabled state as Err(MetricsError::LanguageDisabled(LANG)):

Callers can query the compiled-in set without going through a dispatcher:

#![allow(unused)]
fn main() {
use big_code_analysis::LANG;

for lang in LANG::into_enum_iter() {
    if lang.is_enabled() {
        println!("{:?} is compiled in", lang);
    }
}
}

This pairs well with the get_language_for_file / guess_language helpers, which still hand back any LANG variant for a recognised extension — callers walking a directory may want to skip files whose language is not enabled in the current build.

Stability

Per-language features are themselves stable. Adding or removing a language feature in the future is a minor-bump break (it changes which LANG variants the default build covers); changes to the default feature set will be flagged in the changelog under (breaking).