Walking the AST directly
Ast::parse gives you a parsed
tree_sitter::Tree together with the source bytes it was
parsed from; Ast::as_tree_sitter hands that tree out as a
borrowed reference. This chapter shows how to use it to drive your own
syntax-tree analysis — counting node kinds, finding constructs by name,
detecting parse errors, or pulling out a symbol table — without paying
for a second parse.
When to use this
Reach for direct AST traversal when:
- You want to count or find syntactic constructs in-process. The CLI
equivalents (
bca count <kind>,bca find <kind>, recipe) shell out per file; the library path is one parse and one Rust loop. - You want to detect parse errors programmatically. Tree-sitter
emits a synthetic
ERRORnode anywhere the grammar could not match;Node::has_erroris O(1) — tree-sitter caches the error bit on every node — so the check is free even on a multi-MB source file. - You want to mix metrics with custom analysis in one parse — e.g. capture metric values and a list of function names for a coverage mapping, an IDE outline, or a code-owner report.
If you only need standard metrics, stay with analyze or
Ast::metrics — they walk the tree for you. The direct
path is for things the metric walker does not already compute.
Use the re-exported tree_sitter
Import tree_sitter from
big_code_analysis::tree_sitter rather than adding a sibling
tree-sitter dependency. The re-export is pinned to the exact version
the metric walker was built against, so the Tree types agree by
definition. See Reusing an existing tree-sitter
Tree and
Stability and versioning for the value-not-stable
posture this re-export carries.
A reusable DFS walker
Most of the examples below need a depth-first traversal of every
descendant. Tree-sitter ships a TreeCursor that does
this in O(1) per step (no allocations beyond the cursor itself). The
canonical walk is short enough to inline:
#![allow(unused)] fn main() { use big_code_analysis::tree_sitter; /// Visit every node in `tree` in pre-order, root first, passing each /// node to `visit`. Allocation-free apart from the cursor itself. fn walk_preorder<F: FnMut(tree_sitter::Node<'_>)>( tree: &tree_sitter::Tree, mut visit: F, ) { let mut cursor = tree.walk(); 'walk: loop { visit(cursor.node()); if cursor.goto_first_child() { continue; } loop { if cursor.goto_next_sibling() { continue 'walk; } if !cursor.goto_parent() { return; } } } } }
The pattern is: visit, descend, climb back up while there is no next
sibling, repeat. Every example in this chapter is a thin wrapper around
this walker — the code fences below are marked ignore because they
assume walk_preorder is already in scope; the matching set of tests
in tests/book_ast_traversal_examples.rs keeps them
honest, so a refactor that broke an example would fail cargo test.
Count nodes by kind
Library equivalent of bca count if_expression for_expression while_expression from the
AST-queries recipe:
use big_code_analysis::{Ast, LANG, Source};
use std::collections::HashMap;
let ast = Ast::parse(Source::new(
LANG::Rust,
b"fn a() { if true { 1 } else { 2 } } fn b() { for _ in 0..10 {} }",
))
.expect("rust feature enabled");
let mut counts: HashMap<&str, usize> = HashMap::new();
walk_preorder(ast.as_tree_sitter(), |node| {
*counts.entry(node.kind()).or_default() += 1;
});
assert_eq!(counts.get("if_expression").copied().unwrap_or(0), 1);
assert_eq!(counts.get("for_expression").copied().unwrap_or(0), 1);
The string keys ("if_expression", "for_expression", …) are the
tree-sitter grammar's node-type names. The fastest way to discover them
for a new language is bca --paths sample.rs dump, which prints the
full AST.
Anonymous tokens. The walker visits every node tree-sitter emits, including anonymous tokens like
"{",";", and keyword literals. The targetedcounts.get("if_expression")lookups above are unaffected — anonymous tokens have different kind names — butcounts.values().sum()would be much larger than the count of named grammar productions. Filter withtree_sitter::Node::is_named()inside the visitor if you only want named nodes.
Find nodes by kind
Library equivalent of bca find unsafe_block:
use big_code_analysis::{Ast, LANG, Source};
let ast = Ast::parse(Source::new(
LANG::Rust,
b"fn safe() {} fn risky() { unsafe { } }",
))
.expect("rust feature enabled");
let source = ast.source();
// Captured slices borrow from `source` — no per-hit `String` allocation.
let mut hits: Vec<((usize, usize), &str)> = Vec::new();
walk_preorder(ast.as_tree_sitter(), |node| {
if node.kind() == "unsafe_block" {
let span = (node.start_position().row, node.end_position().row);
let text = node
.utf8_text(source)
.expect("source is valid utf-8");
hits.push((span, text));
}
});
assert_eq!(hits.len(), 1);
Node::utf8_text(&source[..]) slices the source bytes by the node's
byte range. Pair it with Ast::source — for C++ with
preprocessor inputs supplied to Ast::parse, source is
the expanded buffer the parser actually saw, not the original input
(see the C++ preprocessor note).
Detect parse errors
Tree-sitter is lossless: even on malformed input it returns a tree, but nodes that could not be matched are tagged as errors. The cheapest check is on the root:
#![allow(unused)] fn main() { use big_code_analysis::{Ast, LANG, Source}; let ast = Ast::parse(Source::new(LANG::Rust, b"fn broken(")) .expect("rust feature enabled"); // Walks far enough to confirm something went wrong, but does not // enumerate every error site. assert!(ast.as_tree_sitter().root_node().has_error()); }
To list the offending nodes, walk the tree and check each:
use big_code_analysis::{Ast, LANG, Source};
let ast = Ast::parse(Source::new(LANG::Rust, b"fn broken("))
.expect("rust feature enabled");
let mut error_lines = Vec::new();
walk_preorder(ast.as_tree_sitter(), |node| {
if node.is_error() || node.is_missing() {
error_lines.push(node.start_position().row);
}
});
assert!(!error_lines.is_empty());
Node::is_error() flags the synthetic ERROR node tree-sitter inserts
where it could not match the grammar; Node::is_missing() flags
phantom nodes the parser invented to recover from a missing token. The
CLI's bca find ERROR recipe uses the same nodes.
Combine metrics with a custom walk
The whole point of Ast is parse-once / compute-many. A
realistic pipeline computes metrics and extracts a symbol table from
the same parse:
use big_code_analysis::{Ast, LANG, MetricsOptions, Source};
let ast = Ast::parse(Source::new(
LANG::Rust,
b"fn outer() { fn inner() {} } fn alone() {}",
))
.expect("rust feature enabled");
// One parse: metrics walker uses it…
let space = ast
.metrics(MetricsOptions::default())
.expect("walker succeeds");
// …and so does the custom walk, against the very same tree. The
// captured names borrow from `source` rather than allocating a fresh
// `String` per function — the same pattern as `find_unsafe_blocks`
// above.
let source = ast.source();
let mut functions: Vec<&str> = Vec::new();
walk_preorder(ast.as_tree_sitter(), |node| {
if node.kind() == "function_item"
&& let Some(name_node) = node.child_by_field_name("name")
{
let name = name_node
.utf8_text(source)
.expect("source is valid utf-8");
functions.push(name);
}
});
assert_eq!(space.metrics.nom.functions_sum(), 3.0);
assert_eq!(functions, ["outer", "inner", "alone"]);
Node::child_by_field_name walks the named grammar fields — the same
fields that show up in the FieldName column when you run
bca --paths sample.rs dump. Field-based lookup is more robust than
positional indexing because it does not depend on which children the
grammar emits for anonymous tokens (commas, parentheses, …).
Want a serializable JSON tree?
For pipelines that want a structured AST as data — diffing, queries on
the wire, language-agnostic schema work — the
AstCallback / AstNode family materializes
the tree as a Serialize-able struct. This is what the REST /ast
endpoint produces (bca dump uses a separate Dump callback that
writes a human-readable form to stdout). Library consumers can call
the JSON-shaped callback directly:
#![allow(unused)] fn main() { use std::path::PathBuf; use big_code_analysis::{ AstCallback, AstCfg, AstPayload, LANG, action, }; let payload = AstPayload { id: "snippet".to_owned(), file_name: "snippet.rs".to_owned(), code: "fn f() {}".to_owned(), comment: false, span: true, }; let cfg = AstCfg { id: payload.id.clone(), comment: payload.comment, span: payload.span, }; let response = action::<AstCallback>( &LANG::Rust, payload.code.into_bytes(), &PathBuf::from(&payload.file_name), None, cfg, ); let json = serde_json::to_string(&response).expect("AstResponse serializes"); println!("{json}"); }
For one-off in-process work, the as_tree_sitter() walker above is
cheaper (no allocation per node). Reach for AstCallback when you
need a serializable owned tree.
Out of scope
- Incremental reparse — tree-sitter supports
tree_sitter::InputEditfor incremental updates, butAstis a snapshot. To reflect a source edit, build a freshAst::parseor callParser::parse(&new_source, Some(&old_tree))directly via the re-exportedtree_sitterand feed the result throughAst::from_tree_sitter. - The crate-internal
big_code_analysis::Nodewrapper. It is exposed for the metric walker's traversal needs, but most of its traversal methods (kind,child_count,children,cursor, …) staypub(crate). Library consumers should reach the tree-sitterNodethroughas_tree_sitter().root_node()— that is the documented seam.