Operating bca-web

bca-web is the HTTP daemon that wraps the big-code-analysis library, exposing comment removal, function spans, AST dumps, maintainability metrics, and change-history (VCS) metrics over a REST API. This page is for the operator running the daemon: how to build and start it, which flags and environment variables tune it, and the trust boundaries to respect before exposing it.

It is the operations companion to two reference pages. The REST API reference documents every endpoint, its request and response shapes, and the error contract. The Driving the REST API recipe shows end-to-end curl calls. This page covers the process itself and links to those two rather than repeating them.

Build and run

bca-web is the binary of the big-code-analysis-web crate. From a checkout, run it through Cargo:

cargo run -p big-code-analysis-web -- --host 127.0.0.1 --port 8080

To install the binary on your PATH, build a release artifact and copy it out, or install from the crate:

cargo install big-code-analysis-web   # installs the `bca-web` command
bca-web --host 127.0.0.1 --port 8080

bca-web binds the requested address, serves until interrupted, and exits non-zero if it cannot bind the port or hits an I/O error, so a supervisor (systemd, a container orchestrator, or a CI smoke check) sees the failure and can restart or alert.

Building with a subset of languages does not work

The shipped bca-web binary compiles every supported tree-sitter grammar in. The big-code-analysis-web crate pins the library's all-languages feature set explicitly, so passing --no-default-features or a custom --features list to cargo build -p big-code-analysis-web does not drop grammars from the resulting binary. Dropping a grammar silently from a user-facing daemon would surface as "language X stopped working" at request time rather than as a build error, so the crate forbids it (issue #252).

If you need a reduced grammar set, embed the big-code-analysis library in your own Rust code and select features in your own Cargo.toml. The per-language Cargo features chapter lists every feature with a worked example.

Command-line flags

The full flag set, with defaults:

FlagDefaultPurpose
-j, --num-jobs <N|auto>autoWorker-thread count. auto resolves to the OS-reported effective CPU count.
--host <HOST>127.0.0.1Address to bind.
-p, --port <PORT>8080TCP port.
--parse-timeout-secs <SECS>30Per-parse deadline. 0 disables it.
--cors <ORIGINS>offEnable CORS for a comma-separated origin allow-list.
-h, --helpPrint help and exit.
-V, --versionPrint version and exit.

--num-jobs auto is cgroup-quota- and cpuset-aware on Linux: in a container with a CPU quota it resolves to the quota rather than the host's physical core count, matching the bca CLI's --num-jobs. This count sizes the worker pool and the parse-admission semaphore, so it caps how many parses run concurrently. The minimum is 1; 0 is rejected at parse time.

--parse-timeout-secs bounds how long a single parse may run before the request returns 504 Gateway Timeout. The default of 30 guards against a pathological input wedging a worker indefinitely. Setting it to 0 removes the deadline and, with it, the load-shedding described below; use 0 only when an unbounded parse is acceptable. See the REST API reference for the response body the timeout returns.

CORS is off by default. The CORS section of the reference documents it in full, covering preflight handling, the wildcard form, and the absence of credentials. The short version: pass --cors with an explicit origin allow-list to let browser tooling read responses; omit it to emit no Access-Control-* headers at all.

Environment variables

VariableDefaultPurpose
BCA_MAX_ORPHANED_TASKSmax(num_jobs * 2, 4)Cap on orphaned (timed-out but still-running) parse tasks before new requests are shed with 503.
RUST_LOGinfoLog filter for the tracing subscriber.

RUST_LOG uses the EnvFilter syntax (for example RUST_LOG=big_code_analysis_web=debug). The daemon emits one access-log line per completed request, carrying the method, route, status, and latency.

Resource limits and back-pressure

Two limits protect the daemon from a single client exhausting it.

Request body size. Every endpoint rejects a request body larger than 4 MiB with 413 Payload Too Large. The limit applies uniformly to the JSON and raw-octet-stream content types, so both reject oversized bodies at the same threshold.

Orphaned-task admission control. When a parse exceeds --parse-timeout-secs, the request returns 504, but the blocking thread keeps running on the pool until the parse finishes on its own, because tree-sitter cannot be interrupted mid-parse. To stop sustained pathological input from piling up unbounded background work, new requests are rejected with 503 Service Unavailable once the count of orphaned tasks reaches a soft cap. The cap defaults to max(num_jobs * 2, 4) and is overridable through BCA_MAX_ORPHANED_TASKS (parsed as an unsigned integer; an invalid or zero value falls back to the default). Setting --parse-timeout-secs 0 disables this mechanism entirely, since with no deadline no task is ever orphaned.

Security and trust boundaries

bca-web has no authentication, authorization, or rate limiting of its own. The defaults are chosen for a local, single-operator deployment; widen them deliberately.

Default bind is loopback. The server binds 127.0.0.1 unless --host says otherwise. Keep it there, or put an authenticating proxy in front, before exposing it to a network. Binding 0.0.0.0 makes every capability below reachable by anyone who can route to the port.

CORS is off by default. With no --cors flag, a browser script from another origin cannot read API responses, so a page the operator happens to visit cannot quietly drive a loopback bca-web. The wildcard form (--cors '*') answers every origin and exposes the server's metrics and repository paths to any site; use it only on trusted networks. Full semantics are under CORS.

The VCS endpoints read server-side repositories. Unlike the source-in-body endpoints, /v1/vcs, /v1/vcs/trend, and /v1/vcs/jit analyze a git repository already on the server's filesystem, named by the request's repo_path. A caller who can reach these endpoints can make the server walk any git repository it can read and learn that repository's file paths, churn, and author signals. The VCS trust-boundary warning in the reference covers this in full; do not expose these endpoints to untrusted clients without an authorization layer.

The VCS cache directory is a client-controlled write path. The VCS endpoints accept an optional cache_dir field that overrides where the persistent change-history cache is written, defaulting to the platform cache location ($XDG_CACHE_HOME/big-code-analysis/vcs). A caller who can set cache_dir chooses a directory the server process writes into, so an untrusted client could direct cache writes to an attacker-chosen path. This is one more reason the VCS endpoints belong behind an authorization layer, never open to untrusted input.