SARIF output
bca.to_sarif(result, *, thresholds=None) renders an analysis
result (or an iterable of them) into a SARIF
2.1.0
JSON document, ready for upload to GitHub Code Scanning or any
other SARIF consumer. The output is produced by the same Rust
writer that backs bca check -O sarif, so the schema URL, tool
driver name / version, and rule descriptions match the CLI
byte-for-byte.
def run(
paths: Iterable[Path],
sarif_path: Path,
thresholds: Mapping[str, float],
) -> str:
"""Analyse ``paths`` and write a SARIF document to ``sarif_path``.
Returns the rendered SARIF JSON so the caller (or the test) can
inspect it without re-reading the file.
"""
batch = bca.analyze_batch([str(p) for p in paths])
sarif = bca.to_sarif(batch, thresholds=dict(thresholds))
sarif_path.parent.mkdir(parents=True, exist_ok=True)
sarif_path.write_text(sarif, encoding="utf-8")
print(f"wrote {sarif_path} ({len(sarif)} bytes)")
return sarif
to_sarif accepts:
- A single
dictreturned bybca.analyzeorbca.analyze_source. - Any iterable yielding such dicts and / or
bca.AnalysisErrorinstances (the natural shape ofbca.analyze_batch's return value).AnalysisErrorentries are skipped silently — they represent files that could not be analysed, not findings.
Thresholds
Accepted threshold names mirror the CLI's EXTRACTORS table in
big-code-analysis-cli/src/thresholds.rs:
cognitive,cyclomatic,cyclomatic.modifiedhalstead.volume,halstead.difficulty,halstead.effort,halstead.time,halstead.bugsloc.sloc,loc.ploc,loc.lloc,loc.cloc,loc.blanknom,tokens,nexits,nargsmi.original,mi.sei,mi.visual_studioabc,wmc,npm,npa
An unknown name raises ValueError listing the accepted set, so
a typo fails fast instead of silently producing an empty SARIF
run.
thresholds=None (the default) and thresholds={} both produce
a well-formed SARIF document with empty results and rules
arrays. This matches the CLI's posture: there are no built-in
default thresholds; every check run supplies its own limits.
Upload to GitHub Code Scanning
# .github/workflows/code-scanning.yml (excerpt)
- name: Compute metric SARIF
run: |
python - <<'PY'
import big_code_analysis as bca
with open("paths.txt", encoding="utf-8") as paths_fh:
results = bca.analyze_batch(paths_fh.read().splitlines())
with open("metrics.sarif", "w", encoding="utf-8") as fh:
fh.write(bca.to_sarif(results, thresholds={"cyclomatic": 15}))
PY
- name: Upload to Code Scanning
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: metrics.sarif
The upload action is documented under
github/codeql-action/upload-sarif.
The bindings produce one SARIF run per call; the action handles
the upload to the repository's Code Scanning alerts.
What "Unit" findings mean
to_sarif emits file-scope (unit-space) findings for every
metric whose JSON headline at the unit space matches the CLI's
per-space accessor (loc.*, halstead.*, mi.*, nom,
nargs, nexits, tokens, abc, wmc, npm, npa). The
three exceptions — cyclomatic, cyclomatic.modified,
cognitive — are skipped at the unit level because the JSON
exposes the aggregate sum across children while the CLI's
per-space accessor returns just the unit's own scalar.
Unit findings carry logicalLocations: [{"fullyQualifiedName": "<file>"}]. Nameless non-unit spaces (rare parse-failure case)
carry "<unnamed>" — both matching the CLI's function_token
placeholders.
See also
- Batch processing — the natural source of input
iterables for
to_sarif;AnalysisErrorentries are skipped silently. - Metric selection — threshold names are a closed
set independent of
metrics=; requesting a narrower metric suite while gating on a dropped threshold yields an empty SARIF run. - Error handling — the typed exceptions
to_sarifraises for bad caller input (TypeError/ValueError).