SARIF output

bca.to_sarif(result, *, thresholds=None) renders an analysis result (or an iterable of them) into a SARIF 2.1.0 JSON document, ready for upload to GitHub Code Scanning or any other SARIF consumer. The output is produced by the same Rust writer that backs bca check --report-format sarif, so the schema URL, tool driver name / version, and rule descriptions match the CLI byte-for-byte.

def run(
    paths: Iterable[Path],
    sarif_path: Path,
    thresholds: Mapping[str, float],
) -> str:
    """Analyse ``paths`` and write a SARIF document to ``sarif_path``.

    Returns the rendered SARIF JSON so the caller (or the test) can
    inspect it without re-reading the file.
    """
    batch = bca.analyze_batch(paths)
    sarif = bca.to_sarif(batch, thresholds=dict(thresholds))

    sarif_path.parent.mkdir(parents=True, exist_ok=True)
    sarif_path.write_text(sarif, encoding="utf-8")
    print(f"wrote {sarif_path} ({len(sarif.encode('utf-8'))} bytes)")
    return sarif

to_sarif accepts:

A single dict returned by bca.analyze or bca.analyze_source.
Any iterable yielding such dicts and / or bca.AnalysisFailure instances (the natural shape of bca.analyze_batch's return value). AnalysisFailure entries are skipped silently — they represent files that could not be analysed, not findings.

Thresholds

Accepted threshold names mirror the CLI's EXTRACTORS table in big-code-analysis-cli/src/thresholds.rs:

cognitive, cyclomatic, cyclomatic.modified
halstead.volume, halstead.difficulty, halstead.effort, halstead.time, halstead.bugs
loc.sloc, loc.ploc, loc.lloc, loc.cloc, loc.blank
nom, tokens, nexits, nargs
mi.original, mi.sei, mi.visual_studio
abc, wmc, npm, npa

An unknown name raises ValueError listing the accepted set, so a typo fails fast instead of silently producing an empty SARIF run.

thresholds=None (the default) and thresholds={} both produce a well-formed SARIF document with empty results and rules arrays. This matches the CLI's posture: there are no built-in default thresholds; every check run supplies its own limits.

Upload to GitHub Code Scanning

# .github/workflows/code-scanning.yml (excerpt)
- name: Compute metric SARIF
  run: |
    python - <<'PY'
    import big_code_analysis as bca
    with open("paths.txt", encoding="utf-8") as paths_fh:
        results = bca.analyze_batch(paths_fh.read().splitlines())
    with open("metrics.sarif", "w", encoding="utf-8") as fh:
        fh.write(bca.to_sarif(results, thresholds={"cyclomatic": 15}))
    PY
- name: Upload to Code Scanning
  uses: github/codeql-action/upload-sarif@v3
  with:
    sarif_file: metrics.sarif

The upload action is documented under github/codeql-action/upload-sarif. The bindings produce one SARIF run per call; the action handles the upload to the repository's Code Scanning alerts.

What "Unit" findings mean

to_sarif emits a finding at every space — the file unit, each container, and each leaf function or closure — whose own value breaches its limit, exactly matching bca check --report-format sarif. For most metrics the JSON headline at a space already is that space's own value. The four subtree-aggregate metrics — cyclomatic, cyclomatic.modified, cognitive, and abc — additionally expose a sum / magnitude rolled up across child spaces; the binding reads their per-space value field instead, so it reports an interior breach (for example a function whose own complexity breaches even though a nested closure's does not) without being fooled by the larger aggregate. Before the value field existed the binding could read only the aggregate and so emitted these four only at leaf spaces, missing genuine interior breaches the CLI reports (#958).

Unit findings carry logicalLocations: [{"fullyQualifiedName": "<file>"}]. Nameless non-unit spaces (rare parse-failure case) carry "<unnamed>" — both matching the CLI's function_token placeholders.

big-code-analysis Documentation

SARIF output

Thresholds

Upload to GitHub Code Scanning

What "Unit" findings mean

See also