Flat-record iteration

bca.flatten_spaces(result) walks the nested FuncSpace tree in pre-order and yields one flat, scalar-only dict per node — ready for sqlite3.executemany, pandas.DataFrame.from_records, or any other tabular consumer.

Metric keys use the same dotted convention as the CLI's CSV writer (cyclomatic.modified.sum, halstead.volume, loc.lloc_average, …). Identity keys (path, name, kind, start_line, end_line, parent_name, depth) are added on every record.

SQLite via `executemany`

The example below analyses one file and inserts one row per FuncSpace into a sqlite table whose columns are the union of all flattened keys.

"""Flatten a FuncSpace tree into scalar rows for sqlite / pandas.

Demonstrates ``bca.flatten_spaces`` + ``sqlite3.executemany``. The
pandas equivalent is shown in the book as a non-executed snippet so
this example stays dependency-free (sqlite ships with the stdlib).

Tied to the book's ``python/flat-records.md`` page.
"""

from __future__ import annotations

import sqlite3
from contextlib import closing
from pathlib import Path

import big_code_analysis as bca


def run(path: Path, db_path: Path) -> int:
    """Analyse ``path`` and insert one row per FuncSpace into ``db_path``.

    Returns the number of rows inserted so the test can assert on it.
    """
    result = bca.analyze(path)
    if result is None:
        msg = f"{path} was skipped (looks generated)"
        raise SystemExit(msg)

    # The flattened keys are dotted, lowercase names
    # (`halstead.unique_operators`, `halstead.total_operators`, …) that
    # are unique under SQLite's case-insensitive column comparison (the
    # old `N1`/`n1` Halstead collision was removed in #511), so each
    # lands on its own column without renaming.
    records = [dict(r) for r in bca.flatten_spaces(result)]
    if not records:
        return 0

    columns = sorted({k for r in records for k in r})
    cols_sql = ", ".join(f'"{c}"' for c in columns)
    placeholders = ", ".join("?" for _ in columns)
    rows = [tuple(r.get(c) for c in columns) for r in records]

    # `closing(sqlite3.connect(...))` is the documented idiom — the
    # bare ``with sqlite3.connect(...)`` context manager only commits
    # / rolls back the transaction; it does NOT close the connection,
    # so a long-running consumer leaks file descriptors (and on
    # Windows holds an exclusive write lock on the db file).
    with closing(sqlite3.connect(db_path)) as db, db:
        db.execute(f"CREATE TABLE IF NOT EXISTS metrics ({cols_sql})")
        db.executemany(
            f"INSERT INTO metrics ({cols_sql}) VALUES ({placeholders})",
            rows,
        )

    return len(rows)


if __name__ == "__main__":
    import sys

    if len(sys.argv) != 3:
        sys.exit("usage: python flat_records.py <source-file> <out.db>")
    inserted = run(Path(sys.argv[1]), Path(sys.argv[2]))
    print(f"inserted {inserted} rows into {sys.argv[2]}")

The iterator is lazy and single-use: it walks the input once without materialising the whole list. A second iteration of the same iterator yields nothing — call list() once if you need to re-iterate.

Pandas

flatten_spaces is the natural input to pandas.DataFrame.from_records. Pandas is not a dependency of the bindings; install it separately if you want the DataFrame view.

import big_code_analysis as bca
import pandas as pd

result = bca.analyze("src/lib.rs")
if result is not None:
    df = pd.DataFrame.from_records(bca.flatten_spaces(result))
    print(df.head())
    # Group by space kind to inspect the average cyclomatic per
    # function vs. per class vs. per file.
    by_kind = df.groupby("kind")["cyclomatic.sum"].mean()

Identity columns vs CLI CSV

The flat-record schema is mostly aligned with the CLI's CSV writer, with a couple of intentional deltas:

Identity columns use name / kind here; the CSV writer uses space_name / space_kind. Flat records also add parent_name / depth; the CSV writer omits those.
tokens.* flattens to the JSON shape (tokens.tokens, tokens.average, tokens.min, tokens.max). Only the sum leaf differs from CSV, which spells it tokens.sum; the average / min / max leaves now match (#590). Rename the sum leaf in the consumer if you need exact CSV alignment.

Anonymous spaces (Rust closures, JavaScript function expressions / arrows) keep their name == "<anonymous>" marker verbatim — flatten_spaces does not normalise.

Caveats

parent_name alone cannot disambiguate same-named siblings nested under different parents (e.g. two Inner classes under two different outer classes both surface as parent_name == "Inner" for their own children). Pair with depth and source-order position, or rebuild the qualified name in your consumer, if you need a fully-qualified path.
Do not mutate the input result while iterating: the walker keeps references into it, so mutations to not-yet-yielded subtrees will be observed in later records.
Missing metric subtrees produce no keys (absent, not None), matching the "Halstead disabled" edge case for metric selection.
flatten_spaces raises TypeError if the input is not a mapping; callers must filter None returns from bca.analyze (e.g. generated files with skip_generated=True) before passing.

big-code-analysis Documentation

Flat-record iteration

SQLite via executemany

Pandas

Identity columns vs CLI CSV

Caveats

SQLite via `executemany`