Flat-record iteration
bca.flatten_spaces(result) walks the nested FuncSpace tree in
pre-order and yields one flat, scalar-only dict per node —
ready for sqlite3.executemany,
pandas.DataFrame.from_records, or any other tabular consumer.
Metric keys use the same dotted convention as the CLI's CSV
writer (cyclomatic.modified.sum, halstead.volume,
loc.lloc_average, …). Identity keys (path, name, kind,
start_line, end_line, parent_name, depth) are added on
every record.
SQLite via executemany
The example below analyses one file and inserts one row per
FuncSpace into a sqlite table whose columns are the union of
all flattened keys.
"""Flatten a FuncSpace tree into scalar rows for sqlite / pandas.
Demonstrates ``bca.flatten_spaces`` + ``sqlite3.executemany``. The
pandas equivalent is shown in the book as a non-executed snippet so
this example stays dependency-free (sqlite ships with the stdlib).
Tied to the book's ``python/flat-records.md`` page.
"""
from __future__ import annotations
import sqlite3
from contextlib import closing
from pathlib import Path
import big_code_analysis as bca
# SQLite identifier names are case-insensitive, so the Halstead
# pair `N1` / `n1` (and `N2` / `n2`) collide on one column. Rewrite
# the uppercase totals to a distinct name before insertion. The
# explicit map (not a `.replace(".N", "...")` substring rewrite)
# means a hypothetical future `halstead.NN_metric` would not be
# silently mangled.
_RENAME_FOR_SQLITE: dict[str, str] = {
"halstead.N1": "halstead.total_1",
"halstead.N2": "halstead.total_2",
}
def _safe_column(key: str) -> str:
return _RENAME_FOR_SQLITE.get(key, key)
def run(path: Path, db_path: Path) -> int:
"""Analyse ``path`` and insert one row per FuncSpace into ``db_path``.
Returns the number of rows inserted so the test can assert on it.
"""
result = bca.analyze(path)
if result is None:
msg = f"{path} was skipped (looks generated)"
raise SystemExit(msg)
records = [{_safe_column(k): v for k, v in r.items()} for r in bca.flatten_spaces(result)]
if not records:
return 0
columns = sorted({k for r in records for k in r})
cols_sql = ", ".join(f'"{c}"' for c in columns)
placeholders = ", ".join("?" for _ in columns)
rows = [tuple(r.get(c) for c in columns) for r in records]
# `closing(sqlite3.connect(...))` is the documented idiom — the
# bare ``with sqlite3.connect(...)`` context manager only commits
# / rolls back the transaction; it does NOT close the connection,
# so a long-running consumer leaks file descriptors (and on
# Windows holds an exclusive write lock on the db file).
with closing(sqlite3.connect(db_path)) as db, db:
db.execute(f"CREATE TABLE IF NOT EXISTS metrics ({cols_sql})")
db.executemany(
f"INSERT INTO metrics ({cols_sql}) VALUES ({placeholders})",
rows,
)
return len(rows)
if __name__ == "__main__":
import sys
if len(sys.argv) != 3:
sys.exit("usage: python flat_records.py <source-file> <out.db>")
inserted = run(Path(sys.argv[1]), Path(sys.argv[2]))
print(f"inserted {inserted} rows into {sys.argv[2]}")
The iterator is lazy and single-use: it walks the input once
without materialising the whole list. A second iteration of the
same iterator yields nothing — call list() once if you need to
re-iterate.
Pandas
flatten_spaces is the natural input to
pandas.DataFrame.from_records. Pandas is not a dependency of
the bindings; install it separately if you want the DataFrame
view.
import big_code_analysis as bca
import pandas as pd
result = bca.analyze("src/lib.rs")
if result is not None:
df = pd.DataFrame.from_records(bca.flatten_spaces(result))
print(df.head())
# Group by space kind to inspect the average cyclomatic per
# function vs. per class vs. per file.
by_kind = df.groupby("kind")["cyclomatic.sum"].mean()
Identity columns vs CLI CSV
The flat-record schema is mostly aligned with the CLI's CSV writer, with a couple of intentional deltas:
- Identity columns use
name/kindhere; the CSV writer usesspace_name/space_kind. Flat records also addparent_name/depth; the CSV writer omits those. tokens.*flattens to the JSON shape (tokens.tokens,tokens.tokens_average, …), while CSV renames those totokens.sum/.average/.min/.max. Rename in the consumer if you need exact CSV alignment.
Anonymous spaces (Rust closures, JavaScript function expressions /
arrows) keep their name == "<anonymous>" marker verbatim —
flatten_spaces does not normalise.
Caveats
parent_namealone cannot disambiguate same-named siblings nested under different parents (e.g. twoInnerclasses under two different outer classes both surface asparent_name == "Inner"for their own children). Pair withdepthand source-order position, or rebuild the qualified name in your consumer, if you need a fully-qualified path.- Do not mutate the input
resultwhile iterating: the walker keeps references into it, so mutations to not-yet-yielded subtrees will be observed in later records. - Missing metric subtrees produce no keys (absent, not
None), matching the "Halstead disabled" edge case for metric selection. flatten_spacesraisesTypeErrorif the input is not a mapping; callers must filterNonereturns frombca.analyze(e.g. generated files withskip_generated=True) before passing.