perf(native): scope node loading in call-edge builder for incremental builds#976
perf(native): scope node loading in call-edge builder for incremental builds#976carlos-alm wants to merge 3 commits intomainfrom
Conversation
… builds On incremental builds, `build_and_insert_call_edges` was loading every node in the graph (~13k rows in codegraph itself) just to resolve a handful of changed files' call sites. The deserialization cost dominates the actual edge-building work for small edits. Scope the SQL to only the files being processed plus their resolved import targets. The import context already knows the resolved target paths, so we stage them in a temp table and `INNER JOIN` against the nodes table. Full builds still load every node (there is no smaller set to work with). Guarded on `is_incremental && file_symbols.len() < 200` so the scoped path only kicks in when it actually wins — very large incremental builds fall through to the full-load path to avoid temp-table overhead. No behavior change: the scoped set is a strict superset of the nodes the call-edge builder needs (changed files + their import targets). Cross-file call resolution still sees every target it could previously.
|
Claude finished @carlos-alm's task —— View job PR Review AnalysisTodo List
Root Cause AnalysisProblem Being Solved: Why This Approach: The optimization scopes node loading on incremental builds to only files being processed + their resolved import targets. The import context already has resolved paths, so the solution stages them in a temp table and uses Risk Assessment: Low risk - the scoped set is a strict superset of nodes the call-edge builder needs. Full builds and large incremental builds (>200 files) fall through to existing full-load behavior. Backlog Compliance
Critical ConcernsMinor Issue: Line 862 uses hardcoded Final Recommendation
Condition: Consider using |
Greptile SummaryThis PR adds an incremental fast path to Confidence Score: 5/5Safe to merge — all prior P1 concerns are resolved and only two minor P2 style nits remain The barrel-chain fix, schema qualification, and index addition correctly address the three previous blocking comments. The scoped path is a strict superset of the required node set by construction. Remaining findings are cosmetic and do not affect correctness or data integrity. No files require special attention Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[build_and_insert_call_edges] --> B{is_incremental?}
B -- No --> FULL[Full load: SELECT all callable nodes]
B -- Yes --> C{file_symbols.len <= 5\nAND existing_files > 20?}
C -- No --> FULL
C -- Yes --> D[Build relevant_files\nfrom file_symbols keys]
D --> E[For each changed file's imports:\nresolve target path]
E --> F{Target is barrel file?}
F -- Yes --> G[resolve_barrel_export\nfor each imported name\nadd ultimate def file]
F -- No --> H[Add resolved target to relevant_files]
G --> H
H --> I[Add all barrel_only_files\nto relevant_files]
I --> J[Stage relevant_files into\ntemp._edge_files]
J --> K[INNER JOIN nodes ON _edge_files.file\nwhere kind in callable types]
K --> L[DROP temp._edge_files]
L --> M[all_nodes = scoped result]
FULL --> M
M --> N[build_call_edges\nfile_entries + all_nodes]
N --> O[Insert computed edges into DB]
Reviews (2): Last reviewed commit: "fix(native): include barrel-ultimate tar..." | Re-trigger Greptile |
| let mut relevant_files: HashSet<String> = file_symbols.keys().cloned().collect(); | ||
| for (rel_path, symbols) in file_symbols { | ||
| let abs_file = Path::new(&import_ctx.root_dir).join(rel_path); | ||
| let abs_str = abs_file.to_str().unwrap_or(""); | ||
| for imp in &symbols.imports { | ||
| let resolved = import_ctx.get_resolved(abs_str, &imp.source); | ||
| if !resolved.is_empty() { | ||
| relevant_files.insert(resolved); | ||
| } | ||
| } |
There was a problem hiding this comment.
Barrel-resolved targets missing from
relevant_files
The scoped set only adds direct import targets, but the FileEdgeInput construction (lines 997–1004) resolves barrel re-exports and sets imported_names[].file to the ultimate source file (2+ hops away). When the edge builder then queries nodes_by_name_and_file for (name, ultimate_source_file), that file's nodes are absent from all_nodes, so the import-aware lookup returns nothing and the call edge is silently dropped — or falls through to an unscoped name-only lookup that may pick the wrong target.
Concretely: if file A (changed) imports { foo } from @/barrel (file B), and the barrel re-exports foo from file C, then relevant_files contains A and B but not C. File C's nodes are never loaded, so the A→C call edge is lost on every incremental run that touches A.
Fix: after the barrel-resolution pass, collect all target_file values from imported_names and add them to relevant_files before staging the temp table, or traverse the barrel chain during the scoping step.
There was a problem hiding this comment.
Fixed in b1bdee9. The scoping loop now walks the barrel chain: when a direct import target is a barrel file, we call resolve_barrel_export for each imported name and add the ultimate definition file to relevant_files. Tested via the existing barrel_resolution_simple / barrel_chain_two_levels unit tests.
| let _ = conn.execute_batch( | ||
| "CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)", | ||
| ); | ||
| let _ = conn.execute("DELETE FROM _edge_files", []); | ||
| { | ||
| let mut ins = match conn.prepare("INSERT INTO _edge_files (file) VALUES (?)") { | ||
| Ok(s) => s, | ||
| Err(_) => return, | ||
| }; | ||
| for f in &relevant_files { | ||
| let _ = ins.execute([f]); | ||
| } | ||
| } | ||
|
|
||
| let sql = format!( | ||
| "SELECT n.id, n.name, n.kind, n.file, n.line FROM nodes n \ | ||
| INNER JOIN _edge_files ef ON n.file = ef.file \ | ||
| WHERE n.{node_kind_filter}", | ||
| ); | ||
| let nodes: Vec<NodeInfo> = match conn.prepare(&sql) { | ||
| Ok(mut stmt) => stmt | ||
| .query_map([], |row| { | ||
| Ok(NodeInfo { | ||
| id: row.get::<_, i64>(0)? as u32, | ||
| name: row.get(1)?, | ||
| kind: row.get(2)?, | ||
| file: row.get(3)?, | ||
| line: row.get::<_, i64>(4)? as u32, | ||
| }) | ||
| }) | ||
| .map(|rows| rows.filter_map(|r| r.ok()).collect()) | ||
| .unwrap_or_default(), | ||
| Err(_) => Vec::new(), | ||
| }; | ||
| let _ = conn.execute_batch("DROP TABLE IF EXISTS _edge_files"); |
There was a problem hiding this comment.
Inconsistent temp-table schema qualification
The existing _analysis_files pattern explicitly qualifies every statement with temp._analysis_files (lines 1096, 1106, 1127). The new _edge_files uses unqualified names everywhere. SQLite resolves unqualified names to the temp schema first, so this is functionally correct today, but consistency with the established pattern would reduce future confusion.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Fixed in b1bdee9. Aligned with the _analysis_files pattern: unqualified CREATE TEMP TABLE (temp schema is the default for TEMP tables), then temp._edge_files on every subsequent op (DELETE, INSERT, SELECT JOIN, DROP). Parameter style is now ?1 + rusqlite::params![...] to match.
| Vec::new() | ||
| } else { | ||
| let _ = conn.execute_batch( | ||
| "CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)", |
There was a problem hiding this comment.
Fixed in b1bdee9. Added CREATE INDEX IF NOT EXISTS _edge_files_file_idx ON _edge_files (file) in the same execute_batch as the CREATE TABLE, so the INNER JOIN against nodes.file becomes an index lookup instead of a linear scan.
Codegraph Impact Analysis2 functions changed → 1 callers affected across 1 files
|
Mirror the JS `loadNodes` gate in `src/domain/graph/builder/stages/build-edges.ts`: only scope call-edge node loading when the codebase is large enough (existing file-node count > 20) and the incremental change set is small (<= 5 files). Tiny fixtures skip the scoped path entirely — the savings are negligible at that scale and the scoped set can miss transitively- required nodes needed for receiver-type resolution, breaking the incremental-edge-parity tests (e.g. `main -> Calculator.compute`). Also include `barrel_only_files` in the scoped set to match JS exactly. Fixes the "new export added" parity regression introduced by #976. Impact: 1 functions changed, 1 affected
Addresses review feedback on #976: - greptile P1: relevant_files only included direct import targets. Barrel re-exports resolve imported_names[].file to the ultimate definition file 2+ hops away, so that file's nodes were absent from all_nodes and the A->C call edge was silently dropped. Now we walk the barrel chain during scoping and add every resolved ultimate file. - greptile P2: align temp-table usage with the existing _analysis_files pattern (qualified temp. on all non-CREATE ops) and add a covering index on _edge_files.file so the INNER JOIN is a lookup. Impact: 1 functions changed, 1 affected
|
@greptileai addressed all three findings in b1bdee9 (barrel ultimate-target scoping + temp schema qualification + |
Summary
build_and_insert_call_edgesloads every node in the graph on every invocation, including incremental builds where only a handful of files changed. On the codegraph repo (~13k nodes), that deserialization alone dominates the work the builder actually does.INNER JOINagainstnodes.Why
On incremental builds, the JS side already has scoped node loading in
build-edges.ts(therelevantFilespattern withaddLazyFallback), but the Rust orchestrator still loads all nodes here. This closes that gap in the native fast path.Safety
The scoped set is a strict superset of the nodes the call-edge builder needs: every changed/reverse-dep file plus every file any of them imports. Cross-file call resolution still sees every target it could previously — the SQL filter only drops unrelated files, not reachable ones.
Test plan
cargo check -p codegraph-corepassescargo test --lib build_pipeline/cargo test --lib edge_builderpassmain— should be unchangedmain— should drop by the node-load portion (call-edge phase)