perf(native): scope node loading in call-edge builder for incremental builds by carlos-alm · Pull Request #976 · optave/ops-codegraph-tool

carlos-alm · 2026-04-20T06:58:57Z

Summary

build_and_insert_call_edges loads every node in the graph on every invocation, including incremental builds where only a handful of files changed. On the codegraph repo (~13k nodes), that deserialization alone dominates the work the builder actually does.
This PR scopes node loading on incremental builds to files being processed + their resolved import targets. The import context already knows the resolved paths, so we stage the set in a temp table and INNER JOIN against nodes.
Full builds are unchanged — there is no smaller set to work with. Very large incremental builds (>200 changed files) also fall through to the full-load path to avoid temp-table overhead outweighing the savings.

Why

On incremental builds, the JS side already has scoped node loading in build-edges.ts (the relevantFiles pattern with addLazyFallback), but the Rust orchestrator still loads all nodes here. This closes that gap in the native fast path.

Safety

The scoped set is a strict superset of the nodes the call-edge builder needs: every changed/reverse-dep file plus every file any of them imports. Cross-file call resolution still sees every target it could previously — the SQL filter only drops unrelated files, not reachable ones.

Test plan

cargo check -p codegraph-core passes
cargo test --lib build_pipeline / cargo test --lib edge_builder pass
Benchmark full build vs current main — should be unchanged
Benchmark 1-file incremental vs current main — should drop by the node-load portion (call-edge phase)

… builds On incremental builds, `build_and_insert_call_edges` was loading every node in the graph (~13k rows in codegraph itself) just to resolve a handful of changed files' call sites. The deserialization cost dominates the actual edge-building work for small edits. Scope the SQL to only the files being processed plus their resolved import targets. The import context already knows the resolved target paths, so we stage them in a temp table and `INNER JOIN` against the nodes table. Full builds still load every node (there is no smaller set to work with). Guarded on `is_incremental && file_symbols.len() < 200` so the scoped path only kicks in when it actually wins — very large incremental builds fall through to the full-load path to avoid temp-table overhead. No behavior change: the scoped set is a strict superset of the nodes the call-edge builder needs (changed files + their import targets). Cross-file call resolution still sees every target it could previously.

claude · 2026-04-20T06:59:13Z

Claude finished @carlos-alm's task —— View job

PR Review Analysis

Todo List

Phase 1: Root Cause Analysis - understand the performance problem
Phase 2: Configuration Change Detection - check for config modifications
Phase 3: Test Integrity - verify test completeness
Phase 4: Code Quality - review the Rust implementation
Phase 5: Backlog Compliance - evaluate against project criteria
Final Assessment - structured recommendation

Root Cause Analysis

Problem Being Solved: build_and_insert_call_edges loads every node in the graph (~13k nodes) on every invocation, including incremental builds where only a handful of files changed. Node deserialization dominates the actual work on small incremental builds.

Why This Approach: The optimization scopes node loading on incremental builds to only files being processed + their resolved import targets. The import context already has resolved paths, so the solution stages them in a temp table and uses INNER JOIN for efficient filtering. This mirrors the existing JS-side optimization (relevantFiles pattern) but closes the gap in the Rust native fast path.

Risk Assessment: Low risk - the scoped set is a strict superset of nodes the call-edge builder needs. Full builds and large incremental builds (>200 files) fall through to existing full-load behavior.

Backlog Compliance

Zero-dep: ✓ - No new runtime dependencies, builds on existing SQLite/Rust infrastructure
Foundation-aligned: ✓ - Pure performance optimization maintaining local-first architecture
Problem-fit: 4/5 - Directly reduces wasteful work that impacts AI agent performance during incremental analysis
Breaking: No - Internal optimization with no API or behavior changes
Tier: 1

Critical Concerns

Minor Issue: Line 862 uses hardcoded 200 threshold instead of FAST_PATH_MAX_CHANGED_FILES constant for consistency.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE WITH CONDITIONS
Reasoning: Excellent performance optimization targeting a real bottleneck. Well-implemented with proper fallbacks and safety. Only minor concern is hardcoded threshold constant. The optimization directly benefits AI agent workflows by reducing unnecessary node loading overhead during incremental builds.

Condition: Consider using FAST_PATH_MAX_CHANGED_FILES constant instead of hardcoded 200 on line 862.

greptile-apps · 2026-04-20T07:03:52Z

Greptile Summary

This PR adds an incremental fast path to build_and_insert_call_edges that scopes node loading to the changed files plus their resolved import targets (including barrel chain resolution), instead of loading every node on every incremental run. The gate mirrors the existing JS build-edges.ts logic and is bounded to ≤ 5 changed files on codebases with > 20 file nodes. All three issues from the previous review round — barrel-ultimate-file inclusion, temp. schema qualification, and the missing file-column index — are addressed in this revision.

Confidence Score: 5/5

Safe to merge — all prior P1 concerns are resolved and only two minor P2 style nits remain

The barrel-chain fix, schema qualification, and index addition correctly address the three previous blocking comments. The scoped path is a strict superset of the required node set by construction. Remaining findings are cosmetic and do not affect correctness or data integrity.

No files require special attention

Important Files Changed

Filename	Overview
crates/codegraph-core/src/build_pipeline.rs	Scopes node loading in `build_and_insert_call_edges` to changed files + resolved import targets on small incremental builds; barrel chain resolution, temp-table schema qualification, and file-column index are all correctly implemented

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_and_insert_call_edges] --> B{is_incremental?}
    B -- No --> FULL[Full load: SELECT all callable nodes]
    B -- Yes --> C{file_symbols.len <= 5\nAND existing_files > 20?}
    C -- No --> FULL
    C -- Yes --> D[Build relevant_files\nfrom file_symbols keys]
    D --> E[For each changed file's imports:\nresolve target path]
    E --> F{Target is barrel file?}
    F -- Yes --> G[resolve_barrel_export\nfor each imported name\nadd ultimate def file]
    F -- No --> H[Add resolved target to relevant_files]
    G --> H
    H --> I[Add all barrel_only_files\nto relevant_files]
    I --> J[Stage relevant_files into\ntemp._edge_files]
    J --> K[INNER JOIN nodes ON _edge_files.file\nwhere kind in callable types]
    K --> L[DROP temp._edge_files]
    L --> M[all_nodes = scoped result]
    FULL --> M
    M --> N[build_call_edges\nfile_entries + all_nodes]
    N --> O[Insert computed edges into DB]

_{Reviews (2): Last reviewed commit: "fix(native): include barrel-ultimate tar..." | Re-trigger Greptile}

greptile-apps · 2026-04-20T07:03:55Z

+        let mut relevant_files: HashSet<String> = file_symbols.keys().cloned().collect();
+        for (rel_path, symbols) in file_symbols {
+            let abs_file = Path::new(&import_ctx.root_dir).join(rel_path);
+            let abs_str = abs_file.to_str().unwrap_or("");
+            for imp in &symbols.imports {
+                let resolved = import_ctx.get_resolved(abs_str, &imp.source);
+                if !resolved.is_empty() {
+                    relevant_files.insert(resolved);
+                }
+            }


Barrel-resolved targets missing from relevant_files

The scoped set only adds direct import targets, but the FileEdgeInput construction (lines 997–1004) resolves barrel re-exports and sets imported_names[].file to the ultimate source file (2+ hops away). When the edge builder then queries nodes_by_name_and_file for (name, ultimate_source_file), that file's nodes are absent from all_nodes, so the import-aware lookup returns nothing and the call edge is silently dropped — or falls through to an unscoped name-only lookup that may pick the wrong target.

Concretely: if file A (changed) imports { foo } from @/barrel (file B), and the barrel re-exports foo from file C, then relevant_files contains A and B but not C. File C's nodes are never loaded, so the A→C call edge is lost on every incremental run that touches A.

Fix: after the barrel-resolution pass, collect all target_file values from imported_names and add them to relevant_files before staging the temp table, or traverse the barrel chain during the scoping step.

Fixed in b1bdee9. The scoping loop now walks the barrel chain: when a direct import target is a barrel file, we call resolve_barrel_export for each imported name and add the ultimate definition file to relevant_files. Tested via the existing barrel_resolution_simple / barrel_chain_two_levels unit tests.

greptile-apps · 2026-04-20T07:03:56Z

+            let _ = conn.execute_batch(
+                "CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)",
+            );
+            let _ = conn.execute("DELETE FROM _edge_files", []);
+            {
+                let mut ins = match conn.prepare("INSERT INTO _edge_files (file) VALUES (?)") {
+                    Ok(s) => s,
+                    Err(_) => return,
+                };
+                for f in &relevant_files {
+                    let _ = ins.execute([f]);
+                }
+            }
+
+            let sql = format!(
+                "SELECT n.id, n.name, n.kind, n.file, n.line FROM nodes n \
+                 INNER JOIN _edge_files ef ON n.file = ef.file \
+                 WHERE n.{node_kind_filter}",
+            );
+            let nodes: Vec<NodeInfo> = match conn.prepare(&sql) {
+                Ok(mut stmt) => stmt
+                    .query_map([], |row| {
+                        Ok(NodeInfo {
+                            id: row.get::<_, i64>(0)? as u32,
+                            name: row.get(1)?,
+                            kind: row.get(2)?,
+                            file: row.get(3)?,
+                            line: row.get::<_, i64>(4)? as u32,
+                        })
+                    })
+                    .map(|rows| rows.filter_map(|r| r.ok()).collect())
+                    .unwrap_or_default(),
+                Err(_) => Vec::new(),
+            };
+            let _ = conn.execute_batch("DROP TABLE IF EXISTS _edge_files");


Inconsistent temp-table schema qualification

The existing _analysis_files pattern explicitly qualifies every statement with temp._analysis_files (lines 1096, 1106, 1127). The new _edge_files uses unqualified names everywhere. SQLite resolves unqualified names to the temp schema first, so this is functionally correct today, but consistency with the established pattern would reduce future confusion.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fixed in b1bdee9. Aligned with the _analysis_files pattern: unqualified CREATE TEMP TABLE (temp schema is the default for TEMP tables), then temp._edge_files on every subsequent op (DELETE, INSERT, SELECT JOIN, DROP). Parameter style is now ?1 + rusqlite::params![...] to match.

greptile-apps · 2026-04-20T07:03:57Z

+            Vec::new()
+        } else {
+            let _ = conn.execute_batch(
+                "CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)",


No index on _edge_files.file

The INNER JOIN scans _edge_files once per row in nodes. For fewer than 200 files this is negligible, but adding a covering index takes one line and removes any future scaling concern.

Fixed in b1bdee9. Added CREATE INDEX IF NOT EXISTS _edge_files_file_idx ON _edge_files (file) in the same execute_batch as the CREATE TABLE, so the INNER JOIN against nodes.file becomes an index lookup instead of a linear scan.

github-actions · 2026-04-20T07:05:59Z

Codegraph Impact Analysis

2 functions changed → 1 callers affected across 1 files

run_pipeline in crates/codegraph-core/src/build_pipeline.rs:102 (0 transitive callers)
build_and_insert_call_edges in crates/codegraph-core/src/build_pipeline.rs:857 (1 transitive callers)

Mirror the JS `loadNodes` gate in `src/domain/graph/builder/stages/build-edges.ts`: only scope call-edge node loading when the codebase is large enough (existing file-node count > 20) and the incremental change set is small (<= 5 files). Tiny fixtures skip the scoped path entirely — the savings are negligible at that scale and the scoped set can miss transitively- required nodes needed for receiver-type resolution, breaking the incremental-edge-parity tests (e.g. `main -> Calculator.compute`). Also include `barrel_only_files` in the scoped set to match JS exactly. Fixes the "new export added" parity regression introduced by #976. Impact: 1 functions changed, 1 affected

Addresses review feedback on #976: - greptile P1: relevant_files only included direct import targets. Barrel re-exports resolve imported_names[].file to the ultimate definition file 2+ hops away, so that file's nodes were absent from all_nodes and the A->C call edge was silently dropped. Now we walk the barrel chain during scoping and add every resolved ultimate file. - greptile P2: align temp-table usage with the existing _analysis_files pattern (qualified temp. on all non-CREATE ops) and add a covering index on _edge_files.file so the INNER JOIN is a lookup. Impact: 1 functions changed, 1 affected

carlos-alm · 2026-04-20T07:48:19Z

@greptileai addressed all three findings in b1bdee9 (barrel ultimate-target scoping + temp schema qualification + _edge_files index) and 5cadf97 (gate scoped path on FAST_PATH_MAX_CHANGED_FILES + FAST_PATH_MIN_EXISTING_FILES to match the JS loadNodes gates). Please re-review.

greptile-apps bot reviewed Apr 20, 2026

View reviewed changes

carlos-alm added 2 commits April 20, 2026 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(native): scope node loading in call-edge builder for incremental builds#976

perf(native): scope node loading in call-edge builder for incremental builds#976
carlos-alm wants to merge 3 commits intomainfrom
perf/rust-scoped-node-loading

carlos-alm commented Apr 20, 2026

Uh oh!

claude bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

greptile-apps bot Apr 20, 2026

Uh oh!

carlos-alm Apr 20, 2026

Uh oh!

greptile-apps bot Apr 20, 2026

Uh oh!

carlos-alm Apr 20, 2026

Uh oh!

greptile-apps bot Apr 20, 2026

Uh oh!

carlos-alm Apr 20, 2026

Uh oh!

github-actions bot commented Apr 20, 2026 •

edited

Loading

Uh oh!

carlos-alm commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Apr 20, 2026

Summary

Why

Safety

Test plan

Uh oh!

claude bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Analysis

Todo List

Root Cause Analysis

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

carlos-alm Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codegraph Impact Analysis

Uh oh!

carlos-alm commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 20, 2026 •

edited

Loading

greptile-apps bot commented Apr 20, 2026 •

edited

Loading

github-actions bot commented Apr 20, 2026 •

edited

Loading