Skip to content

perf(native): scope node loading in call-edge builder for incremental builds#976

Open
carlos-alm wants to merge 3 commits intomainfrom
perf/rust-scoped-node-loading
Open

perf(native): scope node loading in call-edge builder for incremental builds#976
carlos-alm wants to merge 3 commits intomainfrom
perf/rust-scoped-node-loading

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • build_and_insert_call_edges loads every node in the graph on every invocation, including incremental builds where only a handful of files changed. On the codegraph repo (~13k nodes), that deserialization alone dominates the work the builder actually does.
  • This PR scopes node loading on incremental builds to files being processed + their resolved import targets. The import context already knows the resolved paths, so we stage the set in a temp table and INNER JOIN against nodes.
  • Full builds are unchanged — there is no smaller set to work with. Very large incremental builds (>200 changed files) also fall through to the full-load path to avoid temp-table overhead outweighing the savings.

Why

On incremental builds, the JS side already has scoped node loading in build-edges.ts (the relevantFiles pattern with addLazyFallback), but the Rust orchestrator still loads all nodes here. This closes that gap in the native fast path.

Safety

The scoped set is a strict superset of the nodes the call-edge builder needs: every changed/reverse-dep file plus every file any of them imports. Cross-file call resolution still sees every target it could previously — the SQL filter only drops unrelated files, not reachable ones.

Test plan

  • cargo check -p codegraph-core passes
  • cargo test --lib build_pipeline / cargo test --lib edge_builder pass
  • Benchmark full build vs current main — should be unchanged
  • Benchmark 1-file incremental vs current main — should drop by the node-load portion (call-edge phase)

… builds

On incremental builds, `build_and_insert_call_edges` was loading every
node in the graph (~13k rows in codegraph itself) just to resolve a
handful of changed files' call sites. The deserialization cost dominates
the actual edge-building work for small edits.

Scope the SQL to only the files being processed plus their resolved
import targets. The import context already knows the resolved target
paths, so we stage them in a temp table and `INNER JOIN` against the
nodes table. Full builds still load every node (there is no smaller
set to work with).

Guarded on `is_incremental && file_symbols.len() < 200` so the scoped
path only kicks in when it actually wins — very large incremental
builds fall through to the full-load path to avoid temp-table overhead.

No behavior change: the scoped set is a strict superset of the nodes
the call-edge builder needs (changed files + their import targets).
Cross-file call resolution still sees every target it could previously.
@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Claude finished @carlos-alm's task —— View job


PR Review Analysis

Todo List

  • Phase 1: Root Cause Analysis - understand the performance problem
  • Phase 2: Configuration Change Detection - check for config modifications
  • Phase 3: Test Integrity - verify test completeness
  • Phase 4: Code Quality - review the Rust implementation
  • Phase 5: Backlog Compliance - evaluate against project criteria
  • Final Assessment - structured recommendation

Root Cause Analysis

Problem Being Solved: build_and_insert_call_edges loads every node in the graph (~13k nodes) on every invocation, including incremental builds where only a handful of files changed. Node deserialization dominates the actual work on small incremental builds.

Why This Approach: The optimization scopes node loading on incremental builds to only files being processed + their resolved import targets. The import context already has resolved paths, so the solution stages them in a temp table and uses INNER JOIN for efficient filtering. This mirrors the existing JS-side optimization (relevantFiles pattern) but closes the gap in the Rust native fast path.

Risk Assessment: Low risk - the scoped set is a strict superset of nodes the call-edge builder needs. Full builds and large incremental builds (>200 files) fall through to existing full-load behavior.

Backlog Compliance

  • Zero-dep: ✓ - No new runtime dependencies, builds on existing SQLite/Rust infrastructure
  • Foundation-aligned: ✓ - Pure performance optimization maintaining local-first architecture
  • Problem-fit: 4/5 - Directly reduces wasteful work that impacts AI agent performance during incremental analysis
  • Breaking: No - Internal optimization with no API or behavior changes
  • Tier: 1

Critical Concerns

Minor Issue: Line 862 uses hardcoded 200 threshold instead of FAST_PATH_MAX_CHANGED_FILES constant for consistency.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE WITH CONDITIONS
  • Reasoning: Excellent performance optimization targeting a real bottleneck. Well-implemented with proper fallbacks and safety. Only minor concern is hardcoded threshold constant. The optimization directly benefits AI agent workflows by reducing unnecessary node loading overhead during incremental builds.

Condition: Consider using FAST_PATH_MAX_CHANGED_FILES constant instead of hardcoded 200 on line 862.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 20, 2026

Greptile Summary

This PR adds an incremental fast path to build_and_insert_call_edges that scopes node loading to the changed files plus their resolved import targets (including barrel chain resolution), instead of loading every node on every incremental run. The gate mirrors the existing JS build-edges.ts logic and is bounded to ≤ 5 changed files on codebases with > 20 file nodes. All three issues from the previous review round — barrel-ultimate-file inclusion, temp. schema qualification, and the missing file-column index — are addressed in this revision.

Confidence Score: 5/5

Safe to merge — all prior P1 concerns are resolved and only two minor P2 style nits remain

The barrel-chain fix, schema qualification, and index addition correctly address the three previous blocking comments. The scoped path is a strict superset of the required node set by construction. Remaining findings are cosmetic and do not affect correctness or data integrity.

No files require special attention

Important Files Changed

Filename Overview
crates/codegraph-core/src/build_pipeline.rs Scopes node loading in build_and_insert_call_edges to changed files + resolved import targets on small incremental builds; barrel chain resolution, temp-table schema qualification, and file-column index are all correctly implemented

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[build_and_insert_call_edges] --> B{is_incremental?}
    B -- No --> FULL[Full load: SELECT all callable nodes]
    B -- Yes --> C{file_symbols.len <= 5\nAND existing_files > 20?}
    C -- No --> FULL
    C -- Yes --> D[Build relevant_files\nfrom file_symbols keys]
    D --> E[For each changed file's imports:\nresolve target path]
    E --> F{Target is barrel file?}
    F -- Yes --> G[resolve_barrel_export\nfor each imported name\nadd ultimate def file]
    F -- No --> H[Add resolved target to relevant_files]
    G --> H
    H --> I[Add all barrel_only_files\nto relevant_files]
    I --> J[Stage relevant_files into\ntemp._edge_files]
    J --> K[INNER JOIN nodes ON _edge_files.file\nwhere kind in callable types]
    K --> L[DROP temp._edge_files]
    L --> M[all_nodes = scoped result]
    FULL --> M
    M --> N[build_call_edges\nfile_entries + all_nodes]
    N --> O[Insert computed edges into DB]
Loading

Fix All in Claude Code

Reviews (2): Last reviewed commit: "fix(native): include barrel-ultimate tar..." | Re-trigger Greptile

Comment on lines +863 to +872
let mut relevant_files: HashSet<String> = file_symbols.keys().cloned().collect();
for (rel_path, symbols) in file_symbols {
let abs_file = Path::new(&import_ctx.root_dir).join(rel_path);
let abs_str = abs_file.to_str().unwrap_or("");
for imp in &symbols.imports {
let resolved = import_ctx.get_resolved(abs_str, &imp.source);
if !resolved.is_empty() {
relevant_files.insert(resolved);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Barrel-resolved targets missing from relevant_files

The scoped set only adds direct import targets, but the FileEdgeInput construction (lines 997–1004) resolves barrel re-exports and sets imported_names[].file to the ultimate source file (2+ hops away). When the edge builder then queries nodes_by_name_and_file for (name, ultimate_source_file), that file's nodes are absent from all_nodes, so the import-aware lookup returns nothing and the call edge is silently dropped — or falls through to an unscoped name-only lookup that may pick the wrong target.

Concretely: if file A (changed) imports { foo } from @/barrel (file B), and the barrel re-exports foo from file C, then relevant_files contains A and B but not C. File C's nodes are never loaded, so the A→C call edge is lost on every incremental run that touches A.

Fix: after the barrel-resolution pass, collect all target_file values from imported_names and add them to relevant_files before staging the temp table, or traverse the barrel chain during the scoping step.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b1bdee9. The scoping loop now walks the barrel chain: when a direct import target is a barrel file, we call resolve_barrel_export for each imported name and add the ultimate definition file to relevant_files. Tested via the existing barrel_resolution_simple / barrel_chain_two_levels unit tests.

Comment on lines +878 to +912
let _ = conn.execute_batch(
"CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)",
);
let _ = conn.execute("DELETE FROM _edge_files", []);
{
let mut ins = match conn.prepare("INSERT INTO _edge_files (file) VALUES (?)") {
Ok(s) => s,
Err(_) => return,
};
for f in &relevant_files {
let _ = ins.execute([f]);
}
}

let sql = format!(
"SELECT n.id, n.name, n.kind, n.file, n.line FROM nodes n \
INNER JOIN _edge_files ef ON n.file = ef.file \
WHERE n.{node_kind_filter}",
);
let nodes: Vec<NodeInfo> = match conn.prepare(&sql) {
Ok(mut stmt) => stmt
.query_map([], |row| {
Ok(NodeInfo {
id: row.get::<_, i64>(0)? as u32,
name: row.get(1)?,
kind: row.get(2)?,
file: row.get(3)?,
line: row.get::<_, i64>(4)? as u32,
})
})
.map(|rows| rows.filter_map(|r| r.ok()).collect())
.unwrap_or_default(),
Err(_) => Vec::new(),
};
let _ = conn.execute_batch("DROP TABLE IF EXISTS _edge_files");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent temp-table schema qualification

The existing _analysis_files pattern explicitly qualifies every statement with temp._analysis_files (lines 1096, 1106, 1127). The new _edge_files uses unqualified names everywhere. SQLite resolves unqualified names to the temp schema first, so this is functionally correct today, but consistency with the established pattern would reduce future confusion.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b1bdee9. Aligned with the _analysis_files pattern: unqualified CREATE TEMP TABLE (temp schema is the default for TEMP tables), then temp._edge_files on every subsequent op (DELETE, INSERT, SELECT JOIN, DROP). Parameter style is now ?1 + rusqlite::params![...] to match.

Vec::new()
} else {
let _ = conn.execute_batch(
"CREATE TEMP TABLE IF NOT EXISTS _edge_files (file TEXT NOT NULL)",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No index on _edge_files.file

The INNER JOIN scans _edge_files once per row in nodes. For fewer than 200 files this is negligible, but adding a covering index takes one line and removes any future scaling concern.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b1bdee9. Added CREATE INDEX IF NOT EXISTS _edge_files_file_idx ON _edge_files (file) in the same execute_batch as the CREATE TABLE, so the INNER JOIN against nodes.file becomes an index lookup instead of a linear scan.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 20, 2026

Codegraph Impact Analysis

2 functions changed1 callers affected across 1 files

  • run_pipeline in crates/codegraph-core/src/build_pipeline.rs:102 (0 transitive callers)
  • build_and_insert_call_edges in crates/codegraph-core/src/build_pipeline.rs:857 (1 transitive callers)

Mirror the JS `loadNodes` gate in `src/domain/graph/builder/stages/build-edges.ts`:
only scope call-edge node loading when the codebase is large enough
(existing file-node count > 20) and the incremental change set is small
(<= 5 files). Tiny fixtures skip the scoped path entirely — the savings
are negligible at that scale and the scoped set can miss transitively-
required nodes needed for receiver-type resolution, breaking the
incremental-edge-parity tests (e.g. `main -> Calculator.compute`).

Also include `barrel_only_files` in the scoped set to match JS exactly.

Fixes the "new export added" parity regression introduced by #976.

Impact: 1 functions changed, 1 affected
Addresses review feedback on #976:

- greptile P1: relevant_files only included direct import targets. Barrel
  re-exports resolve imported_names[].file to the ultimate definition
  file 2+ hops away, so that file's nodes were absent from all_nodes and
  the A->C call edge was silently dropped. Now we walk the barrel chain
  during scoping and add every resolved ultimate file.
- greptile P2: align temp-table usage with the existing _analysis_files
  pattern (qualified temp. on all non-CREATE ops) and add a covering
  index on _edge_files.file so the INNER JOIN is a lookup.

Impact: 1 functions changed, 1 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai addressed all three findings in b1bdee9 (barrel ultimate-target scoping + temp schema qualification + _edge_files index) and 5cadf97 (gate scoped path on FAST_PATH_MAX_CHANGED_FILES + FAST_PATH_MIN_EXISTING_FILES to match the JS loadNodes gates). Please re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant