Skip to content

fix(wasm): isolate tree-sitter parsing in worker thread (#965)#975

Open
carlos-alm wants to merge 4 commits intomainfrom
fix/965-wasm-worker-isolation
Open

fix(wasm): isolate tree-sitter parsing in worker thread (#965)#975
carlos-alm wants to merge 4 commits intomainfrom
fix/965-wasm-worker-isolation

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Fixes #965.

Summary

  • WASM grammar parses now run in a worker_threads worker so a V8 fatal error kills only the worker, not the whole build.
  • The worker pool detects worker exit, marks the in-flight file as skipped with a warn, respawns the worker, and continues with the rest of the batch.
  • Per-file extractor exceptions are also caught and turned into skips (previously could leak out and abort the build).

Changes

  • src/domain/wasm-worker-entry.ts — worker entry that owns all tree-sitter WASM calls.
  • src/domain/wasm-worker-pool.ts — single-worker pool with crash/respawn + timeout handling.
  • src/domain/wasm-worker-protocol.ts — typed message protocol (serializable ExtractorOutput, no _tree field).
  • src/domain/parser.ts — routes WASM parses through the worker pool; preserves single-file + batch APIs.
  • tests/parsers/wasm-hardening.test.ts — coverage for the skip-and-continue behavior (single file + batch).

Why

The WASM engine reproducibly kills V8 on Windows + Node 22 when building the codegraph source repo (see #965). A fatal in the main thread leaves the user with zero progress. Isolating parses in a worker gives us graceful degradation: the file that triggers the crash is skipped with a warn, and every other file parses normally.

Test plan

The WASM grammar can trigger uncatchable V8 fatal errors that kill
whichever thread runs it. Running parses in a worker_threads worker
means the crash kills only the worker — the pool detects the exit,
marks the in-flight file as skipped, respawns the worker, and
continues with the rest of the build.

- wasm-worker-entry.ts: worker entry that owns all tree-sitter WASM
  calls; runs extractor + AST visitor walk before tree.delete() so
  the main thread never holds a live Tree
- wasm-worker-pool.ts: single-worker pool with crash-respawn logic
- wasm-worker-protocol.ts: typed message protocol with serializable
  ExtractorOutput (no _tree field)
- parser.ts: route WASM parses through the worker pool
- tests/parsers/wasm-hardening.test.ts: coverage for skip-and-continue
  behavior on extractor errors

docs check acknowledged: internal plumbing only — no new CLI surface,
no new language support, no architectural tier changes visible to users

Impact: 80 functions changed, 30 affected
@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Claude finished @carlos-alm's task —— View job


Reviewing PR: WASM Worker Isolation

Root Cause Analysis

Problem Being Solved: WASM grammar parses cause V8 fatal errors on Windows + Node 22, killing the entire build process and leaving users with zero progress.

Why This Approach: Worker thread isolation is the correct solution - contains fatal V8 crashes to the worker thread, allowing graceful degradation where the triggering file is skipped and the build continues for all other files.

Risk Assessment: Low-risk defensive enhancement. Preserves all existing functionality while adding crash resilience.

Backlog Compliance

  • Zero-dep: ✓ — Uses Node.js built-in worker_threads, no new runtime dependencies
  • Foundation-aligned: ✓ — Fully aligned, improves reliability without violating any core principles
  • Problem-fit: 5/5 — Directly addresses AI agent reliability by preventing build crashes that waste tokens and break workflows
  • Breaking: Minimal — disposeParsers() changed sync→async, but analysis shows no external usage
  • Tier: 1 (zero-dep ✓ + foundation-aligned ✓)

Code Quality Analysis

Strengths:

  • Excellent error handling: Comprehensive try/catch/finally blocks with proper WASM memory cleanup
  • Well-structured worker protocol: Clean separation between main thread and worker with typed message interfaces
  • Proper resource management: Worker pool disposal, WASM tree cleanup, crashed file tracking
  • Backward compatibility: Preserves all existing APIs except one internal function
  • Comprehensive testing: New tests cover both single-file and batch crash scenarios

Architecture Quality:

  • Single-worker pool: Appropriate initial implementation - sequential dispatch simplifies crash attribution
  • Crash isolation: onExit() handler properly detects crashes, marks problematic files, respawns worker
  • Memory safety: tree.delete() called in finally blocks to prevent WASM heap accumulation
  • Protocol design: Serializable ExtractorOutput without _tree field prevents worker boundary issues

Technical Implementation:

  • Worker entry resolution: Smart handling of both compiled .js and source .ts contexts
  • Local language registry: Avoids process-global cache conflicts in worker threads
  • Error propagation: Extractor exceptions properly caught and converted to skips (not crashes)
  • Performance preservation: Lazy grammar loading, parser caching, minimal overhead

Critical Concerns

None identified. This is a well-engineered defensive enhancement.

Final Recommendation

  • Rating: ⭐⭐⭐⭐☆ (4/5)
  • Action: APPROVE
  • Reasoning: High-quality implementation solving a real production issue. Excellent error handling, proper resource management, comprehensive testing, and maintains backward compatibility. Minor deduction only because it's a defensive fix rather than core functionality improvement.

This PR directly improves AI agent reliability by preventing build crashes that waste tokens and break coding workflows. The worker isolation approach is sound and well-implemented.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 20, 2026

Greptile Summary

This PR isolates all tree-sitter WASM calls in a worker_threads worker so that uncatchable V8 fatal errors (reproducible on Windows + Node 22) kill only the worker, not the build process. The pool detects worker exit, skips the in-flight file with a warning, respawns lazily, and continues; a 60s watchdog prevents hangs from non-crashing infinite loops. A CLI teardown hook (disposeParsers) was also added to prevent the process from hanging after the build completes.

Confidence Score: 4/5

Safe to merge; the single P2 race only manifests when a parse completes at almost exactly the 60-second deadline.

All previously raised concerns (dead workerReady field, missing timeout, stale shutdown branch, test mocking wrong registry) have been fully addressed. The remaining finding is a theoretical race in onTimeout → onExit where the wrong in-flight job could be skipped, but it requires a >60s parse that completes at exactly the timeout boundary — rare in practice and causing only a silent file skip, not data corruption.

src/domain/wasm-worker-pool.ts — the onTimeout/onExit interaction around line 188–269.

Important Files Changed

Filename Overview
src/domain/wasm-worker-pool.ts Core crash-isolation pool — timeout watchdog and respawn logic are correct for the common case, but a rare race between the 60s timer firing and the worker's already-queued response message could cause the wrong job to be skipped.
src/domain/wasm-worker-entry.ts Worker entry isolates all WASM/tree-sitter calls; test-crash hook is properly env-gated; shutdown branch removed cleanly per previous feedback.
src/domain/wasm-worker-protocol.ts Typed message protocol is clean; WorkerShutdownRequest removed; SerializedExtractorOutput correctly excludes _tree.
src/domain/parser.ts WASM path correctly delegates to the worker pool; disposeParsers calls disposeWasmWorkerPool; single-file and batch APIs preserved.
src/cli.ts Shutdown hook correctly calls disposeParsers on both success and error paths, preventing the event-loop hang caused by an un-terminated worker thread.
tests/parsers/wasm-hardening.test.ts Tests use the worker-side crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH + magic token) to simulate real V8 fatals; covers single crash, batch-with-poisoned-file, and repeated-crash respawn scenarios.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI
    participant P as parser.ts
    participant Pool as WasmWorkerPool
    participant W as Worker Thread

    CLI->>P: parseFilesAuto(files)
    P->>Pool: getWasmWorkerPool()
    loop per file
        P->>Pool: pool.parse(filePath, code, opts)
        Pool->>Pool: enqueue job and pump
        Pool->>W: postMessage WorkerParseRequest
        Pool->>Pool: arm 60s watchdog
        alt Normal parse
            W-->>Pool: postMessage WorkerParseResponseOk
            Pool->>Pool: clear watchdog, resolve result, pump next
        else Worker crashes with V8 fatal
            W--xPool: exit non-zero
            Pool->>Pool: onExit, warn, skip file, resolve null, respawn worker
        else Worker hangs past 60s
            Pool->>Pool: onTimeout, terminate worker
            W--xPool: exit zero
            Pool->>Pool: onExit, warn, skip file, resolve null, pump next
        end
    end
    P-->>CLI: Map of relPath to ExtractorOutput
    CLI->>P: disposeParsers
    P->>Pool: disposeWasmWorkerPool
    Pool->>W: worker.terminate
Loading

Fix All in Claude Code

Reviews (3): Last reviewed commit: "fix(wasm): dispose parser pool on CLI sh..." | Re-trigger Greptile

Comment thread tests/parsers/wasm-hardening.test.ts Outdated
Comment on lines +34 to +47
jsEntry.extractor = () => {
throw new Error('simulated extractor failure');
};

const symbols = await parseFileAuto('boom.js', 'function hello() {}', { engine: 'wasm' });

expect(symbols).toBeNull();

const warnings = stderrSpy.mock.calls
.map((args) => String(args[0]))
.filter((s) => s.includes('[codegraph WARN]'));
expect(warnings.some((w) => w.includes('boom.js'))).toBe(true);
expect(warnings.some((w) => w.includes('simulated extractor failure'))).toBe(true);
expect(warnings.some((w) => /skipping/i.test(w))).toBe(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Test mocks wrong registry — worker path is never exercised

The tests mutate jsEntry.extractor on the LANGUAGE_REGISTRY exported from parser.ts, but after this PR the WASM parse path goes through the worker. The worker (wasm-worker-entry.ts) builds its own independent LANGUAGE_REGISTRY and its own _extToLang map at module load time — they are completely separate objects in a different V8 isolate. Changing the extractor in the main thread has zero effect on the worker.

As a result, all three tests in this file will fail: parseFileAuto('boom.js', ...) will return a non-null result (the worker parses successfully with the real extractor), causing the expect(symbols).toBeNull() assertion to fail, and no warning containing 'simulated extractor failure' will be emitted. The tests do not verify the skip-and-continue behaviour they are titled for.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Rewrote tests/parsers/wasm-hardening.test.ts to stop mocking the main-thread extractor (which the worker never sees — it has its own registry in a separate V8 isolate). The new tests use a gated worker-side crash hook (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in the source text) that calls process.exit(1) mid-parse, so we verify the actual skip-and-continue behavior: single-file crash + warn, batch with one poisoned file continuing through the rest, and repeated crashes with worker respawn.

Comment thread src/domain/wasm-worker-pool.ts Outdated

export class WasmWorkerPool {
private worker: Worker | null = null;
private workerReady: Promise<void> | null = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Dead workerReady field

workerReady is declared and reset to null in onExit (line 170) but is never assigned to an actual Promise and is never awaited or read anywhere. It appears to be scaffolding left over from an earlier design. This is dead code that adds confusion about whether there is an intended start-up gate for the worker.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the dead workerReady field and the corresponding no-op reset in onExit.

Comment on lines +117 to +132
private pump(): void {
if (this.disposed) return;
if (this.inFlight) return;
const next = this.queue.shift();
if (!next) return;
this.inFlight = next;
const worker = this.ensureWorker();
const req: WorkerRequest = {
type: 'parse',
id: next.id,
filePath: next.filePath,
code: next.code,
opts: next.opts,
};
worker.postMessage(req);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No timeout for a hung worker

The PR description mentions "crash/respawn + timeout handling," but no timeout is implemented. pump() posts a message to the worker and sets this.inFlight, but nothing ever fires if the worker hangs (e.g., WASM enters an infinite loop instead of crashing). In that case the PendingJob.resolve is never called and the parse() Promise hangs forever, stalling the entire build — a worse outcome than the crash the PR is designed to prevent.

Consider adding a per-job timer (e.g. setTimeout) after posting that calls worker.terminate() and resolves the promise with null if the deadline is exceeded.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added a per-job watchdog in WasmWorkerPool — each parse arms a 60s setTimeout before postMessage; if it fires, we call worker.terminate(), blacklist the file in crashedFiles, and onExit then resolves the in-flight job with null and respawns. 60s sits comfortably above the worst-case real-parse time seen in CI (~12s) while guaranteeing a build can no longer hang forever on a stuck WASM grammar.

Comment thread src/domain/wasm-worker-pool.ts Outdated
Comment on lines +167 to +170
private onExit(code: number): void {
const crashed = this.inFlight;
this.worker = null;
this.workerReady = null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 workerReady reset is a no-op

this.workerReady = null here resets a field that was never set to anything other than null. Once the workerReady dead field is removed, this line can be removed too.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — removed alongside the workerReady field.

Comment thread src/domain/wasm-worker-entry.ts Outdated
Comment on lines +756 to +758
if (msg.type === 'shutdown') {
process.exit(0);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 shutdown message handler is unreachable from the current pool

The worker handles a 'shutdown' message type by calling process.exit(0), but WasmWorkerPool.dispose() never posts this message — it calls worker.terminate() directly. The shutdown branch in the worker is dead code. This is harmless but could mislead future readers into thinking there is a graceful-shutdown handshake.

Fix in Claude Code

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the unreachable shutdown branch from the worker's message handler and dropped the unused WorkerShutdownRequest from the protocol — the pool disposes workers via Worker.terminate() directly, so there's no graceful-shutdown handshake.

…solation

Impact: 3 functions changed, 8 affected
The worker pool previously tried to load the worker entry from src/*.ts
under Node's strip-types runtime, but Node's worker_threads loader does
not rewrite .js import specifiers to .ts — so the worker exited with
"module not found" for every file, breaking all WASM engine tests and
the engine-parity CI jobs.

Changes:

- wasm-worker-pool.ts: resolveWorkerEntry now always points at the
  compiled .js. If the sibling .js exists (dist build) use it; otherwise
  walk up to dist/domain/wasm-worker-entry.js (tests/dev from src/).
  Throws a clear error instead of silently crashing the worker.

- wasm-worker-pool.ts: add per-job timeout (60s) so a hung WASM grammar
  (infinite loop instead of crash) can no longer stall the whole build;
  remove dead `workerReady` field and its no-op reset in onExit.

- wasm-worker-entry.ts: add a gated test-crash hook
  (CODEGRAPH_WASM_WORKER_TEST_CRASH=1 + magic token in code) so we can
  unit-test pool crash recovery without a real V8 abort. Remove dead
  `shutdown` message handler — the pool uses Worker.terminate() directly.

- wasm-worker-protocol.ts: drop unused WorkerShutdownRequest.

- tests/parsers/wasm-hardening.test.ts: rewrite. The old tests mutated
  jsEntry.extractor on the main-thread registry, but the worker has its
  own independent registry in a separate V8 isolate, so the mock was
  never exercised. New tests use the worker-side crash hook to verify
  single-file skip+warn, batch skip-and-continue, and repeated-crash
  recovery with worker respawn.

Impact: 10 functions changed, 10 affected
@carlos-alm
Copy link
Copy Markdown
Contributor Author

@greptileai

@github-actions
Copy link
Copy Markdown
Contributor

Codegraph Impact Analysis

82 functions changed33 callers affected across 11 files

  • shutdown in src/cli.ts:18 (1 transitive callers)
  • disposeParsers in src/domain/parser.ts:283 (2 transitive callers)
  • ensureWasmTrees in src/domain/parser.ts:320 (3 transitive callers)
  • mergeAnalysisData in src/domain/parser.ts:354 (3 transitive callers)
  • backfillTypeMap in src/domain/parser.ts:785 (5 transitive callers)
  • parseFileAuto in src/domain/parser.ts:848 (4 transitive callers)
  • backfillTypeMapBatch in src/domain/parser.ts:878 (9 transitive callers)
  • parseFilesWasm in src/domain/parser.ts:917 (9 transitive callers)
  • grammarPath in src/domain/wasm-worker-entry.ts:102 (3 transitive callers)
  • initRuntime in src/domain/wasm-worker-entry.ts:399 (2 transitive callers)
  • loadLanguageLazy in src/domain/wasm-worker-entry.ts:410 (2 transitive callers)
  • ComplexityFuncResult.funcNode in src/domain/wasm-worker-entry.ts:440 (0 transitive callers)
  • ComplexityFuncResult.funcName in src/domain/wasm-worker-entry.ts:441 (0 transitive callers)
  • ComplexityFuncResult.metrics in src/domain/wasm-worker-entry.ts:442 (0 transitive callers)
  • CfgFuncResult.funcNode in src/domain/wasm-worker-entry.ts:451 (0 transitive callers)
  • CfgFuncResult.blocks in src/domain/wasm-worker-entry.ts:452 (0 transitive callers)
  • CfgFuncResult.edges in src/domain/wasm-worker-entry.ts:453 (0 transitive callers)
  • CfgFuncResult.cyclomatic in src/domain/wasm-worker-entry.ts:454 (0 transitive callers)
  • hasFuncBody in src/domain/wasm-worker-entry.ts:459 (3 transitive callers)
  • indexByLine in src/domain/wasm-worker-entry.ts:474 (4 transitive callers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: WASM engine crashes V8 reproducibly on Windows + Node 22 when building codegraph source

1 participant