feat(lang): add Rust and TypeScript analyzer support#1
Open
donnfelker wants to merge 38 commits intomainfrom
Open
feat(lang): add Rust and TypeScript analyzer support#1donnfelker wants to merge 38 commits intomainfrom
donnfelker wants to merge 38 commits intomainfrom
Conversation
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce internal/lang/ with the per-language interfaces (FileFilter, FunctionExtractor, ComplexityCalculator, ComplexityScorer, ImportResolver, MutantGenerator, MutantApplier, AnnotationScanner, TestRunner) plus the top-level Language interface and the shared data types the analyzers and language back-ends pass to each other (FunctionInfo, FunctionSize, FileSize, FunctionComplexity, MutantSite, TestRunConfig). Also adds the process-wide registry (Register/Get/All with deterministic sorted ordering) and a manifest-file / custom-detector-based auto-detection hook (Detect) that will be used by the CLI in Part B. No behavior change yet — the analyzers still use their hardcoded Go AST paths. Part A2 extracts those paths into an internal/lang/goanalyzer/ package that implements these interfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implement all 9 lang.* sub-interfaces in internal/lang/goanalyzer/, with
one file per concern (parse, complexity, sizes, deps,
mutation_generate, mutation_apply, mutation_annotate, testrunner,
goanalyzer). The three duplicated funcName helpers from sizes.go,
complexity.go, and churn.go collapse into the single definition in
parse.go.
goanalyzer.init() calls lang.Register(&Language{}) and
lang.RegisterManifest("go.mod", "go") so Go is auto-detected by
manifest and ready to serve once the package is imported. The old
analyzer packages still hold the orchestration code in this commit —
A3 parameterizes diff, A4 routes the analyzers through the interfaces,
and the old embedded AST paths get deleted there.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hardcoded isAnalyzableGoFile, "*.go" diff glob, and the
"+++" path suffix checks in internal/diff/diff.go with a caller-supplied
diff.Filter. Parse() and CollectPaths() now take the filter as an
explicit parameter; parseHunkHeader and parseUnifiedDiff are unchanged
at the shape level but thread the filter through.
cmd/diffguard/main.go looks up the Go language through the registry
(lang.Get("go")) and converts its lang.FileFilter into the narrower
diff.Filter shape via a small diffFilter() helper, so the diff package
doesn't have to import lang (which would create an import cycle once
analyzers plug through interfaces in A4).
Blank-import _ "internal/lang/goanalyzer" lands in this commit so the
init() registration fires; Part A4 deletes the old embedded AST code in
the analyzer packages and routes through the interfaces instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each analyzer package now takes the relevant lang.* interface as a parameter and orchestrates it; the embedded Go-AST walks are gone. - internal/complexity: take lang.ComplexityCalculator, delete the AST walker (moved to goanalyzer/complexity.go; walker unit tests moved to goanalyzer/complexity_walker_test.go). - internal/sizes: take lang.FunctionExtractor, delete analyzeFile / collectFunctionSizes / funcName. - internal/churn: take lang.ComplexityScorer, delete the simplified duplicate computeComplexity; keep git log --oneline counting (language-agnostic). - internal/deps: split into graph.go (pure algorithms) and deps.go (orchestration via lang.ImportResolver). detectModulePath + scanPackageImports + collectImports move to goanalyzer/deps.go. - internal/mutation: route through the four mutation-related interfaces (MutantGenerator, MutantApplier, AnnotationScanner, TestRunner). Delete apply.go, generate.go, annotations.go; tiers.go untouched. go test -overlay scaffolding (writeOverlayJSON, buildTestArgs) moves to goanalyzer/testrunner.go. cmd/diffguard/main.go pulls the Go language out of the registry and threads it into each analyzer call. Tests migrated with their code: AST-level tests live next to the AST code in goanalyzer/*_test.go; orchestration tests stay in the analyzer packages but exercise the Go back-end via the registry (blank import). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unter The churn analyzer's pre-split computeComplexity was a coarse "+1 per branching node" counter, not the full cognitive-complexity walker. Moving to lang.ComplexityScorer had caused the churn section to reuse AnalyzeFile — producing higher scores and breaking byte-identical output on the regression baseline. Restore the coarser counter as computeSimpleComplexity in goanalyzer and have ComplexityScorer.ScoreFile use it. Lock in the difference between the two scorers with a test (nested ifs = 2 via the counter, 3 via the cognitive walker). Regression gate commands used: go build -o /tmp/diffguard-baseline ./cmd/diffguard (pre-refactor HEAD) go build -o /tmp/diffguard-after ./cmd/diffguard (post-refactor) # text /tmp/diffguard-baseline --base 6f359df --skip-mutation --fail-on none /repo > base.txt /tmp/diffguard-after --base 6f359df --skip-mutation --fail-on none /repo > after.txt diff base.txt after.txt -> byte-identical # json (after normalizing metrics[] ordering, which was already # non-deterministic pre-refactor due to sort ties + map iteration) -> normalized-identical # wall clock baseline median ~0.491s, after median ~0.484s -> within 5% Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add --language CLI flag (comma-separated, empty = auto-detect) and rewire
run() to loop over the resolved language set, running the analyzer
pipeline once per language and merging sections into one report.
Section naming: when only one language contributes (the common case
today, since only Go is registered), section names stay unsuffixed so
the output is byte-identical to pre-multi-language diffguard. When two
or more languages contribute, each section name gets an "[<lang>]"
suffix and the merged sections are sorted (language, metric)
lexicographically for stable report ordering.
Empty-diff UX preserved for the single-language case ("No Go files
found.") and generalized per-language in the multi-language case. An
unknown --language value is a hard error listing the registered names.
Includes the B6 smoke test (TestRun_SingleLanguageGo) using a temp git
repo, plus resolveLanguages unit tests and a B5 checkExitCode
escalation test. A TODO on the smoke test notes it will extend to
assert per-language section suffixes when Rust/TS land.
Regression gate re-run: byte-identical text output, wall-clock median
0.462s vs baseline 0.491s (-6%).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pass explicit --src-prefix=a/ --dst-prefix=b/ to git diff so the unified diff output always carries the "+++ b/" prefix that parseUnifiedDiff expects. Without this, users with diff.mnemonicPrefix=true (or the diff.noPrefix / custom prefix variants) see git emit "+++ w/<path>", handleFileLine rejects every path, and Parse returns zero files. This bug predates the Filter refactor, but exposing it via a parameter made TestParse_SuccessDetectsChangedGoFile flake under local git configs that the previous CI-only run never saw. The baseline regression gate (pre-refactor vs HEAD, --skip-mutation, --fail-on none) remains byte-identical in a clean env where the prefix default is already a/b/.
Adds the rustanalyzer package directory, depends on tree-sitter-rust via go-tree-sitter, and registers a minimal Rust Language with diffguard so Part C checklist items C0 (research prerequisites) and C1 (FileFilter) are complete. * FileFilter: .rs extension, IsTestFile treats any `tests` path segment as a test file, DiffGlobs = [*.rs]. * init() registers the language and its Cargo.toml manifest. * Stub implementations for the remaining sub-interfaces are in place so the package compiles; they're filled out by subsequent commits. * Rust-specific operators registered in internal/mutation/tiers.go: unwrap_removal + some_to_none (Tier 1), question_mark_removal (Tier 2). * Blank import added in cmd/diffguard/main.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts standalone functions, impl methods (inherent and trait-impl), and
nested functions as separate entries. Method names are prefixed with their
impl type: `impl Counter { fn new }` -> "Counter::new",
`impl Named for Counter { fn name }` -> "Counter::name". Nested functions
inside method bodies do NOT inherit the impl prefix (the walk-up stops at
the nearest function boundary), matching the spec's "treated as separate"
requirement.
Line ranges are 1-based and inclusive, consistent with the Go analyzer.
Testdata fixture covers every function form plus the filter-by-changed-
region path; countLines is unit-tested for edge cases (empty, trailing
newline, bare text).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Implements the cognitive complexity walker per the design doc: * Base +1 for if / while / for / loop / match / if-let / while-let. * +1 per guarded match arm (detected via either `match_arm_guard` child or the newer `match_pattern.condition` field; both grammar shapes are accepted for resilience against tree-sitter-rust upgrades). * +1 per logical-operator run change in a `&&`/`||` chain. * +1 nesting penalty per scope-introducing ancestor. * `?` and `unsafe` do NOT contribute. * Closures start a fresh nesting context (matches the Go analyzer's FuncLit behavior); nested `function_item`s are reported separately. ComplexityScorer reuses the calculator — tree-sitter walks are cheap enough that a separate approximation isn't worth the divergence. Testdata fixture documents expected scores per function; unit tests assert each case plus the logical-op counter directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DetectModulePath parses [package] name from Cargo.toml via a line-based scanner — no TOML dependency needed since we only pull two tokens. ScanPackageImports walks every .rs file in the package, looks for `use_declaration` and `mod_item` nodes via tree-sitter, and classifies each as internal iff the leading segment is `crate`, `self`, or `super`. The resolver maps each internal path to a directory-level graph node (e.g. `crate::foo::bar::Baz` -> `src/foo/bar`), matching the Go analyzer's directory-granularity edges. External crates and std imports are dropped. Tests cover crate-root detection, relative-path resolution (super/self/crate), and end-to-end scanning on a fixture crate with mixed internal + external imports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scans line_comment and block_comment tokens for `mutator-disable-next- line` (suppresses the following source line) and `mutator-disable-func` (suppresses every line of the enclosing function, including the signature). Function ranges are sourced from the same function_item walk used by the FunctionExtractor so `mutator-disable-func` can attach to comments that live inside the function body OR directly precede it (one blank line tolerated, matching the Go analyzer's behavior). Tests cover next-line annotations, func-wide annotations from both preceding and internal positions, and the negative case of ordinary comments that must not toggle anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ators Implements all canonical operators (conditional_boundary, negate_conditional, math_operator, return_value, boolean_substitution, branch_removal, statement_deletion) plus the three Rust-specific ones: * `unwrap_removal` — strips `.unwrap()` / `.expect(...)` from the receiver of a call (Tier 1). * `some_to_none` — flips `return Some(x)` to `return None` (Tier 1). * `question_mark_removal` — strips the trailing `?` from a try_expression (Tier 2). `incdec` is deliberately absent because Rust has no `++`/`--` operators. Mutants are filtered to changed regions and disabled-line maps, then sorted by (line, operator, description) so repeated runs produce byte-identical output — a critical property for the exit-code gate. Unit tests exercise every operator on minimal source snippets plus the determinism and filtering paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Uses tree-sitter byte offsets to do surgical text replacements — simpler than AST rewriting and preserves whitespace + formatting exactly. Each operator has a dedicated helper: * binary: swaps the operator token within a binary_expression. * bool: flips true <-> false. * return_value: substitutes Default::default() for the return expression. * some_to_none: rewrites `return Some(x)` to `return None`. * branch_removal: empties the consequence block of an if_expression. * statement_deletion: replaces a call statement with `();`. * unwrap_removal: drops `.unwrap()` / `.expect(...)` from a call. * question_mark_removal: strips the trailing `?` from a try_expression. After every successful edit we re-parse with tree-sitter and check for HasError on the root; corrupt mutants return nil so the test runner never exercises invalid source. Unknown operators and line-mismatches also return nil cleanly. Tests cover each operator's success path plus the two nil-return paths (unknown op, site mismatch) and direct coverage of the re-parse gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Temp-copy strategy per MULTI_LANGUAGE_SUPPORT.md §Mutation isolation: 1. Acquire a per-file mutex so concurrent mutants on the same file serialize; different files run in parallel. 2. Backup original bytes in memory. 3. Write mutant over original in place. 4. exec cargo test with context.WithTimeout and CARGO_INCREMENTAL=0. 5. Restore original bytes via defer — always, even on panic or unexpected process failure. TestPattern is passed as a positional filter (`cargo test <pattern>`). Timeouts promote to a killed verdict (the mutant made tests too slow to finish, which is itself a test-gap signal). Tests simulate the full kill / survive / timeout / process-failure matrix with /bin/sh scripts so the suite stays hermetic — no actual cargo or Rust toolchain required. `go test -race -count=3` passes, which is the critical assurance for the per-file lock. Also register the Rust-specific operators in tiers.go unit tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The design doc describes some_to_none as a generic `Some(x) -> None` rewrite, not specifically a return-position swap. Broaden the generator and applier so any Some(x) call expression becomes a mutation site. This lets the operator fire on `.map(|x| Some(x))`, `Ok(Some(x))`, etc., which is where it carries real test-quality signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the rustanalyzer layout 1:1. Wires: - tsanalyzer.go: Language struct, FileFilter (.ts/.tsx, test-suffix and __tests__/__mocks__ detection), composite detector requiring package.json AND at least one .ts/.tsx file so JS-only repos don't misfire. - parse.go: tree-sitter setup that picks typescript vs tsx grammar by file extension; shared walk/nodeLine/nodeText helpers. - sub-component impls (sizes, complexity, deps, mutation_*, testrunner, annotation) implementing the lang.Language interface end-to-end. - mutation/tiers.go: append TS operators strict_equality (Tier 1), nullish_to_logical_or and optional_chain_removal (Tier 2). - cmd/diffguard/main.go: blank-import tsanalyzer so init() runs at process start. Tests cover FileFilter edge cases (including utils.test-helper.ts NOT being a test file), JS-only repo detection negative case, and the appended operator tier classifications. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- testdata/functions.ts: standalone fn, arrow-const, fn-expr-const, class methods (constructor/public/static/private), nested arrow in a method, and a generator. - testdata/component.tsx: minimal JSX component + arrow component to exercise the tsx grammar path end-to-end (the plain typescript grammar rejects JSX, so successful extraction proves parse.go routes by extension). - sizes_test.go: asserts every expected function name is extracted, line ranges are well-formed, region filtering works, empty-file is tolerated, and .tsx parses. - helpers_test.go: shared writeFile helper matching rustanalyzer pattern, returning errors so callers can t.Fatal on setup failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Handles the TS-specific control flow set: if_statement with +1 per else branch, switch_statement with +1 per non-empty case, ternary_expression, try/catch (+1 try, +1 per catch_clause), promise-chain .catch() calls, &&/|| run-switch counting, and correct non-counting of optional chain (`?.`), nullish coalescing (`??`), `await`, and `async`. Unwraps parenthesized_expression around if/while conditions so logical chain detection works on the real grammar shape (tree-sitter TS wraps `if (a && b)` as `if ( (a && b) )`). Fixture complexity.ts documents expected scores inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DetectModulePath: parses package.json's name field via encoding/json.
- ScanPackageImports: handles both ES module `import ... from '...'`
and CommonJS `require('...')` via tree-sitter node kinds
(import_statement, call_expression with function identifier `require`).
- Internal classification: specifiers starting with `.`, `@/`, or `~/`
are internal; bare specifiers (npm packages) are external.
- Directory-level folding: `./dir/index` collapses to `./dir` so the
graph uses the same node as `./dir`.
Alias handling pins `@/` node naming to retain the `@/` prefix (so
the graph shape is self-documenting) while `~/` strips the prefix,
matching the common convention that `~` points at the project root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors rustanalyzer/mutation_annotate_test.go: next-line, func-wide (both preceding the function and inside its body), and a negative control that ordinary comments don't toggle anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers every operator the generator emits: - canonical: conditional_boundary, negate_conditional, math_operator, boolean_substitution, incdec (TS has ++/--), return_value, branch_removal, statement_deletion - TS-specific: strict_equality (both === and == sides), nullish_to_logical_or (?? -> ||), optional_chain_removal (foo?.bar) Plus the cross-cutting checks: mutants outside the changed region are dropped, disabled lines are honored, repeated runs are deterministic, and mutation works on .tsx files (exercising the tsx grammar). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests cover every operator the applier handles: - canonical: binary operator swap, boolean flip, incdec, return_value (both null and undefined target paths), branch_removal, statement_deletion - TS-specific: strict_equality toggle, nullish_to_logical_or, optional_chain_removal Plus the edge cases: unknown operator returns nil (no-op, not error), site mismatch (predicate finds no matching node) returns nil, and the re-parse gate correctly distinguishes well-formed from malformed output under both .ts and .tsx grammars. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors rustanalyzer/testrunner_test.go with a fake /bin/sh runner so we exercise killed / survived / timeout / restore-after-failure / per-file serialization without needing real vitest/jest binaries or network. Adds coverage for TS-specific concerns: - CI=true is set in the child env so interactive prompts don't stall jest/vitest. - detectTSRunner precedence (vitest > jest > npm test) via package.json devDependencies / dependencies. - argv shape for each runner including TestPattern translation (vitest -t, jest --testNamePattern). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Treat the final path segment of an @/ specifier as a directory component, only folding /index — identical to how ~/ imports are handled. Updates existing test expectations and adds three new resolveInternal cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add .next, .nuxt, .output, .svelte-kit, .turbo, and .cache to the hasTSFile directory prune list so generated .ts files inside framework output directories don't trigger TypeScript detection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add TestApply_BranchRemoval_BareForm to lock in that a braceless if body
is replaced with {} (parseable). Add TestFileFilter_MjsCjsExcluded to
assert .mjs and .cjs are rejected by both IncludesSource and MatchesExtension.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extend conditionLogicalOps calls to the if_let_expression and while_let_expression branches in walkComplexity, using the `value` field (what follows `=`). Add if_let_simple fixture and TestIfLetLogicalOps unit test covering the if-let code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace for-loop-then-break patterns in returnMutants, exprStmtMutants, applyReturnValue, and applyQuestionMarkRemoval with direct NamedChildCount > 0 / NamedChild(0) equivalents. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds cmd/diffguard/testdata/mixed-repo/{violations,clean} fixtures with a
Go + Rust + TS project layout sharing a single repo root. Extends
cmd/diffguard/main_test.go with two tests that build the binary and exec
it against each variant in --paths mode:
- TestMixedRepo_ViolationsHasAllThreeLanguageSections asserts the [go]
[rust] [typescript] section suffixes all appear and that Cognitive
Complexity fails in each language.
- TestMixedRepo_CleanAllPass asserts the negative control produces
WorstSeverity() == PASS while still populating every language section.
Refactoring mode (--paths .) is used so no .git history is required and
the tests don't depend on cargo / node toolchains. Mutation is skipped
for the same reason; EVAL-2/3 exercise the mutation path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/lang/evalharness with the shared binary-build / fixture-copy
/ JSON-decode helpers and a semantic Expectation/AssertMatches matcher
(matches on section name + severity and finding file/function/operator
rather than line-exact counts).
Adds internal/lang/rustanalyzer/eval_test.go plus fixtures under
evaldata/ for:
- complexity positive + negative (nested match vs flat helpers)
- sizes (function) positive + negative (long body vs split helpers)
- deps (cycle) positive + negative (a<->b vs shared types mod)
- mutation (kill) positive + negative (well-tested vs under-tested
classify())
- mutation (Rust-specific) positive + negative (some_to_none /
unwrap_removal)
Mutation tests are gated behind exec.LookPath("cargo") and honor -short.
Follow-up TODOs for sizes(file), deps(SDP), churn, and mutation
annotation respect are recorded as a comment block at the top of
eval_test.go.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds internal/lang/tsanalyzer/eval_test.go plus fixtures under evaldata/: - complexity positive + negative (nested ternary/try-catch vs flat helpers) - sizes (function) positive + negative (long const arrow vs split) - deps (cycle) positive + negative (a<->b vs shared ./types) - mutation (kill) positive + negative (classify() with strong/loose tests) - mutation (TS-specific) positive + negative (strict_equality + nullish_to_logical_or with distinguishing vs indistinguishable inputs) Mutation fixtures use `npm test` which shells to `node --experimental-strip-types` — available by default in Node 22.6+. requireNode() skips cleanly when node/npm are absent. Follow-up TODOs for sizes(file), deps(internal-vs-external), churn, and mutation annotation respect are recorded in eval_test.go. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the cross-language evaluation suite (eval4_test.go) with four fixture families under testdata/cross/: - rust_fail_ts_pass / rust_pass_ts_fail: severity propagation — a FAIL in any language escalates the overall WorstSeverity. - concurrency: 3+3 multi-file Rust + TS fixture exercising --mutation-workers 4. Asserts the on-disk tree stays bit-identical after the run (temp-copy restore safety) and workers=1 reports are deterministic across repeat runs. - disabled: mutator-disable-func annotations on Rust + TS files; asserts disabled functions never produce mutation findings even under concurrency. - known_clean: false-positive ceiling — a well-tested small fixture must produce WorstSeverity == PASS. Mutation-dependent tests gate behind -short and cargo/node LookPath so go test ./... stays green on dev boxes without the full toolchain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the three evaluation Make targets that wrap the eval Go tests with the deterministic env (CI=true, CARGO_INCREMENTAL=0) the mutation runners depend on: - eval-rust : EVAL-2 Rust correctness suite - eval-ts : EVAL-3 TypeScript correctness suite - eval-mixed : EVAL-4 cross-cutting + E1 mixed-repo integration `make eval` runs all three. Extends the existing ci.yml with dtolnay/rust-toolchain@stable and actions/setup-node@v4 so the mutation-gated evals actually exercise in CI (locally they t.Skip cleanly when cargo/node aren't on PATH). Caches ~/.cargo, target/, and ~/.npm by hashed lockfiles to keep steady-state CI runs under a minute. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Opening line reframed from Go-only to "Go, Rust, TypeScript". - New Supported Languages table with extensions, detection signals, and the native test runner each language's mutation section uses. - Per-language runtime dependencies subsection documents cargo and node+npm install paths (rustup, mise/nvm) and the Node 22.6 floor. - CLI reference gains --language, clarifies --test-pattern semantics across languages. - Mutation operators table gains per-language operator lists (unwrap_removal / some_to_none for Rust; strict_equality / nullish_to_logical_or for TS) and documents the per-language isolation strategies (overlay / temp-copy / in-place restore). - Annotation syntax section now shows Go, Rust, and TS variants side by side — same directive names, language-native comment syntax. - Observability tier text mentions JS / Rust logging patterns. - Cross-links MULTI_LANGUAGE_SUPPORT.md and docs/rust-typescript-support.md for deeper context. - Per-PR GitHub Actions sample shows Rust + Node toolchain install steps alongside Go setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…llow-ups README supported-languages table previously listed .mts and .cts for TypeScript; the tsanalyzer FileFilter only accepts .ts and .tsx per spec D1. Trim the table to match actual behavior. Also add docs/multi-lang-followups.md capturing deferred work (EVAL-5 pre-flight calibration, Rust workspace-crate path resolution, per-analyzer eval carve-outs) so nothing becomes invisible tech debt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per spec D1, tsanalyzer's FileFilter accepts only .ts and .tsx. The README supported-languages table overclaimed .mts and .cts support. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| # (cargo test is how survived-vs-killed is decided). Without this | ||
| # step, the mutation evals t.Skip via exec.LookPath("cargo"). | ||
| - name: Install Rust toolchain | ||
| uses: dtolnay/rust-toolchain@stable |
Collaborator
|
Thanks for adding Rust and Typescript support. Looks like the CI failed due to timeout. The PR is too big to review manually. I am fine with it as long as the PR passes the diffguard gating itself. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Languageinterface (internal/lang/) and ports Go to it as one of three registered languages, with no byte-level regression on Go-only diffs.internal/lang/rustanalyzer/) and TypeScript (internal/lang/tsanalyzer/) analyzers via tree-sitter, covering all five existing analyses (complexity, sizes, deps, churn, mutation) plus language-specific mutation operators:unwrap_removal,some_to_none,question_mark_removal(Rust);strict_equality,nullish_to_logical_or,optional_chain_removal(TypeScript).cmd/diffguardwith auto-detection from manifest files (go.mod/Cargo.toml/package.json+.ts), a new--languageflag for override, and per-language[lang]-suffixed report sections when more than one language is active.internal/lang/evalharness/) with positive/negative control fixture pairs per analyzer per language, plus cross-cutting tests for severity propagation, mutation concurrency safety, disabled-line respect, and false-positive ceiling.What's in the box
Language abstraction (Parts A + B)
internal/lang/{lang,registry,detect}.go— 9 sub-interfaces (FileFilter,FunctionExtractor,ComplexityCalculator,ComplexityScorer,ImportResolver,MutantGenerator,MutantApplier,AnnotationScanner,TestRunner) behind a top-levelLanguage, plus a deterministic registry and manifest-based detection.internal/lang/goanalyzer/— existing Go pipeline extracted unchanged behind the interfaces; three duplicatedfuncNamehelpers consolidated; byte-identical output verified against pre-refactor baseline on a Go-only diff.internal/diffparameterized with aFileFilter(no more hardcoded*.go/_test.gochecks);internal/depssplit into pure graph math (graph.go) + orchestration (deps.go).cmd/diffguard/main.goloops over the resolved language set, merges sections into oneReport; single-language output stays suffix-free for backwards compatibility.Rust analyzer (Part C)
// mutator-disable-*), mutation generator/applier with re-parse gate,cargo testrunner with temp-copy isolation, per-file mutex,CARGO_INCREMENTAL=0.TypeScript analyzer (Part D)
typescriptandtsxgrammars routed by extension; composite detector requiringpackage.jsonAND at least one.ts/.tsxfile (rejects JS-only repos); runner auto-detection prefers vitest > jest > npm test withCI=true; same temp-copy + per-file-mutex isolation as Rust.Evaluation suites (Part E + EVAL-1..4)
Positive/negative control pairs prove the analyzers catch real issues rather than just parsing cleanly:
internal/lang/evalharness/— builds the binary once per test run, copies fixtures to temp dirs, runs diffguard with deterministic flags (--mutation-sample-rate 100, fixed workers), and matches the livereport.Reportsemantically (section prefix + severity + required{file, function, severity, operator}findings).internal/lang/rustanalyzer/evaldata/andinternal/lang/tsanalyzer/evaldata/— 10 fixture pairs per language covering complexity, sizes (fn), deps (cycle), mutation kill rate, and language-specific mutation operators. Each fixture has aREADME.mddocumenting the seeded issue and anexpected.jsonencoding the required verdict.cmd/diffguard/eval4_test.go+testdata/cross/— severity propagation across languages (Rust FAIL + TS PASS → overall FAIL, and reverse), mutation concurrency tree-cleanliness at--mutation-workers 4, disabled-line respect under concurrency, and a false-positive ceiling on a known-clean multi-language fixture.cmd/diffguard/testdata/mixed-repo/{violations,clean}/— end-to-end integration for the full orchestrator; asserts[go]/[rust]/[typescript]sections all appear and that clean inputs produceWorstSeverity() == PASS.exec.LookPath("cargo")/npxsogo test ./...stays green on dev machines without the toolchains; CI installs both before running eval targets.CI + docs
.github/workflows/ci.ymlinstalls Rust (dtolnay/rust-toolchain) and Node 22 (setup-node), caches~/.cargo,target/,~/.npm, and runsmake eval-rust,make eval-ts,make eval-mixedas dedicated steps.Makefilegainedeval,eval-rust,eval-ts,eval-mixedtargets withCI=true CARGO_INCREMENTAL=0.README.mdrewritten: supported-languages table, per-language runtime dependencies,--languagein the CLI reference, per-language annotation syntax, mutation operator lists, isolation strategies, cross-links toMULTI_LANGUAGE_SUPPORT.mdanddocs/rust-typescript-support.md.docs/multi-lang-followups.mdtracks deferred work (EVAL-5 pre-flight calibration, Rust workspace-crate path resolution, per-analyzer eval carve-outs: sizes-file, SDP, churn end-to-end, annotation respect end-to-end).Breaking changes
diff.Parse(repoPath, baseBranch)→diff.Parse(repoPath, baseBranch, lang.FileFilter). Same fordiff.CollectPaths.Report.Section.Namenow carries a[<lang>]suffix when two or more languages contribute sections. Single-language runs are unchanged.internal/sizes,internal/complexity,internal/deps,internal/churn, andinternal/mutationexportedAnalyzefunctions now delegate throughlang.*interfaces; callers outsidecmd/diffguard/main.goneed to pass aLanguage.Test plan
go test ./... -count=1green locallygo vet ./...cleanmake eval-rustgreen (cargo installed)make eval-tsgreen (node 22 + vitest available)make eval-mixedgreendiffguard --language goon a Go-only diff produces byte-identical output to pre-refactormaindiffguard(no flag) on themixed-repo/violationsfixture reports[go]+[rust]+[typescript]sections