Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions web/src/server/free-session/abuse-detection.ts
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,17 @@ async function enrichWithGithubAge(
} else if (ageDays < 90) {
s.flags.push(`gh-new<90d:${ageDays.toFixed(0)}d`)
s.score += 10
} else if (ageDays >= 365 * 3) {
// Established GitHub accounts are a strong counter-signal: buying
// a 3+ year old account is rare at our abuse scale. Subtract enough
// to pull a day-1 heavy user (new-acct<1d + very-heavy = 90) back
// below the high-tier threshold without fully clearing them —
// genuine 24/7 patterns still surface.
s.flags.push(`gh-established:${(ageDays / 365).toFixed(1)}y`)
s.score -= 40
} else if (ageDays >= 365) {
s.flags.push(`gh-established:${(ageDays / 365).toFixed(1)}y`)
s.score -= 20
}
}
}
Expand Down
2 changes: 2 additions & 0 deletions web/src/server/free-session/abuse-review.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ You will see:
A very young GitHub account (gh_age < 7d, especially < 1d) combined with heavy usage is one of the strongest bot signals we have: real developers almost never create a GitHub account on the same day they start running an agent. Weigh this heavily in tiering.
Conversely, an established GitHub account (gh_age ≥ 1 year, especially ≥ 3 years) is a strong counter-signal. Account-age spoofing by buying old accounts is possible but uncommon at our abuse scale. An established GitHub + a natural agent mix (basher, code-reviewer, file-picker alongside the root agent) + some activity gaps during the day reads like an excited first-day power user, not a bot. Don't tier these as HIGH unless there's a second independent signal (creation cluster membership, true 24/7 distinct_hours, suspicious email pattern).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent field name in LLM guidance

The new guidance uses distinct_hours to describe the field that appears in the suspect data as distinct_hrs24 (line 79). The existing TIER 1 description (line 49) uses yet another variant: distinct_hours_24h. The model sees three names for the same field, which could lead it to misread or misquote a value when justifying a tier decision.

Consider aligning all three occurrences to the actual field name the model will see in the data (distinct_hrs24).

Suggested change
Conversely, an established GitHub account (gh_age 1 year, especially 3 years) is a strong counter-signal. Account-age spoofing by buying old accounts is possible but uncommon at our abuse scale. An established GitHub + a natural agent mix (basher, code-reviewer, file-picker alongside the root agent) + some activity gaps during the day reads like an excited first-day power user, not a bot. Don't tier these as HIGH unless there's a second independent signal (creation cluster membership, true 24/7 distinct_hours, suspicious email pattern).
Don't tier these as HIGH unless there's a second independent signal (creation cluster membership, true 24/7 distinct_hrs24, suspicious email pattern).

Produce a markdown report with three sections:
## TIER 1 — HIGH CONFIDENCE (ban)
Expand Down
Loading