Hourly freebuff bot-sweep dry-run endpoint#527
Conversation
Adds an admin endpoint (/api/admin/bot-sweep) plus hourly GitHub Action that scans active/queued free_session rows for likely bots and emails a ranked report to james@codebuff.com. Also checks in the inspect and unban helper scripts used during the 2026-04-21 ban sweep, and promotes FREEBUFF_ROOT_AGENT_IDS to a shared constant. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR adds a read-only hourly bot-detection sweep for freebuff abuse: a new Key findings:
Confidence Score: 4/5Safe to merge with low risk — read-only reporting feature with no automated bans; issues are operational reliability concerns only The P1 concern (silent email failure returning 200) is an operational reliability gap but causes no data loss, incorrect bans, or security exposure. The P2 heuristic issue causes false negatives in edge cases only. Auth, scoring logic, cluster detection, env schema, and GH workflow are all solid. web/src/app/api/admin/bot-sweep/route.ts — silent email failure and unconditional email send Important Files Changed
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions (hourly)
participant API as POST /api/admin/bot-sweep
participant DB as PostgreSQL (free_session + messages)
participant Loops as Loops (email)
participant Reviewer as james@codebuff.com
GHA->>API: POST with Bearer BOT_SWEEP_SECRET
API->>API: timingSafeEqual auth check
alt secret missing
API-->>GHA: 503
else invalid token
API-->>GHA: 401
else auth OK
API->>DB: SELECT free_session LEFT JOIN user
DB-->>API: sessions[]
API->>DB: SELECT message stats (24h FILTER + lifetime COUNT)
DB-->>API: msgStats[]
API->>API: score & tier suspects, findCreationClusters()
API->>Loops: sendBasicEmail(subject, message)
Loops-->>API: { success: true|false }
API-->>GHA: 200 { ok, suspectCount, emailSent }
Loops-->>Reviewer: ranked suspect report (if success)
end
Reviews (1): Last reviewed commit: "Hourly freebuff bot-sweep dry run + ban ..." | Re-trigger Greptile |
| if (!emailResult.success) { | ||
| logger.error( | ||
| { error: emailResult.error }, | ||
| 'Failed to email bot-sweep report', | ||
| ) | ||
| } | ||
|
|
||
| return NextResponse.json({ | ||
| ok: true, | ||
| totalSessions: report.totalSessions, | ||
| suspectCount: report.suspects.length, | ||
| highTierCount: report.suspects.filter((s) => s.tier === 'high').length, | ||
| emailSent: emailResult.success, | ||
| }) |
There was a problem hiding this comment.
Email failure silently returns HTTP 200
When sendBasicEmail fails (e.g., LOOPS_API_KEY is unset or the Loops API is down), the route logs the error but still returns { ok: true, emailSent: false } with HTTP 200. The GitHub Actions workflow only checks for a non-200 status to decide whether to fail the run:
if [ "$status" != "200" ]; then
exit 1
fiThis means a configuration problem (missing API key, wrong transactional template ID, etc.) would cause the sweep to run for weeks with no emails delivered, while GitHub Actions reports every run as green.
Consider returning a non-2xx status when the email fails:
if (!emailResult.success) {
logger.error({ error: emailResult.error }, 'Failed to email bot-sweep report')
return NextResponse.json(
{ ok: false, error: 'report generated but email failed', emailSent: false },
{ status: 502 },
)
}| try { | ||
| const report = await identifyBotSuspects({ logger }) | ||
| const { subject, message } = formatSweepReport(report) | ||
|
|
||
| const emailResult = await sendBasicEmail({ | ||
| email: REPORT_RECIPIENT, | ||
| data: { subject, message }, | ||
| logger, | ||
| }) |
There was a problem hiding this comment.
Email sent every hour even with zero suspects
The email is dispatched unconditionally after every sweep. When there are no active/queued sessions or no suspects match any flag, the email subject will be [freebuff bot-sweep] 0 medium suspects (0 active+queued). Over time this produces hourly noise in the inbox even during quiet periods.
Consider skipping the email when there is nothing actionable:
const report = await identifyBotSuspects({ logger })
if (report.suspects.length === 0 && report.creationClusters.length === 0) {
logger.info({ totalSessions: report.totalSessions }, 'bot-sweep: no suspects found, skipping email')
return NextResponse.json({ ok: true, totalSessions: report.totalSessions, suspectCount: 0, emailSent: false })
}
const { subject, message } = formatSweepReport(report)
// ... rest of email send| msgs24h: sql<number>`COUNT(*) FILTER (WHERE ${schema.message.finished_at} >= ${cutoff})`, | ||
| distinctHours24h: sql<number>`COUNT(DISTINCT EXTRACT(HOUR FROM ${schema.message.finished_at})) FILTER (WHERE ${schema.message.finished_at} >= ${cutoff})`, | ||
| lifetime: sql<number>`COUNT(*)`, |
There was a problem hiding this comment.
EXTRACT(HOUR) deduplicates across calendar-day boundaries — distinctHours24h can be undercounted
EXTRACT(HOUR FROM finished_at) returns an integer 0–23 (the clock-hour component). Within the rolling 24-hour window the same clock-hour can appear twice — e.g., messages at yesterday 14:00 and today 14:05 both extract to 14, so COUNT(DISTINCT EXTRACT(HOUR …)) counts them as a single distinct hour.
The 24-7-usage flag fires at distinctHours24h >= 20, so borderline bots near that threshold might be missed. DATE_TRUNC('hour', …) produces a unique value per calendar-hour slot and avoids the undercount:
distinctHours24h: sql<number>`COUNT(DISTINCT DATE_TRUNC('hour', ${schema.message.finished_at})) FILTER (WHERE ${schema.message.finished_at} >= ${cutoff})`,postgres-js can't encode a raw JS Date as an ad-hoc template parameter (it only knows the target type when drizzle recognises the column). The FILTER (WHERE finished_at >= $cutoff) clauses were throwing ERR_INVALID_ARG_TYPE at query time. Switch to an ISO string with an explicit ::timestamptz cast. Verified end-to-end against prod: identified 39 suspects / 17 creation clusters across 244 active+queued sessions, email delivered. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rules produce the deterministic shortlist; the agent then writes a tiered ban recommendation with cluster reasoning over just that shortlist. Advisory only — fails open to the rule-only report, and still no auto-ban. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The agent's tiered recommendation is the actionable part — surface it above the raw rule-based data in the email. Also return the full agent review text in the API JSON response so the GitHub Action can log it. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
For every suspect on the rule-based shortlist we now look up the linked GitHub login's created_at via the public GitHub API and fold age into scoring: <7d GH → +60, <30d → +30, <90d → +10. The agent prompt is taught that a fresh GitHub account paired with heavy usage is one of the strongest bot signals we have. Optional BOT_SWEEP_GITHUB_TOKEN env var lifts the unauthenticated 60 req/hr rate limit. Failures are logged but don't break the sweep. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
POST /api/admin/bot-sweep, guarded by aBOT_SWEEP_SECRETbearer token (timing-safe compare). A new hourly GitHub Action hits it; the endpoint scans active/queuedfree_sessionrows, ranks likely-bot users by a tiered score (24-7 usage, heavy msg volume, young accounts, alias-email patterns, etc.), and emails a report to james@codebuff.com. It's a DRY RUN — no bans are issued automatically.scripts/inspect-freebuff-active.ts(snapshots current sessions with clustering signals) andscripts/unban-freebuff-users.ts(reverse of the ban script). Also promotesFREEBUFF_ROOT_AGENT_IDSto a shared constant so both the CLI script and the production sweep count the same agent IDs.Test plan
BOT_SWEEP_SECRETin Infisical (prod) and as a GitHub repo secretworkflow_dispatch🤖 Generated with Claude Code