Skip to content

fix: security and correctness fixes for worker and bot#17

Merged
abrichr merged 3 commits intofeat/worker-core-and-testsfrom
fix/pr15-security-and-correctness
Mar 2, 2026
Merged

fix: security and correctness fixes for worker and bot#17
abrichr merged 3 commits intofeat/worker-core-and-testsfrom
fix/pr15-security-and-correctness

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Mar 2, 2026

Summary

Fixes 4 critical/high issues identified in code review of PR #15 (wright worker pipeline):

  • Env var allowlist: Replace { ...process.env } in Claude subprocess with explicit allowlist — prevents leaking SUPABASE_SERVICE_ROLE_KEY, BOT_TOKEN, GITHUB_TOKEN, etc.
  • AbortController wiring: Pass AbortController from queue-pollerdev-loopclaude-session so SIGTERM actually stops the Claude session and clears dangling timers
  • Bot github_token: Add github_token to insertJob() — the job_queue table has github_token TEXT NOT NULL, so inserts were failing with a constraint error
  • Telegram auth middleware: Add ALLOWED_TELEGRAM_USERS env var to restrict bot access to known user IDs
  • Build fix: Add @types/node to @wright/shared and @wright/bot (pre-existing build failures)

Test plan

  • pnpm build — all 3 packages compile (shared, worker, bot)
  • pnpm --filter worker test — 53/53 tests pass
  • Deploy to Fly.io, submit a job via Telegram, verify end-to-end
  • Confirm Claude session logs don't show SUPABASE/BOT_TOKEN vars
  • Send Telegram message from unauthorized user, verify rejection

🤖 Generated with Claude Code

abrichr and others added 3 commits March 2, 2026 15:18
- Allowlist env vars passed to Claude subprocess (prevent leaking
  SUPABASE_SERVICE_ROLE_KEY, BOT_TOKEN, GITHUB_TOKEN, etc.)
- Wire AbortController from queue-poller through dev-loop to
  claude-session for graceful SIGTERM cancellation
- Add github_token to bot insertJob (fixes NOT NULL constraint failure)
- Add Telegram chat ID allowlist middleware (ALLOWED_TELEGRAM_USERS)
- Add @types/node to shared and bot packages (fixes pre-existing
  build failures)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add TMPDIR/TMP/TEMP to allowed env vars (git/npm need temp dirs)
- Use { once: true } on abort signal listener to prevent accumulation
  across loop iterations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Pass abortController directly instead of conditional spread
- Filter non-finite values from ALLOWED_TELEGRAM_USERS parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr merged commit 3062c20 into feat/worker-core-and-tests Mar 2, 2026
abrichr added a commit that referenced this pull request Mar 2, 2026
* feat: implement worker core, tests, bot, CI/CD, and Dockerfile

Worker (apps/worker):
- Queue poller with atomic claiming, stale recovery, graceful SIGTERM requeue
- Dev loop (Ralph Loop): clone → detect → install → Claude loop → test → PR
- Claude Agent SDK wrapper for spawning Claude Code sessions
- Test runner auto-detection for pytest, jest, vitest, playwright, go, cargo
- Git operations: clone, branch, commit, push, create PR
- HTTP server with health, drain, cancel endpoints + scale-to-zero
- 53 tests across 6 test files, all passing

Telegram Bot (apps/bot):
- grammY-based bot with /start, /task, /status, /cancel commands
- Supabase realtime subscription for streaming job events to chat
- Inline keyboard for PR approve/reject actions

Infrastructure:
- Production multi-stage Dockerfile (Node 22, Python/uv, Go, Rust)
- GitHub Actions CI (lint + test + build) and deploy (Fly.io)
- Updated shared types (retry fields, DevLoopConfig, DevLoopResult)
- Updated SQL migration (attempt, max_attempts, github_token columns)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: security issues, deployment blockers, and build config

- Fix command injection in createPullRequest (use execFileSync instead of execSync)
- Add try-catch to installDependencies with proper error propagation
- Use WORKSPACE_DIR env var instead of hardcoded /tmp path in dev-loop
- Add GitHub CLI (gh) to Dockerfile for PR creation
- Copy .dockerignore to repo root for proper Docker build context
- Add composite: true to shared tsconfig for project references
- Fix migration to use gen_random_uuid() instead of uuid_generate_v4()
- Add Supabase config files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: reduce Fly.io memory to 4GB (shared CPU limit)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: install Claude Code CLI in Docker for agent SDK

The @anthropic-ai/claude-agent-sdk spawns a Claude Code subprocess
via the query() function. The CLI must be globally installed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update URLs for repo rename to openadapt-wright

Update stale OpenAdaptAI/wright URLs to OpenAdaptAI/openadapt-wright
in both README.md and dev-loop.ts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: strip ANSI escape codes before parsing test output

Test runners emit colored output that broke our regex parsers,
causing 0/0 results even when tests were actually passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add vitest output format parser for correct test result extraction

Vitest uses "Tests  2 passed (2)" format while Jest uses "Tests: 2 failed, 5 passed, 7 total".
The parser now handles both formats, fixing the 0/0 results seen in production.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: clone specified branch and add vitest output parser

- cloneRepo now accepts an optional branch parameter to checkout the
  correct base branch instead of always cloning the default branch
- dev-loop passes job.branch to cloneRepo so auto-detection works
  against the right codebase
- parseJest now handles vitest's output format ("Tests  2 passed (2)")
  in addition to jest's format ("Tests: 2 failed, 5 passed, 7 total")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: disable Fly.io auto-stop to prevent job interruption

Fly.io's auto_stop_machines was killing the worker during long Claude
sessions because execSync blocks the event loop, preventing health
checks from responding. Now the worker manages its own lifecycle
via the 5-minute idle timer (process.exit(0)).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: security and correctness fixes for worker and bot (#17)

* fix: security and correctness fixes for worker and bot

- Allowlist env vars passed to Claude subprocess (prevent leaking
  SUPABASE_SERVICE_ROLE_KEY, BOT_TOKEN, GITHUB_TOKEN, etc.)
- Wire AbortController from queue-poller through dev-loop to
  claude-session for graceful SIGTERM cancellation
- Add github_token to bot insertJob (fixes NOT NULL constraint failure)
- Add Telegram chat ID allowlist middleware (ALLOWED_TELEGRAM_USERS)
- Add @types/node to shared and bot packages (fixes pre-existing
  build failures)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add TMPDIR to env allowlist and use once for abort listener

- Add TMPDIR/TMP/TEMP to allowed env vars (git/npm need temp dirs)
- Use { once: true } on abort signal listener to prevent accumulation
  across loop iterations

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: simplify abortController pass-through and filter NaN from allowlist

- Pass abortController directly instead of conditional spread
- Filter non-finite values from ALLOWED_TELEGRAM_USERS parsing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr abrichr deleted the fix/pr15-security-and-correctness branch March 2, 2026 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant