feat(sandcastle): make strategy dispatch registry-driven (#1862)
* feat(sandcastle): make strategy dispatch registry-driven
Replace the hardcoded `sandcastle` label / single-strategy implementation
with a registry mapping each strategy key to its actor/critic loop and
finalization config. Labels (`sandcastle-<key>`) and branch prefixes
(`agent/<key>`) are derived from the key, so adding a strategy is one
folder + one registry line.
- New `.sandcastle/strategies/index.ts` (registry, indexed view with
load-time uniqueness assertion, `labelOf`/`branchPrefixOf` helpers).
- `TaskSpec.strategyKey` propagates the chosen strategy through the pipeline.
- `GithubIssueSource` fetches issues per registered label, deduplicates
with a multi-label warning, validates planner output (strategy echo and
branch number bound to issue id), and builds its prompt sanitizer once
from the registry (universal core + per-strategy `controlTags`).
- `GITHUB_ISSUE_LABEL` and `GIT_BRANCH_PREFIX` constants removed; no
back-compat shim.
Cutover (one-time, also noted in `.github/workflows/sandcastle.yml`):
1. `gh label edit sandcastle --name sandcastle-implement`
2. Close or merge open PRs whose head branches start with `agent/issue-`.
* fix(sandcastle): validate strategy key and controlTags at registry load
Extend `indexByKey` to enforce the kebab-case contract on `StrategyEntry.key`
(documented but unchecked) and an XML-name-safe contract on `controlTags`.
Failures throw at module load with a precise message, matching the existing
duplicate-key behaviour, so registry mistakes (typos, wrong casing, empty
tag) cannot silently produce undiscoverable GitHub labels, invalid git
branches or empty regex alternatives in the prompt sanitizer.
* fix(sandcastle): tighten control-tag sanitizer against prefix matches
The deny-list regex `</?(?:tag1|tag2)[^>]*>` matched tag-name prefixes
because `[^>]*` swallowed the trailing characters: `<plant>` matched
alternative `plan`, `<reviewer>` matched `review`, and so on. Insert a
zero-width lookahead `(?=[\\s/>])` after the alternation so only XML
tag-name terminators (whitespace, `/`, or `>`) are accepted. The
deny-list remains entirely registry-driven.
* refactor(sandcastle): parallelise issue fetch and clarify multi-label warning
- Fan out per-strategy `gh issue list` calls with `Promise.all`. The
dedup pass still iterates the resolved array in registry order, so the
"first registered wins" semantics is preserved. Failure semantics are
unchanged: any rejected fetch propagates as before.
- Render labels via `labelOf` (no more hard-coded `sandcastle-*` glob),
name both winner and dropped GitHub labels, and append a ready-to-paste
`gh issue edit --remove-label` remediation hint.
* fix(sandcastle): reject prefix-overlapping strategy keys at registry load
Two registered keys related by a kebab-prefix (e.g. 'foo' and 'foo-1') yield
overlapping branch-detection regexes: a branch 'agent/foo-1-42-slug' is matched
by both '^agent/foo-(\d+)-' (capturing '1' as the issue id) and
'^agent/foo-1-(\d+)-' (capturing '42'), and the registry-order tie-break in
`fetchIssuesWithOpenPRs` then attributes the open PR to the wrong issue.
Extend `indexByKey` with a pairwise prefix-incomparable check that throws at
module load with a precise message, in line with the existing
`STRATEGY_KEY_PATTERN`, `CONTROL_TAG_PATTERN`, and duplicate-key checks
introduced in
89be4bd3. Today's single-entry registry passes unchanged.
* refactor(sandcastle): unify prompt taxonomy and decouple planner from orchestrator
Adopt the Inputs/Task/Output/Rules/Done section structure across the
plan, actor and critic prompts; drop `Agent` from the role headers;
deduplicate intra-prompt boilerplate (verbatim instructions, repeated
quality-gate blocks, multiply-explained confidence levels). Total prompt
size: 217 \u2192 142 lines.
Decouple the planner output from registry-known data: the planner now
emits a kebab-case `slug` only; the orchestrator builds
`branch = \`\${source.branchPrefix}-\${id}-\${slug}\`` and copies
`strategyKey` from the registry-resolved source. The validator drops the
`strategyKey` echo check and the full-branch regex, validates the slug
shape via `SLUG_PATTERN` and a `MAX_SLUG_CHARS` cap. The planner JSON
view is also projected to `{ body, labels, number, title }` so
`branchPrefix`/`strategyKey` cannot leak back into the prompt.
Rename actor variable `TASK_ID` \u2192 `ISSUE_NUMBER` for precision; drop the
`## Planner Analysis` section header from `buildPlanContext` so
`{{PLAN_CONTEXT}}` interpolates cleanly inside the actor's Inputs
section.
* fix(sandcastle): de-duplicate multi-line prompt placeholders
The sandcastle library's PromptArgumentSubstitution
(`@ai-hero/sandcastle/dist/PromptArgumentSubstitution.js`) applies the
regex \`/\\{\\{\\s*([A-Za-z_][A-Za-z0-9_]*)\\s*\\}\\}/g\` globally with no
Markdown awareness, so a placeholder appearing both inside an
`## Inputs` bullet (`\`{{KEY}}\` \u2014 description.`) and on a bare line
below was substituted twice. For multi-line values (`ISSUE_BODY`,
`PLAN_CONTEXT`, `FINDINGS`, `ISSUES_JSON`, `ACCEPTANCE_CRITERIA`)
this corrupted the inline-code span and doubled the prompt token cost
on every actor and critic round.
Reference variables by bare name (`\`KEY\``) in the `## Inputs` bullets
and in narratives that mention multi-line values; keep the bare-line
`{{KEY}}` injections downstream as the single substitution sites.
Scalar variables (`ISSUE_NUMBER`, `ISSUE_TITLE`, `BRANCH`,
`BASE_BRANCH`, `NONCE`) remain interpolated where their single-line
value belongs in the rendered prose or commands.
* refactor(sandcastle): centralize MAX_SLUG_CHARS in constants module
`MAX_SLUG_CHARS` is a tunable peer of `MAX_TITLE_CHARS`,
`CONTEXT_HASH_RADIUS`, and `HASH_PREFIX_LENGTH`, all already in
`constants.ts`. Move it there to honour the AGENTS.md "single source of
truth: canonical defaults" rule. `SLUG_PATTERN` stays local in
`task-source.ts`: it is a structural validator regex, not a tunable.
* fix(sandcastle): align PR-coverage branch regex with slug grammar
The PR-coverage regex used to dedup open PRs in `fetchIssuesWithOpenPRs`
omitted the trailing slug, while `validatePlanEntry` enforces a strict
kebab-case slug shape via `SLUG_PATTERN`. A stale or malformed branch
like `agent/implement-42-` would match the dedup regex and suppress
retries forever.
Lift the slug body into a shared `SLUG_PATTERN_BODY` constant reused by
both `SLUG_PATTERN` (the plan-entry validator) and the per-strategy
`branchPatterns` so the two definitions stay in lock-step.
* fix(sandcastle): sanitize planner-supplied issue title at trust boundary
`validatePlanEntry` validated typeof, length and control characters on
`item.title` but stored the raw value verbatim in `TaskSpec`, exposing
both the actor prompt (`{{ISSUE_TITLE}}`) and the GitHub PR title (via
`finalizer.ts`) to control tags such as `<system>` or `<plan>` that a
prompt-injected planner could echo from an issue body.
Route `item.title` through the same registry-derived `sanitizeForPrompt`
already used for `body`, `rootCauseHypothesis` and `acceptanceCriteria`,
trim whitespace, and reject the entry if the sanitised title is empty.
Length and control-character checks remain on the raw input as a fast
fail-fast path before sanitisation.
* fix(sandcastle): retry on all-invalid plan and dedupe planner ids
`validatePlan` returned `[]` both for legitimate empty plans and for
plans where every entry failed validation. The caller treats `[]`
identically ("No actionable issues. Exiting."), so a systemic planner
mistake silently terminated the nightly run as if there were genuinely
no work, wasting the entire retry budget without diagnostic.
It also accepted duplicate ids: a planner emitting two entries for the
same issue number produced two `TaskSpec`s sharing one branch, so
`main.ts` would spawn two concurrent sandboxes and create duplicate PRs.
Restructure `validatePlan` to:
- Track `seenIds: Set<string>`; on duplicate, warn and drop the second
occurrence (first registered wins, mirrors the multi-label dedup
pattern in `fetchAndSanitizeIssues`).
- Detect `parsed.issues.length > 0 && validated.length === 0`: log an
error and return `null` so the existing retry loop at lines 119-177
re-invokes the planner instead of exiting silently.