Piment Noir Git Repositories - e-mobility-charging-stations-simulator.git/commit

feat(sandcastle): make strategy dispatch registry-driven (#1862)

* feat(sandcastle): make strategy dispatch registry-driven

Replace the hardcoded `sandcastle` label / single-strategy implementation
with a registry mapping each strategy key to its actor/critic loop and
finalization config. Labels (`sandcastle-<key>`) and branch prefixes
(`agent/<key>`) are derived from the key, so adding a strategy is one
folder + one registry line.

- New `.sandcastle/strategies/index.ts` (registry, indexed view with
  load-time uniqueness assertion, `labelOf`/`branchPrefixOf` helpers).
- `TaskSpec.strategyKey` propagates the chosen strategy through the pipeline.
- `GithubIssueSource` fetches issues per registered label, deduplicates
  with a multi-label warning, validates planner output (strategy echo and
  branch number bound to issue id), and builds its prompt sanitizer once
  from the registry (universal core + per-strategy `controlTags`).
- `GITHUB_ISSUE_LABEL` and `GIT_BRANCH_PREFIX` constants removed; no
  back-compat shim.

Cutover (one-time, also noted in `.github/workflows/sandcastle.yml`):
1. `gh label edit sandcastle --name sandcastle-implement`
2. Close or merge open PRs whose head branches start with `agent/issue-`.

* fix(sandcastle): validate strategy key and controlTags at registry load

Extend `indexByKey` to enforce the kebab-case contract on `StrategyEntry.key`
(documented but unchecked) and an XML-name-safe contract on `controlTags`.
Failures throw at module load with a precise message, matching the existing
duplicate-key behaviour, so registry mistakes (typos, wrong casing, empty
tag) cannot silently produce undiscoverable GitHub labels, invalid git
branches or empty regex alternatives in the prompt sanitizer.

* fix(sandcastle): tighten control-tag sanitizer against prefix matches

The deny-list regex `</?(?:tag1|tag2)[^>]*>` matched tag-name prefixes
because `[^>]*` swallowed the trailing characters: `<plant>` matched
alternative `plan`, `<reviewer>` matched `review`, and so on. Insert a
zero-width lookahead `(?=[\\s/>])` after the alternation so only XML
tag-name terminators (whitespace, `/`, or `>`) are accepted. The
deny-list remains entirely registry-driven.

* refactor(sandcastle): parallelise issue fetch and clarify multi-label warning

- Fan out per-strategy `gh issue list` calls with `Promise.all`. The
  dedup pass still iterates the resolved array in registry order, so the
  "first registered wins" semantics is preserved. Failure semantics are
  unchanged: any rejected fetch propagates as before.
- Render labels via `labelOf` (no more hard-coded `sandcastle-*` glob),
  name both winner and dropped GitHub labels, and append a ready-to-paste
  `gh issue edit --remove-label` remediation hint.

* fix(sandcastle): reject prefix-overlapping strategy keys at registry load

Two registered keys related by a kebab-prefix (e.g. 'foo' and 'foo-1') yield
overlapping branch-detection regexes: a branch 'agent/foo-1-42-slug' is matched
by both '^agent/foo-(\d+)-' (capturing '1' as the issue id) and
'^agent/foo-1-(\d+)-' (capturing '42'), and the registry-order tie-break in
`fetchIssuesWithOpenPRs` then attributes the open PR to the wrong issue.

Extend `indexByKey` with a pairwise prefix-incomparable check that throws at
module load with a precise message, in line with the existing
`STRATEGY_KEY_PATTERN`, `CONTROL_TAG_PATTERN`, and duplicate-key checks
introduced in 89be4bd3. Today's single-entry registry passes unchanged.

* refactor(sandcastle): unify prompt taxonomy and decouple planner from orchestrator

Adopt the Inputs/Task/Output/Rules/Done section structure across the
plan, actor and critic prompts; drop `Agent` from the role headers;
deduplicate intra-prompt boilerplate (verbatim instructions, repeated
quality-gate blocks, multiply-explained confidence levels). Total prompt
size: 217 \u2192 142 lines.

Decouple the planner output from registry-known data: the planner now
emits a kebab-case `slug` only; the orchestrator builds
`branch = \`\${source.branchPrefix}-\${id}-\${slug}\`` and copies
`strategyKey` from the registry-resolved source. The validator drops the
`strategyKey` echo check and the full-branch regex, validates the slug
shape via `SLUG_PATTERN` and a `MAX_SLUG_CHARS` cap. The planner JSON
view is also projected to `{ body, labels, number, title }` so
`branchPrefix`/`strategyKey` cannot leak back into the prompt.

Rename actor variable `TASK_ID` \u2192 `ISSUE_NUMBER` for precision; drop the
`## Planner Analysis` section header from `buildPlanContext` so
`{{PLAN_CONTEXT}}` interpolates cleanly inside the actor's Inputs
section.

* fix(sandcastle): de-duplicate multi-line prompt placeholders

The sandcastle library's PromptArgumentSubstitution
(`@ai-hero/sandcastle/dist/PromptArgumentSubstitution.js`) applies the
regex \`/\\{\\{\\s*([A-Za-z_][A-Za-z0-9_]*)\\s*\\}\\}/g\` globally with no
Markdown awareness, so a placeholder appearing both inside an
`## Inputs` bullet (`\`{{KEY}}\` \u2014 description.`) and on a bare line
below was substituted twice. For multi-line values (`ISSUE_BODY`,
`PLAN_CONTEXT`, `FINDINGS`, `ISSUES_JSON`, `ACCEPTANCE_CRITERIA`)
this corrupted the inline-code span and doubled the prompt token cost
on every actor and critic round.

Reference variables by bare name (`\`KEY\``) in the `## Inputs` bullets
and in narratives that mention multi-line values; keep the bare-line
`{{KEY}}` injections downstream as the single substitution sites.
Scalar variables (`ISSUE_NUMBER`, `ISSUE_TITLE`, `BRANCH`,
`BASE_BRANCH`, `NONCE`) remain interpolated where their single-line
value belongs in the rendered prose or commands.

* refactor(sandcastle): centralize MAX_SLUG_CHARS in constants module

`MAX_SLUG_CHARS` is a tunable peer of `MAX_TITLE_CHARS`,
`CONTEXT_HASH_RADIUS`, and `HASH_PREFIX_LENGTH`, all already in
`constants.ts`. Move it there to honour the AGENTS.md "single source of
truth: canonical defaults" rule. `SLUG_PATTERN` stays local in
`task-source.ts`: it is a structural validator regex, not a tunable.

* fix(sandcastle): align PR-coverage branch regex with slug grammar

The PR-coverage regex used to dedup open PRs in `fetchIssuesWithOpenPRs`
omitted the trailing slug, while `validatePlanEntry` enforces a strict
kebab-case slug shape via `SLUG_PATTERN`. A stale or malformed branch
like `agent/implement-42-` would match the dedup regex and suppress
retries forever.

Lift the slug body into a shared `SLUG_PATTERN_BODY` constant reused by
both `SLUG_PATTERN` (the plan-entry validator) and the per-strategy
`branchPatterns` so the two definitions stay in lock-step.

* fix(sandcastle): sanitize planner-supplied issue title at trust boundary

`validatePlanEntry` validated typeof, length and control characters on
`item.title` but stored the raw value verbatim in `TaskSpec`, exposing
both the actor prompt (`{{ISSUE_TITLE}}`) and the GitHub PR title (via
`finalizer.ts`) to control tags such as `<system>` or `<plan>` that a
prompt-injected planner could echo from an issue body.

Route `item.title` through the same registry-derived `sanitizeForPrompt`
already used for `body`, `rootCauseHypothesis` and `acceptanceCriteria`,
trim whitespace, and reject the entry if the sanitised title is empty.
Length and control-character checks remain on the raw input as a fast
fail-fast path before sanitisation.

* fix(sandcastle): retry on all-invalid plan and dedupe planner ids

`validatePlan` returned `[]` both for legitimate empty plans and for
plans where every entry failed validation. The caller treats `[]`
identically ("No actionable issues. Exiting."), so a systemic planner
mistake silently terminated the nightly run as if there were genuinely
no work, wasting the entire retry budget without diagnostic.

It also accepted duplicate ids: a planner emitting two entries for the
same issue number produced two `TaskSpec`s sharing one branch, so
`main.ts` would spawn two concurrent sandboxes and create duplicate PRs.

Restructure `validatePlan` to:
- Track `seenIds: Set<string>`; on duplicate, warn and drop the second
  occurrence (first registered wins, mirrors the multi-label dedup
  pattern in `fetchAndSanitizeIssues`).
- Detect `parsed.issues.length > 0 && validated.length === 0`: log an
  error and return `null` so the existing retry loop at lines 119-177
  re-invokes the planner instead of exiting silently.

author	Jérôme Benoit <jerome.benoit@piment-noir.org>
	Sun, 17 May 2026 22:45:10 +0000 (00:45 +0200)
committer	GitHub <noreply@github.com>
	Sun, 17 May 2026 22:45:10 +0000 (00:45 +0200)
commit	5d430ad4b09351c66730e58283efb0a1e9d7cd83
tree	5af8d56ba1feabe99c647ba11d0107b088cd81b7	tree \| snapshot (zip tar.gz tar.bz2)
parent	8bb806dc4ae42aafe100009dfd34be4809ebc921	commit \| diff

.github/workflows/sandcastle.yml		diff \| blob \| blame \| history
.sandcastle/constants.ts		diff \| blob \| blame \| history
.sandcastle/main.ts		diff \| blob \| blame \| history
.sandcastle/plan-prompt.md		diff \| blob \| blame \| history
.sandcastle/strategies/implement/actor-prompt.md		diff \| blob \| blame \| history
.sandcastle/strategies/implement/critic-prompt.md		diff \| blob \| blame \| history
.sandcastle/strategies/implement/strategy.ts		diff \| blob \| blame \| history
.sandcastle/strategies/index.ts	[new file with mode: 0644]	blob
.sandcastle/task-source.ts		diff \| blob \| blame \| history
.sandcastle/types.ts		diff \| blob \| blame \| history