From f2d20320ecef33ac08bb5e001eedb71ab58ffaba Mon Sep 17 00:00:00 2001 From: =?utf8?q?J=C3=A9r=C3=B4me=20Benoit?= Date: Fri, 14 Nov 2025 17:46:56 +0100 Subject: [PATCH] docs(reforcexy): refine README MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Signed-off-by: Jérôme Benoit --- ReforceXY/reward_space_analysis/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ReforceXY/reward_space_analysis/README.md b/ReforceXY/reward_space_analysis/README.md index 3ad0397..901514f 100644 --- a/ReforceXY/reward_space_analysis/README.md +++ b/ReforceXY/reward_space_analysis/README.md @@ -433,9 +433,9 @@ done Combine with other overrides cautiously; use distinct `out_dir` per configuration. -### PBRS Rationale +### PBRS Configuration -Canonical mode seeks near zero-sum shaping (Φ terminal ≈ 0) ensuring invariance: reward differences reflect environment performance, not potential leakage. Non-canonical modes or additives (entry/exit) trade strict invariance for potential extra signal shaping. Progressive release & spike cancel adjust temporal release of Φ. Choose canonical for theory alignment; use non-canonical or additives only when empirical gain outweighs invariance guarantees. Symbol Φ denotes potential. See invariance condition and drift correction mechanics under PBRS section. +Canonical mode enforces zero-sum shaping (Φ terminal ≈ 0) for theoretical invariance. Non-canonical modes or additives modify this behavior. Choose canonical for standard PBRS compliance; use non-canonical when specific shaping behavior is required. ### Real Data Comparison -- 2.43.0