fix(ReforceXY): compute exit factor properly

author Jérôme Benoit <jerome.benoit@piment-noir.org>

Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)

committer Jérôme Benoit <jerome.benoit@piment-noir.org>

Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)
author Jérôme Benoit <jerome.benoit@piment-noir.org>
Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)
committer Jérôme Benoit <jerome.benoit@piment-noir.org>
Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)
diff --git a/ReforceXY/reward_space_analysis/README.md b/ReforceXY/reward_space_analysis/README.md

index 180cc9981c2eeb6127e23df55ddcbc8dd30ee1cd..52d14e5f9a74967be526d9139f2b33132700e95a 100644 (file)
--- a/ReforceXY/reward_space_analysis/README.md
+++ b/ReforceXY/reward_space_analysis/README.md
@@ -42,6 +42,10 @@ Full test documentation: [tests/README.md](./tests/README.md).
  - [Quick Start](#quick-start)
  - [Prerequisites](#prerequisites)
  - [Common Use Cases](#common-use-cases)
+  - [1. Validate Reward Logic](#1-validate-reward-logic)
+  - [2. Parameter Sensitivity](#2-parameter-sensitivity)
+  - [3. Debug Anomalies](#3-debug-anomalies)
+  - [4. Real vs Synthetic](#4-real-vs-synthetic)
  - [CLI Parameters](#cli-parameters)
    - [Simulation & Environment](#simulation--environment)
    - [Hybrid Simulation Scalars](#hybrid-simulation-scalars)
@@ -56,13 +60,21 @@ Full test documentation: [tests/README.md](./tests/README.md).
    - [Overrides vs --params](#overrides-vs--params)
  - [Examples](#examples)
  - [Outputs](#outputs)
+  - [Main Report (`statistical_analysis.md`)](#main-report-statistical_analysismd)
+  - [Data Exports](#data-exports)
+  - [Manifest (`manifest.json`)](#manifest-manifestjson)
+  - [Distribution Shift Metrics](#distribution-shift-metrics)
  - [Advanced Usage](#advanced-usage)
    - [Parameter Sweeps](#parameter-sweeps)
-  - [PBRS Rationale](#pbrs-rationale)
+  - [PBRS Configuration](#pbrs-configuration)
    - [Real Data Comparison](#real-data-comparison)
    - [Batch Analysis](#batch-analysis)
  - [Testing](#testing)
  - [Troubleshooting](#troubleshooting)
+  - [No Output Files](#no-output-files)
+  - [Unexpected Reward Values](#unexpected-reward-values)
+  - [Slow Execution](#slow-execution)
+  - [Memory Errors](#memory-errors)
  
  ## Prerequisites
  
@@ -191,8 +203,7 @@ be overridden via `--params`.
  - **`--strict_diagnostics`** (flag, default: false) – Fail-fast on degenerate
    statistical diagnostics (zero-width CIs, undefined distribution metrics)
    instead of graceful fallbacks.
-- **`--exit_factor_threshold`** (float, default: 10000.0) – Warn if exit factor
-  exceeds threshold.
+- **`--exit_factor_threshold`** (float, default: 1000.0) – Emits a warning if the absolute value of the exit factor exceeds the threshold.
  - **`--pvalue_adjust`** (none|benjamini_hochberg, default: none) – Multiple
    testing p-value adjustment method.
  - **`--bootstrap_resamples`** (int, default: 10000) – Bootstrap iterations for
@@ -215,63 +226,93 @@ be overridden via `--params`.
  
  #### Core
  
-| Parameter           | Default | Description                 |
-| ------------------- | ------- | --------------------------- |
-| `base_factor`       | 100.0   | Base reward scale           |
-| `invalid_action`    | -2.0    | Penalty for invalid actions |
-| `win_reward_factor` | 2.0     | Profit overshoot multiplier |
-| `pnl_factor_beta`   | 0.5     | PnL amplification beta      |
+| Parameter        | Default | Description                 |
+| ---------------- | ------- | --------------------------- |
+| `base_factor`    | 100.0   | Base reward scale           |
+| `invalid_action` | -2.0    | Penalty for invalid actions |
  
-#### Duration Penalties
+#### Exit Factor
  
-| Parameter                    | Default | Description                |
-| ---------------------------- | ------- | -------------------------- |
-| `max_trade_duration_candles` | 128     | Trade duration cap         |
-| `max_idle_duration_candles`  | None    | Fallback 4× trade duration |
-| `idle_penalty_scale`         | 0.5     | Idle penalty scale         |
-| `idle_penalty_power`         | 1.025   | Idle penalty exponent      |
-| `hold_penalty_scale`         | 0.25    | Hold penalty scale         |
-| `hold_penalty_power`         | 1.025   | Hold penalty exponent      |
+The exit factor is computed as:
  
-#### Exit Attenuation
+`exit_factor` = `base_factor `× `time_attenuation_coefficient` × `pnl_coefficient`
+where:
+`pnl_coefficient` = `pnl_target_coefficient` × `efficiency_coefficient`
  
-| Parameter               | Default | Description                    |
-| ----------------------- | ------- | ------------------------------ |
-| `exit_attenuation_mode` | linear  | Kernel mode                    |
-| `exit_plateau`          | true    | Flat region before attenuation |
-| `exit_plateau_grace`    | 1.0     | Plateau grace ratio            |
-| `exit_linear_slope`     | 1.0     | Linear slope                   |
-| `exit_power_tau`        | 0.5     | Power kernel tau (0,1]         |
-| `exit_half_life`        | 0.5     | Half-life for half_life kernel |
+##### PnL Target
+
+| Parameter           | Default | Description                   |
+| ------------------- | ------- | ----------------------------- |
+| `profit_target`     | 0.03    | Target profit threshold       |
+| `risk_reward_ratio` | 1.0     | Risk/reward multiplier        |
+| `win_reward_factor` | 2.0     | Profit overshoot bonus factor |
+| `pnl_factor_beta`   | 0.5     | PnL amplification sensitivity |
  
-#### Efficiency
+**Note:** In ReforceXY, `profit_target` maps to `profit_aim` and `risk_reward_ratio` maps to `rr`.
+
+**Formula:**
+
+Let `pnl_target = profit_target × risk_reward_ratio`, `pnl_ratio = pnl / pnl_target`.
+
+- If `pnl_target ≤ 0`: `pnl_target_coefficient = 1.0`
+- If `pnl_ratio > 1.0`:
+  `pnl_target_coefficient = 1.0 + win_reward_factor × tanh(pnl_factor_beta × (pnl_ratio − 1.0))`
+- If `pnl_ratio < −(1.0 / risk_reward_ratio)`:
+  `pnl_target_coefficient = 1.0 + (win_reward_factor × risk_reward_ratio) × tanh(pnl_factor_beta × (|pnl_ratio| − 1.0))`
+- Else: `pnl_target_coefficient = 1.0`
+
+##### Efficiency
  
  | Parameter           | Default | Description                    |
  | ------------------- | ------- | ------------------------------ |
  | `efficiency_weight` | 1.0     | Efficiency contribution weight |
  | `efficiency_center` | 0.5     | Efficiency pivot in [0,1]      |
  
-**Formula (unrealized profit normalization):**
+**Formula:**
  
  Let `max_u = max_unrealized_profit`, `min_u = min_unrealized_profit`,
  `range = max_u - min_u`, `ratio = (pnl - min_u)/range`. Then:
  
  - If `pnl > 0`:
-  `efficiency_factor = 1 + efficiency_weight * (ratio - efficiency_center)`
+  `efficiency_coefficient = 1 + efficiency_weight * (ratio - efficiency_center)`
  - If `pnl < 0`:
-  `efficiency_factor = 1 + efficiency_weight * (efficiency_center - ratio)`
-- Else: `efficiency_factor = 1`
+  `efficiency_coefficient = 1 + efficiency_weight * (efficiency_center - ratio)`
+- Else: `efficiency_coefficient = 1`
+
+##### Exit Attenuation
  
-Final exit multiplier path: `exit_reward = pnl * exit_factor`, where
-`exit_factor = kernel(base_factor, duration_ratio_adjusted) * pnl_factor` and
-`pnl_factor` includes the `efficiency_factor` above.
+| Parameter               | Default | Description                    |
+| ----------------------- | ------- | ------------------------------ |
+| `exit_attenuation_mode` | linear  | Kernel mode                    |
+| `exit_plateau`          | true    | Flat region before attenuation |
+| `exit_plateau_grace`    | 1.0     | Plateau grace ratio            |
+| `exit_linear_slope`     | 1.0     | Linear slope                   |
+| `exit_power_tau`        | 0.5     | Power kernel tau (0,1]         |
+| `exit_half_life`        | 0.5     | Half-life for half_life kernel |
+
+**Formula:**
+
+`time_attenuation_coefficient = kernel_function(duration_ratio)`
+
+where `kernel_function` depends on `exit_attenuation_mode`. See [Exit Attenuation Kernels](#exit-attenuation-kernels) for detailed formulas.
+
+#### Duration Penalties
+
+| Parameter                    | Default | Description                |
+| ---------------------------- | ------- | -------------------------- |
+| `max_trade_duration_candles` | 128     | Trade duration cap         |
+| `max_idle_duration_candles`  | None    | Fallback 4× trade duration |
+| `idle_penalty_scale`         | 0.5     | Idle penalty scale         |
+| `idle_penalty_power`         | 1.025   | Idle penalty exponent      |
+| `hold_penalty_scale`         | 0.25    | Hold penalty scale         |
+| `hold_penalty_power`         | 1.025   | Hold penalty exponent      |
  
  #### Validation
  
  | Parameter               | Default | Description                       |
  | ----------------------- | ------- | --------------------------------- |
  | `check_invariants`      | true    | Invariant enforcement (see above) |
-| `exit_factor_threshold` | 10000.0 | Warn on excessive factor          |
+| `exit_factor_threshold` | 1000.0  | Warn on excessive factor          |
  
  #### PBRS (Potential-Based Reward Shaping)
  
@@ -327,13 +368,13 @@ r* = r - grace    if exit_plateau and r >  grace
  r* = r            if not exit_plateau
  ```
  
-| Mode      | Multiplier applied to base_factor \* pnl \* pnl_factor \* efficiency_factor | Monotonic | Notes                                       | Use Case                             |
-| --------- | --------------------------------------------------------------------------- | --------- | ------------------------------------------- | ------------------------------------ |
-| legacy    | step: ×1.5 if r\* ≤ 1 else ×0.5                                             | No        | Non-monotonic legacy mode (not recommended) | Backward compatibility only          |
-| sqrt      | 1 / sqrt(1 + r\*)                                                           | Yes       | Sub-linear decay                            | Gentle long-trade penalty            |
-| linear    | 1 / (1 + slope \* r\*)                                                      | Yes       | slope = `exit_linear_slope`                 | Balanced duration penalty (default)  |
-| power     | (1 + r\*)^(-alpha)                                                          | Yes       | alpha = -ln(tau)/ln(2); tau=1 ⇒ alpha=0     | Tunable decay rate via tau parameter |
-| half_life | 2^(- r\* / hl)                                                              | Yes       | hl = `exit_half_life`; r\*=hl ⇒ factor ×0.5 | Time-based exponential discount      |
+| Mode      | Formula                         | Monotonic | Notes                                       | Use Case                             |
+| --------- | ------------------------------- | --------- | ------------------------------------------- | ------------------------------------ |
+| legacy    | step: ×1.5 if r\* ≤ 1 else ×0.5 | No        | Non-monotonic legacy mode (not recommended) | Backward compatibility only          |
+| sqrt      | 1 / sqrt(1 + r\*)               | Yes       | Sub-linear decay                            | Gentle long-trade penalty            |
+| linear    | 1 / (1 + slope \* r\*)          | Yes       | slope = `exit_linear_slope`                 | Balanced duration penalty (default)  |
+| power     | (1 + r\*)^(-alpha)              | Yes       | alpha = -ln(tau)/ln(2); tau=1 ⇒ alpha=0     | Tunable decay rate via tau parameter |
+| half_life | 2^(- r\* / hl)                  | Yes       | hl = `exit_half_life`; r\*=hl ⇒ factor ×0.5 | Time-based exponential discount      |
  
  ### Transform Functions
  
diff --git a/ReforceXY/reward_space_analysis/reward_space_analysis.py b/ReforceXY/reward_space_analysis/reward_space_analysis.py

index f8a2cc6b52277dbc923fd28fa08d25a022d9b450..5594ebf6daec80cb314e117f8a83157f3a4cdd79 100644 (file)
--- a/ReforceXY/reward_space_analysis/reward_space_analysis.py
+++ b/ReforceXY/reward_space_analysis/reward_space_analysis.py
@@ -133,7 +133,7 @@ DEFAULT_MODEL_REWARD_PARAMETERS: RewardParams = {
      "pnl_factor_beta": 0.5,
      # Invariant / safety (env defaults)
      "check_invariants": True,
-    "exit_factor_threshold": 10000.0,
+    "exit_factor_threshold": 1000.0,
      # === PBRS PARAMETERS ===
      # Potential-based reward shaping core parameters
      # Discount factor γ for potential term (0 ≤ γ ≤ 1)
@@ -663,17 +663,15 @@ class RewardBreakdown:
      invariance_correction: float = 0.0
  
  
-def _get_exit_factor(
-    base_factor: float,
-    pnl: float,
-    pnl_factor: float,
+def _compute_time_attenuation_coefficient(
      duration_ratio: float,
      params: RewardParams,
  ) -> float:
-    """Exit factor (kernel + optional plateau) * pnl_factor with invariants."""
-    if not np.isfinite(base_factor) or not np.isfinite(pnl) or not np.isfinite(duration_ratio):
-        return _fail_safely("non_finite_exit_factor_inputs")
+    """
+    Calculate time-based attenuation coefficient using configurable strategy.
  
+    Returns a coefficient (typically in range [0.5, 2.0]) to multiply with base_factor.
+    """
      if duration_ratio < 0.0:
          duration_ratio = 0.0
  
@@ -713,16 +711,16 @@ def _get_exit_factor(
          )
          exit_linear_slope = 1.0
  
-    def _legacy_kernel(f: float, dr: float) -> float:
-        return f * (1.5 if dr <= 1.0 else 0.5)
+    def _legacy_kernel(dr: float) -> float:
+        return 1.5 if dr <= 1.0 else 0.5
  
-    def _sqrt_kernel(f: float, dr: float) -> float:
-        return f / math.sqrt(1.0 + dr)
+    def _sqrt_kernel(dr: float) -> float:
+        return 1.0 / math.sqrt(1.0 + dr)
  
-    def _linear_kernel(f: float, dr: float) -> float:
-        return f / (1.0 + exit_linear_slope * dr)
+    def _linear_kernel(dr: float) -> float:
+        return 1.0 / (1.0 + exit_linear_slope * dr)
  
-    def _power_kernel(f: float, dr: float) -> float:
+    def _power_kernel(dr: float) -> float:
          tau = _get_float_param(
              params,
              "exit_power_tau",
@@ -739,9 +737,9 @@ def _get_exit_factor(
                  stacklevel=2,
              )
              alpha = 1.0
-        return f / math.pow(1.0 + dr, alpha)
+        return 1.0 / math.pow(1.0 + dr, alpha)
  
-    def _half_life_kernel(f: float, dr: float) -> float:
+    def _half_life_kernel(dr: float) -> float:
          hl = _get_float_param(
              params,
              "exit_half_life",
@@ -756,7 +754,7 @@ def _get_exit_factor(
                  stacklevel=2,
              )
              return 1.0
-        return f * math.pow(2.0, -dr / hl)
+        return math.pow(2.0, -dr / hl)
  
      kernels = {
          "legacy": _legacy_kernel,
@@ -785,16 +783,49 @@ def _get_exit_factor(
          kernel = _linear_kernel
  
      try:
-        attenuation_factor = kernel(base_factor, effective_dr)
+        time_attenuation_coefficient = kernel(effective_dr)
      except Exception as e:
          warnings.warn(
              f"exit_attenuation_mode '{exit_attenuation_mode}' failed ({e!r}); fallback linear (effective_dr={effective_dr:.5f})",
              RewardDiagnosticsWarning,
              stacklevel=2,
          )
-        attenuation_factor = _linear_kernel(base_factor, effective_dr)
+        time_attenuation_coefficient = _linear_kernel(effective_dr)
+
+    return time_attenuation_coefficient
+
+
+def _get_exit_factor(
+    base_factor: float,
+    pnl: float,
+    pnl_coefficient: float,
+    duration_ratio: float,
+    params: RewardParams,
+) -> float:
+    """
+    Compute exit reward factor by applying multiplicative coefficients to base_factor.
+
+    Formula: exit_factor = base_factor × time_attenuation_coefficient × pnl_coefficient
+
+    The time_attenuation_coefficient reduces rewards for longer trades, and the
+    pnl_coefficient adjusts rewards based on profit/target ratio and exit timing efficiency.
+
+    Args:
+        base_factor: Base reward value before coefficient adjustments
+        pnl: Realized profit/loss
+        pnl_coefficient: PnL scaling coefficient (already calculated)
+        duration_ratio: Trade duration relative to target duration
+        params: Reward configuration parameters
+
+    Returns:
+        float: Final exit factor (can be negative for losses)
+    """
+    if not np.isfinite(base_factor) or not np.isfinite(pnl) or not np.isfinite(duration_ratio):
+        return _fail_safely("non_finite_exit_factor_inputs")
+
+    time_attenuation_coefficient = _compute_time_attenuation_coefficient(duration_ratio, params)
  
-    exit_factor = attenuation_factor * pnl_factor
+    exit_factor = base_factor * time_attenuation_coefficient * pnl_coefficient
  
      if _get_bool_param(
          params,
@@ -808,7 +839,7 @@ def _get_exit_factor(
          exit_factor_threshold = _get_float_param(
              params,
              "exit_factor_threshold",
-            DEFAULT_MODEL_REWARD_PARAMETERS.get("exit_factor_threshold", 10000.0),
+            DEFAULT_MODEL_REWARD_PARAMETERS.get("exit_factor_threshold", 1000.0),
          )
          if exit_factor_threshold > 0 and np.isfinite(exit_factor_threshold):
              if abs(exit_factor) > exit_factor_threshold:
@@ -823,42 +854,78 @@ def _get_exit_factor(
      return exit_factor
  
  
-def _get_pnl_factor(
+def _compute_pnl_target_coefficient(
      params: RewardParams,
-    context: RewardContext,
+    pnl: float,
      profit_target: float,
      risk_reward_ratio: float,
  ) -> float:
-    """PnL factor: tanh overshoot/loss modulation + efficiency tilt (non-negative)."""
-    pnl = context.pnl
-    if not np.isfinite(pnl) or not np.isfinite(profit_target) or not np.isfinite(risk_reward_ratio):
-        return _fail_safely("non_finite_inputs_pnl_factor")
-    if profit_target <= 0.0:
-        return 0.0
+    """
+    Compute PnL target coefficient based on PnL/target ratio using tanh.
  
-    win_reward_factor = _get_float_param(
-        params,
-        "win_reward_factor",
-        DEFAULT_MODEL_REWARD_PARAMETERS.get("win_reward_factor", 2.0),
-    )
-    pnl_factor_beta = _get_float_param(
-        params,
-        "pnl_factor_beta",
-        DEFAULT_MODEL_REWARD_PARAMETERS.get("pnl_factor_beta", 0.5),
-    )
-    rr = risk_reward_ratio if risk_reward_ratio > 0 else 1.0
-
-    pnl_ratio = pnl / profit_target
-    pnl_target_factor = 1.0
-    if abs(pnl_ratio) > 1.0:
-        base_pnl_target_factor = math.tanh(pnl_factor_beta * (abs(pnl_ratio) - 1.0))
-        if pnl_ratio > 1.0:
-            pnl_target_factor = 1.0 + win_reward_factor * base_pnl_target_factor
-        elif pnl_ratio < -(1.0 / rr):
-            loss_penalty_factor = win_reward_factor * rr
-            pnl_target_factor = 1.0 + loss_penalty_factor * base_pnl_target_factor
-
-    efficiency_factor = 1.0
+    Returns a coefficient (typically 0.5-2.0) to be multiplied with base_factor.
+    The coefficient rewards trades that exceed profit targets and penalizes losses
+    beyond the risk/reward threshold.
+
+    Args:
+        params: Reward configuration parameters
+        pnl: Realized profit/loss
+        profit_target: Target profit threshold
+        risk_reward_ratio: Risk/reward ratio for loss penalty calculation
+
+    Returns:
+        float: Coefficient ≥ 0.0 (typically 0.5-2.0 range)
+    """
+    pnl_target_coefficient = 1.0
+
+    if profit_target > 0.0:
+        win_reward_factor = _get_float_param(
+            params,
+            "win_reward_factor",
+            DEFAULT_MODEL_REWARD_PARAMETERS.get("win_reward_factor", 2.0),
+        )
+        pnl_factor_beta = _get_float_param(
+            params,
+            "pnl_factor_beta",
+            DEFAULT_MODEL_REWARD_PARAMETERS.get("pnl_factor_beta", 0.5),
+        )
+        rr = risk_reward_ratio if risk_reward_ratio > 0 else 1.0
+
+        pnl_ratio = pnl / profit_target
+        if abs(pnl_ratio) > 1.0:
+            base_pnl_target_coefficient = math.tanh(pnl_factor_beta * (abs(pnl_ratio) - 1.0))
+            if pnl_ratio > 1.0:
+                pnl_target_coefficient = 1.0 + win_reward_factor * base_pnl_target_coefficient
+            elif pnl_ratio < -(1.0 / rr):
+                loss_penalty_factor = win_reward_factor * rr
+                pnl_target_coefficient = 1.0 + loss_penalty_factor * base_pnl_target_coefficient
+
+    return pnl_target_coefficient
+
+
+def _compute_efficiency_coefficient(
+    params: RewardParams,
+    context: RewardContext,
+    pnl: float,
+) -> float:
+    """
+    Compute exit efficiency coefficient based on PnL position relative to unrealized extremes.
+
+    Returns a coefficient (typically 0.5-1.5) that rewards exits closer to optimal timing.
+    For profitable trades, higher coefficient when exiting near max unrealized profit.
+    For losing trades, higher coefficient when exiting near min unrealized loss.
+
+    Args:
+        params: Reward configuration parameters containing:
+            - efficiency_weight: Amplification factor for efficiency adjustment
+            - efficiency_center: Target efficiency ratio (0.0-1.0)
+        context: Trade context with unrealized profit/loss extremes
+        pnl: Realized profit/loss
+
+    Returns:
+        float: Coefficient ≥ 0.0 (typically 0.5-1.5 range)
+    """
+    efficiency_coefficient = 1.0
      efficiency_weight = _get_float_param(
          params,
          "efficiency_weight",
@@ -876,11 +943,51 @@ def _get_pnl_factor(
          if np.isfinite(range_pnl) and not np.isclose(range_pnl, 0.0):
              efficiency_ratio = (pnl - min_pnl) / range_pnl
              if pnl > 0.0:
-                efficiency_factor = 1.0 + efficiency_weight * (efficiency_ratio - efficiency_center)
+                efficiency_coefficient = 1.0 + efficiency_weight * (
+                    efficiency_ratio - efficiency_center
+                )
              elif pnl < 0.0:
-                efficiency_factor = 1.0 + efficiency_weight * (efficiency_center - efficiency_ratio)
+                efficiency_coefficient = 1.0 + efficiency_weight * (
+                    efficiency_center - efficiency_ratio
+                )
+
+    return efficiency_coefficient
  
-    return max(0.0, pnl_target_factor * efficiency_factor)
+
+def _get_pnl_coefficient(
+    params: RewardParams,
+    context: RewardContext,
+    profit_target: float,
+    risk_reward_ratio: float,
+) -> float:
+    """
+    Compute combined PnL coefficient from target and efficiency components.
+
+    Multiplies the PnL target coefficient (based on profit/target ratio) with
+    the efficiency coefficient (based on exit timing quality) to produce a
+    single composite coefficient applied to the base reward factor.
+
+    Args:
+        params: Reward configuration parameters
+        context: Trade context with PnL and unrealized extremes
+        profit_target: Target profit threshold
+        risk_reward_ratio: Risk/reward ratio for loss penalty calculation
+
+    Returns:
+        float: Composite coefficient ≥ 0.0 (typically 0.25-4.0 range)
+    """
+    pnl = context.pnl
+    if not np.isfinite(pnl) or not np.isfinite(profit_target) or not np.isfinite(risk_reward_ratio):
+        return _fail_safely("non_finite_inputs_pnl_coefficient")
+    if profit_target <= 0.0:
+        return 0.0
+
+    pnl_target_coefficient = _compute_pnl_target_coefficient(
+        params, pnl, profit_target, risk_reward_ratio
+    )
+    efficiency_coefficient = _compute_efficiency_coefficient(params, context, pnl)
+
+    return max(0.0, pnl_target_coefficient * efficiency_coefficient)
  
  
  def _is_valid_action(
@@ -946,7 +1053,7 @@ def _hold_penalty(context: RewardContext, hold_factor: float, params: RewardPara
  
  def _compute_exit_reward(
      base_factor: float,
-    pnl_factor: float,
+    pnl_coefficient: float,
      context: RewardContext,
      params: RewardParams,
  ) -> float:
@@ -957,7 +1064,9 @@ def _compute_exit_reward(
          DEFAULT_MODEL_REWARD_PARAMETERS.get("max_trade_duration_candles", 128),
      )
      duration_ratio = _compute_duration_ratio(context.trade_duration, max_trade_duration_candles)
-    exit_factor = _get_exit_factor(base_factor, context.pnl, pnl_factor, duration_ratio, params)
+    exit_factor = _get_exit_factor(
+        base_factor, context.pnl, pnl_coefficient, duration_ratio, params
+    )
      return context.pnl * exit_factor
  
  
@@ -999,7 +1108,7 @@ def calculate_reward(
      pnl_target = float(profit_target * risk_reward_ratio)
  
      idle_factor = factor * pnl_target / 4.0
-    pnl_factor = _get_pnl_factor(
+    pnl_coefficient = _get_pnl_coefficient(
          params,
          context,
          pnl_target,
@@ -1019,10 +1128,10 @@ def calculate_reward(
          base_reward = _hold_penalty(context, hold_factor, params)
          breakdown.hold_penalty = base_reward
      elif context.action == Actions.Long_exit and context.position == Positions.Long:
-        base_reward = _compute_exit_reward(factor, pnl_factor, context, params)
+        base_reward = _compute_exit_reward(factor, pnl_coefficient, context, params)
          breakdown.exit_component = base_reward
      elif context.action == Actions.Short_exit and context.position == Positions.Short:
-        base_reward = _compute_exit_reward(factor, pnl_factor, context, params)
+        base_reward = _compute_exit_reward(factor, pnl_coefficient, context, params)
          breakdown.exit_component = base_reward
      else:
          base_reward = 0.0
diff --git a/ReforceXY/reward_space_analysis/tests/components/test_reward_components.py b/ReforceXY/reward_space_analysis/tests/components/test_reward_components.py

index ed0511aedceba042afd5009087819fc71114e761..373cd0133d14019e7a44ac0eae6d2367a8a1c33f 100644 (file)
--- a/ReforceXY/reward_space_analysis/tests/components/test_reward_components.py
+++ b/ReforceXY/reward_space_analysis/tests/components/test_reward_components.py
@@ -12,7 +12,7 @@ from reward_space_analysis import (
      _compute_hold_potential,
      _get_exit_factor,
      _get_float_param,
-    _get_pnl_factor,
+    _get_pnl_coefficient,
      calculate_reward,
  )
  
@@ -194,11 +194,11 @@ class TestRewardComponents(RewardSpaceTestBase):
          )
  
      def test_efficiency_zero_policy(self):
-        """Test efficiency zero policy produces expected PnL factor.
+        """Test efficiency zero policy produces expected PnL coefficient.
  
          Verifies:
-        - efficiency_weight = 0 → pnl_factor ≈ 1.0
-        - Factor is finite and positive
+        - efficiency_weight = 0 → pnl_coefficient ≈ 1.0
+        - Coefficient is finite and positive
          """
          ctx = self.make_ctx(
              pnl=0.0,
@@ -210,9 +210,9 @@ class TestRewardComponents(RewardSpaceTestBase):
          )
          params = self.base_params()
          profit_target = self.TEST_PROFIT_TARGET * self.TEST_RR
-        pnl_factor = _get_pnl_factor(params, ctx, profit_target, self.TEST_RR)
-        self.assertFinite(pnl_factor, name="pnl_factor")
-        self.assertAlmostEqualFloat(pnl_factor, 1.0, tolerance=self.TOL_GENERIC_EQ)
+        pnl_coefficient = _get_pnl_coefficient(params, ctx, profit_target, self.TEST_RR)
+        self.assertFinite(pnl_coefficient, name="pnl_coefficient")
+        self.assertAlmostEqualFloat(pnl_coefficient, 1.0, tolerance=self.TOL_GENERIC_EQ)
  
      def test_max_idle_duration_candles_logic(self):
          """Test max idle duration candles parameter affects penalty magnitude.
@@ -267,7 +267,11 @@ class TestRewardComponents(RewardSpaceTestBase):
          for mode in modes_to_test:
              test_params = self.base_params(exit_attenuation_mode=mode)
              factor = _get_exit_factor(
-                base_factor=1.0, pnl=0.02, pnl_factor=1.5, duration_ratio=0.3, params=test_params
+                base_factor=1.0,
+                pnl=0.02,
+                pnl_coefficient=1.5,
+                duration_ratio=0.3,
+                params=test_params,
              )
              self.assertFinite(factor, name=f"exit_factor[{mode}]")
              self.assertGreater(factor, 0, f"Exit factor for {mode} should be positive")
@@ -282,7 +286,7 @@ class TestRewardComponents(RewardSpaceTestBase):
              _get_exit_factor,
              base_factor=1.0,
              pnl=0.02,
-            pnl_factor=1.5,
+            pnl_coefficient=1.5,
              plateau_params=plateau_params,
              grace=0.5,
              tolerance_strict=self.TOL_IDENTITY_STRICT,
@@ -508,6 +512,48 @@ class TestRewardComponents(RewardSpaceTestBase):
              msg="invariance_correction should be ~0 in canonical mode",
          )
  
+    def test_efficiency_center_extremes(self):
+        """Efficiency center extremes affect pnl_coefficient as expected when pnl_target_coefficient=1."""
+        context = self.make_ctx(
+            pnl=0.05,
+            trade_duration=10,
+            idle_duration=0,
+            max_unrealized_profit=0.10,
+            min_unrealized_profit=0.00,
+            position=Positions.Long,
+            action=Actions.Long_exit,
+        )
+        profit_target = 0.20
+        base_params = self.base_params(efficiency_weight=2.0)
+        params_center0 = dict(base_params, efficiency_center=0.0)
+        params_center1 = dict(base_params, efficiency_center=1.0)
+        coef_c0 = _get_pnl_coefficient(params_center0, context, profit_target, self.TEST_RR)
+        coef_c1 = _get_pnl_coefficient(params_center1, context, profit_target, self.TEST_RR)
+        self.assertFinite(coef_c0, name="coef_center0")
+        self.assertFinite(coef_c1, name="coef_center1")
+        self.assertGreater(coef_c0, coef_c1)
+
+    def test_efficiency_weight_zero_vs_two(self):
+        """Efficiency weight 0 yields ~1; weight 2 amplifies pnl_coefficient when center < ratio."""
+        context = self.make_ctx(
+            pnl=0.05,
+            trade_duration=10,
+            idle_duration=0,
+            max_unrealized_profit=0.10,
+            min_unrealized_profit=0.00,
+            position=Positions.Long,
+            action=Actions.Long_exit,
+        )
+        profit_target = 0.20
+        params_w0 = self.base_params(efficiency_weight=0.0, efficiency_center=0.2)
+        params_w2 = self.base_params(efficiency_weight=2.0, efficiency_center=0.2)
+        c0 = _get_pnl_coefficient(params_w0, context, profit_target, self.TEST_RR)
+        c2 = _get_pnl_coefficient(params_w2, context, profit_target, self.TEST_RR)
+        self.assertFinite(c0, name="coef_w0")
+        self.assertFinite(c2, name="coef_w2")
+        self.assertAlmostEqualFloat(c0, 1.0, tolerance=self.TOL_GENERIC_EQ)
+        self.assertGreater(c2, c0)
+
  
  if __name__ == "__main__":
      unittest.main()
diff --git a/ReforceXY/reward_space_analysis/tests/helpers/assertions.py b/ReforceXY/reward_space_analysis/tests/helpers/assertions.py

index 40e20d0a54fcc54a566f4572a3c980eeb63066b0..30ee7914645224adf9496a0f19f3a564bfdfde8f 100644 (file)
--- a/ReforceXY/reward_space_analysis/tests/helpers/assertions.py
+++ b/ReforceXY/reward_space_analysis/tests/helpers/assertions.py
@@ -10,7 +10,7 @@ import numpy as np
  
  from reward_space_analysis import (
      _get_exit_factor,
-    _get_pnl_factor,
+    _get_pnl_coefficient,
      calculate_reward,
  )
  
@@ -518,7 +518,7 @@ def assert_exit_factor_attenuation_modes(
      test_case,
      base_factor: float,
      pnl: float,
-    pnl_factor: float,
+    pnl_coefficient: float,
      attenuation_modes: Sequence[str],
      base_params_fn,
      tolerance_relaxed: float,
@@ -532,7 +532,7 @@ def assert_exit_factor_attenuation_modes(
          test_case: Test case instance with assertion methods
          base_factor: Base scaling factor
          pnl: Profit/loss value
-        pnl_factor: PnL amplification factor
+        pnl_coefficient: PnL amplification coefficient
          attenuation_modes: List of mode names to test
          base_params_fn: Factory function for creating parameter dicts
          tolerance_relaxed: Numerical tolerance for monotonicity checks
@@ -572,7 +572,7 @@ def assert_exit_factor_attenuation_modes(
                  mode_params = base_params_fn(exit_attenuation_mode="sqrt")
              ratios = np.linspace(0, 2, 15)
              values = [
-                _get_exit_factor(base_factor, pnl, pnl_factor, r, mode_params) for r in ratios
+                _get_exit_factor(base_factor, pnl, pnl_coefficient, r, mode_params) for r in ratios
              ]
              if mode == "plateau_linear":
                  grace = float(mode_params["exit_plateau_grace"])
@@ -649,12 +649,12 @@ def assert_exit_mode_mathematical_validation(
          short_allowed=True,
          action_masking=True,
      )
-    pnl_factor_hl = _get_pnl_factor(params, context, profit_target, risk_reward_ratio)
+    pnl_coefficient_hl = _get_pnl_coefficient(params, context, profit_target, risk_reward_ratio)
      observed_exit_factor = _get_exit_factor(
-        base_factor, context.pnl, pnl_factor_hl, duration_ratio, params
+        base_factor, context.pnl, pnl_coefficient_hl, duration_ratio, params
      )
      observed_half_life_factor = observed_exit_factor / (
-        base_factor * max(pnl_factor_hl, np.finfo(float).eps)
+        base_factor * max(pnl_coefficient_hl, np.finfo(float).eps)
      )
      expected_half_life_factor = 2 ** (-duration_ratio / params["exit_half_life"])
      test_case.assertAlmostEqual(
@@ -1008,7 +1008,7 @@ def assert_exit_factor_invariant_suite(
          suite_cases: List of scenario dicts with keys:
              - base_factor: Base scaling factor
              - pnl: Profit/loss value
-            - pnl_factor: PnL amplification factor
+            - pnl_coefficient: PnL amplification coefficient
              - duration_ratio: Duration ratio (0-2)
              - params: Parameter dictionary
              - expectation: Expected invariant ("non_negative", "safe_zero", "clamped")
@@ -1018,12 +1018,12 @@ def assert_exit_factor_invariant_suite(
      Example:
          cases = [
              {
-                "base_factor": 90.0, "pnl": 0.08, "pnl_factor": 1.5,
+                "base_factor": 90.0, "pnl": 0.08, "pnl_coefficient": 1.5,
                  "duration_ratio": 0.5, "params": {...},
                  "expectation": "non_negative", "tolerance": 1e-09
              },
              {
-                "base_factor": 90.0, "pnl": 0.0, "pnl_factor": 0.0,
+                "base_factor": 90.0, "pnl": 0.0, "pnl_coefficient": 0.0,
                  "duration_ratio": 0.5, "params": {...},
                  "expectation": "safe_zero"
              },
@@ -1035,7 +1035,7 @@ def assert_exit_factor_invariant_suite(
              f_val = exit_factor_fn(
                  case["base_factor"],
                  case["pnl"],
-                case["pnl_factor"],
+                case["pnl_coefficient"],
                  case["duration_ratio"],
                  case["params"],
              )
@@ -1055,7 +1055,7 @@ def assert_exit_factor_kernel_fallback(
      exit_factor_fn,
      base_factor: float,
      pnl: float,
-    pnl_factor: float,
+    pnl_coefficient: float,
      duration_ratio: float,
      bad_params: Dict[str, Any],
      reference_params: Dict[str, Any],
@@ -1071,7 +1071,7 @@ def assert_exit_factor_kernel_fallback(
          exit_factor_fn: Exit factor calculation function
          base_factor: Base scaling factor
          pnl: Profit/loss value
-        pnl_factor: PnL amplification factor
+        pnl_coefficient: PnL amplification coefficient
          duration_ratio: Duration ratio
          bad_params: Parameters that trigger kernel failure
          reference_params: Reference linear mode parameters for comparison
@@ -1092,8 +1092,8 @@ def assert_exit_factor_kernel_fallback(
          )
      """
  
-    f_bad = exit_factor_fn(base_factor, pnl, pnl_factor, duration_ratio, bad_params)
-    f_ref = exit_factor_fn(base_factor, pnl, pnl_factor, duration_ratio, reference_params)
+    f_bad = exit_factor_fn(base_factor, pnl, pnl_coefficient, duration_ratio, bad_params)
+    f_ref = exit_factor_fn(base_factor, pnl, pnl_coefficient, duration_ratio, reference_params)
      test_case.assertAlmostEqual(f_bad, f_ref, delta=TOLERANCE.IDENTITY_STRICT)
      test_case.assertGreaterEqual(f_bad, 0.0)
  
@@ -1212,7 +1212,7 @@ def assert_exit_factor_plateau_behavior(
      exit_factor_fn,
      base_factor: float,
      pnl: float,
-    pnl_factor: float,
+    pnl_coefficient: float,
      plateau_params: dict,
      grace: float,
      tolerance_strict: float,
@@ -1224,7 +1224,7 @@ def assert_exit_factor_plateau_behavior(
          exit_factor_fn: Exit factor calculation function (_get_exit_factor)
          base_factor: Base factor for exit calculation
          pnl: PnL value
-        pnl_factor: PnL factor multiplier
+        pnl_coefficient: PnL coefficient multiplier
          plateau_params: Parameters dict with plateau configuration
          grace: Grace period threshold (exit_plateau_grace value)
          tolerance_strict: Tolerance for numerical comparisons
@@ -1236,14 +1236,14 @@ def assert_exit_factor_plateau_behavior(
      plateau_factor_pre = exit_factor_fn(
          base_factor=base_factor,
          pnl=pnl,
-        pnl_factor=pnl_factor,
+        pnl_coefficient=pnl_coefficient,
          duration_ratio=duration_ratio_pre,
          params=plateau_params,
      )
      plateau_factor_post = exit_factor_fn(
          base_factor=base_factor,
          pnl=pnl,
-        pnl_factor=pnl_factor,
+        pnl_coefficient=pnl_coefficient,
          duration_ratio=duration_ratio_post,
          params=plateau_params,
      )
diff --git a/ReforceXY/reward_space_analysis/tests/helpers/configs.py b/ReforceXY/reward_space_analysis/tests/helpers/configs.py

index 36a1cb856039fc9f447faaaf0a2f3d72627df29c..e379c18422765cd0073c33ba0d0288622f7e12cd 100644 (file)
--- a/ReforceXY/reward_space_analysis/tests/helpers/configs.py
+++ b/ReforceXY/reward_space_analysis/tests/helpers/configs.py
@@ -119,7 +119,7 @@ class ExitFactorConfig:
      Attributes:
          base_factor: Base scaling factor
          pnl: Profit/loss value
-        pnl_factor: PnL amplification factor
+        pnl_coefficient: PnL amplification coefficient
          duration_ratio: Ratio of current to maximum duration
          attenuation_mode: Mode of attenuation ("linear", "power", etc.)
          plateau_enabled: Whether plateau behavior is active
@@ -129,7 +129,7 @@ class ExitFactorConfig:
  
      base_factor: float
      pnl: float
-    pnl_factor: float
+    pnl_coefficient: float
      duration_ratio: float
      attenuation_mode: str
      plateau_enabled: bool = False
diff --git a/ReforceXY/reward_space_analysis/tests/robustness/test_branch_coverage.py b/ReforceXY/reward_space_analysis/tests/robustness/test_branch_coverage.py

index fc062af2681528f0a0a583f3f218dab99e8f6485..7ef6b2eae69f4887953bc5f6b404c2d5ed444136 100644 (file)
--- a/ReforceXY/reward_space_analysis/tests/robustness/test_branch_coverage.py
+++ b/ReforceXY/reward_space_analysis/tests/robustness/test_branch_coverage.py
@@ -65,7 +65,7 @@ def test_get_exit_factor_negative_plateau_grace_warning():
          factor = _get_exit_factor(
              base_factor=10.0,
              pnl=0.01,
-            pnl_factor=1.0,
+            pnl_coefficient=1.0,
              duration_ratio=0.5,
              params=params,
          )
@@ -79,7 +79,7 @@ def test_get_exit_factor_negative_linear_slope_warning():
          factor = _get_exit_factor(
              base_factor=10.0,
              pnl=0.01,
-            pnl_factor=1.0,
+            pnl_coefficient=1.0,
              duration_ratio=2.0,
              params=params,
          )
@@ -93,7 +93,7 @@ def test_get_exit_factor_invalid_power_tau_relaxed():
          factor = _get_exit_factor(
              base_factor=5.0,
              pnl=0.02,
-            pnl_factor=1.0,
+            pnl_coefficient=1.0,
              duration_ratio=1.5,
              params=params,
          )
@@ -111,7 +111,7 @@ def test_get_exit_factor_half_life_near_zero_relaxed():
          factor = _get_exit_factor(
              base_factor=5.0,
              pnl=0.02,
-            pnl_factor=1.0,
+            pnl_coefficient=1.0,
              duration_ratio=2.0,
              params=params,
          )
@@ -141,7 +141,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 15.0,
              "pnl": 0.02,
-            "pnl_factor": 1.0,
+            "pnl_coefficient": 1.0,
              "duration_ratio": -5.0,
              "params": {
                  "exit_attenuation_mode": "linear",
@@ -153,7 +153,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 15.0,
              "pnl": 0.02,
-            "pnl_factor": 1.0,
+            "pnl_coefficient": 1.0,
              "duration_ratio": 0.0,
              "params": {
                  "exit_attenuation_mode": "linear",
@@ -165,7 +165,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": float("nan"),
              "pnl": 0.01,
-            "pnl_factor": 1.0,
+            "pnl_coefficient": 1.0,
              "duration_ratio": 0.2,
              "params": {"exit_attenuation_mode": "linear", "exit_linear_slope": 0.5},
              "expectation": "safe_zero",
@@ -173,7 +173,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 10.0,
              "pnl": float("nan"),
-            "pnl_factor": 1.0,
+            "pnl_coefficient": 1.0,
              "duration_ratio": 0.2,
              "params": {"exit_attenuation_mode": "linear", "exit_linear_slope": 0.5},
              "expectation": "safe_zero",
@@ -181,7 +181,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 10.0,
              "pnl": 0.01,
-            "pnl_factor": 1.0,
+            "pnl_coefficient": 1.0,
              "duration_ratio": float("nan"),
              "params": {"exit_attenuation_mode": "linear", "exit_linear_slope": 0.5},
              "expectation": "safe_zero",
@@ -189,7 +189,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 10.0,
              "pnl": 0.02,
-            "pnl_factor": float("inf"),
+            "pnl_coefficient": float("inf"),
              "duration_ratio": 0.5,
              "params": {
                  "exit_attenuation_mode": "linear",
@@ -201,7 +201,7 @@ def test_exit_factor_invariant_suite_grouped():
          {
              "base_factor": 10.0,
              "pnl": 0.015,
-            "pnl_factor": -2.5,
+            "pnl_coefficient": -2.5,
              "duration_ratio": 2.0,
              "params": {
                  "exit_attenuation_mode": "legacy",
diff --git a/ReforceXY/reward_space_analysis/tests/robustness/test_robustness.py b/ReforceXY/reward_space_analysis/tests/robustness/test_robustness.py

index e6176a3a2aea10741776ab43f9fbb1db657a9085..496b908fd96bd1d978906dbd06794dba849d933e 100644 (file)
--- a/ReforceXY/reward_space_analysis/tests/robustness/test_robustness.py
+++ b/ReforceXY/reward_space_analysis/tests/robustness/test_robustness.py
@@ -189,7 +189,7 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
              self,
              base_factor=self.TEST_BASE_FACTOR,
              pnl=0.05,
-            pnl_factor=1.0,
+            pnl_coefficient=1.0,
              attenuation_modes=modes,
              base_params_fn=self.base_params,
              tolerance_relaxed=self.TOL_IDENTITY_RELAXED,
@@ -249,7 +249,7 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          """Negative exit_linear_slope is sanitized to 1.0; resulting exit factors must match slope=1.0 within tolerance."""
          base_factor = 100.0
          pnl = 0.03
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratios = [0.0, 0.2, 0.5, 1.0, 1.5]
          params_bad = self.base_params(
              exit_attenuation_mode="linear", exit_linear_slope=-5.0, exit_plateau=False
@@ -258,8 +258,8 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
              exit_attenuation_mode="linear", exit_linear_slope=1.0, exit_plateau=False
          )
          for dr in duration_ratios:
-            f_bad = _get_exit_factor(base_factor, pnl, pnl_factor, dr, params_bad)
-            f_ref = _get_exit_factor(base_factor, pnl, pnl_factor, dr, params_ref)
+            f_bad = _get_exit_factor(base_factor, pnl, pnl_coefficient, dr, params_bad)
+            f_ref = _get_exit_factor(base_factor, pnl, pnl_coefficient, dr, params_ref)
              self.assertAlmostEqualFloat(
                  f_bad,
                  f_ref,
@@ -271,15 +271,15 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          """Power mode attenuation: ratio f(dr=1)/f(dr=0) must equal 1/(1+1)^alpha with alpha=-log(tau)/log(2)."""
          base_factor = 200.0
          pnl = 0.04
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratio = 1.0
          taus = [0.9, 0.5, 0.25, 1.0]
          for tau in taus:
              params = self.base_params(
                  exit_attenuation_mode="power", exit_power_tau=tau, exit_plateau=False
              )
-            f0 = _get_exit_factor(base_factor, pnl, pnl_factor, 0.0, params)
-            f1 = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, params)
+            f0 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 0.0, params)
+            f1 = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, params)
              if 0.0 < tau <= 1.0:
                  alpha = -math.log(tau) / math.log(2.0)
              else:
@@ -347,14 +347,14 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          """Test parameter edge cases: tau extrema, plateau grace edges, slope zero."""
          base_factor = 50.0
          pnl = 0.02
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          params_hi = self.base_params(exit_attenuation_mode="power", exit_power_tau=0.999999)
          params_lo = self.base_params(
              exit_attenuation_mode="power", exit_power_tau=self.MIN_EXIT_POWER_TAU
          )
          r = 1.5
-        hi_val = _get_exit_factor(base_factor, pnl, pnl_factor, r, params_hi)
-        lo_val = _get_exit_factor(base_factor, pnl, pnl_factor, r, params_lo)
+        hi_val = _get_exit_factor(base_factor, pnl, pnl_coefficient, r, params_hi)
+        lo_val = _get_exit_factor(base_factor, pnl, pnl_coefficient, r, params_lo)
          self.assertGreater(
              hi_val, lo_val, "Power mode: higher tau (≈1) should attenuate less than tiny tau"
          )
@@ -370,8 +370,8 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
              exit_plateau_grace=1.0,
              exit_linear_slope=1.0,
          )
-        val_g0 = _get_exit_factor(base_factor, pnl, pnl_factor, 0.5, params_g0)
-        val_g1 = _get_exit_factor(base_factor, pnl, pnl_factor, 0.5, params_g1)
+        val_g0 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 0.5, params_g0)
+        val_g1 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 0.5, params_g1)
          self.assertGreater(
              val_g1, val_g0, "Plateau grace=1.0 should delay attenuation vs grace=0.0"
          )
@@ -381,8 +381,8 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          params_lin1 = self.base_params(
              exit_attenuation_mode="linear", exit_linear_slope=2.0, exit_plateau=False
          )
-        val_lin0 = _get_exit_factor(base_factor, pnl, pnl_factor, 1.0, params_lin0)
-        val_lin1 = _get_exit_factor(base_factor, pnl, pnl_factor, 1.0, params_lin1)
+        val_lin0 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 1.0, params_lin0)
+        val_lin1 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 1.0, params_lin1)
          self.assertGreater(
              val_lin0, val_lin1, "Linear slope=0 should yield no attenuation vs slope>0"
          )
@@ -397,9 +397,9 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          )
          base_factor = self.TEST_BASE_FACTOR
          pnl = 0.04
-        pnl_factor = 1.2
+        pnl_coefficient = 1.2
          ratios = [0.3, 0.6, 1.0, 1.4]
-        values = [_get_exit_factor(base_factor, pnl, pnl_factor, r, params) for r in ratios]
+        values = [_get_exit_factor(base_factor, pnl, pnl_coefficient, r, params) for r in ratios]
          first = values[0]
          for v in values[1:]:
              self.assertAlmostEqualFloat(
@@ -422,9 +422,9 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          )
          base_factor = 80.0
          pnl = self.TEST_PROFIT_TARGET
-        pnl_factor = 1.1
+        pnl_coefficient = 1.1
          ratios = [0.8, 1.0, 1.2, 1.4, 1.6]
-        vals = [_get_exit_factor(base_factor, pnl, pnl_factor, r, params) for r in ratios]
+        vals = [_get_exit_factor(base_factor, pnl, pnl_coefficient, r, params) for r in ratios]
          ref = vals[0]
          for i, r in enumerate(ratios[:-1]):
              self.assertAlmostEqualFloat(
@@ -442,7 +442,7 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          eps = self.CONTINUITY_EPS_SMALL
          base_factor = self.TEST_BASE_FACTOR
          pnl = 0.01
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          tau = 0.5
          half_life = 0.5
          slope = 1.3
@@ -459,9 +459,9 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
                          "exit_half_life": half_life,
                      }
                  )
-                left = _get_exit_factor(base_factor, pnl, pnl_factor, grace - eps, params)
-                boundary = _get_exit_factor(base_factor, pnl, pnl_factor, grace, params)
-                right = _get_exit_factor(base_factor, pnl, pnl_factor, grace + eps, params)
+                left = _get_exit_factor(base_factor, pnl, pnl_coefficient, grace - eps, params)
+                boundary = _get_exit_factor(base_factor, pnl, pnl_coefficient, grace, params)
+                right = _get_exit_factor(base_factor, pnl, pnl_coefficient, grace + eps, params)
                  self.assertAlmostEqualFloat(
                      left,
                      boundary,
@@ -532,12 +532,14 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          )
          base_factor = 75.0
          pnl = 0.05
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratio = 0.8
          with assert_diagnostic_warning(["Unknown exit_attenuation_mode"]):
-            f_unknown = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, params)
+            f_unknown = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, params)
          linear_params = self.base_params(exit_attenuation_mode="linear", exit_plateau=False)
-        f_linear = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, linear_params)
+        f_linear = _get_exit_factor(
+            base_factor, pnl, pnl_coefficient, duration_ratio, linear_params
+        )
          self.assertAlmostEqualFloat(
              f_unknown,
              f_linear,
@@ -556,10 +558,10 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          )
          base_factor = PARAMS.BASE_FACTOR
          pnl = 0.03
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratio = 0.5
          with assert_diagnostic_warning(["exit_plateau_grace < 0"]):
-            f_neg = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, params)
+            f_neg = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, params)
          # Reference with grace=0.0 (since negative should clamp)
          ref_params = self.base_params(
              exit_attenuation_mode="linear",
@@ -567,7 +569,7 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
              exit_plateau_grace=0.0,
              exit_linear_slope=1.2,
          )
-        f_ref = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, ref_params)
+        f_ref = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, ref_params)
          self.assertAlmostEqualFloat(
              f_neg,
              f_ref,
@@ -581,7 +583,7 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          invalid_taus = [0.0, -0.5, 2.0, float("nan")]
          base_factor = 120.0
          pnl = 0.04
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratio = 1.0
          # Explicit alpha=1 expected ratio: f(dr)/f(0)=1/(1+dr)^1 with plateau disabled to observe attenuation.
          expected_ratio_alpha1 = 1.0 / (1.0 + duration_ratio)
@@ -590,8 +592,8 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
                  exit_attenuation_mode="power", exit_power_tau=tau, exit_plateau=False
              )
              with assert_diagnostic_warning(["exit_power_tau"]):
-                f0 = _get_exit_factor(base_factor, pnl, pnl_factor, 0.0, params)
-                f1 = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, params)
+                f0 = _get_exit_factor(base_factor, pnl, pnl_coefficient, 0.0, params)
+                f1 = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, params)
              ratio = f1 / max(f0, self.TOL_NUMERIC_GUARD)
              self.assertAlmostEqual(
                  ratio,
@@ -605,17 +607,19 @@ class TestRewardRobustnessAndBoundaries(RewardSpaceTestBase):
          """Invariant 105: Near-zero exit_half_life warns and returns factor≈base_factor (no attenuation)."""
          base_factor = 60.0
          pnl = 0.02
-        pnl_factor = 1.0
+        pnl_coefficient = 1.0
          duration_ratio = 0.7
          near_zero_values = [1e-15, 1e-12, 5e-14]
          for hl in near_zero_values:
              params = self.base_params(exit_attenuation_mode="half_life", exit_half_life=hl)
              with assert_diagnostic_warning(["exit_half_life", "close to 0"]):
-                _ = _get_exit_factor(base_factor, pnl, pnl_factor, 0.0, params)
-                fdr = _get_exit_factor(base_factor, pnl, pnl_factor, duration_ratio, params)
+                _ = _get_exit_factor(base_factor, pnl, pnl_coefficient, 0.0, params)
+                fdr = _get_exit_factor(base_factor, pnl, pnl_coefficient, duration_ratio, params)
              self.assertAlmostEqualFloat(
                  fdr,
-                1.0 * pnl_factor,  # Kernel returns 1.0 then * pnl_factor
+                base_factor
+                * 1.0
+                * pnl_coefficient,  # base_factor * time_coefficient (1.0) * pnl_coefficient
                  tolerance=self.TOL_IDENTITY_RELAXED,
                  msg=f"Near-zero half-life attenuation mismatch hl={hl} fdr={fdr}",
              )
diff --git a/ReforceXY/user_data/freqaimodels/ReforceXY.py b/ReforceXY/user_data/freqaimodels/ReforceXY.py

index 43ae8f77a7baf585c2729212b3a02644a931ebd0..3d924942e74f72604c48fdf728357e0303cfcf75 100644 (file)
--- a/ReforceXY/user_data/freqaimodels/ReforceXY.py
+++ b/ReforceXY/user_data/freqaimodels/ReforceXY.py
@@ -182,7 +182,7 @@ class ReforceXY(BaseReinforcementLearningModel):
      DEFAULT_HOLD_PENALTY_POWER: Final[float] = 1.025
  
      DEFAULT_CHECK_INVARIANTS: Final[bool] = True
-    DEFAULT_EXIT_FACTOR_THRESHOLD: Final[float] = 10_000.0
+    DEFAULT_EXIT_FACTOR_THRESHOLD: Final[float] = 1_000.0
  
      _MODEL_TYPES: Final[Tuple[ModelType, ...]] = (
          "PPO",
@@ -2364,14 +2364,13 @@ class MyRLEnv(Base5ActionRLEnv):
          self._last_exit_reward = 0.0
          return observation, history
  
-    def _compute_time_attenuation_factor(
+    def _compute_time_attenuation_coefficient(
          self,
-        factor: float,
          duration_ratio: float,
          model_reward_parameters: Mapping[str, Any],
      ) -> float:
          """
-        Apply time-based decay to reward factor using configurable strategy
+        Calculate time-based attenuation coefficient using configurable strategy
          (legacy/sqrt/linear/power/half_life). Optionally apply plateau grace period.
          """
          if duration_ratio < 0.0:
@@ -2391,23 +2390,25 @@ class MyRLEnv(Base5ActionRLEnv):
              )
          )
          if exit_plateau_grace < 0.0:
+            logger.warning("exit_plateau_grace < 0; falling back to 0.0")
              exit_plateau_grace = 0.0
  
-        def _legacy(f: float, dr: float, p: Mapping[str, Any]) -> float:
-            return f * (1.5 if dr <= 1.0 else 0.5)
+        def _legacy(dr: float, p: Mapping[str, Any]) -> float:
+            return 1.5 if dr <= 1.0 else 0.5
  
-        def _sqrt(f: float, dr: float, p: Mapping[str, Any]) -> float:
-            return f / math.sqrt(1.0 + dr)
+        def _sqrt(dr: float, p: Mapping[str, Any]) -> float:
+            return 1.0 / math.sqrt(1.0 + dr)
  
-        def _linear(f: float, dr: float, p: Mapping[str, Any]) -> float:
+        def _linear(dr: float, p: Mapping[str, Any]) -> float:
              slope = float(
                  p.get("exit_linear_slope", ReforceXY.DEFAULT_EXIT_LINEAR_SLOPE)
              )
              if slope < 0.0:
+                logger.warning("exit_linear_slope < 0; falling back to 1.0")
                  slope = 1.0
-            return f / (1.0 + slope * dr)
+            return 1.0 / (1.0 + slope * dr)
  
-        def _power(f: float, dr: float, p: Mapping[str, Any]) -> float:
+        def _power(dr: float, p: Mapping[str, Any]) -> float:
              tau = p.get("exit_power_tau")
              if isinstance(tau, (int, float)):
                  tau = float(tau)
@@ -2417,15 +2418,15 @@ class MyRLEnv(Base5ActionRLEnv):
                      alpha = 1.0
              else:
                  alpha = 1.0
-            return f / math.pow(1.0 + dr, alpha)
+            return 1.0 / math.pow(1.0 + dr, alpha)
  
-        def _half_life(f: float, dr: float, p: Mapping[str, Any]) -> float:
+        def _half_life(dr: float, p: Mapping[str, Any]) -> float:
              hl = float(p.get("exit_half_life", ReforceXY.DEFAULT_EXIT_HALF_LIFE))
              if np.isclose(hl, 0.0) or hl < 0.0:
                  return 1.0
-            return f * math.pow(2.0, -dr / hl)
+            return math.pow(2.0, -dr / hl)
  
-        strategies: Dict[str, Callable[[float, float, Mapping[str, Any]], float]] = {
+        strategies: Dict[str, Callable[[float, Mapping[str, Any]], float]] = {
              ReforceXY._EXIT_ATTENUATION_MODES[0]: _legacy,
              ReforceXY._EXIT_ATTENUATION_MODES[1]: _sqrt,
              ReforceXY._EXIT_ATTENUATION_MODES[2]: _linear,
@@ -2452,7 +2453,9 @@ class MyRLEnv(Base5ActionRLEnv):
              strategy_fn = _linear
  
          try:
-            factor = strategy_fn(factor, effective_dr, model_reward_parameters)
+            time_attenuation_coefficient = strategy_fn(
+                effective_dr, model_reward_parameters
+            )
          except Exception as e:
              logger.warning(
                  "exit_attenuation_mode '%s' failed (%r); fallback to %s (effective_dr=%.5f)",
@@ -2461,34 +2464,39 @@ class MyRLEnv(Base5ActionRLEnv):
                  ReforceXY._EXIT_ATTENUATION_MODES[2],  # "linear"
                  effective_dr,
              )
-            factor = _linear(factor, effective_dr, model_reward_parameters)
+            time_attenuation_coefficient = _linear(
+                effective_dr, model_reward_parameters
+            )
  
-        return factor
+        return time_attenuation_coefficient
  
      def _get_exit_factor(
          self,
-        factor: float,
+        base_factor: float,
          pnl: float,
          duration_ratio: float,
          model_reward_parameters: Mapping[str, Any],
      ) -> float:
          """
-        Compute exit reward factor combining time attenuation and PnL factors
+        Compute exit factor: base_factor × time_attenuation_coefficient × pnl_coefficient.
          """
          if not (
-            np.isfinite(factor) and np.isfinite(pnl) and np.isfinite(duration_ratio)
+            np.isfinite(base_factor)
+            and np.isfinite(pnl)
+            and np.isfinite(duration_ratio)
          ):
              return 0.0
-        time_attenuation_factor = self._compute_time_attenuation_factor(
-            factor,
+
+        time_attenuation_coefficient = self._compute_time_attenuation_coefficient(
              duration_ratio,
              model_reward_parameters,
          )
-
-        factor *= time_attenuation_factor * self._get_pnl_factor(
+        pnl_coefficient = self._get_pnl_coefficient(
              pnl, self._pnl_target, model_reward_parameters
          )
  
+        exit_factor = base_factor * time_attenuation_coefficient * pnl_coefficient
+
          check_invariants = model_reward_parameters.get(
              "check_invariants", ReforceXY.DEFAULT_CHECK_INVARIANTS
          )
@@ -2496,39 +2504,39 @@ class MyRLEnv(Base5ActionRLEnv):
              check_invariants if isinstance(check_invariants, bool) else True
          )
          if check_invariants:
-            if not np.isfinite(factor):
+            if not np.isfinite(exit_factor):
                  logger.debug(
                      "_get_exit_factor produced non-finite factor; resetting to 0.0"
                  )
                  return 0.0
-            if factor < 0.0 and pnl >= 0.0:
+            if exit_factor < 0.0 and pnl >= 0.0:
                  logger.debug(
-                    "_get_exit_factor negative with positive pnl (factor=%.5f, pnl=%.5f); clamping to 0.0",
-                    factor,
+                    "_get_exit_factor negative with positive pnl (exit_factor=%.5f, pnl=%.5f); clamping to 0.0",
+                    exit_factor,
                      pnl,
                  )
-                factor = 0.0
+                exit_factor = 0.0
              exit_factor_threshold = float(
                  model_reward_parameters.get(
                      "exit_factor_threshold", ReforceXY.DEFAULT_EXIT_FACTOR_THRESHOLD
                  )
              )
-            if exit_factor_threshold > 0 and abs(factor) > exit_factor_threshold:
+            if exit_factor_threshold > 0 and abs(exit_factor) > exit_factor_threshold:
                  logger.warning(
-                    "_get_exit_factor |factor|=%.2f exceeds threshold %.2f",
-                    factor,
+                    "_get_exit_factor |exit_factor|=%.2f exceeds threshold %.2f",
+                    exit_factor,
                      exit_factor_threshold,
                  )
  
-        return factor
+        return exit_factor
  
-    def _compute_pnl_target_factor(
+    def _compute_pnl_target_coefficient(
          self, pnl: float, pnl_target: float, model_reward_parameters: Mapping[str, Any]
      ) -> float:
          """
-        Scale reward based on PnL/target ratio using tanh (≥ 1.0 for good trades).
+        Compute PnL target coefficient (typically 0.5-2.0) using tanh on PnL/target ratio.
          """
-        pnl_target_factor = 1.0
+        pnl_target_coefficient = 1.0
  
          if pnl_target > 0.0:
              pnl_factor_beta = float(
@@ -2539,7 +2547,7 @@ class MyRLEnv(Base5ActionRLEnv):
              pnl_ratio = pnl / pnl_target
  
              if abs(pnl_ratio) > 1.0:
-                base_pnl_target_factor = math.tanh(
+                base_pnl_target_coefficient = math.tanh(
                      pnl_factor_beta * (abs(pnl_ratio) - 1.0)
                  )
                  win_reward_factor = float(
@@ -2549,20 +2557,22 @@ class MyRLEnv(Base5ActionRLEnv):
                  )
  
                  if pnl_ratio > 1.0:
-                    pnl_target_factor = 1.0 + win_reward_factor * base_pnl_target_factor
+                    pnl_target_coefficient = (
+                        1.0 + win_reward_factor * base_pnl_target_coefficient
+                    )
                  elif pnl_ratio < -(1.0 / self.rr):
                      loss_penalty_factor = win_reward_factor * self.rr
-                    pnl_target_factor = (
-                        1.0 + loss_penalty_factor * base_pnl_target_factor
+                    pnl_target_coefficient = (
+                        1.0 + loss_penalty_factor * base_pnl_target_coefficient
                      )
  
-        return pnl_target_factor
+        return pnl_target_coefficient
  
-    def _compute_efficiency_factor(
+    def _compute_efficiency_coefficient(
          self, pnl: float, model_reward_parameters: Mapping[str, Any]
      ) -> float:
          """
-        Scale reward based on exit efficiency (distance from max unrealized PnL).
+        Compute exit efficiency coefficient (typically 0.5-1.5) based on exit timing quality.
          """
          efficiency_weight = float(
              model_reward_parameters.get(
@@ -2575,7 +2585,7 @@ class MyRLEnv(Base5ActionRLEnv):
              )
          )
  
-        efficiency_factor = 1.0
+        efficiency_coefficient = 1.0
          if efficiency_weight != 0.0 and not np.isclose(pnl, 0.0):
              max_pnl = max(self.get_max_unrealized_profit(), pnl)
              min_pnl = min(self.get_min_unrealized_profit(), pnl)
@@ -2583,30 +2593,30 @@ class MyRLEnv(Base5ActionRLEnv):
              if np.isfinite(range_pnl) and not np.isclose(range_pnl, 0.0):
                  efficiency_ratio = (pnl - min_pnl) / range_pnl
                  if pnl > 0.0:
-                    efficiency_factor = 1.0 + efficiency_weight * (
+                    efficiency_coefficient = 1.0 + efficiency_weight * (
                          efficiency_ratio - efficiency_center
                      )
                  elif pnl < 0.0:
-                    efficiency_factor = 1.0 + efficiency_weight * (
+                    efficiency_coefficient = 1.0 + efficiency_weight * (
                          efficiency_center - efficiency_ratio
                      )
  
-        return efficiency_factor
+        return efficiency_coefficient
  
-    def _get_pnl_factor(
+    def _get_pnl_coefficient(
          self, pnl: float, pnl_target: float, model_reward_parameters: Mapping[str, Any]
      ) -> float:
          """
-        Combine PnL target and efficiency factors (>= 0.0)
+        Combine PnL target and efficiency coefficients (typically 0.25-4.0).
          """
-        pnl_target_factor = self._compute_pnl_target_factor(
+        pnl_target_coefficient = self._compute_pnl_target_coefficient(
              pnl, pnl_target, model_reward_parameters
          )
-        efficiency_factor = self._compute_efficiency_factor(
+        efficiency_coefficient = self._compute_efficiency_coefficient(
              pnl, model_reward_parameters
          )
  
-        return max(0.0, pnl_target_factor * efficiency_factor)
+        return max(0.0, pnl_target_coefficient * efficiency_coefficient)
  
      def calculate_reward(self, action: int) -> float:
          """Compute per-step reward and apply potential-based reward shaping (PBRS).
author	Jérôme Benoit <jerome.benoit@piment-noir.org>
	Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)
committer	Jérôme Benoit <jerome.benoit@piment-noir.org>
	Wed, 17 Dec 2025 16:09:10 +0000 (17:09 +0100)
ReforceXY/reward_space_analysis/README.md		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/reward_space_analysis.py		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/tests/components/test_reward_components.py		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/tests/helpers/assertions.py		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/tests/helpers/configs.py		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/tests/robustness/test_branch_coverage.py		patch \| blob \| blame \| history
ReforceXY/reward_space_analysis/tests/robustness/test_robustness.py		patch \| blob \| blame \| history
ReforceXY/user_data/freqaimodels/ReforceXY.py		patch \| blob \| blame \| history