fix: eliminate data leakage in extrema weighting normalization (#30)
* fix: eliminate data leakage in extrema weighting normalization
Move dataset-dependent scaling from strategy (pre-split) to model label
pipeline (post-split) to prevent train/test data leakage.
Changes:
- Add ExtremaWeightingTransformer (datasieve BaseTransform) in Utils.py
that fits standardization/normalization stats on training data only
- Add define_label_pipeline() in QuickAdapterRegressorV3 that replaces
FreqAI's default MinMaxScaler with our configurable transformer
- Simplify strategy's set_freqai_targets() to pass raw weighted extrema
without any normalization (normalization now happens post-split)
- Remove pre-split normalization functions from Utils.py (~107 lines)
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
* refactor: align ExtremaWeightingTransformer with BaseTransform API
- Call super().__init__() with name parameter
- Match method signatures exactly (npt.ArrayLike, ArrayOrNone, ListOrNone)
- Return tuple from fit() instead of self
- Import types from same namespaces as BaseTransform
* refactor: cleanup type hints
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
* refactor: remove unnecessary type casts and annotations
Let numpy types flow naturally without explicit float()/int() casts.
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
* refactor: use scipy.special.logit for inverse sigmoid transformation
Replace manual inverse sigmoid calculation (-np.log(1.0 / values - 1.0))
with scipy.special.logit() for better code clarity and consistency.
- Uses official scipy function that is the documented inverse of expit
- Mathematically equivalent to the previous implementation
- Improves code readability and maintainability
- Maintains symmetry: sp.special.expit() <-> sp.special.logit()
Also improve comment clarity for standardization identity function.
The _n_train attribute was being set during fit() but never used
elsewhere in the class or by the BaseTransform interface. Removing
it to reduce code clutter and improve maintainability.
* fix: import paths correction
Signed-off-by: Jérôme Benoit <jerome.benoit@piment-noir.org>
* fix: add Bessel correction and ValueError consistency in ExtremaWeightingTransformer
- Use ddof=1 for std computation (sample std instead of population std)
- Add ValueError in _inverse_standardize for unknown methods
- Add ValueError in _inverse_normalize for unknown methods
* chore: refine config-template.json for extrema weighting options