From 05044a7a4f1f2cbc44300d5823ac64be2f5a0059 Mon Sep 17 00:00:00 2001 From: =?utf8?q?J=C3=A9r=C3=B4me=20Benoit?= Date: Wed, 27 May 2026 02:40:10 +0200 Subject: [PATCH] feat(weights): add uniform pivot weighting strategy (#75) * feat(weights): add uniform pivot weighting strategy Adds "uniform" to WEIGHT_STRATEGIES (between "none" and the metric names) which assigns weight=1.0 to every detected pivot. Off-pivot rows remain governed by the existing fill_method (zero / epsilon / gaussian), so uniform + gaussian collapses cleanly to a pure proximity kernel around each pivot. Naming follows sklearn convention (KNeighbors(weights="uniform"), DummyClassifier(strategy="uniform")). * chore(quickadapter): bump strategy and regressor version 3.11.11 -> 3.11.12 * refactor(weights): hoist indices_array and valid_mask in compute_label_weights Compute indices_array and valid_mask once at the top of the function instead of after the strategy dispatch. The uniform branch can now use indices_array.size instead of len(indices), and the duplicate np.asarray / valid_mask construction lower in the function is removed. Saves one np.asarray and one mask computation per call. * refactor(weights): consolidate _scatter_weights signature Drop the redundant indices: list[int] parameter now that the only caller (compute_label_weights) hoists indices_array and valid_mask. The function takes them positionally and uses indices_array.size for size checks, removing three len(indices) calls and the optional-kwarg fallback paths. * refactor(weights): pipeline API consolidation pass - compute_label_weights: drop Optional placeholder, accept Sequence[int] | NDArray[np.integer] for indices - standardize Optional[X] -> X | None across module (PEP 604) - _impute_weights: positional call instead of keyword on single arg - _pivot_equivalent_count: remove unreachable threshold <= 0 branch (survivors.size > 0 implies survivors.max() > 0 because the input has been sanitized to non-negative values upstream) - _scatter_weights: drop dead 'if not np.any(valid_mask)' early return; vectorized assignment is a no-op when the mask is all-False - sanitize_and_renormalize: clarify empty-input semantics in docstring * fix(weights): zero leading and trailing non-finite runs in _impute_weights The boundary mask only covered the strict tip positions (index 0 and -1), so multi-element non-finite runs at the boundary were median-imputed instead of zeroed. With input [NaN, NaN, 1.0, 2.0, NaN, NaN] the function returned [0.0, 1.5, 1.0, 2.0, 1.5, 0.0] instead of [0, 0, 1.0, 2.0, 0, 0], silently extending pivot weight to the unconfirmed boundary candles. Use np.argmax on the finite mask to detect the leading and trailing non-finite runs and zero the entire run, matching the docstring contract. * fix(weights): floor stacked metrics in geometric and harmonic aggregation Power means with p<=0 collapse to 0 on a single zero in the stack: pmean([1, 0, 3], p=-1) = 0.0 and pmean([1, 0, 3], p=0) = 0.0. Combined with compose_sample_weights' (arr <= 0) drop_mask predicate, a single metric returning 0 on a pivot silently drops that row entirely. Floor stacked_metrics at np.finfo(float).tiny only inside the geometric_mean and harmonic_mean branches so all-positive pivots survive aggregation. arithmetic_mean, quadratic_mean, weighted_median and softmax branches are untouched. * fix(weights): log when out-of-range pivot indices are dropped compute_label_weights silently filters out pivot indices outside [0, n_values) via valid_mask. This made upstream contract violations invisible: a stale or off-by-one index list would simply produce zero training weight on those rows with no diagnostic. Emit logger.warning with the count and dropped fraction whenever n_dropped > 0 so the upstream caller can spot the issue. * refactor(weights): collapse 4x label-config validators into a registry Replace the four near-identical _validate_*_params + get_label_*_config pairs with a single _LABEL_KIND_REGISTRY mapping each kind name to (specs, defaults). _label_kind_validator builds the validator on the fly and get_label_kind_config dispatches to _get_label_config with the appropriate spec/default pair. The four public get_label_*_config helpers remain as thin wrappers so existing callers in QuickAdapterV3 and QuickAdapterRegressorV3 are unaffected. _LabelTransformerConfig.from_dict (LabelTransformer.py) is intentionally out of scope: it would require propagating a logger through BaseTransform's freqtrade-side interface, which is upstream-controlled. * docs(weights): drop verbose empty-input note from sanitize_and_renormalize The added sentence paraphrased the existing collapse line and restated obvious facts about zero-length vectors without contractual information. Revert to the concise pre-W1 docstring. * docs(weights): drop get_label_kind_config docstring for consistency The 4 sibling get_label_*_config wrappers have no docstrings; their parameter names and the registry name self-document the contract. Drop the redundant docstring on get_label_kind_config to match the family style. * fix(weights): drop tiny floor in geometric/harmonic aggregation The floor at np.finfo(float).tiny added in b1f86a0 preserved pivots whose metrics included an exact zero, but a zero metric is itself a 'signal absent' marker that downstream compose_sample_weights drops via the (arr <= 0) mask. Floor was masking the intended drop. Restore the upstream pmean behavior so a zero in any geometric or harmonic input produces an exact 0.0 combined weight, allowing drop_mask to drop the pivot as designed. * docs(readme): document uniform label weighting strategy * style(readme): re-align tunables table columns --- README.md | 196 ++++++++--------- .../freqaimodels/QuickAdapterRegressorV3.py | 2 +- .../user_data/strategies/LabelTransformer.py | 3 +- .../user_data/strategies/QuickAdapterV3.py | 2 +- quickadapter/user_data/strategies/Utils.py | 200 ++++++++---------- 5 files changed, 186 insertions(+), 217 deletions(-) diff --git a/README.md b/README.md index cb0cb38..5947dc3 100644 --- a/README.md +++ b/README.md @@ -37,104 +37,104 @@ docker compose up -d --build ### Configuration tunables -| Path | Default | Type / Range | Description | -| -------------------------------------------------------------- | ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| _Protections_ | | | | -| custom_protections.trade_duration_candles | 72 | int >= 1 | Estimated trade duration in candles. Scales protections stop duration candles and trade limit. | -| custom_protections.lookback_period_fraction | 0.5 | float (0,1] | Fraction of `fit_live_predictions_candles` used to calculate `lookback_period_candles` for _MaxDrawdown_ and _StoplossGuard_ protections. | -| custom_protections.cooldown.enabled | true | bool | Enable/disable _CooldownPeriod_ protection. | -| custom_protections.cooldown.stop_duration_candles | 4 | int >= 1 | Number of candles to wait before allowing new trades after a trade is closed. | -| custom_protections.drawdown.enabled | true | bool | Enable/disable _MaxDrawdown_ protection. | -| custom_protections.drawdown.max_allowed_drawdown | 0.2 | float (0,1) | Maximum allowed drawdown. | -| custom_protections.stoploss.enabled | true | bool | Enable/disable _StoplossGuard_ protection. | -| _Leverage_ | | | | -| leverage | `proposed_leverage` | float [1.0, max_leverage] | Leverage. Fallback to `proposed_leverage` for the pair. | -| _Exit pricing_ | | | | -| exit_pricing.trade_price_target_method | `moving_average` | enum {`moving_average`,`quantile_interpolation`,`weighted_average`} | Trade NATR computation method. | -| exit_pricing.thresholds_calibration.decline_quantile | 0.75 | float (0,1) | PnL decline quantile threshold. | -| _Reversal confirmation_ | | | | -| reversal_confirmation.lookback_period_candles | 0 | int >= 0 | Prior confirming candles; 0 = none. | -| reversal_confirmation.decay_fraction | 0.5 | float (0,1] | Geometric per-candle volatility adjusted reversal threshold relaxation factor. | -| reversal_confirmation.min_natr_multiplier_fraction | 0.0095 | float [0,1] | Lower bound fraction for volatility adjusted reversal threshold. | -| reversal_confirmation.max_natr_multiplier_fraction | 0.075 | float [0,1] | Upper bound fraction (>= lower bound) for volatility adjusted reversal threshold. | -| _Regressor model_ | | | | -| freqai.regressor | `xgboost` | enum {`xgboost`,`lightgbm`,`histgradientboostingregressor`,`ngboost`,`catboost`} | Machine learning regressor algorithm. | -| _Model training parameters_ | | | | -| freqai.model_training_parameters.gpu_vram_gb | 80 | enum {8,10,12,16,24,32,40,48,64,80} | Available GPU VRAM (GB) for CatBoost, not total. Constrains `depth`, `border_count`, and `max_ctr_complexity` ranges. | -| _Data split parameters_ | | | | -| freqai.data_split_parameters.method | `train_test_split` | enum {`train_test_split`,`timeseries_split`} | Data splitting strategy. `train_test_split` for sequential split, `timeseries_split` for chronological split with configurable gap. | -| freqai.data_split_parameters.test_size | 0.1 / None | float (0,1) \| int >= 1 \| None | Test set size. Float for fraction, int for count. Default: 0.1 for `train_test_split`, None for `timeseries_split` (sklearn dynamic sizing). | -| freqai.data_split_parameters.n_splits | 5 | int >= 2 | Controls train/test proportions for `timeseries_split` (higher = larger train set). | -| freqai.data_split_parameters.gap | 0 | int >= 0 | Samples to exclude between train/test for `timeseries_split`. When 0, auto-calculated from `label_period_candles` to prevent look-ahead bias. | -| freqai.data_split_parameters.max_train_size | None | int >= 1 \| None | Maximum training set size for `timeseries_split`. When set, creates a sliding window instead of expanding train set. None = no limit. | -| _Label smoothing_ | | | | -| freqai.label_smoothing.method | `gaussian` | enum {`none`,`gaussian`,`kaiser`,`triang`,`smm`,`sma`,`savgol`,`gaussian_filter1d`} | Label smoothing method (`smm`=median, `sma`=mean, `savgol`=Savitzky–Golay). | -| freqai.label_smoothing.window_candles | 5 | int >= 3 | Smoothing window length (candles). | -| freqai.label_smoothing.beta | 8.0 | float > 0 | Shape parameter for `kaiser` kernel. | -| freqai.label_smoothing.polyorder | 3 | int >= 0 | Polynomial order for `savgol` smoothing. | -| freqai.label_smoothing.mode | `mirror` | enum {`mirror`,`constant`,`nearest`,`wrap`,`interp`} | Boundary mode for `savgol` and `gaussian_filter1d`. | -| freqai.label_smoothing.sigma | 1.0 | float > 0 | Gaussian `sigma` for `gaussian_filter1d` smoothing. | -| _Label weighting_ | | | | -| freqai.label_weighting.strategy | `none` | enum {`none`,`amplitude`,`amplitude_threshold_ratio`,`volume_rate`,`speed`,`efficiency_ratio`,`volume_weighted_efficiency_ratio`,`combined`} | Label weighting metric: none (`none`), swing amplitude (`amplitude`), swing amplitude / median volatility-threshold ratio (`amplitude_threshold_ratio`), swing volume per candle (`volume_rate`), swing speed (`speed`), swing efficiency ratio (`efficiency_ratio`), swing volume-weighted efficiency ratio (`volume_weighted_efficiency_ratio`), or combined metrics aggregation (`combined`). Switching between `none` and any other strategy requires deleting trained models to realign training emphasis. | -| freqai.label_weighting.metric_coefficients | {} | dict[str, float] | Per-metric coefficients for `combined` strategy. Keys: `amplitude`, `amplitude_threshold_ratio`, `volume_rate`, `speed`, `efficiency_ratio`, `volume_weighted_efficiency_ratio`. | -| freqai.label_weighting.aggregation | `arithmetic_mean` | enum {`arithmetic_mean`,`geometric_mean`,`harmonic_mean`,`quadratic_mean`,`weighted_median`,`softmax`} | Metric aggregation method for `combined` strategy. `arithmetic_mean`=(Σ(w·m)/Σ(w)), `geometric_mean`=(∏(m^w))^(1/Σw), `harmonic_mean`=Σ(w)/(Σ(w/m)), `quadratic_mean`=(Σ(w·m²)/Σ(w))^(1/2), `weighted_median`=Q₀.₅(m,w), `softmax`=Σ(m·s_i) where s_i=w_i·exp(m_i/T)/Σ(w_j·exp(m_j/T)). | -| freqai.label_weighting.softmax_temperature | 1.0 | float > 0 | Temperature T for `softmax` aggregation, controls distribution sharpness. | -| freqai.label_weighting.fill_method | `zero` | enum {`zero`,`epsilon`,`gaussian`} | Off-pivot weighting scheme. `zero` hard-zeros off-pivot rows; `epsilon` applies a flat baseline `fill_epsilon * (pivot_weights)`; `gaussian` applies heatmap-style decay around each pivot. Switching away from `zero` may require retuning tree-leaf regularization (`min_child_weight`, `lambda`) and resetting any prior Optuna study. Changing this parameter requires deleting trained models. | -| freqai.label_weighting.fill_epsilon | 0.001 | float [0,1] | Off-pivot fraction of the pivot baseline. Ignored when `fill_method != "epsilon"`. | -| freqai.label_weighting.fill_epsilon_baseline | `mean` | enum {`mean`,`median`} | Pivot baseline statistic. `mean` tracks central tendency; `median` is robust against pivot-weight skew. Ignored when `fill_method != "epsilon"`. | -| freqai.label_weighting.fill_sigma_candles | 3.0 | float >= 0.5 | Gaussian standard deviation in candles for `fill_method == "gaussian"`. Lower bound 0.5 prevents underflow that silently degrades to `zero` mode. Ignored when `fill_method != "gaussian"`. | -| _Label pipeline_ | | | | -| freqai.label_pipeline.standardization | `none` | enum {`none`,`zscore`,`robust`,`mmad`,`power_yj`} | Standardization method applied to labels before normalization. `none`=w, `zscore`=(w-μ)/σ, `robust`=(w-median)/(Q₃-Q₁), `mmad`=(w-median)/(MAD·k), `power_yj`=YJ(w). | -| freqai.label_pipeline.robust_quantiles | [0.25, 0.75] | list[float] where 0 <= Q1 < Q3 <= 1 | Quantile range for robust standardization, Q1 and Q3. | -| freqai.label_pipeline.mmad_scaling_factor | 1.4826 | float > 0 | Scaling factor for MMAD standardization. | -| freqai.label_pipeline.normalization | `maxabs` | enum {`maxabs`,`minmax`,`sigmoid`,`none`} | Normalization method applied to labels. `maxabs`=w/max(\|w\|), `minmax`=low+(w-min)/(max-min)·(high-low), `sigmoid`=2·σ(scale·w)-1, `none`=w. | -| freqai.label_pipeline.minmax_range | [-1.0, 1.0] | list[float] | Target range for `minmax` normalization, min and max. | -| freqai.label_pipeline.sigmoid_scale | 1.0 | float > 0 | Scale parameter for `sigmoid` normalization, controls steepness. | -| freqai.label_pipeline.gamma | 1.0 | float (0,10] | Contrast exponent applied to labels after normalization: >1 emphasizes extrema, values between 0 and 1 soften. | -| _Feature parameters_ | | | | -| freqai.feature_parameters.label_period_candles | min/max midpoint | int >= 1 | Zigzag labeling NATR horizon. | -| freqai.feature_parameters.min_label_period_candles | 12 | int >= 1 | Minimum labeling NATR horizon used for reversals labeling HPO. | -| freqai.feature_parameters.max_label_period_candles | 24 | int >= 1 | Maximum labeling NATR horizon used for reversals labeling HPO. | -| freqai.feature_parameters.label_natr_multiplier | min/max midpoint | float > 0 | Zigzag labeling NATR multiplier. | -| freqai.feature_parameters.min_label_natr_multiplier | 9.0 | float > 0 | Minimum labeling NATR multiplier used for reversals labeling HPO. | -| freqai.feature_parameters.max_label_natr_multiplier | 12.0 | float > 0 | Maximum labeling NATR multiplier used for reversals labeling HPO. | -| freqai.feature_parameters.label_frequency_candles | `auto` | int >= 2 \| `auto` | Reversals labeling frequency. `auto` = max(2, 2 \* number of whitelisted pairs). | -| freqai.feature_parameters.label_weights | [1/7,1/7,1/7,1/7,1/7,1/7,1/7] | list[float] | Per-objective weights for trial selection methods. Objectives: (1) number of detected reversals, (2) median swing amplitude, (3) median (swing amplitude / median volatility-threshold ratio), (4) median swing volume per candle, (5) median swing speed, (6) median swing efficiency ratio, (7) median swing volume-weighted efficiency ratio. | -| freqai.feature_parameters.label_p_order | None | float \| None | Lp exponent for parameterized metrics. Used by `minkowski` distance (default 2.0) and `power_mean` aggregation (default 1.0). Ignored by other metrics. | -| freqai.feature_parameters.label_method | `compromise_programming` | enum {`compromise_programming`,`topsis`,`kmeans`,`kmeans2`,`kmedoids`,`knn`,`medoid`} | HPO `label` Pareto front trial selection method. | -| freqai.feature_parameters.label_distance_metric | `euclidean` | string | Distance metric for `compromise_programming` and `topsis` methods. | -| freqai.feature_parameters.label_cluster_metric | `euclidean` | string | Distance metric for `kmeans`, `kmeans2`, and `kmedoids` methods. | -| freqai.feature_parameters.label_cluster_selection_method | `topsis` | enum {`compromise_programming`,`topsis`} | Cluster selection method for clustering-based label methods. | -| freqai.feature_parameters.label_cluster_trial_selection_method | `topsis` | enum {`compromise_programming`,`topsis`} | Best cluster trial selection method for clustering-based label methods. | -| freqai.feature_parameters.label_density_metric | method-dependent | string | Distance metric for `knn` and `medoid` methods. | -| freqai.feature_parameters.label_density_aggregation | `power_mean` | enum {`power_mean`,`quantile`,`min`,`max`} | Aggregation method for KNN neighbor distances. | -| freqai.feature_parameters.label_density_n_neighbors | 5 | int >= 1 | Number of neighbors for KNN. | -| freqai.feature_parameters.label_density_aggregation_param | aggregation-dependent | float \| None | Tunable for KNN neighbor distance aggregation: Lp exponent (`power_mean`) or quantile value (`quantile`). | -| freqai.feature_parameters.scaler | `minmax` | enum {`minmax`,`maxabs`,`standard`,`robust`} | Feature scaling method. `minmax`=MinMaxScaler, `maxabs`=MaxAbsScaler, `standard`=StandardScaler, `robust`=RobustScaler. Changing this parameter requires deleting trained models. | -| freqai.feature_parameters.range | [-1.0, 1.0] | list[float] | Target range for `minmax` scaler, min and max. Changing this parameter requires deleting trained models. | -| _Label prediction_ | | | | -| freqai.label_prediction.method | `thresholding` | enum {`none`,`thresholding`} | Prediction method. `none` disables threshold computation, `thresholding` enables adaptive threshold calculation. | -| freqai.label_prediction.selection_method | `rank_extrema` | enum {`rank_extrema`,`rank_peaks`,`partition`} | Extrema selection method. `rank_extrema` ranks extrema values, `rank_peaks` ranks detected peak values, `partition` uses sign-based partitioning. | -| freqai.label_prediction.threshold_method | `mean` | enum {`mean`,`isodata`,`li`,`minimum`,`otsu`,`triangle`,`yen`,`median`,`soft_extremum`} | Thresholding method for prediction thresholds. | -| freqai.label_prediction.soft_extremum_alpha | 12.0 | float >= 0 | Alpha for `soft_extremum` threshold method. | -| freqai.label_prediction.outlier_quantile | 0.999 | float (0,1) | Quantile threshold for predictions outlier filtering. | -| freqai.label_prediction.keep_fraction | 0.5 | float (0,1] | Fraction of extrema used for thresholds. 1 uses all, lower values keep only most significant. Applies to `rank_extrema` and `rank_peaks`; ignored for `partition`. | -| _Optuna / HPO_ | | | | -| freqai.optuna_hyperopt.enabled | false | bool | Enables HPO. | -| freqai.optuna_hyperopt.sampler | `tpe` | enum {`tpe`,`auto`} | HPO sampler algorithm for `hp` namespace. `tpe` uses [TPESampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html) with multivariate, group, and constant_liar (when multiple workers), `auto` uses [AutoSampler](https://hub.optuna.org/samplers/auto_sampler). | -| freqai.optuna_hyperopt.label_sampler | `auto` | enum {`auto`,`tpe`,`nsgaii`,`nsgaiii`} | HPO sampler algorithm for multi-objective `label` namespace. `nsgaii` uses [NSGAIISampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.NSGAIISampler.html), `nsgaiii` uses [NSGAIIISampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.NSGAIIISampler.html). | -| freqai.optuna_hyperopt.storage | `file` | enum {`file`,`sqlite`} | HPO storage backend. | -| freqai.optuna_hyperopt.continuous | true | bool | Continuous HPO. | -| freqai.optuna_hyperopt.warm_start | true | bool | Warm start HPO with previous best value(s). | -| freqai.optuna_hyperopt.n_startup_trials | 15 | int >= 0 | HPO startup trials. | -| freqai.optuna_hyperopt.n_trials | 50 | int >= 1 | Maximum HPO trials. | -| freqai.optuna_hyperopt.n_jobs | CPU threads / 4 | int >= 1 | Parallel HPO workers. | -| freqai.optuna_hyperopt.timeout | 7200 | int >= 0 | HPO wall-clock timeout in seconds. | -| freqai.optuna_hyperopt.label_candles_step | 1 | int >= 1 | Step for Zigzag NATR horizon `label` search space. | -| freqai.optuna_hyperopt.space_reduction | false | bool | Enable/disable `hp` search space reduction based on previous best parameters. | -| freqai.optuna_hyperopt.space_fraction | 0.4 | float [0,1] | Fraction of the `hp` search space to use with `space_reduction`. Lower values create narrower search ranges around the best parameters. | -| freqai.optuna_hyperopt.min_resource | 3 | int >= 1 | Minimum resource per [HyperbandPruner](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.HyperbandPruner.html) rung. | -| freqai.optuna_hyperopt.seed | 1 | int >= 0 | HPO RNG seed. | +| Path | Default | Type / Range | Description | +| -------------------------------------------------------------- | ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| _Protections_ | | | | +| custom_protections.trade_duration_candles | 72 | int >= 1 | Estimated trade duration in candles. Scales protections stop duration candles and trade limit. | +| custom_protections.lookback_period_fraction | 0.5 | float (0,1] | Fraction of `fit_live_predictions_candles` used to calculate `lookback_period_candles` for _MaxDrawdown_ and _StoplossGuard_ protections. | +| custom_protections.cooldown.enabled | true | bool | Enable/disable _CooldownPeriod_ protection. | +| custom_protections.cooldown.stop_duration_candles | 4 | int >= 1 | Number of candles to wait before allowing new trades after a trade is closed. | +| custom_protections.drawdown.enabled | true | bool | Enable/disable _MaxDrawdown_ protection. | +| custom_protections.drawdown.max_allowed_drawdown | 0.2 | float (0,1) | Maximum allowed drawdown. | +| custom_protections.stoploss.enabled | true | bool | Enable/disable _StoplossGuard_ protection. | +| _Leverage_ | | | | +| leverage | `proposed_leverage` | float [1.0, max_leverage] | Leverage. Fallback to `proposed_leverage` for the pair. | +| _Exit pricing_ | | | | +| exit_pricing.trade_price_target_method | `moving_average` | enum {`moving_average`,`quantile_interpolation`,`weighted_average`} | Trade NATR computation method. | +| exit_pricing.thresholds_calibration.decline_quantile | 0.75 | float (0,1) | PnL decline quantile threshold. | +| _Reversal confirmation_ | | | | +| reversal_confirmation.lookback_period_candles | 0 | int >= 0 | Prior confirming candles; 0 = none. | +| reversal_confirmation.decay_fraction | 0.5 | float (0,1] | Geometric per-candle volatility adjusted reversal threshold relaxation factor. | +| reversal_confirmation.min_natr_multiplier_fraction | 0.0095 | float [0,1] | Lower bound fraction for volatility adjusted reversal threshold. | +| reversal_confirmation.max_natr_multiplier_fraction | 0.075 | float [0,1] | Upper bound fraction (>= lower bound) for volatility adjusted reversal threshold. | +| _Regressor model_ | | | | +| freqai.regressor | `xgboost` | enum {`xgboost`,`lightgbm`,`histgradientboostingregressor`,`ngboost`,`catboost`} | Machine learning regressor algorithm. | +| _Model training parameters_ | | | | +| freqai.model_training_parameters.gpu_vram_gb | 80 | enum {8,10,12,16,24,32,40,48,64,80} | Available GPU VRAM (GB) for CatBoost, not total. Constrains `depth`, `border_count`, and `max_ctr_complexity` ranges. | +| _Data split parameters_ | | | | +| freqai.data_split_parameters.method | `train_test_split` | enum {`train_test_split`,`timeseries_split`} | Data splitting strategy. `train_test_split` for sequential split, `timeseries_split` for chronological split with configurable gap. | +| freqai.data_split_parameters.test_size | 0.1 / None | float (0,1) \| int >= 1 \| None | Test set size. Float for fraction, int for count. Default: 0.1 for `train_test_split`, None for `timeseries_split` (sklearn dynamic sizing). | +| freqai.data_split_parameters.n_splits | 5 | int >= 2 | Controls train/test proportions for `timeseries_split` (higher = larger train set). | +| freqai.data_split_parameters.gap | 0 | int >= 0 | Samples to exclude between train/test for `timeseries_split`. When 0, auto-calculated from `label_period_candles` to prevent look-ahead bias. | +| freqai.data_split_parameters.max_train_size | None | int >= 1 \| None | Maximum training set size for `timeseries_split`. When set, creates a sliding window instead of expanding train set. None = no limit. | +| _Label smoothing_ | | | | +| freqai.label_smoothing.method | `gaussian` | enum {`none`,`gaussian`,`kaiser`,`triang`,`smm`,`sma`,`savgol`,`gaussian_filter1d`} | Label smoothing method (`smm`=median, `sma`=mean, `savgol`=Savitzky–Golay). | +| freqai.label_smoothing.window_candles | 5 | int >= 3 | Smoothing window length (candles). | +| freqai.label_smoothing.beta | 8.0 | float > 0 | Shape parameter for `kaiser` kernel. | +| freqai.label_smoothing.polyorder | 3 | int >= 0 | Polynomial order for `savgol` smoothing. | +| freqai.label_smoothing.mode | `mirror` | enum {`mirror`,`constant`,`nearest`,`wrap`,`interp`} | Boundary mode for `savgol` and `gaussian_filter1d`. | +| freqai.label_smoothing.sigma | 1.0 | float > 0 | Gaussian `sigma` for `gaussian_filter1d` smoothing. | +| _Label weighting_ | | | | +| freqai.label_weighting.strategy | `none` | enum {`none`,`uniform`,`amplitude`,`amplitude_threshold_ratio`,`volume_rate`,`speed`,`efficiency_ratio`,`volume_weighted_efficiency_ratio`,`combined`} | Label weighting metric: none (`none`), uniform unit weight on every detected pivot (`uniform`), swing amplitude (`amplitude`), swing amplitude / median volatility-threshold ratio (`amplitude_threshold_ratio`), swing volume per candle (`volume_rate`), swing speed (`speed`), swing efficiency ratio (`efficiency_ratio`), swing volume-weighted efficiency ratio (`volume_weighted_efficiency_ratio`), or combined metrics aggregation (`combined`). Switching between `none` and any other strategy requires deleting trained models to realign training emphasis. | +| freqai.label_weighting.metric_coefficients | {} | dict[str, float] | Per-metric coefficients for `combined` strategy. Keys: `amplitude`, `amplitude_threshold_ratio`, `volume_rate`, `speed`, `efficiency_ratio`, `volume_weighted_efficiency_ratio`. | +| freqai.label_weighting.aggregation | `arithmetic_mean` | enum {`arithmetic_mean`,`geometric_mean`,`harmonic_mean`,`quadratic_mean`,`weighted_median`,`softmax`} | Metric aggregation method for `combined` strategy. `arithmetic_mean`=(Σ(w·m)/Σ(w)), `geometric_mean`=(∏(m^w))^(1/Σw), `harmonic_mean`=Σ(w)/(Σ(w/m)), `quadratic_mean`=(Σ(w·m²)/Σ(w))^(1/2), `weighted_median`=Q₀.₅(m,w), `softmax`=Σ(m·s_i) where s_i=w_i·exp(m_i/T)/Σ(w_j·exp(m_j/T)). | +| freqai.label_weighting.softmax_temperature | 1.0 | float > 0 | Temperature T for `softmax` aggregation, controls distribution sharpness. | +| freqai.label_weighting.fill_method | `zero` | enum {`zero`,`epsilon`,`gaussian`} | Off-pivot weighting scheme. `zero` hard-zeros off-pivot rows; `epsilon` applies a flat baseline `fill_epsilon * (pivot_weights)`; `gaussian` applies heatmap-style decay around each pivot. Switching away from `zero` may require retuning tree-leaf regularization (`min_child_weight`, `lambda`) and resetting any prior Optuna study. Changing this parameter requires deleting trained models. | +| freqai.label_weighting.fill_epsilon | 0.001 | float [0,1] | Off-pivot fraction of the pivot baseline. Ignored when `fill_method != "epsilon"`. | +| freqai.label_weighting.fill_epsilon_baseline | `mean` | enum {`mean`,`median`} | Pivot baseline statistic. `mean` tracks central tendency; `median` is robust against pivot-weight skew. Ignored when `fill_method != "epsilon"`. | +| freqai.label_weighting.fill_sigma_candles | 3.0 | float >= 0.5 | Gaussian standard deviation in candles for `fill_method == "gaussian"`. Lower bound 0.5 prevents underflow that silently degrades to `zero` mode. Ignored when `fill_method != "gaussian"`. | +| _Label pipeline_ | | | | +| freqai.label_pipeline.standardization | `none` | enum {`none`,`zscore`,`robust`,`mmad`,`power_yj`} | Standardization method applied to labels before normalization. `none`=w, `zscore`=(w-μ)/σ, `robust`=(w-median)/(Q₃-Q₁), `mmad`=(w-median)/(MAD·k), `power_yj`=YJ(w). | +| freqai.label_pipeline.robust_quantiles | [0.25, 0.75] | list[float] where 0 <= Q1 < Q3 <= 1 | Quantile range for robust standardization, Q1 and Q3. | +| freqai.label_pipeline.mmad_scaling_factor | 1.4826 | float > 0 | Scaling factor for MMAD standardization. | +| freqai.label_pipeline.normalization | `maxabs` | enum {`maxabs`,`minmax`,`sigmoid`,`none`} | Normalization method applied to labels. `maxabs`=w/max(\|w\|), `minmax`=low+(w-min)/(max-min)·(high-low), `sigmoid`=2·σ(scale·w)-1, `none`=w. | +| freqai.label_pipeline.minmax_range | [-1.0, 1.0] | list[float] | Target range for `minmax` normalization, min and max. | +| freqai.label_pipeline.sigmoid_scale | 1.0 | float > 0 | Scale parameter for `sigmoid` normalization, controls steepness. | +| freqai.label_pipeline.gamma | 1.0 | float (0,10] | Contrast exponent applied to labels after normalization: >1 emphasizes extrema, values between 0 and 1 soften. | +| _Feature parameters_ | | | | +| freqai.feature_parameters.label_period_candles | min/max midpoint | int >= 1 | Zigzag labeling NATR horizon. | +| freqai.feature_parameters.min_label_period_candles | 12 | int >= 1 | Minimum labeling NATR horizon used for reversals labeling HPO. | +| freqai.feature_parameters.max_label_period_candles | 24 | int >= 1 | Maximum labeling NATR horizon used for reversals labeling HPO. | +| freqai.feature_parameters.label_natr_multiplier | min/max midpoint | float > 0 | Zigzag labeling NATR multiplier. | +| freqai.feature_parameters.min_label_natr_multiplier | 9.0 | float > 0 | Minimum labeling NATR multiplier used for reversals labeling HPO. | +| freqai.feature_parameters.max_label_natr_multiplier | 12.0 | float > 0 | Maximum labeling NATR multiplier used for reversals labeling HPO. | +| freqai.feature_parameters.label_frequency_candles | `auto` | int >= 2 \| `auto` | Reversals labeling frequency. `auto` = max(2, 2 \* number of whitelisted pairs). | +| freqai.feature_parameters.label_weights | [1/7,1/7,1/7,1/7,1/7,1/7,1/7] | list[float] | Per-objective weights for trial selection methods. Objectives: (1) number of detected reversals, (2) median swing amplitude, (3) median (swing amplitude / median volatility-threshold ratio), (4) median swing volume per candle, (5) median swing speed, (6) median swing efficiency ratio, (7) median swing volume-weighted efficiency ratio. | +| freqai.feature_parameters.label_p_order | None | float \| None | Lp exponent for parameterized metrics. Used by `minkowski` distance (default 2.0) and `power_mean` aggregation (default 1.0). Ignored by other metrics. | +| freqai.feature_parameters.label_method | `compromise_programming` | enum {`compromise_programming`,`topsis`,`kmeans`,`kmeans2`,`kmedoids`,`knn`,`medoid`} | HPO `label` Pareto front trial selection method. | +| freqai.feature_parameters.label_distance_metric | `euclidean` | string | Distance metric for `compromise_programming` and `topsis` methods. | +| freqai.feature_parameters.label_cluster_metric | `euclidean` | string | Distance metric for `kmeans`, `kmeans2`, and `kmedoids` methods. | +| freqai.feature_parameters.label_cluster_selection_method | `topsis` | enum {`compromise_programming`,`topsis`} | Cluster selection method for clustering-based label methods. | +| freqai.feature_parameters.label_cluster_trial_selection_method | `topsis` | enum {`compromise_programming`,`topsis`} | Best cluster trial selection method for clustering-based label methods. | +| freqai.feature_parameters.label_density_metric | method-dependent | string | Distance metric for `knn` and `medoid` methods. | +| freqai.feature_parameters.label_density_aggregation | `power_mean` | enum {`power_mean`,`quantile`,`min`,`max`} | Aggregation method for KNN neighbor distances. | +| freqai.feature_parameters.label_density_n_neighbors | 5 | int >= 1 | Number of neighbors for KNN. | +| freqai.feature_parameters.label_density_aggregation_param | aggregation-dependent | float \| None | Tunable for KNN neighbor distance aggregation: Lp exponent (`power_mean`) or quantile value (`quantile`). | +| freqai.feature_parameters.scaler | `minmax` | enum {`minmax`,`maxabs`,`standard`,`robust`} | Feature scaling method. `minmax`=MinMaxScaler, `maxabs`=MaxAbsScaler, `standard`=StandardScaler, `robust`=RobustScaler. Changing this parameter requires deleting trained models. | +| freqai.feature_parameters.range | [-1.0, 1.0] | list[float] | Target range for `minmax` scaler, min and max. Changing this parameter requires deleting trained models. | +| _Label prediction_ | | | | +| freqai.label_prediction.method | `thresholding` | enum {`none`,`thresholding`} | Prediction method. `none` disables threshold computation, `thresholding` enables adaptive threshold calculation. | +| freqai.label_prediction.selection_method | `rank_extrema` | enum {`rank_extrema`,`rank_peaks`,`partition`} | Extrema selection method. `rank_extrema` ranks extrema values, `rank_peaks` ranks detected peak values, `partition` uses sign-based partitioning. | +| freqai.label_prediction.threshold_method | `mean` | enum {`mean`,`isodata`,`li`,`minimum`,`otsu`,`triangle`,`yen`,`median`,`soft_extremum`} | Thresholding method for prediction thresholds. | +| freqai.label_prediction.soft_extremum_alpha | 12.0 | float >= 0 | Alpha for `soft_extremum` threshold method. | +| freqai.label_prediction.outlier_quantile | 0.999 | float (0,1) | Quantile threshold for predictions outlier filtering. | +| freqai.label_prediction.keep_fraction | 0.5 | float (0,1] | Fraction of extrema used for thresholds. 1 uses all, lower values keep only most significant. Applies to `rank_extrema` and `rank_peaks`; ignored for `partition`. | +| _Optuna / HPO_ | | | | +| freqai.optuna_hyperopt.enabled | false | bool | Enables HPO. | +| freqai.optuna_hyperopt.sampler | `tpe` | enum {`tpe`,`auto`} | HPO sampler algorithm for `hp` namespace. `tpe` uses [TPESampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.TPESampler.html) with multivariate, group, and constant_liar (when multiple workers), `auto` uses [AutoSampler](https://hub.optuna.org/samplers/auto_sampler). | +| freqai.optuna_hyperopt.label_sampler | `auto` | enum {`auto`,`tpe`,`nsgaii`,`nsgaiii`} | HPO sampler algorithm for multi-objective `label` namespace. `nsgaii` uses [NSGAIISampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.NSGAIISampler.html), `nsgaiii` uses [NSGAIIISampler](https://optuna.readthedocs.io/en/stable/reference/samplers/generated/optuna.samplers.NSGAIIISampler.html). | +| freqai.optuna_hyperopt.storage | `file` | enum {`file`,`sqlite`} | HPO storage backend. | +| freqai.optuna_hyperopt.continuous | true | bool | Continuous HPO. | +| freqai.optuna_hyperopt.warm_start | true | bool | Warm start HPO with previous best value(s). | +| freqai.optuna_hyperopt.n_startup_trials | 15 | int >= 0 | HPO startup trials. | +| freqai.optuna_hyperopt.n_trials | 50 | int >= 1 | Maximum HPO trials. | +| freqai.optuna_hyperopt.n_jobs | CPU threads / 4 | int >= 1 | Parallel HPO workers. | +| freqai.optuna_hyperopt.timeout | 7200 | int >= 0 | HPO wall-clock timeout in seconds. | +| freqai.optuna_hyperopt.label_candles_step | 1 | int >= 1 | Step for Zigzag NATR horizon `label` search space. | +| freqai.optuna_hyperopt.space_reduction | false | bool | Enable/disable `hp` search space reduction based on previous best parameters. | +| freqai.optuna_hyperopt.space_fraction | 0.4 | float [0,1] | Fraction of the `hp` search space to use with `space_reduction`. Lower values create narrower search ranges around the best parameters. | +| freqai.optuna_hyperopt.min_resource | 3 | int >= 1 | Minimum resource per [HyperbandPruner](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.pruners.HyperbandPruner.html) rung. | +| freqai.optuna_hyperopt.seed | 1 | int >= 0 | HPO RNG seed. | ## ReforceXY diff --git a/quickadapter/user_data/freqaimodels/QuickAdapterRegressorV3.py b/quickadapter/user_data/freqaimodels/QuickAdapterRegressorV3.py index df540b4..70a6424 100644 --- a/quickadapter/user_data/freqaimodels/QuickAdapterRegressorV3.py +++ b/quickadapter/user_data/freqaimodels/QuickAdapterRegressorV3.py @@ -102,7 +102,7 @@ class QuickAdapterRegressorV3(BaseRegressionModel): https://github.com/sponsors/robcaulk """ - version = "3.11.11" + version = "3.11.12" _TEST_SIZE: Final[float] = 0.1 diff --git a/quickadapter/user_data/strategies/LabelTransformer.py b/quickadapter/user_data/strategies/LabelTransformer.py index d4c9caf..16fb833 100644 --- a/quickadapter/user_data/strategies/LabelTransformer.py +++ b/quickadapter/user_data/strategies/LabelTransformer.py @@ -56,9 +56,10 @@ COMBINED_AGGREGATIONS: Final[tuple[CombinedAggregation, ...]] = ( "softmax", ) -WeightStrategy = Literal["none", "combined"] | CombinedMetric +WeightStrategy = Literal["none", "uniform", "combined"] | CombinedMetric WEIGHT_STRATEGIES: Final[tuple[WeightStrategy, ...]] = ( "none", + "uniform", *COMBINED_METRICS, "combined", ) diff --git a/quickadapter/user_data/strategies/QuickAdapterV3.py b/quickadapter/user_data/strategies/QuickAdapterV3.py index 1ebb0b1..c5a5e62 100644 --- a/quickadapter/user_data/strategies/QuickAdapterV3.py +++ b/quickadapter/user_data/strategies/QuickAdapterV3.py @@ -115,7 +115,7 @@ class QuickAdapterV3(IStrategy): _ANNOTATION_LINE_OFFSET_CANDLES: Final[int] = 10 def version(self) -> str: - return "3.11.11" + return "3.11.12" timeframe = "5m" timeframe_minutes = timeframe_to_minutes(timeframe) diff --git a/quickadapter/user_data/strategies/Utils.py b/quickadapter/user_data/strategies/Utils.py index 72edcdf..aa28d7c 100644 --- a/quickadapter/user_data/strategies/Utils.py +++ b/quickadapter/user_data/strategies/Utils.py @@ -7,6 +7,7 @@ import re from dataclasses import dataclass from enum import IntEnum from functools import lru_cache, singledispatch +from collections.abc import Sequence from logging import Logger from pathlib import Path from typing import ( @@ -15,7 +16,6 @@ from typing import ( Callable, Final, Literal, - Optional, TypeVar, Union, ) @@ -597,96 +597,69 @@ def _get_label_config( return {"default": validated_default, "columns": {}} -def _validate_weighting_params( - config: dict[str, Any], - logger: Logger, - config_name: str = "label_weighting", -) -> dict[str, Any]: - return _validate_params( - config, logger, config_name, _WEIGHTING_SPECS, DEFAULTS_LABEL_WEIGHTING - ) +_LABEL_KIND_REGISTRY: Final[dict[str, tuple[dict[str, _ParamSpec], dict[str, Any]]]] = { + "label_weighting": (_WEIGHTING_SPECS, DEFAULTS_LABEL_WEIGHTING), + "label_pipeline": (_PIPELINE_SPECS, DEFAULTS_LABEL_PIPELINE), + "label_smoothing": (_SMOOTHING_SPECS, DEFAULTS_LABEL_SMOOTHING), + "label_prediction": (_PREDICTION_SPECS, DEFAULTS_LABEL_PREDICTION), +} -def get_label_weighting_config( - config: dict[str, Any], - logger: Logger, -) -> dict[str, Any]: - return _get_label_config( - config, - logger, - "label_weighting", - _validate_weighting_params, - DEFAULTS_LABEL_WEIGHTING, - ) +def _label_kind_validator(kind: str) -> ValidateParamsFn: + specs, defaults = _LABEL_KIND_REGISTRY[kind] + def validate( + config: dict[str, Any], + logger: Logger, + config_name: str = kind, + ) -> dict[str, Any]: + return _validate_params(config, logger, config_name, specs, defaults) -def _validate_pipeline_params( - config: dict[str, Any], - logger: Logger, - config_name: str = "label_pipeline", -) -> dict[str, Any]: - return _validate_params( - config, logger, config_name, _PIPELINE_SPECS, DEFAULTS_LABEL_PIPELINE - ) + return validate -def get_label_pipeline_config( +def get_label_kind_config( + kind: str, config: dict[str, Any], logger: Logger, ) -> dict[str, Any]: + if kind not in _LABEL_KIND_REGISTRY: + raise ValueError( + f"Unknown label kind {kind!r}: supported values are " + f"{', '.join(_LABEL_KIND_REGISTRY)}" + ) + _, defaults = _LABEL_KIND_REGISTRY[kind] return _get_label_config( - config, - logger, - "label_pipeline", - _validate_pipeline_params, - DEFAULTS_LABEL_PIPELINE, + config, logger, kind, _label_kind_validator(kind), defaults ) -def _validate_smoothing_params( +def get_label_weighting_config( config: dict[str, Any], logger: Logger, - config_name: str = "label_smoothing", ) -> dict[str, Any]: - return _validate_params( - config, logger, config_name, _SMOOTHING_SPECS, DEFAULTS_LABEL_SMOOTHING - ) + return get_label_kind_config("label_weighting", config, logger) -def get_label_smoothing_config( +def get_label_pipeline_config( config: dict[str, Any], logger: Logger, ) -> dict[str, Any]: - return _get_label_config( - config, - logger, - "label_smoothing", - _validate_smoothing_params, - DEFAULTS_LABEL_SMOOTHING, - ) + return get_label_kind_config("label_pipeline", config, logger) -def _validate_prediction_params( +def get_label_smoothing_config( config: dict[str, Any], logger: Logger, - config_name: str = "label_prediction", ) -> dict[str, Any]: - return _validate_params( - config, logger, config_name, _PREDICTION_SPECS, DEFAULTS_LABEL_PREDICTION - ) + return get_label_kind_config("label_smoothing", config, logger) def get_label_prediction_config( config: dict[str, Any], logger: Logger, ) -> dict[str, Any]: - return _get_label_config( - config, - logger, - "label_prediction", - _validate_prediction_params, - DEFAULTS_LABEL_PREDICTION, - ) + return get_label_kind_config("label_prediction", config, logger) _EPOCH_MS_MIN = 1_262_304_000_000 # 2010-01-01T00:00:00Z @@ -814,8 +787,6 @@ def _pivot_equivalent_count( if survivors.size == 0: return 0 threshold = _PIVOT_EQUIVALENT_MAX_FRACTION * float(survivors.max()) - if threshold <= 0.0: - return 0 return int((survivors >= threshold).sum()) @@ -1076,15 +1047,18 @@ def _impute_weights( if weights.size == 0: return np.full_like(weights, default_weight, dtype=float) - # Zigzag emits NaN at unconfirmed boundary pivots; zero them out and - # exclude from the median so they don't drag interior imputation. + finite_mask = np.isfinite(weights) + if not finite_mask.any(): + return np.full_like(weights, default_weight, dtype=float) + + # Zigzag emits NaN at unconfirmed boundary pivots; zero out the leading + # and trailing non-finite runs so they don't drag interior imputation. boundary_mask = np.zeros(weights.size, dtype=bool) - if not np.isfinite(weights[0]): - boundary_mask[0] = True - if not np.isfinite(weights[-1]): - boundary_mask[-1] = True + first_finite = int(np.argmax(finite_mask)) + last_finite = weights.size - 1 - int(np.argmax(finite_mask[::-1])) + boundary_mask[:first_finite] = True + boundary_mask[last_finite + 1 :] = True - finite_mask = np.isfinite(weights) interior_finite_mask = finite_mask & ~boundary_mask if not interior_finite_mask.any(): weights[~finite_mask] = default_weight @@ -1129,8 +1103,7 @@ def _gaussian_fill_weights( return np.zeros(n_values, dtype=float) if np.any(pivot_weights < 0.0): raise ValueError( - f"Invalid pivot_weights min={float(pivot_weights.min())!r}: " - f"must be >= 0" + f"Invalid pivot_weights min={float(pivot_weights.min())!r}: must be >= 0" ) pivot_indices_array = pivot_indices.astype(float) pivot_weights_row = pivot_weights.astype(float)[np.newaxis, :] @@ -1173,41 +1146,30 @@ def _gaussian_fill_weights( def _scatter_weights( n_values: int, - indices: list[int], + indices_array: NDArray[np.integer], + valid_mask: NDArray[np.bool_], weights: NDArray[np.floating], fill_weights: NDArray[np.floating], - *, - indices_array: NDArray[np.integer] | None = None, - valid_mask: NDArray[np.bool_] | None = None, ) -> NDArray[np.floating]: """Scatter per-pivot weights into a full-length array. Pivot rows (validated via ``valid_mask``) receive ``weights``; off-pivot rows receive the corresponding entry of ``fill_weights`` (shape - ``(n_values,)``). Callers may pre-compute ``indices_array`` and - ``valid_mask`` and pass them in to avoid recomputation when the dispatch - needs the same mask for both filtered pivot extraction and the scatter. + ``(n_values,)``). """ if fill_weights.shape != (n_values,): raise ValueError( - f"Invalid fill_weights shape {fill_weights.shape!r}: " - f"must be ({n_values},)" + f"Invalid fill_weights shape {fill_weights.shape!r}: must be ({n_values},)" ) # Empty-input early return precedes the length-mismatch check on purpose. - if len(indices) == 0 or weights.size == 0: + if indices_array.size == 0 or weights.size == 0: return fill_weights.astype(float, copy=True) - if len(indices) != weights.size: + if indices_array.size != weights.size: raise ValueError( - f"Invalid indices/weights values: length mismatch, " - f"got {len(indices)} indices but {weights.size} weights" + f"Invalid indices_array/weights values: length mismatch, " + f"got {indices_array.size} indices but {weights.size} weights" ) - if indices_array is None: - indices_array = np.asarray(indices, dtype=int) - if valid_mask is None: - valid_mask = (indices_array >= 0) & (indices_array < n_values) weights_array = fill_weights.astype(float, copy=True) - if not np.any(valid_mask): - return weights_array weights_array[indices_array[valid_mask]] = weights[valid_mask] return weights_array @@ -1300,7 +1262,7 @@ def _compute_combined_label_weights( values_array = np.asarray(metric_values, dtype=float) if values_array.size == 0: continue - imputed_metrics.append(_impute_weights(weights=values_array)) + imputed_metrics.append(_impute_weights(values_array)) coefficients_list.append(float(coefficient)) if len(imputed_metrics) == 0: @@ -1316,7 +1278,7 @@ def _compute_combined_label_weights( def compute_label_weights( n_values: int, - indices: list[int], + indices: Sequence[int] | NDArray[np.integer], metrics: dict[str, list[float]], weighting_config: dict[str, Any], *, @@ -1338,11 +1300,25 @@ def compute_label_weights( "callers must skip invocation when weighting is disabled" ) - weights: Optional[NDArray[np.floating]] = None + indices_array = np.asarray(indices, dtype=int) + valid_mask = (indices_array >= 0) & (indices_array < n_values) + n_indices = indices_array.size + n_dropped = n_indices - int(valid_mask.sum()) + if n_dropped > 0: + logger.warning( + "compute_label_weights: %d/%d pivot indices out of range [0, %d); dropped", + n_dropped, + n_indices, + n_values, + ) - if strategy in metrics: + weights: NDArray[np.floating] + + if strategy == WEIGHT_STRATEGIES[1]: # "uniform" + weights = np.ones(n_indices, dtype=float) + elif strategy in metrics: weights = np.asarray(metrics[strategy], dtype=float) - elif strategy == WEIGHT_STRATEGIES[7]: # "combined" + elif strategy == WEIGHT_STRATEGIES[8]: # "combined" weights = _compute_combined_label_weights( metrics=metrics, metric_coefficients=label_weighting["metric_coefficients"], @@ -1355,16 +1331,11 @@ def compute_label_weights( f"supported values are {', '.join(WEIGHT_STRATEGIES)} or metric names {', '.join(metrics.keys())}" ) - weights = _impute_weights( - weights=weights, - ) + weights = _impute_weights(weights) if weights.size == 0: return np.zeros(n_values, dtype=float) - indices_array = np.asarray(indices, dtype=int) - valid_mask = (indices_array >= 0) & (indices_array < n_values) - fill_method = label_weighting["fill_method"] if fill_method == FILL_METHODS[0]: # "zero" @@ -1379,9 +1350,7 @@ def compute_label_weights( elif baseline == FILL_EPSILON_BASELINES[1]: # "median" pivot_baseline = float(np.nanmedian(pivot_values)) else: - raise ValueError( - f"Invalid fill_epsilon_baseline value {baseline!r}" - ) + raise ValueError(f"Invalid fill_epsilon_baseline value {baseline!r}") if not np.isfinite(pivot_baseline): pivot_baseline = 0.0 else: @@ -1400,11 +1369,10 @@ def compute_label_weights( return _scatter_weights( n_values=n_values, - indices=indices, - weights=weights, - fill_weights=fill_weights, indices_array=indices_array, valid_mask=valid_mask, + weights=weights, + fill_weights=fill_weights, ) @@ -2527,13 +2495,13 @@ def fit_regressor( X: pd.DataFrame, y: pd.DataFrame, train_weights: NDArray[np.floating], - eval_set: Optional[list[tuple[pd.DataFrame, pd.DataFrame]]], - eval_weights: Optional[list[NDArray[np.floating]]], + eval_set: list[tuple[pd.DataFrame, pd.DataFrame]] | None, + eval_weights: list[NDArray[np.floating]] | None, model_training_parameters: dict[str, Any], init_model: Any = None, - callbacks: Optional[list[RegressorCallback]] = None, - model_path: Optional[Path] = None, - trial: Optional[optuna.trial.Trial] = None, + callbacks: list[RegressorCallback] | None = None, + model_path: Path | None = None, + trial: optuna.trial.Trial | None = None, ) -> Any: fit_callbacks = list(callbacks) if callbacks else [] @@ -2839,8 +2807,8 @@ def make_test_set_and_weights( test_weights: NDArray[np.floating], test_size: float, ) -> tuple[ - Optional[list[tuple[pd.DataFrame, pd.DataFrame]]], - Optional[list[NDArray[np.floating]]], + list[tuple[pd.DataFrame, pd.DataFrame]] | None, + list[NDArray[np.floating]] | None, ]: """Wrap test data for ``model.fit`` ``eval_set`` when ``test_size > 0``. @@ -2876,7 +2844,7 @@ def _optuna_suggest_int_from_range( def optuna_load_best_params( base_path: Path, pair: str, namespace: str -) -> Optional[dict[str, Any]]: +) -> dict[str, Any] | None: best_params_path = ( base_path / f"optuna-{namespace}-best-params-{pair.split('/')[0]}.json" ) @@ -3533,7 +3501,7 @@ def get_optuna_study_model_parameters( @lru_cache(maxsize=128) -def largest_divisor_to_step(integer: int, step: int) -> Optional[int]: +def largest_divisor_to_step(integer: int, step: int) -> int | None: if not isinstance(integer, int) or integer <= 0: raise ValueError( f"Invalid integer value {integer!r}: must be a positive integer" @@ -3544,7 +3512,7 @@ def largest_divisor_to_step(integer: int, step: int) -> Optional[int]: if step == 1 or integer % step == 0: return integer - best_divisor: Optional[int] = None + best_divisor: int | None = None max_divisor = int(math.isqrt(integer)) for i in range(1, max_divisor + 1): if integer % i != 0: -- 2.53.0