Jérôme Benoit [Sun, 24 May 2026 00:40:05 +0000 (02:40 +0200)]
refactor(weights): align _train with BaseRegressionModel.train
Rename _train_default to _train to match the upstream method name (with
underscore prefix to mark it as the internal mirror, since the public
train() method routes between data split paths).
Mirror BaseRegressionModel.train line-for-line with _compose_train_weights
as the single intentional insertion between make_train_test_datasets and
the pipeline application:
- Drop ensure_datetime_series wrapper around unfiltered_df['date']:
upstream calls .iloc[].strftime() directly.
- Drop **kwargs from self.fit(dd, dk) to match upstream signature.
- Use dk.data_dictionary['train_features'].columns for feature count log,
matching upstream source of truth.
- Apply the same cosmetic alignment to the timeseries_split path for
consistency between both train code paths.
- Add docstring documenting the mirror relationship and the single
functional difference.
Jérôme Benoit [Sun, 24 May 2026 00:10:39 +0000 (02:10 +0200)]
feat(weights): integrate sample_weight composition into both train() data split paths
Add _train_default() mirroring BaseRegressionModel.train() with _compose_train_weights inserted between make_train_test_datasets and _apply_pipelines. Routes train_test_split path through _train_default instead of super().train(). Inserts _compose_train_weights before _apply_pipelines in timeseries_split path. Calls _strip_label_weight_columns(dk) at top of train() for both branches. Validated locally with pytest + structural AST checks (evidence: .omo/evidence/task-9-{pytest,structural}.txt).
Jérôme Benoit [Sat, 23 May 2026 23:55:34 +0000 (01:55 +0200)]
fix(weights): persist label weights into <label>_weight column instead of rescaling target
Removes statistically incorrect target rescaling (label = direction × weight). Persists raw direction labels and a separate <label>_weight column for downstream sample_weight composition. Validated locally with pytest (evidence: .omo/evidence/task-6-{red,green}.txt).
Jérôme Benoit [Sat, 23 May 2026 22:23:35 +0000 (00:23 +0200)]
fix(zigzag): default normalize to False to prevent label magnitude leak (#71)
When set_freqai_targets is invoked by FreqAI's backtesting loop, the dataframe
passed to _generate_extrema_label spans the full historical window
(right-truncated to the current train-window stop), not just train_period_days.
With normalize=True, zigzag applies a global minmax scaling across all detected
pivots in that wider window to amplitudes, amplitude_threshold_ratios,
volume_rates and speeds. The resulting label magnitudes therefore depend on the
global pivot distribution, including pivots outside the current training slice
— a magnitude leak from out-of-train data into training labels.
Switching the zigzag default to normalize=False emits raw log-amplitude values
(|log(P2/P1)|) and defers any scaling to LabelTransformer, which is fitted
strictly on the train slice and is therefore leak-free. The two existing call
sites — _generate_extrema_label (label generation) and label_objective (Optuna
hyperopt) — both want the unnormalized output, so the redundant
normalize=False kwargs are dropped at the call sites in favor of the default.
Strategy and regressor patch versions are bumped to 3.11.8.
Caveat: with apply_label_weighting strategy="combined", the metrics now sit on
heterogeneous scales (raw log-amplitudes ~[0.005, 0.5] mix with bounded ratios
in [0, 1] like efficiency_ratio). Users relying on "combined" aggregation
(power means, weighted_median, softmax) may need to introduce metric-specific
rescaling on the train slice before aggregation. Direction-only (strategy=
"none") and single-metric strategies (e.g. strategy="amplitude") are
unaffected.
Jérôme Benoit [Fri, 1 May 2026 19:03:15 +0000 (21:03 +0200)]
fix: pin pandas>=3.0 in Dockerfile and bump version to 3.11.7
Prevent silent pandas downgrade to 2.x during pip install, which
causes dtype mismatches with freqtrade 2026.4 date handling code.
Includes epoch-ms range validation in ensure_datetime_series.
Jérôme Benoit [Fri, 1 May 2026 14:03:12 +0000 (16:03 +0200)]
fix: validate epoch-ms range before converting int64 date columns
Reject int64 values outside [2010, 2035] epoch-ms range to fail fast
on corrupted data instead of silently producing wrong dates. Catches
nanosecond/microsecond values that would pass the int64 dtype check
but produce garbage timestamps if interpreted as milliseconds.
Jérôme Benoit [Fri, 1 May 2026 13:46:19 +0000 (15:46 +0200)]
fix: add 30min stop_grace_period to prevent data corruption on shutdown
FreqAI training can take minutes to hours. Docker's default 10s grace
period causes SIGKILL mid-write, corrupting feather/pickle files.
Give freqtrade up to 30 minutes to finish training and flush data
before Docker sends SIGKILL.
Jérôme Benoit [Fri, 1 May 2026 10:32:33 +0000 (12:32 +0200)]
fix: align ensure_datetime_series with freqtrade data handler pattern
Chain .dt.as_unit("ms") to guarantee datetime64[ms, UTC] output
resolution regardless of pandas version, matching the contract
established in freqtrade commit 2c5dc72.
refactor: extract ensure_datetime_series helper for date dtype workaround
Centralizes the int64 epoch-ms vs datetime detection logic into a shared
helper. Handles both formats correctly: unit='ms' for int64, passthrough
for existing datetime columns.
fix: workaround freqtrade 2026.4 date column dtype regression
Freqtrade 2026.4 (commit 2c5dc72) changed feather/parquet handlers to
use .dt.as_unit("ms") instead of to_datetime(col, unit="ms", utc=True).
This breaks when data files store dates as int64 epoch-ms, causing
AttributeError in feature_engineering_standard.
Use pd.to_datetime(col, utc=True) defensively to handle both int64 and
datetime inputs.
Jérôme Benoit [Tue, 31 Mar 2026 00:29:28 +0000 (02:29 +0200)]
docs: fix semantic accuracy of README configuration tunables
- polyorder: correct range from int >= 1 to int >= 0 (savgol accepts degree-0)
- robust standardization: replace 'IQR' with '(Q₃-Q₁)' (quantiles are configurable)
- label_weights: broaden scope from 'distance calculations to ideal point' to 'trial selection methods'
- label_p_order: replace 'p-order parameter for distance metrics' with 'Lp exponent for parameterized metrics'
- label_density_aggregation_param: replace 'p-order' with 'Lp exponent' for consistency
Jérôme Benoit [Thu, 12 Feb 2026 23:10:08 +0000 (00:10 +0100)]
fix(ReforceXY): add context-aware guard for efficiency coefficient division
Prevent division explosion in _compute_efficiency_coefficient() when
max_unrealized_profit ≈ min_unrealized_profit by requiring a minimum
meaningful range based on pnl_target. Also adds validation warnings
for potential_gamma=0 and pnl_target<=0 edge cases.
Jérôme Benoit [Thu, 12 Feb 2026 14:17:10 +0000 (15:17 +0100)]
feat(ReforceXY): tune reward sensitivity and extend training period
- Increase pnl_amplification_sensitivity from 0.5 to 2.0 for stronger
reward signal differentiation
- Extend train_period_days from 60 to 120 for more training data
Jérôme Benoit [Mon, 9 Feb 2026 21:04:23 +0000 (22:04 +0100)]
fix(quickadapter): use Optuna params for TimeSeriesSplit gap calculation
Previously gap was calculated from ft_params with a hardcoded default, which could return incorrect values when Optuna optimized parameters. Also standardizes log message format to use [pair] prefix.
- Use test_size parameter in TimeSeriesSplit
- Remove unused dk parameter from _make_timeseries_split_datasets()
- Assign dk.data_dictionary = dd before logging
- Fix typo: train_test_test -> train_test_split in README
* docs: integrate data_split_parameters into tunables table
Remove standalone section and add parameters to existing table
with freqai. prefix for consistency.
* refactor: use FreqAI APIs for weight calculation and data dictionary
- Use dk.set_weights_higher_recent() instead of duplicating weight formula
- Use dk.build_data_dictionary() for consistent data structure
- Respects feature_parameters.weight_factor configuration
- Fix bug: was using data_kitchen_thread_count instead of weight_factor
* refactor: extract _apply_pipelines() to reduce code duplication
- Move pipeline definition and application logic to helper method
- Reduces train() override complexity while keeping same behavior
- Helper can be reused by future custom split implementations
* style: harmonize namespace and remove inline comments
- Rename DATA_SPLIT_METHODS to _DATA_SPLIT_METHODS (private tuple pattern)
- Reference DATA_SPLIT_METHOD_DEFAULT from _DATA_SPLIT_METHODS[0]
- Remove 22 inline comments to match self-documenting codebase style
* fix: align TimeSeriesSplit weight calculation with FreqAI semantics
Calculate weights on combined train+test set before splitting to maintain
temporal weight continuity, matching FreqAI's make_train_test_datasets behavior.
* feat: add gap=0 warning and improve TimeSeriesSplit validation
- Warn when gap=0 about look-ahead bias risk (reference label_period_candles)
- Add _compute_timeseries_min_samples() for accurate minimum sample calculation
- Account for gap and test_size in minimum sample validation
- Improve error message with all relevant parameters
* style: harmonize error messages with codebase conventions
- Use 'Invalid {param} value {value!r}: {constraint}' pattern
- Align with existing validation error format (lines 718, 1145)
* style: add cached set accessor for data split methods
- Add _data_split_methods_set() with @staticmethod @lru_cache
- Use QuickAdapterRegressorV3 prefix for class attribute access
- Use cached set for O(1) membership check in validation
* fix: address PR review comments for TimeSeriesSplit
- Use dd consistently in training logs instead of dk.data_dictionary
- Use self.data_split_parameters consistently in _apply_pipelines
- Add explicit type coercion for n_splits, gap, max_train_size
- Add validation for gap >= 0 and max_train_size >= 1
- Improve test_size validation: float in (0,1) as fraction, int >= 1 as count
- Fix _compute_timeseries_min_samples formula: (n_splits+1)*test_size + n_splits*gap
- Optimize tscv.split() iteration to avoid unnecessary list materialization
* fix: correct min_samples formula to match sklearn validation
sklearn validates: n_samples - gap - (test_size * n_splits) > 0
Correct formula: test_size * n_splits + gap + 1
* feat: auto-calculate TimeSeriesSplit gap from label_period_candles
When gap=0 is configured, automatically set gap to label_period_candles
to prevent look-ahead bias from overlapping label windows. This ensures
temporal separation between train and test sets without requiring manual
configuration.
* fix: remove redundant time import shadowing module
* fix: correct min_samples formula for dynamic test_size and document test_size param
* docs: clarify test_size default per split method
* refactor: move DependencyException import to file header
* style: use class name for class constant access
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: use Python None instead of null in README
* docs: fix train_test_split description (sequential, not random)
* fix: use explicit None check for max_train_size validation
* docs: clarify timeseries_split as chronological split, not cross-validation
* refactor(quickadapter): shorten log prefixes and tailor empty test set error by split method
* refactor(quickadapter): use index pattern for timeseries_split method constant
Replace string literals with index access pattern following existing
codebase convention for _DATA_SPLIT_METHODS.
Also renames variables for semantic clarity:
- test_size_param -> test_size
- feat_dict -> feature_parameters
* refactor(quickadapter): use _TEST_SIZE constant instead of hardcoded 0.1
* chore(quickadapter): bump version to 3.11.2
* fix(quickadapter): restore test_size parameter in TimeSeriesSplit
The test_size variable from data_split_parameters was being
immediately overwritten by a type annotation line, making it
always None regardless of user configuration.