PR #95 retained the historical column name `<label>_known_at_index` for what is now a per-row label lookahead in candles, to keep that hotfix strictly minimal. This PR converges the column suffix, the helper, the dataclass field, the static method, and the per-call-site locals onto `_known_at_lookahead`, with a retro-compat alias on the only externally-named public helper (`label_known_at_column_name = label_known_at_lookahead_column_name`).
The auxiliary `<label>_known_at_*` column is regenerated on every training run inside `set_freqai_targets`; FreqAI persists only the fitted model and `extra_returns_per_train`, never auxiliary dataframe columns -- the rename invalidates no on-disk artifact.
Reviewed by three parallel Oracle passes (math + claims-coherence; Python state-of-the-art + harmonization; documentation + terminology + PR-description coherence), each citing upstream evidence from `freqtrade/freqai/freqai_interface.py`, `data_kitchen.py`, and `data_drawer.py`. Consensus fixes were applied: README `causal_mode` formula symbol bound to the column token (`row-wise max(<label>_known_at_lookahead)`) to colocate definition with usage.
The two causal-guard local variable pairs were also harmonized to the local `train_<noun>` family (`train_known_at_lookahead`, `train_known_at_position`) used by the surrounding `_make_*_datasets` methods.
| freqai.label_pipeline.gamma | 1.0 | float (0,10] | Contrast exponent applied to labels after normalization: >1 emphasizes extrema, values between 0 and 1 soften. |
| _Feature parameters_ | | | |
| freqai.feature_parameters.label_period_candles | min/max midpoint | int >= 1 | Zigzag labeling NATR horizon. |
-| freqai.feature_parameters.label_horizon_candles | `label_period_candles` | int >= 1 | Number of candles after a label row before the label is considered known by causal split guards. Recommended: cover the zigzag pivot confirmation lag (the smoothing kernel half-width is added automatically by `set_freqai_targets`). Used by causal split guards and `<label>_known_at_index` metadata. When unset, falls back to `label_period_candles`. |
-| freqai.feature_parameters.causal_mode | true | bool | Causal split guard toggle. When `true` (default): rejects `data_split_parameters.shuffle=true`, `shuffle_after_split=true`, `reverse_train_test_order=true`; for `timeseries_split` auto-sets `gap=label_horizon_candles` when unset/`0` (rejects explicit `gap<label_horizon_candles`); for `train_test_split` drops train rows where position `>=first_test_position-label_horizon_candles`; with `<label>_known_at_index` columns (per-row label lookahead in candles), additionally drops rows where `local_position + row-wise max(lookahead) >= first_test_position`. `false` is deprecated; acausal baselines only. |
+| freqai.feature_parameters.label_horizon_candles | `label_period_candles` | int >= 1 | Number of candles after a label row before the label is considered known by causal split guards. Recommended: cover the zigzag pivot confirmation lag (the smoothing kernel half-width is added automatically by `set_freqai_targets`). Used by causal split guards and `<label>_known_at_lookahead` metadata. When unset, falls back to `label_period_candles`. |
+| freqai.feature_parameters.causal_mode | true | bool | Causal split guard toggle. When `true` (default): rejects `data_split_parameters.shuffle=true`, `shuffle_after_split=true`, `reverse_train_test_order=true`; for `timeseries_split` auto-sets `gap=label_horizon_candles` when unset/`0` (rejects explicit `gap<label_horizon_candles`); for `train_test_split` drops train rows where position `>=first_test_position-label_horizon_candles`; with `<label>_known_at_lookahead` columns, additionally drops rows where `local_position + row-wise max(<label>_known_at_lookahead) >= first_test_position`. `false` is deprecated; acausal baselines only. |
| freqai.feature_parameters.min_label_period_candles | 12 | int >= 1 | Minimum labeling NATR horizon used for reversals labeling HPO. |
| freqai.feature_parameters.max_label_period_candles | 24 | int >= 1 | Maximum labeling NATR horizon used for reversals labeling HPO. |
| freqai.feature_parameters.label_natr_multiplier | min/max midpoint | float > 0 | Zigzag labeling NATR multiplier. |
get_label_weighting_config,
get_min_max_label_period_candles,
get_optuna_study_model_parameters,
- label_known_at_column_name,
+ label_known_at_lookahead_column_name,
label_weight_column_name,
migrate_config,
optuna_load_best_params,
return
_KNOWN_AT_NONE_LOGGED.add(key)
logger.info(
- f"[{pair}] {context}: no <label>_known_at_index column present; "
+ f"[{pair}] {context}: no <label>_known_at_lookahead column present; "
"causal guards use position-based purge only (label-aware filtering disabled)"
)
return positions.loc[filtered_dataframe.index]
@staticmethod
- def _known_at_index(
+ def _known_at_lookahead(
filtered_dataframe: pd.DataFrame,
unfiltered_df: pd.DataFrame,
) -> pd.Series | None:
"""Per-row label lookahead (in candles) across all registered labels.
- See ``LabelData.known_at_index`` for the lookahead-vs-position
+ See ``LabelData.known_at_lookahead`` for the lookahead-vs-position
contract and the slice-invariance rationale; callers must add the
row's LOCAL position in ``unfiltered_df`` to recover the local
index at which the label becomes causally available.
- Row-wise ``max`` of every present ``<label>_known_at_index``
+ Row-wise ``max`` of every present ``<label>_known_at_lookahead``
column; labels with a missing column or any NaN are skipped
silently (opt-in by emission). Returns ``None`` when no label is
usable; callers then fall back to the position-based purge.
)
series_list: list[pd.Series] = []
for label_col in LABEL_COLUMNS:
- known_at_col = label_known_at_column_name(label_col)
- if known_at_col not in unfiltered_df.columns:
+ known_at_lookahead_col = label_known_at_lookahead_column_name(label_col)
+ if known_at_lookahead_col not in unfiltered_df.columns:
continue
- known_at = unfiltered_df.loc[filtered_dataframe.index, known_at_col]
- if known_at.isna().any():
+ lookahead = unfiltered_df.loc[
+ filtered_dataframe.index, known_at_lookahead_col
+ ]
+ if lookahead.isna().any():
continue
- series_list.append(pd.to_numeric(known_at, errors="raise"))
+ series_list.append(pd.to_numeric(lookahead, errors="raise"))
if not series_list:
return None
if len(series_list) == 1:
train_positions.to_numpy(dtype=np.int64)
< first_test_position - label_horizon_candles
)
- known_at_index = QuickAdapterRegressorV3._known_at_index(
+ known_at_lookahead = QuickAdapterRegressorV3._known_at_lookahead(
features, unfiltered_df
)
- if known_at_index is not None:
- known_at_train_delta = known_at_index.loc[train_features.index]
- known_at_train_position = (
+ if known_at_lookahead is not None:
+ train_known_at_lookahead = known_at_lookahead.loc[
+ train_features.index
+ ]
+ train_known_at_position = (
train_positions.to_numpy(dtype=np.int64)
- + known_at_train_delta.to_numpy(dtype=np.int64)
+ + train_known_at_lookahead.to_numpy(dtype=np.int64)
)
- keep_mask &= known_at_train_position < first_test_position
+ keep_mask &= train_known_at_position < first_test_position
else:
_log_known_at_none_once(dk.pair, "train_test_split causal guard")
(
)
first_test_position = int(row_positions.iloc[test_idx].min())
train_positions = row_positions.iloc[train_idx]
- known_at_index = QuickAdapterRegressorV3._known_at_index(
+ known_at_lookahead = QuickAdapterRegressorV3._known_at_lookahead(
filtered_dataframe, unfiltered_df
)
- if known_at_index is not None:
- known_at_train_delta = known_at_index.iloc[train_idx]
- known_at_train_position = (
+ if known_at_lookahead is not None:
+ train_known_at_lookahead = known_at_lookahead.iloc[train_idx]
+ train_known_at_position = (
train_positions.to_numpy(dtype=np.int64)
- + known_at_train_delta.to_numpy(dtype=np.int64)
+ + train_known_at_lookahead.to_numpy(dtype=np.int64)
)
- keep_mask = known_at_train_position < first_test_position
+ keep_mask = train_known_at_position < first_test_position
(
train_features,
train_labels,
get_label_smoothing_config,
get_label_weighting_config,
get_zl_ma_fn,
- label_known_at_column_name,
+ label_known_at_lookahead_column_name,
label_weight_column_name,
migrate_config,
nan_average,
dataframe[label_col] = label_data.series
- if label_data.known_at_index is not None:
- dataframe[label_known_at_column_name(label_col)] = (
- label_data.known_at_index
+ if label_data.known_at_lookahead is not None:
+ dataframe[label_known_at_lookahead_column_name(label_col)] = (
+ label_data.known_at_lookahead
)
label_weight_col = label_weight_column_name(label_col)
# Zero-phase smoothing reads future candles within the kernel
# half-width; extend the per-row label lookahead so causal
# split guards account for the smoothing lookahead.
- known_at_column = label_known_at_column_name(label_col)
- if known_at_column in dataframe.columns:
+ known_at_lookahead_column = label_known_at_lookahead_column_name(label_col)
+ if known_at_lookahead_column in dataframe.columns:
kernel_half_width = get_smoothing_kernel_half_width(
col_smoothing_config, series_length=series_length
)
if kernel_half_width > 0:
- dataframe[known_at_column] = (
- dataframe[known_at_column] + kernel_half_width
+ dataframe[known_at_lookahead_column] = (
+ dataframe[known_at_lookahead_column] + kernel_half_width
)
if label_col == EXTREMA_COLUMN:
EXTREMA_DIRECTION_SMOOTHED_COLUMN: Final[str] = "extrema_direction_smoothed"
EXTREMA_WEIGHT_COLUMN: Final[str] = "extrema_weight"
EXTREMA_WEIGHT_SMOOTHED_COLUMN: Final[str] = "extrema_weight_smoothed"
-# Suffix is historical; stored values are per-row label lookaheads
-# (in candles), not absolute indexes. See ``LabelData.known_at_index``.
-_LABEL_KNOWN_AT_SUFFIX: Final[str] = "_known_at_index"
+_LABEL_KNOWN_AT_LOOKAHEAD_SUFFIX: Final[str] = "_known_at_lookahead"
LABEL_WEIGHT_SUFFIX: Final[str] = "_weight"
Examples:
``("&s-extrema", "_weight")`` -> ``"s-extrema_weight"``
``("&-amplitude", "_weight")`` -> ``"amplitude_weight"``
- ``("&s-extrema", "_known_at_index")`` -> ``"s-extrema_known_at_index"``
+ ``("&s-extrema", "_known_at_lookahead")`` -> ``"s-extrema_known_at_lookahead"``
"""
stripped = _FREQAI_LABEL_SIGIL_PATTERN.sub("", label_col, count=1)
if not stripped or not any(c.isalpha() for c in stripped):
return _label_aux_column_name(label_col, LABEL_WEIGHT_SUFFIX)
-def label_known_at_column_name(label_col: str) -> str:
- """Return the per-row label-lookahead column name for a label column.
+def label_known_at_lookahead_column_name(label_col: str) -> str:
+ """Return the lookahead column name for ``label_col`` (see ``LabelData.known_at_lookahead``)."""
+ return _label_aux_column_name(label_col, _LABEL_KNOWN_AT_LOOKAHEAD_SUFFIX)
- Column values are lookaheads in candles, not absolute positions; see
- ``LabelData.known_at_index``.
- """
- return _label_aux_column_name(label_col, _LABEL_KNOWN_AT_SUFFIX)
+
+label_known_at_column_name = label_known_at_lookahead_column_name
@dataclass
series: per-row label values aligned to ``dataframe.index``.
indices: positions of detected pivots in ``series``.
metrics: per-pivot metric lists (parallel to ``indices``).
- known_at_index: optional per-row label lookahead in candles
+ known_at_lookahead: optional per-row label lookahead in candles
(NOT an absolute position). Invariant under
``dk.slice_dataframe``. Causal split guards recover the
local availability position as ``row_local_position +
- known_at_index[row]``. ``None`` opts the label out of
+ known_at_lookahead[row]``. ``None`` opts the label out of
label-aware causal filtering.
"""
series: pd.Series
indices: list[int]
metrics: dict[str, list[float]]
- known_at_index: pd.Series | None = None
+ known_at_lookahead: pd.Series | None = None
LabelGenerator = Callable[[pd.DataFrame, dict[str, Any], Logger | None], LabelData]
# freqtrade's ``dk.slice_dataframe`` runs AFTER ``set_freqai_targets``,
# so any pre-slice absolute position would no longer match the causal
# guard's local ``np.arange(len(unfiltered_df))`` coordinate system.
- known_at_index = pd.Series(
+ known_at_lookahead = pd.Series(
int(label_horizon_candles),
index=dataframe.index,
dtype=np.int64,
series=series,
indices=pivots_indices,
metrics=metrics,
- known_at_index=known_at_index,
+ known_at_lookahead=known_at_lookahead,
)
) -> int:
"""Half-width (in candles) of the smoothing kernel's lookahead.
- Equals the lookahead applied to ``known_at_index`` after smoothing.
+ Equals the lookahead applied to ``known_at_lookahead`` after smoothing.
Mirrors ``smooth()`` window normalization and short-series gating
via shared primitives (``get_odd_window``, ``get_even_window``,
``get_savgol_params``).