Codex inline review (P2) on PR #102 flagged that
`_optuna_journal_has_corrupt_tail` catches only
`json.JSONDecodeError` from `json.loads(last_line)`, but
`json.loads` raises `UnicodeDecodeError` (subclass of
`ValueError`, NOT of `JSONDecodeError`) when the trailing record
contains invalid UTF-8 bytes — a common crash pattern when
`fsync` is interrupted mid-multibyte. The exception escapes the
helper, propagates out of `optuna_create_storage` (the helper
runs BEFORE the recoverable try/except), reaches
`optuna_create_study`'s broad outer handler, and reproduces the
silent-HPO-skip symptom under a different corruption class.
Empirically reproduced:
>>> json.loads(b'\\xc3\\x28')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in
position 0: invalid continuation byte
Broaden the except clause from `json.JSONDecodeError` to
`ValueError` — the common parent of both `JSONDecodeError` and
`UnicodeDecodeError` — so the helper treats any
`json.loads`-unparseable trailing record as corruption and routes
it through the same quarantine path. Other `ValueError` subclasses
are not plausibly raised by `json.loads` on bytes input.
Reproducer at `/tmp/quickadapter-tests/test_optuna_journal_quarantine.py`
extended from 17 to 19 scenarios (Class C4 detection + end-to-end
quarantine on invalid UTF-8). All pass.
Bounded tail probe (last ``_OPTUNA_JOURNAL_TAIL_PROBE_BYTES``).
Return True iff the file is non-empty AND its trailing record
is (a) missing the newline, (b) empty (bare ``\\n``), or
- (c) malformed JSON. Fail-open when the probe window cuts a
- single line larger than the window; defer to the
+ (c) ``json.loads``-unparseable (malformed JSON, ``ValueError``,
+ or invalid UTF-8, ``UnicodeDecodeError`` — both are
+ ``ValueError`` subclasses). Fail-open when the probe window
+ cuts a single line larger than the window; defer to the
post-construction handler.
"""
if not journal_path.exists():
return True
try:
json.loads(last_line)
- except json.JSONDecodeError:
+ except ValueError:
return True
return False