Make recovery_target_time reloadable without server restart#22
Make recovery_target_time reloadable without server restart#22
Conversation
Detailed plan covering GUC context change from PGC_POSTMASTER to PGC_SIGHUP, fixing assign hook error-throwing, re-parsing timestamps on reload, handling paused-recovery resume, backward target changes, and test strategy. https://claude.ai/code/session_01MWrPG3xKvaiEEXmrCrTDLE
Key changes: split into two-patch series, fixed unsafe DirectFunctionCall3 in check hook, replaced goto with outer while loop, added explicit RecoveryPauseReason state, check raw GUC strings instead of derived enum, simplified backward-target semantics, added pg_control/crash-recovery and recovery_min_apply_delay analysis, expanded test plan to 14 cases. https://claude.ai/code/session_01MWrPG3xKvaiEEXmrCrTDLE
This patch enables changing recovery_target_time through pg_reload_conf() without requiring a full server restart, improving operational flexibility for standby servers during point-in-time recovery (PITR). Changes: - Change recovery_target_time GUC context from PGC_POSTMASTER to PGC_SIGHUP - Move mutual-exclusion validation from assign hook to check hook, using raw GUC string inspection to avoid order-dependent intermediate state during SIGHUP processing - Add parse_recovery_target_time_safe() shared helper for safe timestamp parsing without ereport(ERROR) in check hooks - Simplify assign hook to be purely mechanical (no error throwing) - Replace DirectFunctionCall3(timestamptz_in) in validateRecoveryParameters() with the shared safe parser - Add RecoveryPauseReason enum to distinguish "paused at target" from "paused via pg_wal_replay_pause()" - Add target-change detection in recoveryPausesHere(): when recovery_target_time advances past the paused position, recovery automatically resumes toward the new target - Add goto redo mechanism in PerformWalRecovery() to re-enter the replay loop when the target is advanced during a pause - Ensure pg_wal_replay_resume() still proceeds to promotion (not re-enter replay) by clearing the pause reason on manual resume The feature is scoped to recovery_target_time with recovery_target_action='pause'. Other recovery_target_* parameters remain PGC_POSTMASTER. Changing target type still requires restart. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add comprehensive TAP test (053_recovery_target_time_reload.pl) covering: - Pause at target T1, advance to T2 via reload, verify resume and re-pause - Reload with same target time (no-op verification) - Reload with earlier target time (no-op verification) - pg_wal_replay_resume() proceeds to promotion, not re-enter replay - Mutual exclusion of recovery target types at startup Update config.sgml documentation to reflect that recovery_target_time is now reloadable via SIGHUP, including timezone re-parsing semantics and the automatic resume behavior when target is advanced during pause. Update implementation plan with completion status checkboxes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implementation Complete — All Tests Pass ✅What was implementedCommit 1: Core implementation (
Commit 2: Tests and docs (
Files changed
Docker test evidenceBuilt from source and ran TAP tests in Test scenarios verified
Design decisions (per spec v2)
🤖 Generated with Claude Code |
Prior Art Research: No Previous Attempts FoundSearched PostgreSQL mailing lists (pgsql-hackers, pgsql-general), commitfest, and community resources. No prior patch or proposal exists for making Related history
ConclusionStrong precedent exists for loosening recovery GUCs case-by-case. The original conservatism was explicitly framed as temporary. This patch is the first to address the 🤖 Generated with Claude Code |
…line) Make these additional parameters reloadable via SIGHUP without restart: - recovery_target (immediate) - recovery_target_lsn - recovery_target_xid - recovery_target_name - recovery_target_inclusive - recovery_target_action recovery_target_timeline intentionally stays PGC_POSTMASTER as changing timeline mid-recovery is unsafe. Changes: - Change GUC context from PGC_POSTMASTER to PGC_SIGHUP for 6 parameters - Add target_type_conflict_exists() checks to all check hooks - Remove error_multiple_recovery_targets() (no longer needed) - Simplify all assign hooks to be purely mechanical - Generalize recoveryPausesHere() to detect target changes for all types: TIME (forward), LSN (forward), XID (any change), NAME (any change) - Handle recovery_target_action change during pause (pause→promote triggers promotion, pause→shutdown triggers shutdown) - Extend TAP test from 9 to 14 assertions covering LSN target advance, action change, and inclusive toggle scenarios - Fix timeline contamination in tests by using recovery_target_timeline = 'current' for standbys created after earlier promotions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 Complete: All recovery_target_* Parameters Now Reloadable ✅Scope expandedIn addition to
Key implementation details
Docker test evidence14 assertions across 8 test scenarios:
Prior artNo previous attempt to make any 🤖 Generated with Claude Code |
Summary
This change makes the
recovery_target_timeparameter reloadable viaSIGHUPwithout requiring a full server restart, improving operational flexibility for standby/replica servers during point-in-time recovery operations.Key Changes
GUC Context Change: Changed
recovery_target_timefromPGC_POSTMASTERtoPGC_SIGHUPcontext, allowing dynamic reload viapg_reload_conf()orSIGHUPsignalValidation Hook Refactoring: Moved mutual-exclusion validation from the assign hook (which cannot safely throw errors in
PGC_SIGHUPcontext) to the check hook, which can safely reject invalid valuesDynamic Timestamp Parsing: Modified the assign hook to re-parse
recovery_target_time_stringintorecoveryTargetTimeon each reload, ensuring the parsed timestamp is always in sync with the GUC valueResume-on-Change Logic: Added mechanism to automatically resume recovery when
recovery_target_timeis changed to a later value while recovery is paused at the previous target, eliminating the need for manual interventionBackward Target Protection: Added validation to prevent setting
recovery_target_timeto an earlier value than already-replayed WAL during active recoveryDocumentation & Tests: Updated configuration documentation and added comprehensive TAP tests covering reload scenarios, edge cases, and interaction with
recovery_target_actionImplementation Details
The solution maintains backward compatibility by:
recovery_target_*parameters asPGC_POSTMASTER(preventing target type switches without restart)The most significant architectural change is the resume-on-change logic in the pause handler, which detects when the recovery target has been updated to a point beyond the current replay position and automatically resumes replay to reach the new target.
https://claude.ai/code/session_01MWrPG3xKvaiEEXmrCrTDLE