app_main: fix crash in OnSrpClientNotification on FailSafe revert#43
app_main: fix crash in OnSrpClientNotification on FailSafe revert#43
Conversation
Two bugs in srpServiceRemove(): 1. kFabricRemoved fires for uncommitted (FailSafe-reverted) pending fabric entries, not only for true "Remove Device" decommissions. The function was unconditionally clearing the SRP service even when a committed fabric still existed, wiping the operational advertisement during every failed re-commissioning attempt. Fix: check FabricTable for committed entries before clearing; return early if any remain. 2. otSrpClientRemoveHostAndServices() triggers the CHIP SDK's own OnSrpClientNotification callback (registered internally via otSrpClientSetCallback). That callback crashes at GenericThreadStackManagerImpl_OpenThread.hpp:1310 with MEPC=0x00000000 (null function pointer) when it receives a removed service that was never registered through the CHIP SDK's internal DNS-SD layer. The ESP32 DNS-SD layer uses esp_mdns, not SRP, so our manually registered _matter._tcp service has no matching internal callback record. Fix: switch to otSrpClientClearHostAndServices() which discards local SRP state immediately with no network message and no callback. The OTBR holds the record until its SRP lease expires, which is acceptable since the instance name is fabric-specific and a new commissioner always sees the new fabric's name. https://claude.ai/code/session_01SxqAiQApiRTwXefk9aQ6bR
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c5d8826e08
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ESP_LOGI(kTag, "SRP: withdrew _matter._tcp service '%s'", s_srp.instanceName); | ||
| // Clear local SRP state without sending a network unregister message. | ||
| // Avoids triggering OnSrpClientNotification with our non-SDK service. | ||
| otSrpClientClearHostAndServices(ot); |
There was a problem hiding this comment.
Preserve SRP host info when clearing SRP services
Replacing otSrpClientRemoveHostAndServices() with otSrpClientClearHostAndServices() clears all host info in OpenThread, not just the service list. In this file, the host name is only configured once during startup (otSrpClientSetHostName() in app_main), and it is never re-set after srpServiceRemove(). After a real decommission event, the next recommission attempt in the same uptime can add a service object but lacks host info for SRP registration, so operational _matter._tcp advertisement may never come back until reboot.
Useful? React with 👍 / 👎.
Two bugs in srpServiceRemove():
kFabricRemoved fires for uncommitted (FailSafe-reverted) pending fabric entries, not only for true "Remove Device" decommissions. The function was unconditionally clearing the SRP service even when a committed fabric still existed, wiping the operational advertisement during every failed re-commissioning attempt. Fix: check FabricTable for committed entries before clearing; return early if any remain.
otSrpClientRemoveHostAndServices() triggers the CHIP SDK's own OnSrpClientNotification callback (registered internally via otSrpClientSetCallback). That callback crashes at GenericThreadStackManagerImpl_OpenThread.hpp:1310 with MEPC=0x00000000 (null function pointer) when it receives a removed service that was never registered through the CHIP SDK's internal DNS-SD layer. The ESP32 DNS-SD layer uses esp_mdns, not SRP, so our manually registered _matter._tcp service has no matching internal callback record. Fix: switch to otSrpClientClearHostAndServices() which discards local SRP state immediately with no network message and no callback. The OTBR holds the record until its SRP lease expires, which is acceptable since the instance name is fabric-specific and a new commissioner always sees the new fabric's name.