Skip to content

app_main: fix crash in OnSrpClientNotification on FailSafe revert#43

Merged
GuvHas merged 1 commit intomainfrom
claude/review-esp32-thread-code-EsKdn
Mar 20, 2026
Merged

app_main: fix crash in OnSrpClientNotification on FailSafe revert#43
GuvHas merged 1 commit intomainfrom
claude/review-esp32-thread-code-EsKdn

Conversation

@GuvHas
Copy link
Copy Markdown
Owner

@GuvHas GuvHas commented Mar 19, 2026

Two bugs in srpServiceRemove():

  1. kFabricRemoved fires for uncommitted (FailSafe-reverted) pending fabric entries, not only for true "Remove Device" decommissions. The function was unconditionally clearing the SRP service even when a committed fabric still existed, wiping the operational advertisement during every failed re-commissioning attempt. Fix: check FabricTable for committed entries before clearing; return early if any remain.

  2. otSrpClientRemoveHostAndServices() triggers the CHIP SDK's own OnSrpClientNotification callback (registered internally via otSrpClientSetCallback). That callback crashes at GenericThreadStackManagerImpl_OpenThread.hpp:1310 with MEPC=0x00000000 (null function pointer) when it receives a removed service that was never registered through the CHIP SDK's internal DNS-SD layer. The ESP32 DNS-SD layer uses esp_mdns, not SRP, so our manually registered _matter._tcp service has no matching internal callback record. Fix: switch to otSrpClientClearHostAndServices() which discards local SRP state immediately with no network message and no callback. The OTBR holds the record until its SRP lease expires, which is acceptable since the instance name is fabric-specific and a new commissioner always sees the new fabric's name.

Two bugs in srpServiceRemove():

1. kFabricRemoved fires for uncommitted (FailSafe-reverted) pending fabric
   entries, not only for true "Remove Device" decommissions.  The function
   was unconditionally clearing the SRP service even when a committed fabric
   still existed, wiping the operational advertisement during every failed
   re-commissioning attempt.  Fix: check FabricTable for committed entries
   before clearing; return early if any remain.

2. otSrpClientRemoveHostAndServices() triggers the CHIP SDK's own
   OnSrpClientNotification callback (registered internally via
   otSrpClientSetCallback).  That callback crashes at
   GenericThreadStackManagerImpl_OpenThread.hpp:1310 with MEPC=0x00000000
   (null function pointer) when it receives a removed service that was never
   registered through the CHIP SDK's internal DNS-SD layer.  The ESP32
   DNS-SD layer uses esp_mdns, not SRP, so our manually registered
   _matter._tcp service has no matching internal callback record.
   Fix: switch to otSrpClientClearHostAndServices() which discards local
   SRP state immediately with no network message and no callback.  The OTBR
   holds the record until its SRP lease expires, which is acceptable since
   the instance name is fabric-specific and a new commissioner always sees
   the new fabric's name.

https://claude.ai/code/session_01SxqAiQApiRTwXefk9aQ6bR
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c5d8826e08

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ESP_LOGI(kTag, "SRP: withdrew _matter._tcp service '%s'", s_srp.instanceName);
// Clear local SRP state without sending a network unregister message.
// Avoids triggering OnSrpClientNotification with our non-SDK service.
otSrpClientClearHostAndServices(ot);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve SRP host info when clearing SRP services

Replacing otSrpClientRemoveHostAndServices() with otSrpClientClearHostAndServices() clears all host info in OpenThread, not just the service list. In this file, the host name is only configured once during startup (otSrpClientSetHostName() in app_main), and it is never re-set after srpServiceRemove(). After a real decommission event, the next recommission attempt in the same uptime can add a service object but lacks host info for SRP registration, so operational _matter._tcp advertisement may never come back until reboot.

Useful? React with 👍 / 👎.

@GuvHas GuvHas merged commit 68cebe4 into main Mar 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants