app_main: fix commissioning failure — SRP record not updated after ad… by GuvHas · Pull Request #44 · GuvHas/tr

GuvHas · 2026-03-20T07:40:26Z

During commissioning the controller's "Reconnect" step (step 18) needs to resolve the device's NEW operational address via mDNS before it can open a CASE session. The mDNS record is sourced from the OTBR's SRP proxy.

Root cause: trySrpServiceAdd() ran when Thread first attached (via the onThreadStateChanged callback) and built the SRP instance name from the FIRST fabric in the table at that time (either an old committed fabric from a previous session, or nothing if no fabric existed yet). When addNoc subsequently committed a new fabric with a different CompressedFabricId / NodeId, the OTBR SRP record still carried the old instance name. DNS-SD resolution for the new -._matter._tcp name timed out (~35 s) and commissioning failed with "something went wrong".

Fix: add a FabricTable::Delegate (SrpFabricDelegate) that overrides OnFabricCommitted. It fires synchronously on the CHIP/OpenThread task immediately after addNoc persists the new fabric, giving enough time to update the SRP record before the CASE reconnect DNS-SD lookup is issued.

The delegate:

Clears any stale SRP service (otSrpClientClearHostAndServices — avoids the OnSrpClientNotification crash fixed in the previous commit).
Calls trySrpServiceAdd() with the specific newly committed FabricIndex so FindFabricWithIndex() picks the right fabric even when an older fabric still occupies index 1.

trySrpServiceAdd() gains an optional preferIndex parameter for this path; the onThreadStateChanged / startup paths continue to use the first-fabric fallback as before.

The delegate is registered in the startup ScheduleWork after chip::Server is initialised.

…dNoc During commissioning the controller's "Reconnect" step (step 18) needs to resolve the device's NEW operational address via mDNS before it can open a CASE session. The mDNS record is sourced from the OTBR's SRP proxy. Root cause: trySrpServiceAdd() ran when Thread first attached (via the onThreadStateChanged callback) and built the SRP instance name from the FIRST fabric in the table at that time (either an old committed fabric from a previous session, or nothing if no fabric existed yet). When addNoc subsequently committed a new fabric with a different CompressedFabricId / NodeId, the OTBR SRP record still carried the old instance name. DNS-SD resolution for the new <CFID>-<NodeId>._matter._tcp name timed out (~35 s) and commissioning failed with "something went wrong". Fix: add a FabricTable::Delegate (SrpFabricDelegate) that overrides OnFabricCommitted. It fires synchronously on the CHIP/OpenThread task immediately after addNoc persists the new fabric, giving enough time to update the SRP record before the CASE reconnect DNS-SD lookup is issued. The delegate: 1. Clears any stale SRP service (otSrpClientClearHostAndServices — avoids the OnSrpClientNotification crash fixed in the previous commit). 2. Calls trySrpServiceAdd() with the specific newly committed FabricIndex so FindFabricWithIndex() picks the right fabric even when an older fabric still occupies index 1. trySrpServiceAdd() gains an optional preferIndex parameter for this path; the onThreadStateChanged / startup paths continue to use the first-fabric fallback as before. The delegate is registered in the startup ScheduleWork after chip::Server is initialised. https://claude.ai/code/session_01SxqAiQApiRTwXefk9aQ6bR

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 22a77f48a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-20T07:43:45Z

main/app_main.cpp

+        // Use Clear (not RemoveHostAndServices) — see srpServiceRemove() for why.
+        if (s_srp.added) {
+            s_srp.added = false;
+            otSrpClientClearHostAndServices(ot);


Reapply SRP host config after clearing host/services

In SrpFabricDelegate::OnFabricCommitted, calling otSrpClientClearHostAndServices() drops the SRP host metadata along with services, but the follow-up path only calls trySrpServiceAdd() (which just adds a service). That means recommissioning cases where s_srp.added is already true can lose the hostname/host-address state set at startup and fail to publish the new operational _matter._tcp instance, so DNS-SD/CASE reconnect can still time out.

Useful? React with 👍 / 👎.

otSrpClientClearHostAndServices() resets the SRP client's host state (hostname pointer → null, auto-host-address mode disabled) in addition to removing services. The previous commit called it in SrpFabricDelegate::OnFabricCommitted but then only called trySrpServiceAdd(), which goes straight to otSrpClientAddService(). Without a configured host, the SRP client rejects AddService or silently fails to send an update, so the new <CFID>-<NodeId>._matter._tcp record never reaches OTBR and CASE reconnect still times out. Fix: - Extract setupSrpHost(otInstance*): sets hostname and re-enables auto-host-address mode using the buffer in s_srp. - Move the hostname buffer from a lambda-local static into SrpCtx (s_srp.hostname) so setupSrpHost() and the delegate share one buffer. - Call setupSrpHost() in OnFabricCommitted immediately after Clear and before trySrpServiceAdd(). - Startup lambda uses setupSrpHost() too, eliminating the duplicate snprintf/SetHostName/EnableAutoHostAddress block. https://claude.ai/code/session_01SxqAiQApiRTwXefk9aQ6bR

chatgpt-codex-connector bot reviewed Mar 20, 2026

View reviewed changes

GuvHas closed this Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

app_main: fix commissioning failure — SRP record not updated after ad…#44

app_main: fix commissioning failure — SRP record not updated after ad…#44
GuvHas wants to merge 2 commits intomainfrom
claude/review-esp32-thread-code-EsKdn

GuvHas commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

GuvHas commented Mar 20, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants