dax: use atomic flags to avoid race conditions by checkupup · Pull Request #10580 · thesofproject/sof

checkupup · 2026-02-26T06:07:37Z

In DP domain, sof_dax_process runs in a DP thread, other sof_dax_* functions run in a different LL thread. Using atomic flags instead of original dax_ctx->update_flags to avoid potential race conditions when updating flags.

Since dax_inf.h should not have any dependencies on SOF, we define the new atomic flags in dax.c

sofci · 2026-02-26T06:08:21Z

Can one of the admins verify this patch?

reply test this please to run this test once

checkupup · 2026-02-26T06:10:07Z

Hello @johnylin76 @lgirdwood , let us continue our discussion on race condition issue here.

wjablon1 · 2026-02-26T11:22:45Z

src/audio/module_adapter/module/dolby/dax.c

+	DAX_FLAG_READ_AND_CLEAR,
+};
+
+static int32_t flag_process(struct dax_adapter_data *adapter_data,


I think zephyr already has an implementation of atomic mask (e.g. atomic_test_and_set_bit). Did you consider it?

Yes, Zephyr provides a wealth of compelling bitwise operations functions, but for detailed reasons, please refer to #10580 (comment)

wjablon1 · 2026-02-26T11:26:22Z

src/audio/module_adapter/module/dolby/dax.c

+{
+	switch (opt_mode) {
+	case DAX_FLAG_READ:
+		return atomic_read(&adapter_data->proc_flags[ffs(flag) - 1]);


You could just define indexes of individual bits instead of masks:
define DAX_DEVICE_MASK 0x4 -> define DAX_DEVICE_IDX 2
This would eliminate ffs

I prefer to preserve the bitwise operation characteristics of masks, which allows me to:

Retain the original code logic as much as possible

Conveniently pass multiple masks within the same variable when needed, e.g., masks = (DAX_DEVICE_MASK | DAX_PROFILE_MASK)

Easily modify the code to use atomic_or and atomic_and operations if we fully transition to the Zephyr environment in the future

src/audio/module_adapter/module/dolby/dax.c

lgirdwood · 2026-02-26T13:43:44Z

src/audio/module_adapter/module/dolby/dax.c


+struct dax_adapter_data {
+	struct sof_dax dax_ctx;
+	atomic_t proc_flags[32];


@checkupup are you able to provide a little more context, as I can now see an array of data instead of a global flag. My understanding was that we would only have 2 instances and had to protect a single shared resource between both DP instances ?
i.e. we can use https://docs.zephyrproject.org/latest/doxygen/html/group__atomic__apis.html#gaef275bd78e093e5b5177ef725d0f570a

bool atomic_cas(atomic_t *target, atomic_val_t old_value, atomic_val_t new_value); /* is resource mine ? */ owner_id = atomic_get(atomic_var); if (owner_id == MY_INSTANCE_ID_VALUE) { // I can freely use the resource } else { /* i dont own it, try and get resource */ is_resource_mine = atomic_cas(atomic_var, FREE_VALUE, MY_INSTANCE_ID_VALUE); if (!is_resource_mine) yield(); // try again when we are next scheduled } // I can use resource at this point ... /* put resource */ is_resource_mine = atomic_cas(atomic_var, MY_INSTANCE_ID_VALUE, FREE_VALUE); if (!is_resource_mine) // should never fail - handle.

where each instance would have a unique id for the atomic variable.

This change does not involve modifications to shared instances; it aims to prevent race condition on dax_ctx->update_flags.

For example, when executing dax_ctx->update_flags |= DAX_PROCESSING_MASK; in sof_dax_process, the DP thread may be preempted by the LL thread. At this point, the LL thread executes the sof_dax_reset operation and compares dax_ctx->update_flags & DAX_PROCESSING_MASK. At this point, the previous |= operation may not have completed its assignment. The result of dax_ctx->update_flags & DAX_PROCESSING_MASK in LL thread could be false, causing the instance resource to be released. When the DP thread resumes execution, it completes the assignment of DAX_PROCESSING_MASK and continues calling dax_process to handle data. Since the instance resource has already been released, this call will result in undefined behavior. The same issue may also occur in check_and_update_settings and check_and_update_state.

As mentioned earlier, once sof_dax_prepare, sof_dax_set_configuration, and sof_dax_reset are moved to userspace, this race protection is no longer needed since they run on the same thread as sof_dax_process. The main branch has already merged the userspace changes, but to align with the ptl-*-drop-stable branch, this change still needs to be merged first.

@checkupup can it be that you're testing with a pre-merge version of SOF userspace DP support? In "main" .reset() should only be called from the same DP thread:

sof/src/schedule/zephyr_dp_schedule_application.c

Line 138 in a32d983

flat->ret = ops->reset(pmod);

checkupup · 2026-02-27T03:09:15Z

@lgirdwood @wjablon1 Zephyr supports numerous compelling operations such as atomic_or and atomic_and, theoretically requiring only a single atomic_t variable. Previous discussions indicated that subsequent SOF updates would be built upon Zephyr. However, legacy platforms based on XTOS and the x86 GCC version within the testbench suite involve non-Zephyr runtime environments. Therefore, to ensure broad compatibility, I avoid adopting Zephyr operations entirely.

checkupup · 2026-03-04T07:07:01Z

Currently, the flag_process function only supports processing a single mask at a time. If Zephyr's atomic functions are used, atomic_t proc_flags[32]; can be changed to atomic_t proc_flags;. As @wjablon1 mentioned, for non-Zephyr systems, an equivalent atomic interface function can be defined. However, how to implement an equivalent atomic_or interface may remain to be determined.

lgirdwood · 2026-03-04T12:23:05Z

Currently, the flag_process function only supports processing a single mask at a time. If Zephyr's atomic functions are used, atomic_t proc_flags[32]; can be changed to atomic_t proc_flags;. As @wjablon1 mentioned, for non-Zephyr systems, an equivalent atomic interface function can be defined. However, how to implement an equivalent atomic_or interface may remain to be determined.

The reason for the ask to atomic ops is that there is a cost around mutex and spinlocks when built as a userspace module, i.e. there needs to be a context switch between user->kernel and back again. However, the atomic API is ISA non privileged so there is no cost between user/kernel and its cheaper than a mutex.

Please see below, it look like we should add this into Zephyr.

#include <zephyr/types.h>
#include <zephyr/toolchain.h>

/**
 * @brief Atomic bitwise OR for Xtensa
 * * @param target Pointer to the atomic variable
 * @param value  The mask to OR with the target
 * @return The value of the target BEFORE the OR operation
 */
static ALWAYS_INLINE atomic_val_t xtensa_atomic_or(atomic_t *target, atomic_val_t value)
{
    atomic_val_t old_val, new_val;
    int success;

    __asm__ volatile (
        "1:     l32i    %0, %2, 0          \n\t" // Load current value from target
        "       or      %1, %0, %3         \n\t" // new_val = old_val | value
        "       wsr     %0, scompare1      \n\t" // Set SCOMPARE1 for the comparison
        "       s32c1i  %1, %2, 0          \n\t" // Conditionally store new_val if target == SCOMPARE1
        "       bne     %1, %0, 1b         \n\t" // If store failed (value changed), retry loop
        : "=&a" (old_val), "=&a" (new_val)
        : "a" (target), "a" (value)
        : "memory"
    );

    return old_val;
}

johnylin76 · 2026-03-04T12:26:15Z

@lgirdwood @wjablon1 Zephyr supports numerous compelling operations such as atomic_or and atomic_and, theoretically requiring only a single atomic_t variable. Previous discussions indicated that subsequent SOF updates would be built upon Zephyr. However, legacy platforms based on XTOS and the x86 GCC version within the testbench suite involve non-Zephyr runtime environments. Therefore, to ensure broad compatibility, I avoid adopting Zephyr operations entirely.

I would not expect the compatibility strictly as yours due to the fact that we only consider the DP mode is available on Zephyr, not on XTOS. The scheduling strategies are different. IIUC, the DP mode (the multi-threading scheme to be precise) cannot be supported on XTOS, hence the scheduling strategies are different from the basis.

Take GoogleRTCAudioProc (a.k.a. AEC) as an example. The pain point is that requires 10ms processing window internally, so DP mode is used on Zephyr-OS platform. As for the XTOS solution, we have to elongate the entire pipeline period to 10ms to make AEC work. There is no synchronous problem since all module APIs are synchronous.

checkupup · 2026-03-05T06:26:08Z

Thank you all. From the above discussion, I understand that:

Only a single atomic_t variable is required
Use Zephyr bit manipulation functions
Synchronization issues need not be considered for non-Zephyr systems

Below is an implementation example. Could you provide a quick review?

struct dax_adapter_data {
	struct sof_dax dax_ctx;
	atomic_t proc_flags;
};

static int32_t flag_process(struct dax_adapter_data *adapter_data,
			    uint32_t flag,
			    enum dax_flag_opt_mode opt_mode)
{
#ifdef __ZEPHYR__
	int32_t bit = ffs(flag) - 1;

	switch (opt_mode) {
	case DAX_FLAG_READ:
		return atomic_test_bit(&adapter_data->proc_flags, bit);
	case DAX_FLAG_SET:
		atomic_set_bit(&adapter_data->proc_flags, bit);
		break;
	case DAX_FLAG_CLEAR:
		atomic_clear_bit(&adapter_data->proc_flags, bit);
		break;
	case DAX_FLAG_READ_AND_CLEAR:
		return atomic_test_and_clear_bit(&adapter_data->proc_flags, bit);
	default:
		break;
	}
#else
	/* Non-Zephyr builds run single-threaded (no DP mode), there is no synchronous problem */
	int32_t old_flags = atomic_read(&adapter_data->proc_flags);

	switch (opt_mode) {
	case DAX_FLAG_READ:
		return (old_flags & flag) != 0;
	case DAX_FLAG_SET:
		atomic_set(&adapter_data->proc_flags, old_flags | flag);
		break;
	case DAX_FLAG_CLEAR:
		atomic_set(&adapter_data->proc_flags, old_flags & ~flag);
		break;
	case DAX_FLAG_READ_AND_CLEAR:
		atomic_set(&adapter_data->proc_flags, old_flags & ~flag);
		return (old_flags & flag) != 0;
	default:
		break;
	}
#endif
	return 0;
}

johnylin76 · 2026-03-05T07:00:49Z

Thank you all. From the above discussion, I understand that:

Only a single atomic_t variable is required

Use Zephyr bit manipulation functions

Synchronization issues need not be considered for non-Zephyr systems

The new design looks good to me. Let's see if @lgirdwood and @wjablon1 have comments

In DP domain, sof_dax_process runs in a DP thread, other sof_dax_* functions run in a different LL thread. Using atomic flags instead of original dax_ctx->update_flags to avoid potential race conditions when updating flags. Since dax_inf.h should not have any dependencies on SOF, we define the new atomic flags in dax.c Signed-off-by: Jun Lai <jun.lai@dolby.com>

wjablon1 · 2026-03-05T09:55:45Z

Looks better. Personally I would just add non-Zephyr counterparts of atomic_*_bit (may come in handy in other modules)... But won't argue about this. Ultimately @lgirdwood or other maintainers should decide.

EDIT:

Also you can just eliminate atomics for non-Zephyr build if there is no synchronization problem. Something like:

struct dax_adapter_data {
struct sof_dax dax_ctx;
#ifdef ZEPHYR
atomic_t proc_flags;
#else
uint32_t proc_flags;
#endif
};

And use regular bitwise ops later on.

lyakh · 2026-03-05T11:48:52Z

src/audio/module_adapter/module/dolby/dax.c


+struct dax_adapter_data {
+	struct sof_dax dax_ctx;
+	atomic_t proc_flags[32];


@checkupup can it be that you're testing with a pre-merge version of SOF userspace DP support? In "main" .reset() should only be called from the same DP thread:

sof/src/schedule/zephyr_dp_schedule_application.c

Line 138 in a32d983

flat->ret = ops->reset(pmod);

lgirdwood · 2026-03-05T13:19:27Z

Thank you all. From the above discussion, I understand that:

Only a single atomic_t variable is required

Use Zephyr bit manipulation functions

Synchronization issues need not be considered for non-Zephyr systems

The new design looks good to me. Let's see if @lgirdwood and @wjablon1 have comments

LGTM, pls proceed @checkupup

checkupup · 2026-03-06T02:57:06Z

@checkupup can it be that you're testing with a pre-merge version of SOF userspace DP support? In "main" .reset() should only be called from the same DP thread:

@lyakh I have verified the status both before and after the merge of 5f4e724 on the PTL platform and have not found anomalies at this time.

lyakh · 2026-03-06T07:24:03Z

@checkupup can it be that you're testing with a pre-merge version of SOF userspace DP support? In "main" .reset() should only be called from the same DP thread:

@lyakh I have verified the status both before and after the merge of 5f4e724 on the PTL platform and have not found anomalies at this time.

@checkupup so this PR isn't needed any more and can be closed?

checkupup · 2026-03-06T07:38:35Z

@checkupup so this PR isn't needed any more and can be closed?

no, see the comment in #10526 by @johnylin76 :

Although the task scheduler in user space sounds phenomenal, I am afraid that is not supposed to be the solution for the project FSI (should be based on ptl-007-drop-stable. Note the image update after launch will still consider moving to the newer drop-stable branches.

This change need to be merged to main first.

lyakh · 2026-03-06T10:50:15Z

third_party/include/dax_inf.h

 	int32_t content_processing_enable;
 	int32_t volume;
-	uint32_t update_flags;
+	uint32_t update_flags; /* Deprecated */


is it still needed or has this commit removed all the users of this flag and it can simply be removed now?

lyakh · 2026-03-06T10:55:31Z

src/audio/module_adapter/module/dolby/dax.c

-			dax_ctx->update_flags |= DAX_DEVICE_MASK;
-			dax_ctx->update_flags |= DAX_VOLUME_MASK;
+			flag_process(adapter_data, DAX_DEVICE_MASK, DAX_FLAG_SET);
+			flag_process(adapter_data, DAX_VOLUME_MASK, DAX_FLAG_SET);


you have many locations now with if (flag_process(...)) {...; flag_proccess(...);...}. You do realise, that between these two calls to flag_process() the flag isn't protected, so it can be changed? This might or might not be a problem in your algorithm, but just as I mentioned before - atomic variables are likely not the best way to protect data against races. It should be possible, but often it's difficult.

lyakh · 2026-03-06T11:00:30Z

src/audio/module_adapter/module/dolby/dax.c

+			    uint32_t flag,
+			    enum dax_flag_opt_mode opt_mode)
+{
+#ifdef __ZEPHYR__


I suppose this is mostly for testbench, right?

lyakh · 2026-03-06T11:13:05Z

no, see the comment in #10526 by @johnylin76 :

@checkupup ah, /me slowly returning from vacation... You mean DP builds for non-PTL platforms, where userspace isn't used and calling module methods is performed differently. Ok, let me remove the request for change then.

The race, that this PR is addressing, is with non-PTL and therefore non-userspace builds.

checkupup requested review from dbaluta, kv2019i, lbetlej, lgirdwood, mmaka1 and plbossart as code owners February 26, 2026 06:07

wjablon1 reviewed Feb 26, 2026

View reviewed changes

src/audio/module_adapter/module/dolby/dax.c Outdated Show resolved Hide resolved

lgirdwood reviewed Feb 26, 2026

View reviewed changes

checkupup force-pushed the main_dax_race_condition branch from e8f2f72 to 4fa9389 Compare March 4, 2026 06:59

lyakh previously requested changes Mar 5, 2026

View reviewed changes

checkupup force-pushed the main_dax_race_condition branch from 4fa9389 to 9e8f614 Compare March 6, 2026 01:49

lyakh reviewed Mar 6, 2026

View reviewed changes

Conversation

checkupup commented Feb 26, 2026

Uh oh!

sofci commented Feb 26, 2026

Uh oh!

checkupup commented Feb 26, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wjablon1 Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

checkupup commented Feb 27, 2026

Uh oh!

checkupup commented Mar 4, 2026

Uh oh!

lgirdwood commented Mar 4, 2026

Uh oh!

johnylin76 commented Mar 4, 2026

Uh oh!

checkupup commented Mar 5, 2026

Uh oh!

johnylin76 commented Mar 5, 2026

Uh oh!

wjablon1 commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgirdwood commented Mar 5, 2026

Uh oh!

checkupup commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lyakh commented Mar 6, 2026

Uh oh!

checkupup commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lyakh Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lyakh commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wjablon1 Feb 26, 2026 •

edited

Loading

wjablon1 commented Mar 5, 2026 •

edited

Loading

checkupup commented Mar 6, 2026 •

edited

Loading

lyakh Mar 6, 2026 •

edited

Loading