public inbox for arm-scmi@vger.kernel.org
 help / color / mirror / Atom feed
From: Artem Shimko <a.shimko.dev@gmail.com>
To: Sudeep Holla <sudeep.holla@arm.com>,
	Cristian Marussi <cristian.marussi@arm.com>
Cc: Artem Shimko <a.shimko.dev@gmail.com>,
	arm-scmi@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH] firmware: arm_scmi: Fix raw mode async completion race
Date: Fri, 19 Dec 2025 14:46:16 +0300	[thread overview]
Message-ID: <20251219114617.2057576-1-a.shimko.dev@gmail.com> (raw)

A race condition exists in scmi_xfer_raw_worker() where async_done
completion can be accessed after being nullified by
scmi_xfer_raw_waiter_put() in scmi_handle_response(). This happens
because the worker skips wait_for_completion_timeout() when ret has
an error value and goes directly to nullify the async_done.

Fix by waiting for async_done unconditionally when it exists, without
checking the result of the ret variable. This ensures the worker
always synchronizes properly with the scmi_handle_response()'s complete
for async_done.

Fixes: 3c3d818a9317a ("firmware: arm_scmi: Add core raw transmission support")
Signed-off-by: Artem Shimko <a.shimko.dev@gmail.com>
---
Hello maintainers and reviewers,

This patch fixes a race condition in the SCMI raw mode implementation
that can lead to kernel crashes when handling asynchronous delayed
responses.

I temporarily added the trace_printk("%s\n", __func__) to track the problem.

# mount -t tracefs nodev /sys/kernel/tracing
# mount -t debugfs debugfs /sys/kernel/debug
# cd /sys/kernel/debug/tracing
# echo 1 > options/trace_printk
# echo 1 > tracing_on

Doing that until Oops
# echo -e -n 'sorry, but NDA raw msg' > /sys/kernel/debug/scmi/0/raw/message_async 

Without the changes:
[   19.513750] Unable to handle kernel NULL pointer dereference at virtual address NDA
[   19.524756] Oops [#1]
[   19.527034] Modules linked in:
[   19.530097] CPU: 0 UID: 0 PID: 124 Comm: irq/12-1e200000 Not tainted 6.12.0-NDA
...
[   19.638262] [<ffffffff8007af3c>] do_raw_spin_lock+0xa/0x10a
[   19.643843] [<ffffffff80b4e42e>] _raw_spin_lock_irqsave+0x20/0x2c
[   19.649941] [<ffffffff80072dc4>] complete+0x1e/0x76
[   19.654826] [<ffffffff808056bc>] scmi_rx_callback+0x66e/0x7cc
[   19.660589] [<ffffffff80800204>] transport_rx_callback+0x4e/0x56
[   19.666534] [<ffffffff8082f664>] mbox_chan_received_data+0x10/0x1a
[   19.672730] [<ffffffff80830624>] transport_chan_do_rx+0xea/0x136
[   19.678311] [<ffffffff808306c6>] transport_mbox_threaded_isr+0x42/0x9c
[   19.684408] [<ffffffff8008e8ee>] irq_thread_fn+0x1a/0x5a
[   19.689733] [<ffffffff8008e724>] irq_thread+0x16c/0x20a
[   19.694962] [<ffffffff80044032>] kthread+0xda/0xf6
[   19.699761] [<ffffffff80b4f8d6>] ret_from_fork+0xe/0x18
[   19.712403] ---[ end trace 0000000000000000 ]---

With the changes: we dont have the Oops

# echo 0 > tracing_on
# cat trace

Without the changes:
 irq/12-1e200000-120     [000] .....    23.368262: scmi_rx_callback: scmi_handle_response 
   kworker/u45:0-95      [003] .....    23.394836: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    25.625926: scmi_rx_callback: scmi_handle_response 
   kworker/u44:0-94      [002] .....    25.653884: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    27.202031: scmi_rx_callback: scmi_handle_response 
   kworker/u45:0-95      [001] .....    27.228216: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    28.504546: scmi_rx_callback: scmi_handle_response 
   kworker/u45:0-95      [002] .....    28.531534: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    30.102729: scmi_rx_callback: scmi_handle_response 
   kworker/u44:0-94      [003] .....    30.129688: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    31.108407: scmi_rx_callback: scmi_handle_response 
   kworker/u44:0-94      [003] .....    31.136012: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    32.388953: scmi_rx_callback: scmi_handle_response 
   kworker/u44:0-94      [003] .....    32.415700: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    33.737014: scmi_rx_callback: scmi_handle_response 
   kworker/u44:0-94      [002] .....    33.764977: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
 irq/12-1e200000-120     [000] .....    34.979096: scmi_rx_callback: scmi_handle_response 
   kworker/u45:0-95      [002] .....    35.005377: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put 
   kworker/u45:0-95      [003] .....    36.758043: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put <- RC
 irq/12-1e200000-119     [000] .....    37.561734: scmi_rx_callback: scmi_handle_response 

Withthe changes:
irq/12-1e200000-120     [000] .....    23.368262: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [003] .....    23.394836: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    25.625926: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [002] .....    25.653884: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    27.202031: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [001] .....    27.228216: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    28.504546: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [002] .....    28.531534: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    30.102729: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [003] .....    30.129688: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    31.108407: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [003] .....    31.136012: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    32.388953: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [003] .....    32.415700: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    33.737014: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [002] .....    33.764977: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    34.979096: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [002] .....    35.005377: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    36.028136: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [003] .....    36.055553: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    36.906953: scmi_rx_callback: scmi_handle_response
   kworker/u45:0-95      [002] .....    36.933304: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    37.548706: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [003] .....    37.577260: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    38.079993: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [002] .....    38.108648: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
 irq/12-1e200000-120     [000] .....    38.512822: scmi_rx_callback: scmi_handle_response
   kworker/u44:0-94      [002] .....    38.540403: scmi_xfer_raw_waiter_put: scmi_xfer_raw_waiter_put
   ...

--
Regards,
Artem

 drivers/firmware/arm_scmi/raw_mode.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c
index 73db5492ab44..362773f3114d 100644
--- a/drivers/firmware/arm_scmi/raw_mode.c
+++ b/drivers/firmware/arm_scmi/raw_mode.c
@@ -479,7 +479,7 @@ static void scmi_xfer_raw_worker(struct work_struct *work)
 				    ret, scmi_inflight_count(raw->handle));
 
 		/* Wait also for an async delayed response if needed */
-		if (!ret && xfer->async_done) {
+		if (xfer->async_done) {
 			unsigned long tmo = msecs_to_jiffies(SCMI_MAX_RESPONSE_TIMEOUT);
 
 			if (!wait_for_completion_timeout(xfer->async_done, tmo))
-- 
2.43.0


                 reply	other threads:[~2025-12-19 11:46 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251219114617.2057576-1-a.shimko.dev@gmail.com \
    --to=a.shimko.dev@gmail.com \
    --cc=arm-scmi@vger.kernel.org \
    --cc=cristian.marussi@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sudeep.holla@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox