From: Tudor Ambarus <tudor.ambarus@linaro.org>
To: Krzysztof Kozlowski <krzk@kernel.org>,
Alim Akhtar <alim.akhtar@samsung.com>
Cc: linux-kernel@vger.kernel.org, linux-samsung-soc@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, peter.griffin@linaro.org,
andre.draszik@linaro.org, jyescas@google.com,
kernel-team@android.com, stable@vger.kernel.org
Subject: Re: [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path
Date: Tue, 5 May 2026 15:14:42 +0300 [thread overview]
Message-ID: <366fc343-0cac-4e03-a08e-aa50a6e17314@linaro.org> (raw)
In-Reply-To: <20260504-acpm-fixes-sashiko-reports-v4-5-529246be6b2b@linaro.org>
Hi all,
Before sending out the v5 of this series, I wanted to share what Sashiko
found when reviewing this patch, I think they are great examples of
tricky concurrency bugs. I'll also give some hints on how I'll be fixing
the problems in v5.
> @@ -261,6 +264,12 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
> if (rx_seqnum == tx_seqnum) {
> __ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
> rx_set = true;
> + /*
> + * Signal completion to the polling thread.
> + * Pairs with smp_load_acquire() in polling
> + * loop.
> + */
> + smp_store_release(&rx_data->completed, true);
> clear_bit(seqnum, achan->bitmap_seqnum);
Sashiko pointed out that calling clear_bit() immediately after setting
completed = true creates a fatal Use-After-Free window.
If Thread A natively finds its message, it copies the payload, sets
completed = true, and clears the sequence bit. If Thread A is preempted
right before evaluating smp_load_acquire() in the polling loop, a
concurrent Thread C can claim that exact same sequence number, overwrite
the buffer, and reset completed = false. When Thread A finally resumes,
it reads false, misses its completion signal, and eventually times out.
> } else {
> /*
> @@ -271,8 +280,19 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
> */
> __ioread32_copy(rx_data->cmd, addr,
> rx_data->rxcnt);
> + /*
> + * Signal completion to the polling thread.
> + * Pairs with smp_load_acquire() in polling
> + * loop.
> + */
> + smp_store_release(&rx_data->completed, true);
The second issue occurs when dealing with cross-thread responses. If
Thread B drains the queue and finds Thread A's response, it saves it to
the cache and sets completed = true. It intentionally skips calling
clear_bit(), expecting Thread A to do it.
However, if Thread A calls acpm_get_rx() (finding nothing) and is
preempted right before evaluating smp_load_acquire(), Thread B can
step in, cache the data, and set the completed flag. When Thread A
resumes, it sees completed == true and immediately returns 0 (success).
Because Thread A returns immediately, it completely bypasses calling
acpm_get_saved_rx(). The result: the sequence bit and the response are
permanently leaked.
Both of these bugs stem from a lack of strict ownership over the
sequence slot. To fix this in v5, I will be rewriting the polling loop
to enforce the following invariant: Only the polling thread that
allocated the TX sequence number is allowed to free it, and only right
before it exits the polling loop. Specifically:
- clear_bit() will be completely stripped out of the RX draining paths
(acpm_get_rx and acpm_get_saved_rx).
- acpm_get_rx() will now return a native_match boolean, which evaluates
to true only if the thread natively processed its own sequence number
during the call
- the polling loop will use native_match to determine if it must pull
data from the cache.
- the polling loop will execute clear_bit() on its own tx_seqnum only
after the payload is safely in the caller's buffer.
Cheers,
ta
> }
> } else {
> + /*
> + * Signal completion to the polling thread.
> + * Pairs with smp_load_acquire() in polling loop.
> + */
> + smp_store_release(&rx_data->completed, true);
> clear_bit(seqnum, achan->bitmap_seqnum);
> }
>
> @@ -318,7 +338,13 @@ static int acpm_dequeue_by_polling(struct acpm_chan *achan,
> if (ret)
> return ret;
>
> - if (!test_bit(seqnum - 1, achan->bitmap_seqnum))
> + /*
> + * Safely check if our specific transaction has been processed.
> + * smp_load_acquire prevents the CPU from speculatively
> + * executing subsequent instructions before the transaction is
> + * synchronized.
> + */
> + if (smp_load_acquire(&achan->rx_data[seqnum - 1].completed))
> return 0;
>
> /* Determined experimentally. */
> @@ -384,6 +410,7 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
>
> /* Clear data for upcoming responses */
> rx_data = &achan->rx_data[achan->seqnum - 1];
> + rx_data->completed = false;
> memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
> /* zero means no response expected */
> rx_data->rxcnt = xfer->rxcnt;
>
next prev parent reply other threads:[~2026-05-05 12:14 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
2026-05-04 18:36 ` Krzysztof Kozlowski
2026-05-05 9:14 ` Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 2/7] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 3/7] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path Tudor Ambarus
2026-05-05 12:14 ` Tudor Ambarus [this message]
2026-05-04 10:15 ` [PATCH v4 6/7] firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 7/7] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=366fc343-0cac-4e03-a08e-aa50a6e17314@linaro.org \
--to=tudor.ambarus@linaro.org \
--cc=alim.akhtar@samsung.com \
--cc=andre.draszik@linaro.org \
--cc=jyescas@google.com \
--cc=kernel-team@android.com \
--cc=krzk@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-samsung-soc@vger.kernel.org \
--cc=peter.griffin@linaro.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox