* [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports
@ 2026-04-27 15:04 Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 1/6] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
` (5 more replies)
0 siblings, 6 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Fixes for bugs that were identified by sashiko when proposing the
GS101 ACPM TMU addition.
While the bugs are sane, we haven't hit them yet, maybe because we
don't have enough ACPM clients upstreamed. The fixes can go either
as fixes at -rc phase, or as regular patches for the next merge window.
If the later, we'll need a dedicated branch, as these patches toghether
with the other ACPM thermal preparatory patches will be needed by the
GS101 ACPM thermal driver. I'm thinking a dedicated branch and a tag
will do. I will respin the GS101 ACPM thermal driver series once this
fixes set gets in.
Thanks,
ta
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
Changes in v2:
- drop patch "firmware: samsung: acpm: Fix sequence number leak and infinite loop"
The patch freed sequence numbers on mailbox failures or timeouts. Because
the message is already in SRAM and tx.front was advanced, a delayed
firmware wake-up will process that abandoned message, stealing the
sequence number from a new thread and causing silent data corruption.
- fix mailbox channel leak when `acpm_achan_alloc_cmds()` failed. Did it
by moving the `devm_add_action_or_reset()` call.
- new patches, last 3 in the set, they fix some more sashiko reports.
- Link to v1: https://lore.kernel.org/r/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e@linaro.org
---
Tudor Ambarus (6):
firmware: samsung: acpm: Fix cross-thread RX length corruption
firmware: samsung: acpm: Fix mailbox channel leak on probe error
firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR
firmware: samsung: acpm: Fix memory ordering race in RX path
firmware: samsung: acpm: Fix out-of-bounds read and infinite loop in RX path
firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
drivers/firmware/samsung/exynos-acpm-dvfs.c | 3 +
drivers/firmware/samsung/exynos-acpm.c | 77 ++++++++++++++--------
.../linux/firmware/samsung/exynos-acpm-protocol.h | 3 +-
3 files changed, 56 insertions(+), 27 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260423-acpm-fixes-sashiko-reports-ae28b6ed5581
Best regards,
--
Tudor Ambarus <tudor.ambarus@linaro.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/6] firmware: samsung: acpm: Fix cross-thread RX length corruption
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 2/6] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
` (4 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified a cross-thread RX length corruption bug when
reviewing the thermal addition to ACPM [1].
When multiple threads concurrently send IPC requests, the ACPM polling
mechanism can encounter responses belonging to other threads. To drain
the queue, the driver saves these concurrent responses into an internal
cache (`rx_data->cmd`) to be retrieved later by the owning thread.
Previously, the driver incorrectly used `xfer->rxcnt` (the expected
receive length of the *current* polling thread) when copying data for
*other* threads into this cache. If the threads expected responses of
different lengths, this resulted in buffer underflows (leading to reads
of uninitialized memory) or potential buffer overflows.
Fix this by replacing the boolean `response` flag in
`struct acpm_rx_data` with `rxcnt`, caching the exact expected receive
length for each specific transaction during transfer preparation. Use
this cached length when saving concurrent responses.
Consequently, ensure that `xfer->rxcnt` is explicitly zeroed in driver
helpers (e.g., `acpm_dvfs_set_xfer`) for fire-and-forget messages to
prevent uninitialized stack garbage from being interpreted as a massive
expected receive length.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
drivers/firmware/samsung/exynos-acpm-dvfs.c | 3 +++
drivers/firmware/samsung/exynos-acpm.c | 15 ++++++++-------
2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/firmware/samsung/exynos-acpm-dvfs.c b/drivers/firmware/samsung/exynos-acpm-dvfs.c
index 06bdf62dea1f..fdea7aa24ca0 100644
--- a/drivers/firmware/samsung/exynos-acpm-dvfs.c
+++ b/drivers/firmware/samsung/exynos-acpm-dvfs.c
@@ -31,6 +31,9 @@ static void acpm_dvfs_set_xfer(struct acpm_xfer *xfer, u32 *cmd, size_t cmdlen,
if (response) {
xfer->rxcnt = cmdlen;
xfer->rxd = cmd;
+ } else {
+ xfer->rxcnt = 0;
+ xfer->rxd = NULL;
}
}
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index 16c46ed60837..e95edc350efa 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -104,12 +104,12 @@ struct acpm_queue {
*
* @cmd: pointer to where the data shall be saved.
* @n_cmd: number of 32-bit commands.
- * @response: true if the client expects the RX data.
+ * @rxcnt: expected length of the response in 32-bit words.
*/
struct acpm_rx_data {
u32 *cmd;
size_t n_cmd;
- bool response;
+ size_t rxcnt;
};
#define ACPM_SEQNUM_MAX 64
@@ -199,7 +199,7 @@ static void acpm_get_saved_rx(struct acpm_chan *achan,
const struct acpm_rx_data *rx_data = &achan->rx_data[tx_seqnum - 1];
u32 rx_seqnum;
- if (!rx_data->response)
+ if (!rx_data->rxcnt)
return;
rx_seqnum = FIELD_GET(ACPM_PROTOCOL_SEQNUM, rx_data->cmd[0]);
@@ -256,7 +256,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
seqnum = rx_seqnum - 1;
rx_data = &achan->rx_data[seqnum];
- if (rx_data->response) {
+ if (rx_data->rxcnt) {
if (rx_seqnum == tx_seqnum) {
__ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
rx_set = true;
@@ -268,7 +268,8 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
* clear yet the bitmap. It will be cleared
* after the response is copied to the request.
*/
- __ioread32_copy(rx_data->cmd, addr, xfer->rxcnt);
+ __ioread32_copy(rx_data->cmd, addr,
+ rx_data->rxcnt);
}
} else {
clear_bit(seqnum, achan->bitmap_seqnum);
@@ -380,8 +381,8 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
/* Clear data for upcoming responses */
rx_data = &achan->rx_data[achan->seqnum - 1];
memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
- if (xfer->rxd)
- rx_data->response = true;
+ /* zero means no response expected */
+ rx_data->rxcnt = xfer->rxcnt;
/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
set_bit(achan->seqnum - 1, achan->bitmap_seqnum);
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 2/6] firmware: samsung: acpm: Fix mailbox channel leak on probe error
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 1/6] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 3/6] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
` (3 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified the leak at [1].
The ACPM driver allocates hardware mailbox channels using
`mbox_request_channel()` during `acpm_channels_init()`. However, the
driver lacked a `.remove` callback and did not free these channels on
subsequent error paths inside `acpm_probe()`.
Additionally, if `acpm_achan_alloc_cmds()` failed during the channel
initialization loop, the function returned immediately, bypassing the
manual cleanup and permanently leaking any channels successfully
requested in previous loop iterations.
Fix this by modifying `acpm_free_mbox_chans()` to match the `devres`
action signature and registering it via `devm_add_action_or_reset()`.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
drivers/firmware/samsung/exynos-acpm.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index e95edc350efa..bd0d48e9d157 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -529,8 +529,9 @@ static int acpm_achan_alloc_cmds(struct acpm_chan *achan)
* acpm_free_mbox_chans() - free mailbox channels.
* @acpm: pointer to driver data.
*/
-static void acpm_free_mbox_chans(struct acpm_info *acpm)
+static void acpm_free_mbox_chans(void *data)
{
+ struct acpm_info *acpm = data;
int i;
for (i = 0; i < acpm->num_chans; i++)
@@ -558,6 +559,10 @@ static int acpm_channels_init(struct acpm_info *acpm)
if (!acpm->chans)
return -ENOMEM;
+ ret = devm_add_action_or_reset(dev, acpm_free_mbox_chans, acpm);
+ if (ret)
+ return dev_err_probe(dev, ret, "Failed to add mbox free action.\n");
+
chans_shmem = acpm->sram_base + readl(&shmem->chans);
for (i = 0; i < acpm->num_chans; i++) {
@@ -579,10 +584,8 @@ static int acpm_channels_init(struct acpm_info *acpm)
cl->dev = dev;
achan->chan = mbox_request_channel(cl, 0);
- if (IS_ERR(achan->chan)) {
- acpm_free_mbox_chans(acpm);
+ if (IS_ERR(achan->chan))
return PTR_ERR(achan->chan);
- }
}
return 0;
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/6] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 1/6] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 2/6] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path Tudor Ambarus
` (2 subsequent siblings)
5 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified a potential NULL pointer dereference [1].
The dummy stub implementation for devm_acpm_get_by_node() returns NULL
when CONFIG_EXYNOS_ACPM_PROTOCOL is disabled.
However, the active implementation of this function returns an ERR_PTR
on failure, and the consumer driver checks the return value using
IS_ERR(). Because IS_ERR(NULL) evaluates to false, returning NULL from
the stub tricks consumer drivers into treating the NULL return as a
valid handle. Subsequent attempts to access handle->ops result in a
fatal NULL pointer dereference.
Fix this by returning ERR_PTR(-ENODEV) in the disabled configuration
to correctly propagate the disabled state and match the API contract.
Cc: stable@vger.kernel.org
Fixes: 6837c006d4e7 ("firmware: exynos-acpm: add empty method to allow compile test")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
include/linux/firmware/samsung/exynos-acpm-protocol.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/firmware/samsung/exynos-acpm-protocol.h b/include/linux/firmware/samsung/exynos-acpm-protocol.h
index 13f17dc4443b..d4db2796a6fb 100644
--- a/include/linux/firmware/samsung/exynos-acpm-protocol.h
+++ b/include/linux/firmware/samsung/exynos-acpm-protocol.h
@@ -8,6 +8,7 @@
#ifndef __EXYNOS_ACPM_PROTOCOL_H
#define __EXYNOS_ACPM_PROTOCOL_H
+#include <linux/err.h>
#include <linux/types.h>
struct acpm_handle;
@@ -57,7 +58,7 @@ struct acpm_handle *devm_acpm_get_by_node(struct device *dev,
static inline struct acpm_handle *devm_acpm_get_by_node(struct device *dev,
struct device_node *np)
{
- return NULL;
+ return ERR_PTR(-ENODEV);
}
#endif
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
` (2 preceding siblings ...)
2026-04-27 15:04 ` [PATCH v2 3/6] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
2026-04-28 9:52 ` Krzysztof Kozlowski
2026-04-27 15:04 ` [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop " Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 6/6] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
5 siblings, 1 reply; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified a memory ordering race in RX path [1].
When draining the RX queue or reading saved responses, the driver uses
clear_bit() to release the sequence number back to the available pool.
However, on weakly ordered architectures like ARM64, clear_bit() does
not provide implicit memory barriers.
This allows the CPU to reorder instructions, making the cleared bit
globally visible before the preceding memory operations (memcpy() or
__ioread32_copy()) have completed. If a concurrent thread allocates the
newly freed sequence number, it can execute acpm_prepare_xfer() and
zero out the buffer via memset() while the RX thread is still actively
reading from it, leading to silent data corruption.
Fix this by replacing clear_bit() with clear_bit_unlock() across the
RX path. This provides release semantics, guaranteeing that all prior
memory reads and writes are fully completed and visible before the
sequence number is marked as free.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
drivers/firmware/samsung/exynos-acpm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index bd0d48e9d157..c9aa79c2faa4 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -7,7 +7,7 @@
#include <linux/bitfield.h>
#include <linux/bitmap.h>
-#include <linux/bits.h>
+#include <linux/bitops.h>
#include <linux/cleanup.h>
#include <linux/container_of.h>
#include <linux/delay.h>
@@ -206,7 +206,7 @@ static void acpm_get_saved_rx(struct acpm_chan *achan,
if (rx_seqnum == tx_seqnum) {
memcpy(xfer->rxd, rx_data->cmd, xfer->rxcnt * sizeof(*xfer->rxd));
- clear_bit(rx_seqnum - 1, achan->bitmap_seqnum);
+ clear_bit_unlock(rx_seqnum - 1, achan->bitmap_seqnum);
}
}
@@ -260,7 +260,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
if (rx_seqnum == tx_seqnum) {
__ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
rx_set = true;
- clear_bit(seqnum, achan->bitmap_seqnum);
+ clear_bit_unlock(seqnum, achan->bitmap_seqnum);
} else {
/*
* The RX data corresponds to another request.
@@ -272,7 +272,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
rx_data->rxcnt);
}
} else {
- clear_bit(seqnum, achan->bitmap_seqnum);
+ clear_bit_unlock(seqnum, achan->bitmap_seqnum);
}
i = (i + 1) % achan->qlen;
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop in RX path
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
` (3 preceding siblings ...)
2026-04-27 15:04 ` [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
2026-04-28 13:04 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 6/6] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
5 siblings, 1 reply; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified these bugs in [1].
The ACPM driver reads the rx_front and rx_rear pointers directly from
SRAM and uses them to calculate SRAM offsets and loop termination
conditions.
If a firmware bug writes a value greater than or equal to the queue
length (achan->qlen) at those addresses, two failures occur:
1. Out-of-bounds read: The rear pointer ('i') is used to calculate the
MMIO address before the modulo operation is applied, leading to an
immediate out-of-bounds memory access.
2. Infinite loop: The loop iterates using 'i = (i + 1) % achan->qlen'.
Because 'i' is mathematically capped below qlen, if 'rx_front' is
greater than or equal to qlen, 'i' will never equal 'rx_front'.
The CPU will spin forever, holding the rx_lock and deadlocking the
polling thread.
Protect the kernel by strictly validating the MMIO queue offsets
immediately after reading them.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
drivers/firmware/samsung/exynos-acpm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index c9aa79c2faa4..43658cc1347a 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -230,6 +230,13 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
rx_front = readl(achan->rx.front);
i = readl(achan->rx.rear);
+ if (rx_front >= achan->qlen || i >= achan->qlen) {
+ dev_err(achan->acpm->dev,
+ "Invalid RX queue pointers from firmware: front=%u rear=%u qlen=%u\n",
+ rx_front, i, achan->qlen);
+ return -EIO;
+ }
+
tx_seqnum = FIELD_GET(ACPM_PROTOCOL_SEQNUM, xfer->txd[0]);
if (i == rx_front) {
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/6] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
` (4 preceding siblings ...)
2026-04-27 15:04 ` [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop " Tudor Ambarus
@ 2026-04-27 15:04 ` Tudor Ambarus
5 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-27 15:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable
Sashiko identified a possible infinite loop [1].
ACPM IPC sequence numbers are tracked via a 64-bit bitmap. Previously,
acpm_prepare_xfer() used a do...while loop to search for a free
sequence number.
If all 63 available sequence numbers are leaked due to transient
hardware timeouts or mailbox failures, the bitmap becomes full.
The next call to acpm_prepare_xfer() would enter an infinite loop.
Fix this by utilizing the kernel's optimized bitmap search functions
(find_next_zero_bit / find_first_zero_bit). If the pool is completely
exhausted, log the failure and return -EBUSY to allow the kernel to
fail gracefully instead of hanging.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
drivers/firmware/samsung/exynos-acpm.c | 36 +++++++++++++++++++++++-----------
1 file changed, 25 insertions(+), 11 deletions(-)
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index 43658cc1347a..f086084202fb 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -12,6 +12,7 @@
#include <linux/container_of.h>
#include <linux/delay.h>
#include <linux/device.h>
+#include <linux/find.h>
#include <linux/firmware/samsung/exynos-acpm-protocol.h>
#include <linux/io.h>
#include <linux/iopoll.h>
@@ -370,29 +371,40 @@ static int acpm_wait_for_queue_slots(struct acpm_chan *achan, u32 next_tx_front)
* TX queue.
* @achan: ACPM channel info.
* @xfer: reference to the transfer being prepared.
+ *
+ * Return: 0 on success, -EBUSY if the sequence number pool is exhausted.
*/
-static void acpm_prepare_xfer(struct acpm_chan *achan,
- const struct acpm_xfer *xfer)
+static int acpm_prepare_xfer(struct acpm_chan *achan,
+ const struct acpm_xfer *xfer)
{
struct acpm_rx_data *rx_data;
u32 *txd = (u32 *)xfer->txd;
+ unsigned long size = ACPM_SEQNUM_MAX - 1;
+ unsigned long bit;
+
+ bit = find_next_zero_bit(achan->bitmap_seqnum, size, achan->seqnum);
+ if (bit >= size) {
+ bit = find_first_zero_bit(achan->bitmap_seqnum, size);
+ if (bit >= size) {
+ dev_err_ratelimited(achan->acpm->dev,
+ "ACPM sequence number pool exhausted\n");
+ return -EBUSY;
+ }
+ }
- /* Prevent chan->seqnum from being re-used */
- do {
- if (++achan->seqnum == ACPM_SEQNUM_MAX)
- achan->seqnum = 1;
- } while (test_bit(achan->seqnum - 1, achan->bitmap_seqnum));
+ /* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
+ achan->seqnum = bit + 1;
+ set_bit(bit, achan->bitmap_seqnum);
txd[0] |= FIELD_PREP(ACPM_PROTOCOL_SEQNUM, achan->seqnum);
/* Clear data for upcoming responses */
- rx_data = &achan->rx_data[achan->seqnum - 1];
+ rx_data = &achan->rx_data[bit];
memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
/* zero means no response expected */
rx_data->rxcnt = xfer->rxcnt;
- /* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
- set_bit(achan->seqnum - 1, achan->bitmap_seqnum);
+ return 0;
}
/**
@@ -452,7 +464,9 @@ int acpm_do_xfer(struct acpm_handle *handle, const struct acpm_xfer *xfer)
if (ret)
return ret;
- acpm_prepare_xfer(achan, xfer);
+ ret = acpm_prepare_xfer(achan, xfer);
+ if (ret)
+ return ret;
/* Write TX command. */
__iowrite32_copy(achan->tx.base + achan->mlen * tx_front,
--
2.54.0.rc2.544.gc7ae2d5bb8-goog
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path
2026-04-27 15:04 ` [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path Tudor Ambarus
@ 2026-04-28 9:52 ` Krzysztof Kozlowski
2026-04-28 12:57 ` Tudor Ambarus
0 siblings, 1 reply; 11+ messages in thread
From: Krzysztof Kozlowski @ 2026-04-28 9:52 UTC (permalink / raw)
To: Tudor Ambarus, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, stable
On 27/04/2026 17:04, Tudor Ambarus wrote:
> Sashiko identified a memory ordering race in RX path [1].
>
> When draining the RX queue or reading saved responses, the driver uses
> clear_bit() to release the sequence number back to the available pool.
> However, on weakly ordered architectures like ARM64, clear_bit() does
> not provide implicit memory barriers.
And it does not have to if entire access is synchronized by other locks.
You need to analyze also this and mention here path which is not
synchronized and uses these weakly ordered atomic operations.
>
> This allows the CPU to reorder instructions, making the cleared bit
> globally visible before the preceding memory operations (memcpy() or
> __ioread32_copy()) have completed. If a concurrent thread allocates the
> newly freed sequence number, it can execute acpm_prepare_xfer() and
> zero out the buffer via memset() while the RX thread is still actively
> reading from it, leading to silent data corruption.
>
> Fix this by replacing clear_bit() with clear_bit_unlock() across the
> RX path. This provides release semantics, guaranteeing that all prior
> memory reads and writes are fully completed and visible before the
> sequence number is marked as free.
Barriers should be paired and release is paired with acquire.
bitmap_seqnum() is used with test_bit() and a separate set_bit(), which
do not have acquire semantics, although in some calls it is within lock.
Problem is I guess acpm_dequeue_by_polling() which is called without any
locks.
This means that other thread won't see updated values.
I think you also need to investigate and fix that acquire path.
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path
2026-04-28 9:52 ` Krzysztof Kozlowski
@ 2026-04-28 12:57 ` Tudor Ambarus
2026-04-29 11:05 ` Tudor Ambarus
0 siblings, 1 reply; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-28 12:57 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, stable
Hi, Krzysztof,
Thanks for the review! I indeed missed something in the acquire path,
details below.
On 4/28/26 12:52 PM, Krzysztof Kozlowski wrote:
> On 27/04/2026 17:04, Tudor Ambarus wrote:
>> Sashiko identified a memory ordering race in RX path [1].
>>
>> When draining the RX queue or reading saved responses, the driver uses
>> clear_bit() to release the sequence number back to the available pool.
>> However, on weakly ordered architectures like ARM64, clear_bit() does
>> not provide implicit memory barriers.
>
> And it does not have to if entire access is synchronized by other locks.
Right.
> You need to analyze also this and mention here path which is not
> synchronized and uses these weakly ordered atomic operations.
>
The TX path is protected by tx_lock, and the RX path is protected by
rx_lock. Because the bitmap_seqnum is modified by the RX thread
(clearing the bit) and the TX thread (setting the bit), the bitmap is
accessed across two different lock domains. Therefore, from the
bitmap's perspective, the synchronization is entirely lockless, and
explicit memory barriers are required. I'll add a comment in the
commit message.
>>
>> This allows the CPU to reorder instructions, making the cleared bit
>> globally visible before the preceding memory operations (memcpy() or
>> __ioread32_copy()) have completed. If a concurrent thread allocates the
>> newly freed sequence number, it can execute acpm_prepare_xfer() and
>> zero out the buffer via memset() while the RX thread is still actively
>> reading from it, leading to silent data corruption.
>>
>> Fix this by replacing clear_bit() with clear_bit_unlock() across the
>> RX path. This provides release semantics, guaranteeing that all prior
>> memory reads and writes are fully completed and visible before the
>> sequence number is marked as free.
>
> Barriers should be paired and release is paired with acquire.
> bitmap_seqnum() is used with test_bit() and a separate set_bit(), which
> do not have acquire semantics, although in some calls it is within lock.
> Problem is I guess acpm_dequeue_by_polling() which is called without any
> locks.
>
> This means that other thread won't see updated values.
>
> I think you also need to investigate and fix that acquire path.
:) thanks for the challenge!
In acpm_dequeue_by_polling(), zero-lenght messages (rxcnt == 0) can have
their bits cleared by a concurrent thread draining the queue. If we have
our thread sitting in the do...while loop and calling test_bit() we risk
speculative execution of the caller's subsequent instructions before the
bit actually flips to zero. The fix is to s/test_bit()/test_bit_acquire().
In what concerns acpm_prepare_xfer(), where we use set_bit(), I find it
safe as it is, I don't think we need an acquire barrier. By the time the
test_bit() loop observes a 0, the RX's clear_bit_unlock() has guaranteed
that the payload was copied out. The rx_data->cmd buffer is dead.
Another thing that I thought about was about the reordering of memset
and set_bit in acpm_prepare_xfer(), but even there, the internal
execution order doesn't matter. Both are guaranteed to be completed
before writel (wmb). One may notice that I even reordered the memset and
set_bit in patch 6/6.
Also, test_bit() in acpm_prepare_xfer() will be replaced in patch 6/6
by find_next_zero_bit(), they are functionally equivalent.
I'll send a v3, s/test_bit()/test_bit_acquire()/ in RX path and update
the commit message with the details described above.
Cheers,
ta
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop in RX path
2026-04-27 15:04 ` [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop " Tudor Ambarus
@ 2026-04-28 13:04 ` Tudor Ambarus
0 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-28 13:04 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, stable
FYI, I checked sashiko's review on the set, and I shall act on this patch
and extend the checks for tx_front >= achan->qlen and add zero length checks
for qlen and mlen that were read from SRAM. I'll do that in v3.
The rest of the review feedback doesn't apply, so together with the changes
proposed in patch 4/6 that will be all.
Cheers,
ta
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path
2026-04-28 12:57 ` Tudor Ambarus
@ 2026-04-29 11:05 ` Tudor Ambarus
0 siblings, 0 replies; 11+ messages in thread
From: Tudor Ambarus @ 2026-04-29 11:05 UTC (permalink / raw)
To: Krzysztof Kozlowski, Alim Akhtar
Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
andre.draszik, jyescas, kernel-team, stable
On 4/28/26 3:57 PM, Tudor Ambarus wrote:
> Another thing that I thought about was about the reordering of memset
> and set_bit in acpm_prepare_xfer(), but even there, the internal
> execution order doesn't matter. Both are guaranteed to be completed
> before writel (wmb).
I need to correct myself here. While the wmb() inside writel() does
prevent the hardware from seeing incomplete state, my previous
statement was slightly misleading regarding the local CPU pipeline.
The wmb() alone does not prevent the CPU from speculatively trying
to wipe the memory before it actually finds the first zero bit in
the bitmap.
What truly prevents speculative execution here is a strict
Address Dependency (implicit barrier). The CPU mathematically cannot
calculate the destination pointer for the memset() until the bit in
the bitmap is identified. I will add a comment in the code describing
this dependency.
In what concerns that set_bit() in acpm_prepare_xfer(), we only need
to ensure it is visible before the next TX thread tries to allocate
a sequence number. That is completely protected by the tx_lock
boundaries. The RX thread does not care about set_bit() at all — it
only blind-clears bits based on the rx_seqnum it gets back from the
firmware. I'll add a comment documenting the set_bit() safety as well.
Finally, regarding test_bit() and find_next_zero_bit() in
acpm_prepare_xfer(), they are functionally equivalent. Both are
relaxed, barrier-less reads. The non-atomic find_next_zero_bit()
introduces zero concurrency problems because this phase is strictly
a read-only search (if we read the bitmap just before the RX thread
frees a bit, we simply skip to the next available one).
I'll reorder the patches and put this one as last in the set, I want
to have the find_next_zero_bit() before it, to not touch the same
code twice.
Thanks,
ta
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-04-29 11:06 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-27 15:04 [PATCH v2 0/6] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 1/6] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 2/6] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 3/6] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 4/6] firmware: samsung: acpm: Fix memory ordering race in RX path Tudor Ambarus
2026-04-28 9:52 ` Krzysztof Kozlowski
2026-04-28 12:57 ` Tudor Ambarus
2026-04-29 11:05 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 5/6] firmware: samsung: acpm: Fix out-of-bounds read and infinite loop " Tudor Ambarus
2026-04-28 13:04 ` Tudor Ambarus
2026-04-27 15:04 ` [PATCH v2 6/6] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox