public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports
@ 2026-05-04 10:15 Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
                   ` (6 more replies)
  0 siblings, 7 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Fixes for concurrency and memory ordering bugs that were identified by
the Sashiko review tool when proposing the GS101 ACPM TMU addition.

While these bugs are genuine flaws, we haven't hit them yet, likely
because we don't have enough ACPM clients upstreamed to trigger the
race conditions.

These fixes can go in either at the -rc phase or as regular patches for
the next merge window. If the latter, we'll need a dedicated branch, as
these patches, together with the other ACPM thermal preparatory patches
will be needed by the upcoming GS101 ACPM thermal driver.

Thanks,
ta

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
Changes in v4:
- Drop the SRAM boundary checks patch, incomplete band-aid.
- Split the concurrency and memory ordering into dedicated logical
  patches. It involved reordering of the last patches to avoid
  modifying the same code twice.
- Add a missing memory barrier in acpm_get_rx() to prevent weakly
  ordered CPUs from advancing the hardware RX pointer before the
  payload reads have completed.
- Fix a false-timeout race in the polling path by decoupling the
  polling thread from the global allocator bitmap.
- Use test_and_set_bit_lock. address dependency was not enforced
  when using the plain non-atomic read with find_next_zero_bit().
- Fix kernel doc.
- Link to v3: https://lore.kernel.org/r/20260429-acpm-fixes-sashiko-reports-v3-0-47cf74ab09ad@linaro.org

Changes in v3:
- validate more SRAM parameters and queue pointers (sashiko)
- consider/fix the acquire path (Krzysztof) - patch was moved
  last in the series to avoid touching the same code twice.
- Link to v2: https://lore.kernel.org/r/20260427-acpm-fixes-sashiko-reports-v2-0-1ff8de94a997@linaro.org

Changes in v2:
- drop patch "firmware: samsung: acpm: Fix sequence number leak and infinite loop"
  The patch freed sequence numbers on mailbox failures or timeouts. Because
  the message is already in SRAM and tx.front was advanced, a delayed
  firmware wake-up will process that abandoned message, stealing the
  sequence number from a new thread and causing silent data corruption.
- fix mailbox channel leak when `acpm_achan_alloc_cmds()` failed. Did it
  by  moving the `devm_add_action_or_reset()` call.
- new patches, last 3 in the set, they fix some more sashiko reports.
- Link to v1: https://lore.kernel.org/r/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e@linaro.org

---
Tudor Ambarus (7):
      firmware: samsung: acpm: Fix cross-thread RX length corruption
      firmware: samsung: acpm: Fix mailbox channel leak on probe error
      firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR
      firmware: samsung: acpm: Add memory barrier before advancing RX pointer
      firmware: samsung: acpm: Fix false timeouts in polling path
      firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths
      firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion

 drivers/firmware/samsung/exynos-acpm-dvfs.c        |   3 +
 drivers/firmware/samsung/exynos-acpm.c             | 119 ++++++++++++++++-----
 .../linux/firmware/samsung/exynos-acpm-protocol.h  |   3 +-
 3 files changed, 96 insertions(+), 29 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260423-acpm-fixes-sashiko-reports-ae28b6ed5581

Best regards,
-- 
Tudor Ambarus <tudor.ambarus@linaro.org>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 18:36   ` Krzysztof Kozlowski
  2026-05-04 10:15 ` [PATCH v4 2/7] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified a cross-thread RX length corruption bug when
reviewing the thermal addition to ACPM [1].

When multiple threads concurrently send IPC requests, the ACPM polling
mechanism can encounter responses belonging to other threads. To drain
the queue, the driver saves these concurrent responses into an internal
cache (`rx_data->cmd`) to be retrieved later by the owning thread.

Previously, the driver incorrectly used `xfer->rxcnt` (the expected
receive length of the *current* polling thread) when copying data for
*other* threads into this cache. If the threads expected responses of
different lengths, this resulted in buffer underflows (leading to reads
of uninitialized memory) or potential buffer overflows.

Fix this by replacing the boolean `response` flag in
`struct acpm_rx_data` with `rxcnt`, caching the exact expected receive
length for each specific transaction during transfer preparation. Use
this cached length when saving concurrent responses.

Consequently, ensure that `xfer->rxcnt` is explicitly zeroed in driver
helpers (e.g., `acpm_dvfs_set_xfer`) for fire-and-forget messages to
prevent uninitialized stack garbage from being interpreted as a massive
expected receive length.

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm-dvfs.c |  3 +++
 drivers/firmware/samsung/exynos-acpm.c      | 15 ++++++++-------
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/firmware/samsung/exynos-acpm-dvfs.c b/drivers/firmware/samsung/exynos-acpm-dvfs.c
index 06bdf62dea1f..fdea7aa24ca0 100644
--- a/drivers/firmware/samsung/exynos-acpm-dvfs.c
+++ b/drivers/firmware/samsung/exynos-acpm-dvfs.c
@@ -31,6 +31,9 @@ static void acpm_dvfs_set_xfer(struct acpm_xfer *xfer, u32 *cmd, size_t cmdlen,
 	if (response) {
 		xfer->rxcnt = cmdlen;
 		xfer->rxd = cmd;
+	} else {
+		xfer->rxcnt = 0;
+		xfer->rxd = NULL;
 	}
 }
 
diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index 16c46ed60837..e95edc350efa 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -104,12 +104,12 @@ struct acpm_queue {
  *
  * @cmd:	pointer to where the data shall be saved.
  * @n_cmd:	number of 32-bit commands.
- * @response:	true if the client expects the RX data.
+ * @rxcnt:	expected length of the response in 32-bit words.
  */
 struct acpm_rx_data {
 	u32 *cmd;
 	size_t n_cmd;
-	bool response;
+	size_t rxcnt;
 };
 
 #define ACPM_SEQNUM_MAX    64
@@ -199,7 +199,7 @@ static void acpm_get_saved_rx(struct acpm_chan *achan,
 	const struct acpm_rx_data *rx_data = &achan->rx_data[tx_seqnum - 1];
 	u32 rx_seqnum;
 
-	if (!rx_data->response)
+	if (!rx_data->rxcnt)
 		return;
 
 	rx_seqnum = FIELD_GET(ACPM_PROTOCOL_SEQNUM, rx_data->cmd[0]);
@@ -256,7 +256,7 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 		seqnum = rx_seqnum - 1;
 		rx_data = &achan->rx_data[seqnum];
 
-		if (rx_data->response) {
+		if (rx_data->rxcnt) {
 			if (rx_seqnum == tx_seqnum) {
 				__ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
 				rx_set = true;
@@ -268,7 +268,8 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 				 * clear yet the bitmap. It will be cleared
 				 * after the response is copied to the request.
 				 */
-				__ioread32_copy(rx_data->cmd, addr, xfer->rxcnt);
+				__ioread32_copy(rx_data->cmd, addr,
+						rx_data->rxcnt);
 			}
 		} else {
 			clear_bit(seqnum, achan->bitmap_seqnum);
@@ -380,8 +381,8 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
 	/* Clear data for upcoming responses */
 	rx_data = &achan->rx_data[achan->seqnum - 1];
 	memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
-	if (xfer->rxd)
-		rx_data->response = true;
+	/* zero means no response expected */
+	rx_data->rxcnt = xfer->rxcnt;
 
 	/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
 	set_bit(achan->seqnum - 1, achan->bitmap_seqnum);

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 2/7] firmware: samsung: acpm: Fix mailbox channel leak on probe error
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 3/7] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified the leak at [1].

The ACPM driver allocates hardware mailbox channels using
`mbox_request_channel()` during `acpm_channels_init()`. However, the
driver lacked a `.remove` callback and did not free these channels on
subsequent error paths inside `acpm_probe()`.

Additionally, if `acpm_achan_alloc_cmds()` failed during the channel
initialization loop, the function returned immediately, bypassing the
manual cleanup and permanently leaking any channels successfully
requested in previous loop iterations.

Fix this by modifying `acpm_free_mbox_chans()` to match the `devres`
action signature and registering it via `devm_add_action_or_reset()`.

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index e95edc350efa..9766425a44ab 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -527,10 +527,11 @@ static int acpm_achan_alloc_cmds(struct acpm_chan *achan)
 
 /**
  * acpm_free_mbox_chans() - free mailbox channels.
- * @acpm:	pointer to driver data.
+ * @data:	pointer to driver data.
  */
-static void acpm_free_mbox_chans(struct acpm_info *acpm)
+static void acpm_free_mbox_chans(void *data)
 {
+	struct acpm_info *acpm = data;
 	int i;
 
 	for (i = 0; i < acpm->num_chans; i++)
@@ -558,6 +559,10 @@ static int acpm_channels_init(struct acpm_info *acpm)
 	if (!acpm->chans)
 		return -ENOMEM;
 
+	ret = devm_add_action_or_reset(dev, acpm_free_mbox_chans, acpm);
+	if (ret)
+		return dev_err_probe(dev, ret, "Failed to add mbox free action.\n");
+
 	chans_shmem = acpm->sram_base + readl(&shmem->chans);
 
 	for (i = 0; i < acpm->num_chans; i++) {
@@ -579,10 +584,8 @@ static int acpm_channels_init(struct acpm_info *acpm)
 		cl->dev = dev;
 
 		achan->chan = mbox_request_channel(cl, 0);
-		if (IS_ERR(achan->chan)) {
-			acpm_free_mbox_chans(acpm);
+		if (IS_ERR(achan->chan))
 			return PTR_ERR(achan->chan);
-		}
 	}
 
 	return 0;

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 3/7] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 2/7] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer Tudor Ambarus
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified a potential NULL pointer dereference [1].

The dummy stub implementation for devm_acpm_get_by_node() returns NULL
when CONFIG_EXYNOS_ACPM_PROTOCOL is disabled.

However, the active implementation of this function returns an ERR_PTR
on failure, and the consumer driver checks the return value using
IS_ERR(). Because IS_ERR(NULL) evaluates to false, returning NULL from
the stub tricks consumer drivers into treating the NULL return as a
valid handle. Subsequent attempts to access handle->ops result in a
fatal NULL pointer dereference.

Fix this by returning ERR_PTR(-ENODEV) in the disabled configuration
to correctly propagate the disabled state and match the API contract.

Cc: stable@vger.kernel.org
Fixes: 6837c006d4e7 ("firmware: exynos-acpm: add empty method to allow compile test")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 include/linux/firmware/samsung/exynos-acpm-protocol.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/firmware/samsung/exynos-acpm-protocol.h b/include/linux/firmware/samsung/exynos-acpm-protocol.h
index 13f17dc4443b..d4db2796a6fb 100644
--- a/include/linux/firmware/samsung/exynos-acpm-protocol.h
+++ b/include/linux/firmware/samsung/exynos-acpm-protocol.h
@@ -8,6 +8,7 @@
 #ifndef __EXYNOS_ACPM_PROTOCOL_H
 #define __EXYNOS_ACPM_PROTOCOL_H
 
+#include <linux/err.h>
 #include <linux/types.h>
 
 struct acpm_handle;
@@ -57,7 +58,7 @@ struct acpm_handle *devm_acpm_get_by_node(struct device *dev,
 static inline struct acpm_handle *devm_acpm_get_by_node(struct device *dev,
 							struct device_node *np)
 {
-	return NULL;
+	return ERR_PTR(-ENODEV);
 }
 #endif
 

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
                   ` (2 preceding siblings ...)
  2026-05-04 10:15 ` [PATCH v4 3/7] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path Tudor Ambarus
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified a silent data corruption in [1].

In acpm_get_rx(), the driver reads the response payload from SRAM using
__ioread32_copy() and subsequently updates the hardware RX rear pointer
via writel().

On weakly ordered architectures like ARM64, writel() provides a write
memory barrier (wmb()), which strictly orders prior writes against
subsequent writes. However, it does not order prior reads against
subsequent writes. Consequently, the CPU is permitted to reorder the
writel() store to become globally visible before the payload reads
have completed.

If this reordering occurs, the firmware may observe the updated rear
pointer, assume the queue slot is available, and overwrite the SRAM
payload while the kernel is still actively reading from it, leading
to silent data corruption.

Fix this by inserting a full memory barrier (mb()) before the writel()
to guarantee that all payload reads have completed before the hardware
queue pointer is advanced.

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260429-acpm-fixes-sashiko-reports-v3-0-47cf74ab09ad%40linaro.org
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index 9766425a44ab..a9449bc33bd0 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -5,6 +5,7 @@
  * Copyright 2024 Linaro Ltd.
  */
 
+#include <asm/barrier.h>
 #include <linux/bitfield.h>
 #include <linux/bitmap.h>
 #include <linux/bits.h>
@@ -278,6 +279,9 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 		i = (i + 1) % achan->qlen;
 	} while (i != rx_front);
 
+	/* Ensure all payload reads complete before advancing the rear pointer */
+	mb();
+
 	/* We saved all responses, mark RX empty. */
 	writel(rx_front, achan->rx.rear);
 

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
                   ` (3 preceding siblings ...)
  2026-05-04 10:15 ` [PATCH v4 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 6/7] firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 7/7] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified the bug in [1].

In the ACPM driver's polling mode, the polling thread waits for a
response by monitoring the globally shared 'bitmap_seqnum' using
test_bit().

This creates a severe bit-reuse race condition. If the RX thread
successfully processes the response and frees the sequence number,
a concurrent TX thread can immediately reallocate that same sequence
number and set the bit back to 1. The original polling thread will
wake up from its udelay, observe the bit is 1, falsely assume its
transaction is still pending, and eventually timeout (-ETIME).

Fix this by decoupling the polling completion signal from the global
sequence allocator. Introduce a dedicated 'bool completed' flag per
sequence number slot. The RX thread sets this flag using
smp_store_release() once the data is safely copied, and the polling
thread waits on it using smp_load_acquire().

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260429-acpm-fixes-sashiko-reports-v3-0-47cf74ab09ad%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index a9449bc33bd0..ad677962d10b 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -106,11 +106,14 @@ struct acpm_queue {
  * @cmd:	pointer to where the data shall be saved.
  * @n_cmd:	number of 32-bit commands.
  * @rxcnt:	expected length of the response in 32-bit words.
+ * @completed:	flag indicating if the firmware response has been fully
+ *		processed.
  */
 struct acpm_rx_data {
 	u32 *cmd;
 	size_t n_cmd;
 	size_t rxcnt;
+	bool completed;
 };
 
 #define ACPM_SEQNUM_MAX    64
@@ -261,6 +264,12 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 			if (rx_seqnum == tx_seqnum) {
 				__ioread32_copy(xfer->rxd, addr, xfer->rxcnt);
 				rx_set = true;
+				/*
+				 * Signal completion to the polling thread.
+				 * Pairs with smp_load_acquire() in polling
+				 * loop.
+				 */
+				smp_store_release(&rx_data->completed, true);
 				clear_bit(seqnum, achan->bitmap_seqnum);
 			} else {
 				/*
@@ -271,8 +280,19 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 				 */
 				__ioread32_copy(rx_data->cmd, addr,
 						rx_data->rxcnt);
+				/*
+				 * Signal completion to the polling thread.
+				 * Pairs with smp_load_acquire() in polling
+				 * loop.
+				 */
+				smp_store_release(&rx_data->completed, true);
 			}
 		} else {
+			/*
+			 * Signal completion to the polling thread.
+			 * Pairs with smp_load_acquire() in polling loop.
+			 */
+			smp_store_release(&rx_data->completed, true);
 			clear_bit(seqnum, achan->bitmap_seqnum);
 		}
 
@@ -318,7 +338,13 @@ static int acpm_dequeue_by_polling(struct acpm_chan *achan,
 		if (ret)
 			return ret;
 
-		if (!test_bit(seqnum - 1, achan->bitmap_seqnum))
+		/*
+		 * Safely check if our specific transaction has been processed.
+		 * smp_load_acquire prevents the CPU from speculatively
+		 * executing subsequent instructions before the transaction is
+		 * synchronized.
+		 */
+		if (smp_load_acquire(&achan->rx_data[seqnum - 1].completed))
 			return 0;
 
 		/* Determined experimentally. */
@@ -384,6 +410,7 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
 
 	/* Clear data for upcoming responses */
 	rx_data = &achan->rx_data[achan->seqnum - 1];
+	rx_data->completed = false;
 	memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
 	/* zero means no response expected */
 	rx_data->rxcnt = xfer->rxcnt;

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 6/7] firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
                   ` (4 preceding siblings ...)
  2026-05-04 10:15 ` [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  2026-05-04 10:15 ` [PATCH v4 7/7] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified memory ordering races in [1].

The ACPM driver relies on a globally shared 'bitmap_seqnum' to
synchronize sequence number allocations across the TX and RX lock
domains. The TX thread allocates bits, while the RX thread frees them.

Because these operations cross lock domains, they are effectively
lockless and require explicit memory barriers. Previously, the driver
used plain bitwise operators (test_bit, set_bit, clear_bit), which lack
LKMM ordering guarantees. This creates two severe race conditions on
weakly ordered architectures like ARM64:

1. RX Release Violation: The RX thread reads the response payload and
   calls clear_bit(). Without a release barrier, the CPU can make the
   cleared bit globally visible before the memory reads complete.
2. TX Acquire Violation: The TX thread loops on test_bit() and then
   writes to the payload buffer. Without an acquire barrier, the CPU
   can speculatively execute the buffer wipe (memset) before the
   sequence number is safely claimed.

If these reorderings overlap, a TX thread can overwrite the buffer
while the RX thread is still actively reading from it.

Fix this by upgrading the bitwise operators. Wrap the TX allocation in
test_and_set_bit_lock() to establish formal LKMM Acquire semantics, and
pair it with clear_bit_unlock() in the RX path to enforce Release
semantics.

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260423-acpm-fixes-sashiko-reports-v1-0-2217b790925e%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index ad677962d10b..6fc6175b8924 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -8,7 +8,7 @@
 #include <asm/barrier.h>
 #include <linux/bitfield.h>
 #include <linux/bitmap.h>
-#include <linux/bits.h>
+#include <linux/bitops.h>
 #include <linux/cleanup.h>
 #include <linux/container_of.h>
 #include <linux/delay.h>
@@ -210,7 +210,12 @@ static void acpm_get_saved_rx(struct acpm_chan *achan,
 
 	if (rx_seqnum == tx_seqnum) {
 		memcpy(xfer->rxd, rx_data->cmd, xfer->rxcnt * sizeof(*xfer->rxd));
-		clear_bit(rx_seqnum - 1, achan->bitmap_seqnum);
+		/*
+		 * Enforce release semantics. Ensures the payload memcpy
+		 * completes before the sequence number is globally visible as
+		 * free.
+		 */
+		clear_bit_unlock(rx_seqnum - 1, achan->bitmap_seqnum);
 	}
 }
 
@@ -270,7 +275,8 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 				 * loop.
 				 */
 				smp_store_release(&rx_data->completed, true);
-				clear_bit(seqnum, achan->bitmap_seqnum);
+				/* Enforce Release semantics for payload reads */
+				clear_bit_unlock(seqnum, achan->bitmap_seqnum);
 			} else {
 				/*
 				 * The RX data corresponds to another request.
@@ -293,7 +299,8 @@ static int acpm_get_rx(struct acpm_chan *achan, const struct acpm_xfer *xfer)
 			 * Pairs with smp_load_acquire() in polling loop.
 			 */
 			smp_store_release(&rx_data->completed, true);
-			clear_bit(seqnum, achan->bitmap_seqnum);
+			/* Enforce Release semantics when dropping unneeded payloads */
+			clear_bit_unlock(seqnum, achan->bitmap_seqnum);
 		}
 
 		i = (i + 1) % achan->qlen;
@@ -400,11 +407,18 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
 	struct acpm_rx_data *rx_data;
 	u32 *txd = (u32 *)xfer->txd;
 
-	/* Prevent chan->seqnum from being re-used */
+	/*
+	 * Prevent chan->seqnum from being re-used.
+	 * test_and_set_bit_lock() provides formal LKMM Acquire semantics.
+	 * It pairs with the RX thread's clear_bit_unlock() to ensure the CPU
+	 * does not speculatively execute the rx_data buffer wipe (memset)
+	 * before the sequence number is safely claimed.
+	 */
 	do {
 		if (++achan->seqnum == ACPM_SEQNUM_MAX)
 			achan->seqnum = 1;
-	} while (test_bit(achan->seqnum - 1, achan->bitmap_seqnum));
+		/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
+	} while (test_and_set_bit_lock(achan->seqnum - 1, achan->bitmap_seqnum));
 
 	txd[0] |= FIELD_PREP(ACPM_PROTOCOL_SEQNUM, achan->seqnum);
 
@@ -414,9 +428,6 @@ static void acpm_prepare_xfer(struct acpm_chan *achan,
 	memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
 	/* zero means no response expected */
 	rx_data->rxcnt = xfer->rxcnt;
-
-	/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
-	set_bit(achan->seqnum - 1, achan->bitmap_seqnum);
 }
 
 /**

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v4 7/7] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion
  2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
                   ` (5 preceding siblings ...)
  2026-05-04 10:15 ` [PATCH v4 6/7] firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths Tudor Ambarus
@ 2026-05-04 10:15 ` Tudor Ambarus
  6 siblings, 0 replies; 9+ messages in thread
From: Tudor Ambarus @ 2026-05-04 10:15 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, Tudor Ambarus, stable

Sashiko identified a possible infinite loop [1].

ACPM IPC sequence numbers are tracked via a 64-bit bitmap. Previously,
acpm_prepare_xfer() used a do...while loop to search for a free
sequence number.

If all 63 available sequence numbers are leaked due to transient
hardware timeouts or mailbox failures, the bitmap becomes full.
The next call to acpm_prepare_xfer() would enter an infinite loop.

Fix this by utilizing the kernel's optimized bitmap search functions
(find_next_zero_bit / find_first_zero_bit). If the pool is completely
exhausted, log the failure and return -EBUSY to allow the kernel to
fail gracefully instead of hanging.

Furthermore, drop the allocation loop entirely. Because
acpm_prepare_xfer() is strictly called under the 'tx_lock' mutex,
sequence number allocations are perfectly serialized. If
find_next_zero_bit() locates a free bit, a single
test_and_set_bit_lock() is mathematically guaranteed to succeed.

To enforce this locking invariant, wrap the allocation in a
WARN_ON_ONCE. If the atomic set fails, it indicates the driver's
mutex serialization is fundamentally broken. The warning generates a
stack trace for debugging, while returning -EIO immediately aborts the
transfer to prevent silent payload corruption.

Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
 drivers/firmware/samsung/exynos-acpm.c | 45 +++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/firmware/samsung/exynos-acpm.c b/drivers/firmware/samsung/exynos-acpm.c
index 6fc6175b8924..e31a1b616391 100644
--- a/drivers/firmware/samsung/exynos-acpm.c
+++ b/drivers/firmware/samsung/exynos-acpm.c
@@ -13,6 +13,7 @@
 #include <linux/container_of.h>
 #include <linux/delay.h>
 #include <linux/device.h>
+#include <linux/find.h>
 #include <linux/firmware/samsung/exynos-acpm-protocol.h>
 #include <linux/io.h>
 #include <linux/iopoll.h>
@@ -400,34 +401,48 @@ static int acpm_wait_for_queue_slots(struct acpm_chan *achan, u32 next_tx_front)
  * TX queue.
  * @achan:	ACPM channel info.
  * @xfer:	reference to the transfer being prepared.
+ *
+ * Return: 0 on success, -errno otherwise.
  */
-static void acpm_prepare_xfer(struct acpm_chan *achan,
-			      const struct acpm_xfer *xfer)
+static int acpm_prepare_xfer(struct acpm_chan *achan,
+			     const struct acpm_xfer *xfer)
 {
 	struct acpm_rx_data *rx_data;
 	u32 *txd = (u32 *)xfer->txd;
+	unsigned long size = ACPM_SEQNUM_MAX - 1;
+	unsigned long bit = achan->seqnum;
+
+	bit = find_next_zero_bit(achan->bitmap_seqnum, size, bit);
+	if (bit >= size) {
+		bit = find_first_zero_bit(achan->bitmap_seqnum, size);
+		if (bit >= size) {
+			dev_err_ratelimited(achan->acpm->dev,
+					    "ACPM sequence number pool exhausted\n");
+			return -EBUSY;
+		}
+	}
 
 	/*
-	 * Prevent chan->seqnum from being re-used.
-	 * test_and_set_bit_lock() provides formal LKMM Acquire semantics.
-	 * It pairs with the RX thread's clear_bit_unlock() to ensure the CPU
-	 * does not speculatively execute the rx_data buffer wipe (memset)
-	 * before the sequence number is safely claimed.
+	 * Execute the atomic set to formally claim the bit and establish
+	 * LKMM Acquire semantics against the RX thread's clear_bit_unlock().
+	 * A loop is unnecessary because allocations are strictly serialized
+	 * by tx_lock.
 	 */
-	do {
-		if (++achan->seqnum == ACPM_SEQNUM_MAX)
-			achan->seqnum = 1;
-		/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
-	} while (test_and_set_bit_lock(achan->seqnum - 1, achan->bitmap_seqnum));
+	if (WARN_ON_ONCE(test_and_set_bit_lock(bit, achan->bitmap_seqnum)))
+		return -EIO;
 
+	/* Flag the index based on seqnum. (seqnum: 1~63, bitmap: 0~62) */
+	achan->seqnum = bit + 1;
 	txd[0] |= FIELD_PREP(ACPM_PROTOCOL_SEQNUM, achan->seqnum);
 
 	/* Clear data for upcoming responses */
-	rx_data = &achan->rx_data[achan->seqnum - 1];
+	rx_data = &achan->rx_data[bit];
 	rx_data->completed = false;
 	memset(rx_data->cmd, 0, sizeof(*rx_data->cmd) * rx_data->n_cmd);
 	/* zero means no response expected */
 	rx_data->rxcnt = xfer->rxcnt;
+
+	return 0;
 }
 
 /**
@@ -487,7 +502,9 @@ int acpm_do_xfer(struct acpm_handle *handle, const struct acpm_xfer *xfer)
 		if (ret)
 			return ret;
 
-		acpm_prepare_xfer(achan, xfer);
+		ret = acpm_prepare_xfer(achan, xfer);
+		if (ret)
+			return ret;
 
 		/* Write TX command. */
 		__iowrite32_copy(achan->tx.base + achan->mlen * tx_front,

-- 
2.54.0.545.g6539524ca2-goog


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption
  2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
@ 2026-05-04 18:36   ` Krzysztof Kozlowski
  0 siblings, 0 replies; 9+ messages in thread
From: Krzysztof Kozlowski @ 2026-05-04 18:36 UTC (permalink / raw)
  To: Tudor Ambarus, Alim Akhtar
  Cc: linux-kernel, linux-samsung-soc, linux-arm-kernel, peter.griffin,
	andre.draszik, jyescas, kernel-team, stable

On 04/05/2026 12:15, Tudor Ambarus wrote:
> Sashiko identified a cross-thread RX length corruption bug when
> reviewing the thermal addition to ACPM [1].
> 
> When multiple threads concurrently send IPC requests, the ACPM polling
> mechanism can encounter responses belonging to other threads. To drain
> the queue, the driver saves these concurrent responses into an internal
> cache (`rx_data->cmd`) to be retrieved later by the owning thread.
> 
> Previously, the driver incorrectly used `xfer->rxcnt` (the expected
> receive length of the *current* polling thread) when copying data for
> *other* threads into this cache. If the threads expected responses of
> different lengths, this resulted in buffer underflows (leading to reads
> of uninitialized memory) or potential buffer overflows.
> 
> Fix this by replacing the boolean `response` flag in
> `struct acpm_rx_data` with `rxcnt`, caching the exact expected receive
> length for each specific transaction during transfer preparation. Use
> this cached length when saving concurrent responses.
> 
> Consequently, ensure that `xfer->rxcnt` is explicitly zeroed in driver
> helpers (e.g., `acpm_dvfs_set_xfer`) for fire-and-forget messages to
> prevent uninitialized stack garbage from being interpreted as a massive
> expected receive length.
> 
> Cc: stable@vger.kernel.org
> Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
> Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>

I think parallel credits for Titouan Ameline would be suitable here.
If there is going to be new version, please also add:

Reported-by: Titouan Ameline <titouan.ameline@gmail.com>
Closes: https://lore.kernel.org/r/20260426210255.73674-1-titouan.ameline@gmail.com/


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-05-04 18:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-04 10:15 [PATCH v4 0/7] firmware: samsung: acpm: Various fixes for sashiko bug reports Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 1/7] firmware: samsung: acpm: Fix cross-thread RX length corruption Tudor Ambarus
2026-05-04 18:36   ` Krzysztof Kozlowski
2026-05-04 10:15 ` [PATCH v4 2/7] firmware: samsung: acpm: Fix mailbox channel leak on probe error Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 3/7] firmware: samsung: acpm: Fix dummy stubs to return ERR_PTR Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 4/7] firmware: samsung: acpm: Add memory barrier before advancing RX pointer Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 5/7] firmware: samsung: acpm: Fix false timeouts in polling path Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 6/7] firmware: samsung: acpm: Fix missing LKMM barriers in RX and TX paths Tudor Ambarus
2026-05-04 10:15 ` [PATCH v4 7/7] firmware: samsung: acpm: Fix infinite loop on sequence number exhaustion Tudor Ambarus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox