* [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
@ 2025-11-16 1:37 Alison Schofield
2025-11-18 2:12 ` Ira Weiny
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Alison Schofield @ 2025-11-16 1:37 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco
Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
loop iterator in mock_get_event() from a static constant,
CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
intent was to vary the number of events returned per call to simulate
events occurring while logs are being read.
However, ret_limit is modified without synchronization. When multiple
threads call mock_get_event() concurrently, one thread may read
ret_limit, another thread may increment it, and the first thread's
loop condition and size calculation see and use the updated value.
This is visible during cxl_test module load when all memdevs are
initializing simultaneously, which includes getting event records. It
is not tied to the cxl-events.sh unit test specifically, as that
operates on a single memdev.
While no actual harm results (the buffer is always large enough and
the record count fields correctly reflect what was written), this is
a correctness issue. The race creates an inconsistent state within
mock_get_event() and adding variability based on a race appears
unintended.
Make ret_limit a local variable populated from an atomic counter. Each
call gets a stable value that won't change during execution. That
preserves the intended behavior of varying the return counts across
calls while eliminating the race condition.
This implementation uses "+ 1" to produce the full range of 1 to
CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
produced.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
This was found while chasing a NULL payload_out issue in mock_get_event()
that Itaru reported here [1] and Fabio and I have both seen but not been
able to reliably reproduce. Although the accounting can be wrong wrt
ret_limit, no potential overflow was found.
[1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/
tools/testing/cxl/test/mem.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index d533481672b7..6809c4a26f5e 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
* Vary the number of events returned to simulate events occuring while the
* logs are being read.
*/
-static int ret_limit = 0;
+static atomic_t event_counter = ATOMIC_INIT(0);
static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
{
struct cxl_get_event_payload *pl;
struct mock_event_log *log;
u16 nr_overflow;
+ int ret_limit;
u8 log_type;
int i;
if (cmd->size_in != sizeof(log_type))
return -EINVAL;
- ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
- if (!ret_limit)
- ret_limit = 1;
+ /* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
+ ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
if (cmd->size_out < struct_size(pl, records, ret_limit))
return -EINVAL;
base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
--
2.37.3
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
2025-11-16 1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
@ 2025-11-18 2:12 ` Ira Weiny
2025-11-18 23:17 ` Dave Jiang
2025-11-18 23:35 ` Dave Jiang
2 siblings, 0 replies; 4+ messages in thread
From: Ira Weiny @ 2025-11-18 2:12 UTC (permalink / raw)
To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco
Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
>
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
>
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
>
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
>
> Make ret_limit a local variable populated from an atomic counter.
Ah yea... good catch.
>
> Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
>
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.
Does cxl_event.sh need the limits to increment? Would it be better to use
a random number of events?
Regardless.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
[snip]
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
2025-11-16 1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
2025-11-18 2:12 ` Ira Weiny
@ 2025-11-18 23:17 ` Dave Jiang
2025-11-18 23:35 ` Dave Jiang
2 siblings, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2025-11-18 23:17 UTC (permalink / raw)
To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Vishal Verma,
Ira Weiny, Dan Williams
Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco
On 11/15/25 6:37 PM, Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
>
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
>
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
>
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
>
> Make ret_limit a local variable populated from an atomic counter. Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
>
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
>
> This was found while chasing a NULL payload_out issue in mock_get_event()
> that Itaru reported here [1] and Fabio and I have both seen but not been
> able to reliably reproduce. Although the accounting can be wrong wrt
> ret_limit, no potential overflow was found.
>
> [1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/
>
>
> tools/testing/cxl/test/mem.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index d533481672b7..6809c4a26f5e 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
> * Vary the number of events returned to simulate events occuring while the
> * logs are being read.
> */
> -static int ret_limit = 0;
> +static atomic_t event_counter = ATOMIC_INIT(0);
>
> static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
> {
> struct cxl_get_event_payload *pl;
> struct mock_event_log *log;
> u16 nr_overflow;
> + int ret_limit;
> u8 log_type;
> int i;
>
> if (cmd->size_in != sizeof(log_type))
> return -EINVAL;
>
> - ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
> - if (!ret_limit)
> - ret_limit = 1;
> + /* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
> + ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
>
> if (cmd->size_out < struct_size(pl, records, ret_limit))
> return -EINVAL;
>
> base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
2025-11-16 1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
2025-11-18 2:12 ` Ira Weiny
2025-11-18 23:17 ` Dave Jiang
@ 2025-11-18 23:35 ` Dave Jiang
2 siblings, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2025-11-18 23:35 UTC (permalink / raw)
To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Vishal Verma,
Ira Weiny, Dan Williams
Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco
On 11/15/25 6:37 PM, Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
>
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
>
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
>
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
>
> Make ret_limit a local variable populated from an atomic counter. Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
>
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
applied to cxl/next
b6369daf0d6a cxl/test: Remove ret_limit race condition in mock_get_event()
> ---
>
> This was found while chasing a NULL payload_out issue in mock_get_event()
> that Itaru reported here [1] and Fabio and I have both seen but not been
> able to reliably reproduce. Although the accounting can be wrong wrt
> ret_limit, no potential overflow was found.
>
> [1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/
>
>
> tools/testing/cxl/test/mem.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index d533481672b7..6809c4a26f5e 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
> * Vary the number of events returned to simulate events occuring while the
> * logs are being read.
> */
> -static int ret_limit = 0;
> +static atomic_t event_counter = ATOMIC_INIT(0);
>
> static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
> {
> struct cxl_get_event_payload *pl;
> struct mock_event_log *log;
> u16 nr_overflow;
> + int ret_limit;
> u8 log_type;
> int i;
>
> if (cmd->size_in != sizeof(log_type))
> return -EINVAL;
>
> - ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
> - if (!ret_limit)
> - ret_limit = 1;
> + /* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
> + ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
>
> if (cmd->size_out < struct_size(pl, records, ret_limit))
> return -EINVAL;
>
> base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-11-18 23:35 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-16 1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
2025-11-18 2:12 ` Ira Weiny
2025-11-18 23:17 ` Dave Jiang
2025-11-18 23:35 ` Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox