Linux CXL
 help / color / mirror / Atom feed
* [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
@ 2025-11-16  1:37 Alison Schofield
  2025-11-18  2:12 ` Ira Weiny
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Alison Schofield @ 2025-11-16  1:37 UTC (permalink / raw)
  To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco

Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
loop iterator in mock_get_event() from a static constant,
CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
intent was to vary the number of events returned per call to simulate
events occurring while logs are being read.

However, ret_limit is modified without synchronization. When multiple
threads call mock_get_event() concurrently, one thread may read
ret_limit, another thread may increment it, and the first thread's
loop condition and size calculation see and use the updated value.

This is visible during cxl_test module load when all memdevs are
initializing simultaneously, which includes getting event records. It
is not tied to the cxl-events.sh unit test specifically, as that
operates on a single memdev.

While no actual harm results (the buffer is always large enough and
the record count fields correctly reflect what was written), this is
a correctness issue. The race creates an inconsistent state within
mock_get_event() and adding variability based on a race appears
unintended.

Make ret_limit a local variable populated from an atomic counter. Each
call gets a stable value that won't change during execution. That
preserves the intended behavior of varying the return counts across
calls while eliminating the race condition.

This implementation uses "+ 1" to produce the full range of 1 to
CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
produced.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---

This was found while chasing a NULL payload_out issue in mock_get_event()
that Itaru reported here [1] and Fabio and I have both seen but not been
able to reliably reproduce. Although the accounting can be wrong wrt
ret_limit, no potential overflow was found.

[1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/


 tools/testing/cxl/test/mem.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index d533481672b7..6809c4a26f5e 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
  * Vary the number of events returned to simulate events occuring while the
  * logs are being read.
  */
-static int ret_limit = 0;
+static atomic_t event_counter = ATOMIC_INIT(0);
 
 static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_get_event_payload *pl;
 	struct mock_event_log *log;
 	u16 nr_overflow;
+	int ret_limit;
 	u8 log_type;
 	int i;
 
 	if (cmd->size_in != sizeof(log_type))
 		return -EINVAL;
 
-	ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
-	if (!ret_limit)
-		ret_limit = 1;
+	/* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
+	ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
 
 	if (cmd->size_out < struct_size(pl, records, ret_limit))
 		return -EINVAL;

base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
  2025-11-16  1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
@ 2025-11-18  2:12 ` Ira Weiny
  2025-11-18 23:17 ` Dave Jiang
  2025-11-18 23:35 ` Dave Jiang
  2 siblings, 0 replies; 4+ messages in thread
From: Ira Weiny @ 2025-11-18  2:12 UTC (permalink / raw)
  To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
	Vishal Verma, Ira Weiny, Dan Williams
  Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco

Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
> 
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
> 
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
> 
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
> 
> Make ret_limit a local variable populated from an atomic counter.

Ah yea...  good catch.

>
> Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
> 
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.

Does cxl_event.sh need the limits to increment?  Would it be better to use
a random number of events?

Regardless.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

[snip]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
  2025-11-16  1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
  2025-11-18  2:12 ` Ira Weiny
@ 2025-11-18 23:17 ` Dave Jiang
  2025-11-18 23:35 ` Dave Jiang
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2025-11-18 23:17 UTC (permalink / raw)
  To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Vishal Verma,
	Ira Weiny, Dan Williams
  Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco



On 11/15/25 6:37 PM, Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
> 
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
> 
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
> 
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
> 
> Make ret_limit a local variable populated from an atomic counter. Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
> 
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

Reviewed-by: Dave Jiang <dave.jiang@intel.com>> ---
> 
> This was found while chasing a NULL payload_out issue in mock_get_event()
> that Itaru reported here [1] and Fabio and I have both seen but not been
> able to reliably reproduce. Although the accounting can be wrong wrt
> ret_limit, no potential overflow was found.
> 
> [1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/
> 
> 
>  tools/testing/cxl/test/mem.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index d533481672b7..6809c4a26f5e 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
>   * Vary the number of events returned to simulate events occuring while the
>   * logs are being read.
>   */
> -static int ret_limit = 0;
> +static atomic_t event_counter = ATOMIC_INIT(0);
>  
>  static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
>  {
>  	struct cxl_get_event_payload *pl;
>  	struct mock_event_log *log;
>  	u16 nr_overflow;
> +	int ret_limit;
>  	u8 log_type;
>  	int i;
>  
>  	if (cmd->size_in != sizeof(log_type))
>  		return -EINVAL;
>  
> -	ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
> -	if (!ret_limit)
> -		ret_limit = 1;
> +	/* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
> +	ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
>  
>  	if (cmd->size_out < struct_size(pl, records, ret_limit))
>  		return -EINVAL;
> 
> base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event()
  2025-11-16  1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
  2025-11-18  2:12 ` Ira Weiny
  2025-11-18 23:17 ` Dave Jiang
@ 2025-11-18 23:35 ` Dave Jiang
  2 siblings, 0 replies; 4+ messages in thread
From: Dave Jiang @ 2025-11-18 23:35 UTC (permalink / raw)
  To: Alison Schofield, Davidlohr Bueso, Jonathan Cameron, Vishal Verma,
	Ira Weiny, Dan Williams
  Cc: linux-cxl, Itaru Kitayama, Fabio M . De Francesco



On 11/15/25 6:37 PM, Alison Schofield wrote:
> Commit 364ee9f3265e ("cxl/test: Enhance event testing") changed the
> loop iterator in mock_get_event() from a static constant,
> CXL_TEST_EVENT_CNT, to a dynamic global variable, ret_limit. The
> intent was to vary the number of events returned per call to simulate
> events occurring while logs are being read.
> 
> However, ret_limit is modified without synchronization. When multiple
> threads call mock_get_event() concurrently, one thread may read
> ret_limit, another thread may increment it, and the first thread's
> loop condition and size calculation see and use the updated value.
> 
> This is visible during cxl_test module load when all memdevs are
> initializing simultaneously, which includes getting event records. It
> is not tied to the cxl-events.sh unit test specifically, as that
> operates on a single memdev.
> 
> While no actual harm results (the buffer is always large enough and
> the record count fields correctly reflect what was written), this is
> a correctness issue. The race creates an inconsistent state within
> mock_get_event() and adding variability based on a race appears
> unintended.
> 
> Make ret_limit a local variable populated from an atomic counter. Each
> call gets a stable value that won't change during execution. That
> preserves the intended behavior of varying the return counts across
> calls while eliminating the race condition.
> 
> This implementation uses "+ 1" to produce the full range of 1 to
> CXL_TEST_EVENT_RET_MAX (4) records. Previously only 1, 2, 3 were
> produced.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>

applied to cxl/next
b6369daf0d6a cxl/test: Remove ret_limit race condition in mock_get_event()

> ---
> 
> This was found while chasing a NULL payload_out issue in mock_get_event()
> that Itaru reported here [1] and Fabio and I have both seen but not been
> able to reliably reproduce. Although the accounting can be wrong wrt
> ret_limit, no potential overflow was found.
> 
> [1] https://lore.kernel.org/linux-cxl/49A4B521-AB66-4037-A23D-1D0D7AF0F42F@linux.dev/
> 
> 
>  tools/testing/cxl/test/mem.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index d533481672b7..6809c4a26f5e 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,22 +250,22 @@ static void mes_add_event(struct mock_event_store *mes,
>   * Vary the number of events returned to simulate events occuring while the
>   * logs are being read.
>   */
> -static int ret_limit = 0;
> +static atomic_t event_counter = ATOMIC_INIT(0);
>  
>  static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
>  {
>  	struct cxl_get_event_payload *pl;
>  	struct mock_event_log *log;
>  	u16 nr_overflow;
> +	int ret_limit;
>  	u8 log_type;
>  	int i;
>  
>  	if (cmd->size_in != sizeof(log_type))
>  		return -EINVAL;
>  
> -	ret_limit = (ret_limit + 1) % CXL_TEST_EVENT_RET_MAX;
> -	if (!ret_limit)
> -		ret_limit = 1;
> +	/* Vary return limit from 1 to CXL_TEST_EVENT_RET_MAX */
> +	ret_limit = (atomic_inc_return(&event_counter) % CXL_TEST_EVENT_RET_MAX) + 1;
>  
>  	if (cmd->size_out < struct_size(pl, records, ret_limit))
>  		return -EINVAL;
> 
> base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-11-18 23:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-16  1:37 [PATCH] cxl/test: Remove ret_limit race condition in mock_get_event() Alison Schofield
2025-11-18  2:12 ` Ira Weiny
2025-11-18 23:17 ` Dave Jiang
2025-11-18 23:35 ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox