* [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
2024-07-04 0:38 [PATCH 0/2] Return EBUSY on inject poison limit reached alison.schofield
@ 2024-07-04 0:38 ` alison.schofield
2024-07-05 6:59 ` Xingtao Yao (Fujitsu)
2024-07-04 0:38 ` [PATCH 2/2] cxl/test: " alison.schofield
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: alison.schofield @ 2024-07-04 0:38 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
The CXL driver provides a debugfs interface offering users the
ability to inject and clear poison to a memdev. Once a user has
injected up to the devices limit further injection requests fail
with ENXIO until a clear poison is issued.
Users may not have device specs in hand or may want to intentionally
hit the limit and then clear. Replace the usual ENXIO return status
with EBUSY so users can recognize this failure.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
Documentation/ABI/testing/debugfs-cxl | 7 ++++---
drivers/cxl/cxlmem.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
index c61f9b813973..12488c14be64 100644
--- a/Documentation/ABI/testing/debugfs-cxl
+++ b/Documentation/ABI/testing/debugfs-cxl
@@ -14,9 +14,10 @@ Description:
event to its internal Informational Event log, updates the
Event Status register, and if configured, interrupts the host.
It is not an error to inject poison into an address that
- already has poison present and no error is returned. The
- inject_poison attribute is only visible for devices supporting
- the capability.
+ already has poison present and no error is returned. If the
+ device returns 'Inject Poison Limit Reached' an -EBUSY error
+ is returned to the user. The inject_poison attribute is only
+ visible for devices supporting the capability.
What: /sys/kernel/debug/memX/clear_poison
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 19aba81cdf13..942063c09459 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -161,7 +161,7 @@ struct cxl_mbox_cmd {
C(FWRESET, -ENXIO, "FW failed to activate, needs cold reset"), \
C(HANDLE, -ENXIO, "one or more Event Record Handles were invalid"), \
C(PADDR, -EFAULT, "physical address specified is invalid"), \
- C(POISONLMT, -ENXIO, "poison injection limit has been reached"), \
+ C(POISONLMT, -EBUSY, "poison injection limit has been reached"), \
C(MEDIAFAILURE, -ENXIO, "permanent issue with the media"), \
C(ABORT, -ENXIO, "background cmd was aborted by device"), \
C(SECURITY, -ENXIO, "not valid in the current security state"), \
--
2.37.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* RE: [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
2024-07-04 0:38 ` [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for " alison.schofield
@ 2024-07-05 6:59 ` Xingtao Yao (Fujitsu)
0 siblings, 0 replies; 8+ messages in thread
From: Xingtao Yao (Fujitsu) @ 2024-07-05 6:59 UTC (permalink / raw)
To: alison.schofield@intel.com, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl@vger.kernel.org
Tested-by: Xingtao Yao <yaoxt.fnst@fujitsu.com>
> -----Original Message-----
> From: alison.schofield@intel.com <alison.schofield@intel.com>
> Sent: Thursday, July 4, 2024 8:38 AM
> To: Davidlohr Bueso <dave@stgolabs.net>; Jonathan Cameron
> <Jonathan.Cameron@huawei.com>; Dave Jiang <dave.jiang@intel.com>; Alison
> Schofield <alison.schofield@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Ira Weiny <ira.weiny@intel.com>; Dan Williams
> <dan.j.williams@intel.com>
> Cc: linux-cxl@vger.kernel.org
> Subject: [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for inject poison
> limit reached
>
> From: Alison Schofield <alison.schofield@intel.com>
>
> The CXL driver provides a debugfs interface offering users the
> ability to inject and clear poison to a memdev. Once a user has
> injected up to the devices limit further injection requests fail
> with ENXIO until a clear poison is issued.
>
> Users may not have device specs in hand or may want to intentionally
> hit the limit and then clear. Replace the usual ENXIO return status
> with EBUSY so users can recognize this failure.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> Documentation/ABI/testing/debugfs-cxl | 7 ++++---
> drivers/cxl/cxlmem.h | 2 +-
> 2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/ABI/testing/debugfs-cxl
> b/Documentation/ABI/testing/debugfs-cxl
> index c61f9b813973..12488c14be64 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -14,9 +14,10 @@ Description:
> event to its internal Informational Event log, updates the
> Event Status register, and if configured, interrupts the host.
> It is not an error to inject poison into an address that
> - already has poison present and no error is returned. The
> - inject_poison attribute is only visible for devices supporting
> - the capability.
> + already has poison present and no error is returned. If the
> + device returns 'Inject Poison Limit Reached' an -EBUSY error
> + is returned to the user. The inject_poison attribute is only
> + visible for devices supporting the capability.
>
>
> What: /sys/kernel/debug/memX/clear_poison
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 19aba81cdf13..942063c09459 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -161,7 +161,7 @@ struct cxl_mbox_cmd {
> C(FWRESET, -ENXIO, "FW failed to activate, needs cold reset"),
> \
> C(HANDLE, -ENXIO, "one or more Event Record Handles were invalid"),
> \
> C(PADDR, -EFAULT, "physical address specified is invalid"),
> \
> - C(POISONLMT, -ENXIO, "poison injection limit has been reached"),
> \
> + C(POISONLMT, -EBUSY, "poison injection limit has been reached"),
> \
> C(MEDIAFAILURE, -ENXIO, "permanent issue with the media"),
> \
> C(ABORT, -ENXIO, "background cmd was aborted by device"),
> \
> C(SECURITY, -ENXIO, "not valid in the current security state"), \
> --
> 2.37.3
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
2024-07-04 0:38 [PATCH 0/2] Return EBUSY on inject poison limit reached alison.schofield
2024-07-04 0:38 ` [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for " alison.schofield
@ 2024-07-04 0:38 ` alison.schofield
2024-07-05 6:15 ` Xingtao Yao (Fujitsu)
2024-07-04 1:00 ` [PATCH 0/2] Return EBUSY on " Dan Williams
2024-07-06 2:05 ` Davidlohr Bueso
3 siblings, 1 reply; 8+ messages in thread
From: alison.schofield @ 2024-07-04 0:38 UTC (permalink / raw)
To: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Alison Schofield,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
From: Alison Schofield <alison.schofield@intel.com>
The CXL driver was recently updated to return EBUSY rather than
ENXIO when the device reports that an injection request exceeds
the device's limit. That change to EBUSY allows debug users to
differentiate between limit reached and inject failures for any
other reason.
Do the same here in cxl-test.
Reminder: the cxl-test per device injection limit is a configurable
attribute: /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
tools/testing/cxl/test/mem.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index eaf091a3d331..5e0c84d4d9f8 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1131,27 +1131,28 @@ static bool mock_poison_dev_max_injected(struct cxl_dev_state *cxlds)
return (count >= poison_inject_dev_max);
}
-static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
+static int mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
{
+ /* Return EBUSY to match the CXL driver handling */
if (mock_poison_dev_max_injected(cxlds)) {
dev_dbg(cxlds->dev,
"Device poison injection limit has been reached: %d\n",
MOCK_INJECT_DEV_MAX);
- return false;
+ return -EBUSY;
}
for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
if (!mock_poison_list[i].cxlds) {
mock_poison_list[i].cxlds = cxlds;
mock_poison_list[i].dpa = dpa;
- return true;
+ return 0;
}
}
dev_dbg(cxlds->dev,
"Mock test poison injection limit has been reached: %d\n",
MOCK_INJECT_TEST_MAX);
- return false;
+ return -ENXIO;
}
static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
@@ -1175,10 +1176,8 @@ static int mock_inject_poison(struct cxl_dev_state *cxlds,
dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
return 0;
}
- if (!mock_poison_add(cxlds, dpa))
- return -ENXIO;
- return 0;
+ return mock_poison_add(cxlds, dpa);
}
static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
--
2.37.3
^ permalink raw reply related [flat|nested] 8+ messages in thread* RE: [PATCH 2/2] cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
2024-07-04 0:38 ` [PATCH 2/2] cxl/test: " alison.schofield
@ 2024-07-05 6:15 ` Xingtao Yao (Fujitsu)
2024-07-07 1:48 ` Alison Schofield
0 siblings, 1 reply; 8+ messages in thread
From: Xingtao Yao (Fujitsu) @ 2024-07-05 6:15 UTC (permalink / raw)
To: alison.schofield@intel.com, Davidlohr Bueso, Jonathan Cameron,
Dave Jiang, Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl@vger.kernel.org
> -----Original Message-----
> From: alison.schofield@intel.com <alison.schofield@intel.com>
> Sent: Thursday, July 4, 2024 8:38 AM
> To: Davidlohr Bueso <dave@stgolabs.net>; Jonathan Cameron
> <Jonathan.Cameron@huawei.com>; Dave Jiang <dave.jiang@intel.com>; Alison
> Schofield <alison.schofield@intel.com>; Vishal Verma
> <vishal.l.verma@intel.com>; Ira Weiny <ira.weiny@intel.com>; Dan Williams
> <dan.j.williams@intel.com>
> Cc: linux-cxl@vger.kernel.org
> Subject: [PATCH 2/2] cxl/test: Replace ENXIO with EBUSY for inject poison limit
> reached
>
> From: Alison Schofield <alison.schofield@intel.com>
>
> The CXL driver was recently updated to return EBUSY rather than
> ENXIO when the device reports that an injection request exceeds
> the device's limit. That change to EBUSY allows debug users to
> differentiate between limit reached and inject failures for any
> other reason.
>
> Do the same here in cxl-test.
>
> Reminder: the cxl-test per device injection limit is a configurable
> attribute: /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> tools/testing/cxl/test/mem.c | 13 ++++++-------
> 1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index eaf091a3d331..5e0c84d4d9f8 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1131,27 +1131,28 @@ static bool mock_poison_dev_max_injected(struct
> cxl_dev_state *cxlds)
> return (count >= poison_inject_dev_max);
> }
>
> -static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> +static int mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> {
> + /* Return EBUSY to match the CXL driver handling */
> if (mock_poison_dev_max_injected(cxlds)) {
> dev_dbg(cxlds->dev,
> "Device poison injection limit has been reached: %d\n",
> MOCK_INJECT_DEV_MAX);
There is a tiny issue here, we'd better replace MOCK_INJECT_DEV_MAX with
poison_inject_dev_max, as this value can be configured through
/sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
like:
# echo 128 > /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
after injecting 129 poisons, the “Device or resource busy” occurred, but the debug message
still output:
[ 7664.280587] cxl_mock_mem cxl_mem.0: Device poison injection limit has been reached: 8
[ 7664.280591] cxl_mock_mem cxl_mem.0: opcode: 0x4301 sz_in: 8 sz_out: 0 rc: -16
Tested-by: Xingtao Yao <yaoxt.fnst@fujitsu.com>
> - return false;
> + return -EBUSY;
> }
>
> for (int i = 0; i < MOCK_INJECT_TEST_MAX; i++) {
> if (!mock_poison_list[i].cxlds) {
> mock_poison_list[i].cxlds = cxlds;
> mock_poison_list[i].dpa = dpa;
> - return true;
> + return 0;
> }
> }
> dev_dbg(cxlds->dev,
> "Mock test poison injection limit has been reached: %d\n",
> MOCK_INJECT_TEST_MAX);
>
> - return false;
> + return -ENXIO;
> }
>
> static bool mock_poison_found(struct cxl_dev_state *cxlds, u64 dpa)
> @@ -1175,10 +1176,8 @@ static int mock_inject_poison(struct cxl_dev_state
> *cxlds,
> dev_dbg(cxlds->dev, "DPA: 0x%llx already poisoned\n", dpa);
> return 0;
> }
> - if (!mock_poison_add(cxlds, dpa))
> - return -ENXIO;
>
> - return 0;
> + return mock_poison_add(cxlds, dpa);
> }
>
> static bool mock_poison_del(struct cxl_dev_state *cxlds, u64 dpa)
> --
> 2.37.3
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH 2/2] cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
2024-07-05 6:15 ` Xingtao Yao (Fujitsu)
@ 2024-07-07 1:48 ` Alison Schofield
0 siblings, 0 replies; 8+ messages in thread
From: Alison Schofield @ 2024-07-07 1:48 UTC (permalink / raw)
To: Xingtao Yao (Fujitsu)
Cc: Davidlohr Bueso, Jonathan Cameron, Dave Jiang, Vishal Verma,
Ira Weiny, Dan Williams, linux-cxl@vger.kernel.org
On Fri, Jul 05, 2024 at 06:15:24AM +0000, Xingtao Yao (Fujitsu) wrote:
> > -----Original Message-----
> > From: alison.schofield@intel.com <alison.schofield@intel.com>
snip
> >
> > -static bool mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > +static int mock_poison_add(struct cxl_dev_state *cxlds, u64 dpa)
> > {
> > + /* Return EBUSY to match the CXL driver handling */
> > if (mock_poison_dev_max_injected(cxlds)) {
> > dev_dbg(cxlds->dev,
> > "Device poison injection limit has been reached: %d\n",
> > MOCK_INJECT_DEV_MAX);
> There is a tiny issue here, we'd better replace MOCK_INJECT_DEV_MAX with
> poison_inject_dev_max, as this value can be configured through
> /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
>
> like:
> # echo 128 > /sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
>
> after injecting 129 poisons, the “Device or resource busy” occurred, but the debug message
> still output:
> [ 7664.280587] cxl_mock_mem cxl_mem.0: Device poison injection limit has been reached: 8
> [ 7664.280591] cxl_mock_mem cxl_mem.0: opcode: 0x4301 sz_in: 8 sz_out: 0 rc: -16
>
>
> Tested-by: Xingtao Yao <yaoxt.fnst@fujitsu.com>
>
Thanks for testing and thanks for finding this issue.
I've folded this fixup into v2 of this patch. Since it's cxl-test, and
it's a dev_dbg() message, and it's directly adjacent to what this patch
touches, I expect that will be OK. (as opposed to a seperate fixup)
--Alison
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] Return EBUSY on inject poison limit reached
2024-07-04 0:38 [PATCH 0/2] Return EBUSY on inject poison limit reached alison.schofield
2024-07-04 0:38 ` [PATCH 1/2] cxl/memdev: Replace ENXIO with EBUSY for " alison.schofield
2024-07-04 0:38 ` [PATCH 2/2] cxl/test: " alison.schofield
@ 2024-07-04 1:00 ` Dan Williams
2024-07-06 2:05 ` Davidlohr Bueso
3 siblings, 0 replies; 8+ messages in thread
From: Dan Williams @ 2024-07-04 1:00 UTC (permalink / raw)
To: alison.schofield, Davidlohr Bueso, Jonathan Cameron, Dave Jiang,
Vishal Verma, Ira Weiny, Dan Williams
Cc: linux-cxl
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> Validation users are asking for the ability to recognize when
> their injection testing has hit the limit of the device.
>
> Change the drivers error code for this failure to EBUSY and do
> the same in the cxl-test mock of inject poison.
>
>
> Alison Schofield (2):
> cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
> cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
>
> Documentation/ABI/testing/debugfs-cxl | 7 ++++---
> drivers/cxl/cxlmem.h | 2 +-
> tools/testing/cxl/test/mem.c | 13 ++++++-------
> 3 files changed, 11 insertions(+), 11 deletions(-)
Looks good to me:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/2] Return EBUSY on inject poison limit reached
2024-07-04 0:38 [PATCH 0/2] Return EBUSY on inject poison limit reached alison.schofield
` (2 preceding siblings ...)
2024-07-04 1:00 ` [PATCH 0/2] Return EBUSY on " Dan Williams
@ 2024-07-06 2:05 ` Davidlohr Bueso
3 siblings, 0 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2024-07-06 2:05 UTC (permalink / raw)
To: alison.schofield
Cc: Jonathan Cameron, Dave Jiang, Vishal Verma, Ira Weiny,
Dan Williams, linux-cxl
On Wed, 03 Jul 2024, alison.schofield@intel.com wrote:
>From: Alison Schofield <alison.schofield@intel.com>
>
>Validation users are asking for the ability to recognize when
>their injection testing has hit the limit of the device.
>
>Change the drivers error code for this failure to EBUSY and do
>the same in the cxl-test mock of inject poison.
>
>
>Alison Schofield (2):
> cxl/memdev: Replace ENXIO with EBUSY for inject poison limit reached
> cxl/test: Replace ENXIO with EBUSY for inject poison limit reached
>
> Documentation/ABI/testing/debugfs-cxl | 7 ++++---
> drivers/cxl/cxlmem.h | 2 +-
> tools/testing/cxl/test/mem.c | 13 ++++++-------
> 3 files changed, 11 insertions(+), 11 deletions(-)
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
^ permalink raw reply [flat|nested] 8+ messages in thread