All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: <alison.schofield@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Dave Jiang <dave.jiang@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH v2 4/4] cxl/region: Add inject and clear poison by region offset
Date: Thu, 3 Jul 2025 15:31:38 +0100	[thread overview]
Message-ID: <20250703153138.00000a00@huawei.com> (raw)
In-Reply-To: <4f33bcb42217139e0884784e60139e29c647f62b.1751513505.git.alison.schofield@intel.com>

On Wed,  2 Jul 2025 21:03:23 -0700
alison.schofield@intel.com wrote:

> From: Alison Schofield <alison.schofield@intel.com>
> 
> Add CXL region debugfs attributes to inject and clear poison based
> on an offset into the region. These new interfaces allow users to
> operate on poison at the region level without needing to resolve
> Device Physical Addresses (DPA) or target individual memdevs.
> 
> The implementation uses a new helper, region_offset_to_dpa_result()
> that applies decoder interleave logic, including XOR-based address
> decoding when applicable. Note that XOR decodes rely on driver
> internal xormaps which are not exposed to userspace. So, this support
> is not only a simplification of poison operations that could be done
> using existing per memdev operations, but also it enables this
> functionality for XOR interleaved regions for the first time.
> 
> New debugfs attributes are added in /sys/kernel/debug/cxl/regionX/:
> inject_poison and clear_poison. These are only exposed if all memdevs
> participating in the region support both inject and clear commands,
> ensuring consistent and reliable behavior across multi-device regions.
> 
> If tracing is enabled, these operations are logged as cxl_poison
> events in /sys/kernel/tracing/trace.
> 
> The ABI documentation warns users of the significant risks that
> come with using these capabilities.
> 
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> 
> Changes in v2:
> Simplify return using test_bit() in cxl_memdev_has_poison() (Jonathan)
> Use ACQUIRE() in the debugfs inject and clear funcs (DaveJ, Jonathan)
> Fail if offset is in the extended linear cache (Jonathan)
> Added 'cxl' to an existing ABI for memdev clear_poison
> Redefine ABI to take a region offset, not SPA (Dan)
> Remove KernelVersion field in ABI doc (Dan)
> Warn against misuse in ABI doc (Dan)
> Add validate_region_offset() helper
> 
> 
>  Documentation/ABI/testing/debugfs-cxl |  89 ++++++++++++++++-
>  drivers/cxl/core/core.h               |   4 +
>  drivers/cxl/core/memdev.c             |   8 ++
>  drivers/cxl/core/region.c             | 131 +++++++++++++++++++++++++-
>  4 files changed, 228 insertions(+), 4 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/debugfs-cxl b/Documentation/ABI/testing/debugfs-cxl
> index 12488c14be64..2989d4da96c1 100644
> --- a/Documentation/ABI/testing/debugfs-cxl
> +++ b/Documentation/ABI/testing/debugfs-cxl
> @@ -19,8 +19,22 @@ Description:
>  		is returned to the user. The inject_poison attribute is only
>  		visible for devices supporting the capability.
>  
> +		TEST-ONLY INTERFACE: This interface is intended for testing
> +		and validation purposes only. It is not a data repair mechanism
> +		and should never be used on production systems or live data.
>  
> -What:		/sys/kernel/debug/memX/clear_poison
> +		DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
> +		poison injection can result in permanent data loss. Injected
> +		poison may render data permanently inaccessible even after
> +		clearing, as the clear operation writes zeros and does not
> +		recover original data.
> +
> +		SYSTEM STABILITY RISK: For volatile memory, poison injection
> +		can cause kernel crashes, system instability, or unpredictable
> +		behavior if the poisoned addresses are accessed by running code
> +		or critical kernel structures.
> +
> +What:		/sys/kernel/debug/cxl/memX/clear_poison

Moved or previously a bug?  Looks like a bug, so separate patch.
Not sure we will bother asking for a backport, but having it in here made
me stare at that line for an unhealthy game of spot the difference.

>  Date:		April, 2023
>  KernelVersion:	v6.4
>  Contact:	linux-cxl@vger.kernel.org



> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index d965f07ba8a8..cc26623250bf 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2,6 +2,7 @@



> +static int cxl_region_debugfs_poison_clear(void *data, u64 offset)
> +{
> +	struct dpa_result result = { .dpa = ULLONG_MAX, .cxlmd = NULL };
> +	struct cxl_region *cxlr = data;
> +	int rc;
> +
> +	ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> +	if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
> +		return rc;
> +
> +	ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa);
> +	if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem)))
> +		return rc;
> +
> +	if (validate_region_offset(cxlr, offset))
> +		return -EINVAL;
> +
> +	rc = region_offset_to_dpa_result(cxlr, offset, &result);
> +	if (rc || !result.cxlmd || result.dpa == ULLONG_MAX) {
> +		dev_dbg(&cxlr->dev,
> +			"Failed to resolve DPA for region offset %#llx\n",
> +			offset);
> +
> +		return -EINVAL;

If rc was set, preferable to return that. I'd split the condition
into two parts to avoid this.

> +	}
> +


  reply	other threads:[~2025-07-03 14:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-03  4:03 [PATCH v2 0/4] cxl: Support Poison Inject & Clear by Region Offset alison.schofield
2025-07-03  4:03 ` [PATCH v2 1/4] cxl: Define a SPA->CXL HPA root decoder callback for XOR Math alison.schofield
2025-07-03 13:43   ` Jonathan Cameron
2025-07-03  4:03 ` [PATCH v2 2/4] cxl/region: Introduce SPA to DPA address translation alison.schofield
2025-07-03 14:23   ` Jonathan Cameron
2025-07-12  4:17     ` Alison Schofield
2025-07-03  4:03 ` [PATCH v2 3/4] cxl/core: Add locked variants of the poison inject and clear funcs alison.schofield
2025-07-03 14:25   ` Jonathan Cameron
2025-07-12  4:23     ` Alison Schofield
2025-07-03  4:03 ` [PATCH v2 4/4] cxl/region: Add inject and clear poison by region offset alison.schofield
2025-07-03 14:31   ` Jonathan Cameron [this message]
2025-07-12  4:40     ` Alison Schofield
2025-07-15 15:20       ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250703153138.00000a00@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.