Re: [PATCH v11 27/31] cxl/region: Read existing extents on region creation

Linux CXL
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Anisa Su" <anisa.su887@gmail.com>
Cc: nvdimm@lists.linux.dev, linux-cxl@vger.kernel.org
Subject: Re: [PATCH v11 27/31] cxl/region: Read existing extents on region creation
Date: Thu, 25 Jun 2026 18:32:02 +0000	[thread overview]
Message-ID: <20260625183203.3042C1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260625112638.550691-28-anisa.su@samsung.com>

Thank you for your contribution! Sashiko AI review found 8 potential issue(s) to consider:

New issues:
- [High] Cross-region lock assertion failure when processing existing extents.
- [High] Out-of-bounds read due to unvalidated `nr_returned` payload size.
- [High] Runtime DC add events are permanently dropped and lost during region probe, contradicting the design intent.
- [High] Permanent device capacity leak when existing extents are dropped.
- [High] AB-BA deadlock between `mds->add_ctx.lock` and `cxlr_dax->dev` lock.
- [High] Shared `pending_extents` list corruption aborts in-flight runtime events.
- [Medium] Mixing `goto`-based error handling with `__free()` and `guard()` violates subsystem cleanup guidelines.

Pre-existing issues:
- [High] Integer overflow in extent boundary calculation bypasses range validation.
--

commit ae344fab96ea05c143bd5e8ab47ce8059fc778b6
Author: Ira Weiny <iweiny@kernel.org>
Subject: cxl/region: Read existing extents on region creation

Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash. This patch reads the 'devices
extent list' during the asynchronous cxl_dax_region probe to reveal
previously accepted extents and add them to the new region, avoiding
races with probe.

> diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
> --- a/drivers/cxl/core/extent.c
> +++ b/drivers/cxl/core/extent.c
> @@ -261,7 +261,7 @@ static int cxl_validate_extent(struct cxl_memdev_state *mds,
>  			       struct cxl_extent *extent,
>  			       struct cxl_endpoint_decoder **out_cxled,
>  			       struct cxl_dax_region **out_cxlr_dax,
> -			       struct range *out_ext_range)
> +			       struct range *out_ext_range, bool existing)
>  {
>  	u64 start_dpa = le64_to_cpu(extent->start_dpa);

[Severity: High]
This is a pre-existing issue, but could the extent boundary calculation
in cxl_validate_extent() overflow?

A few lines down from this signature change, the end boundary is calculated:

    struct range ext_range = (struct range) {
        .start = start_dpa,
        .end = start_dpa + le64_to_cpu(extent->length) - 1,
    };

If a device provides a massive start_dpa and a small length, could this
calculation wrap around and produce a small ext_range.end?

If so, the bounds check later in the function:

    if (!range_contains(&ed_range, &ext_range)) {

would evaluate r1->start <= r2->start && r1->end >= r2->end. With a wrapped
near-zero ext_range.end, the check might incorrectly pass, allowing
an out-of-bounds extent.

> [ ... ]
> @@ -300,6 +300,13 @@ static int cxl_validate_extent(struct cxl_memdev_state *mds,
>  	if (!cxlr || !cxlr->cxlr_dax)
>  		return -ENXIO;
>  
> +	/*
> +	 * Pre-existing extents must be read before any new extent is added so a
> +	 * tag already in use is never added twice; defer new adds until then.
> +	 */
> +	if (!existing && !smp_load_acquire(&cxlr->cxlr_dax->extents_scanned))
> +		return -EBUSY;
> +

[Severity: High]
Does returning -EBUSY here cause runtime Add Capacity events to be permanently
lost?

If cxl_validate_extent() returns -EBUSY for a new runtime event, its caller
cxl_realize_group() treats it as a failure, logging an error and deleting the
extent node:

    if (cxl_add_extent(mds, pos->extent, seq_num, existing) < 0) {
        dev_dbg(dev, "Tag %pUb: failed to add extent ...");
        delete_extent_node(pos);
        continue;
    }

Since the extent is destroyed rather than deferred, it seems it will be omitted
from the ADD_DC_RESPONSE, meaning the host officially rejects it. Does this
prevent the runtime capacity from being added?

> [ ... ]
> @@ -376,16 +383,22 @@ dc_extent_build(struct cxl_endpoint_decoder *cxled,
>  	return dc_extent;
>  }
>  
> -int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
> -		       struct cxl_dc_tag_group *group)
> +/*
> + * Core notify: the caller must hold device_lock(&cxlr->cxlr_dax->dev).  Used by
> + * the existing-extent path that runs inside cxl_dax_region_probe(), where the
> + * async device-attach already holds the dax_region's device_lock — taking it
> + * again (as cxlr_notify_extent() does) would deadlock the probe against itself.
> + */
> +int __cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
> +			 struct cxl_dc_tag_group *group)
>  {
>  	struct device *dev = &cxlr->cxlr_dax->dev;
>  	struct cxl_notify_data notify_data;
>  	struct cxl_driver *driver;
>  
> -	dev_dbg(dev, "Trying notify: type %d tag %pUb\n", event, &group->uuid);
> +	device_lock_assert(dev);

[Severity: High]
Could this lock assertion fail when processing extents from other regions?

When __cxl_process_extent_list() queries the device for existing extents,
the device returns all extents for all partitions, which may belong to
different regions.

For an extent belonging to a different region, cxl_realize_group() will call
__cxlr_notify_extent() targeting that other region. However, the probe thread
only holds the device lock for the region currently being probed. Will this
cause the assertion to fail for extents on other regions?

> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1503,7 +1503,7 @@ static int cxl_realize_group(struct cxl_memdev_state *mds, const uuid_t *tag,
>  		else
>  			seq_num++;
>  
> -		if (cxl_add_extent(mds, pos->extent, seq_num) < 0) {
> +		if (cxl_add_extent(mds, pos->extent, seq_num, existing) < 0) {

[Severity: High]
If cxl_add_extent() fails for an existing extent, could this cause a permanent
capacity leak?

If existing=true, but cxl_validate_extent() or cxl_add_extent() fails
(e.g., due to an overlap or memory allocation failure), the host drops the
extent by calling delete_extent_node(pos) inside cxl_realize_group().

Because existing=true, cxl_add_pending() skips sending an ADD_DC_RESPONSE.
Since the device considers the extent accepted and the host never sends a
Release-DC command for the dropped extent, does the capacity remain leaked
until reboot?

> [ ... ]
> @@ -2199,6 +2209,145 @@ int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_dev_dc_identify, "CXL");
>  
> +/* Return -EAGAIN if the extent list changes while reading */
> +static int __cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
> +{
> +	u32 current_index, total_read, total_expected, initial_gen_num;
> +	struct cxl_memdev_state *mds = cxled_to_mds(cxled);
> +	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> +	struct device *dev = mds->cxlds.dev;
> +	struct cxl_mbox_cmd mbox_cmd;
> +	u32 max_extent_count;
> +	int rc = 0;
> +	bool first = true;
> +
> +	struct cxl_mbox_get_extent_out *extents __free(kvfree) =
> +				kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
> +	if (!extents)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Build and consume add_ctx.pending_extents under add_ctx.lock, the
> +	 * same lock the DC event path (handle_add_event()) holds, so the two
> +	 * cannot corrupt the shared pending list.
> +	 */
> +	guard(mutex)(&mds->add_ctx.lock);

[Severity: High]
Could this lock acquisition lead to an AB-BA deadlock between the
mds->add_ctx.lock and the region device lock?

In __cxl_process_extent_list(), the driver core is probing the region,
so the region's device lock is already held. Here we acquire mds->add_ctx.lock,
establishing a (device_lock -> add_ctx.lock) ordering.

However, in the asynchronous event path (e.g., handle_add_event()), the driver
holds mds->add_ctx.lock, and then cxl_realize_group() calls cxlr_notify_extent()
which acquires the device lock:

    guard(device)(&cxlr->cxlr_dax->dev);

This establishes an (add_ctx.lock -> device_lock) ordering. If a DC event
arrives while a region is being probed, could these paths deadlock?

[Severity: High]
Could appending to mds->add_ctx.pending_extents here disrupt in-flight
multi-part runtime Add Capacity events?

Runtime DC Add Capacity events with the 'More' flag set accumulate partial
extents in mds->add_ctx.pending_extents across multiple interrupts, with the
lock released in between.

If __cxl_process_extent_list() is called concurrently during a region probe,
it unconditionally appends to this list. Later, it clears the list:

    out:
        clear_pending_extents(mds);

If there was an in-flight runtime event, would its accumulated extents be
processed as existing=true (meaning no response sent) and then destroyed,
leaving the device stalled waiting for a response?

> +
> +	total_read = 0;
> +	current_index = 0;
> +	total_expected = 0;
> +	max_extent_count = (cxl_mbox->payload_size - sizeof(*extents)) /
> +			    sizeof(struct cxl_extent);
> +	do {
> +		u32 nr_returned, current_total, current_gen_num;
> +		struct cxl_mbox_get_extent_in get_extent;
> +
> +		get_extent = (struct cxl_mbox_get_extent_in) {
> +			.extent_cnt = cpu_to_le32(max_extent_count),
> +			.start_extent_index = cpu_to_le32(current_index),
> +		};
> +
> +		mbox_cmd = (struct cxl_mbox_cmd) {
> +			.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
> +			.payload_in = &get_extent,
> +			.size_in = sizeof(get_extent),
> +			.size_out = cxl_mbox->payload_size,
> +			.payload_out = extents,
> +			.min_out = 1,
> +		};
> +
> +		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
> +		if (rc < 0)
> +			goto out;
> +
> +		/* Save initial data */
> +		if (first) {
> +			total_expected = le32_to_cpu(extents->total_extent_count);
> +			initial_gen_num = le32_to_cpu(extents->generation_num);
> +			first = false;
> +		}
> +
> +		nr_returned = le32_to_cpu(extents->returned_extent_count);

[Severity: High]
Could a malformed device response cause an out-of-bounds read here?

Because min_out is hardcoded to 1, truncated mailbox responses are accepted.
The code then extracts nr_returned:

    nr_returned = le32_to_cpu(extents->returned_extent_count);

and iterates over it:

    for (int i = 0; i < nr_returned ; i++) {
        struct cxl_extent *extent = &extents->extent[i];

Since nr_returned is not validated against max_extent_count or the actual
payload size returned by the hardware, could a large nr_returned value cause
the loop to read past the end of the kvmalloc buffer into unrelated kernel
heap memory?

> [ ... ]
> +	} while (total_expected > total_read);
> +
> +	if (!list_empty(&mds->add_ctx.pending_extents)) {
> +		/*
> +		 * Reached only on the success path (every error does goto out),
> +		 * so rc is 0 here.  These extents are already accepted on the
> +		 * device (recovered from a prior boot).  Pass existing=true so
> +		 * they are not re-reported in an Add-DC-Response (the device
> +		 * would reject a DPA already added by a prior response), and so
> +		 * a failed online releases them rather than silently dropping
> +		 * them.
> +		 */
> +		rc = cxl_add_pending(mds, true);
> +	}
> +out:
> +	clear_pending_extents(mds);

[Severity: Medium]
Does mixing goto statements with scope-based cleanup violate the cleanup
subsystem guidelines?

This function declares resources using scope-based cleanup:

    struct cxl_mbox_get_extent_out *extents __free(kvfree) = ...
    guard(mutex)(&mds->add_ctx.lock);

But it also uses explicit goto out statements that jump to a manual cleanup
label:

    out:
        clear_pending_extents(mds);

The cleanup subsystem documentation expects that usage of "goto" and cleanup
helpers is never mixed in the same function to avoid confusing ownership
semantics. Should this be refactored?

> +
> +	return rc;
> +}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260625112638.550691-1-anisa.su@samsung.com?part=27

next prev parent reply	other threads:[~2026-06-25 18:32 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 11:04 DCD: Add support for Dynamic Capacity Devices (DCD) Anisa Su
2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
2026-06-26 21:43   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 02/31] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-26 22:26   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 04/31] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-06-26 22:37   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 05/31] cxl/mem: Expose dynamic ram 1 partition in sysfs Anisa Su
2026-06-25 18:12   ` sashiko-bot
2026-06-26 23:08   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 06/31] cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode Anisa Su
2026-06-25 11:04 ` [PATCH v11 07/31] cxl/region: Add DC DAX region support Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-26 23:18   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 08/31] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-06-25 11:04 ` [PATCH v11 09/31] cxl/pci: Factor out interrupt policy check Anisa Su
2026-06-25 11:04 ` [PATCH v11 10/31] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-06-25 18:14   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 11/31] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-06-25 11:04 ` [PATCH v11 12/31] cxl/mem: Set up framework for handling DC Events Anisa Su
2026-06-25 18:12   ` sashiko-bot
2026-06-26 21:54   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains Anisa Su
2026-06-25 18:15   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 14/31] cxl/extent: Handle DC Add Capacity events Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 15/31] cxl/mem: Drop misaligned DCD extent groups Anisa Su
2026-06-25 18:19   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 16/31] cxl/extent: Validate DC extent partition Anisa Su
2026-06-25 18:20   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 17/31] cxl/mem: Enforce tag-group semantics Anisa Su
2026-06-25 18:24   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 18/31] cxl/extent: Handle DC Release Capacity events Anisa Su
2026-06-25 18:23   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 19/31] cxl/extent: Enforce cross-region tag uniqueness Anisa Su
2026-06-25 18:23   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 20/31] cxl/region/extent: Expose dc_extent information in sysfs Anisa Su
2026-06-25 18:33   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 21/31] cxl + dax: Surface dax_resources on DCD Add Capacity events Anisa Su
2026-06-25 18:29   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 22/31] cxl + dax: Release dax_resources on DCD Release " Anisa Su
2026-06-25 18:36   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 23/31] dax/bus: Factor out dev dax resize logic Anisa Su
2026-06-25 18:27   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 24/31] dax/bus: Add uuid sysfs attribute to dax devices Anisa Su
2026-06-25 11:05 ` [PATCH v11 25/31] dax/bus: Reject resize on DC dax devices and enforce 0-size creation Anisa Su
2026-06-25 11:05 ` [PATCH v11 26/31] dax/bus: Tag-aware uuid claim and show on DC dax devices Anisa Su
2026-06-25 18:26   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 27/31] cxl/region: Read existing extents on region creation Anisa Su
2026-06-25 18:32   ` sashiko-bot [this message]
2026-06-25 11:05 ` [PATCH v11 28/31] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-06-25 18:29   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 29/31] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-06-25 18:31   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 30/31] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-06-25 18:34   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 31/31] Documentation/cxl: Document DCD extent handling and DC-backed DAX regions Anisa Su
2026-06-25 18:24   ` sashiko-bot
2026-06-25 18:00 ` [PATCH v11 03/31] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-06-26 22:30   ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260625183203.3042C1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=anisa.su887@gmail.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox