From: Ira Weiny <ira.weiny@intel.com>
To: Anisa Su <anisa.su887@gmail.com>, Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, Fan Ni <fan.ni@samsung.com>,
"Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
"Alison Schofield" <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
<linux-cxl@vger.kernel.org>, <nvdimm@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, Li Ming <ming.li@zohomail.com>
Subject: Re: [PATCH v9 12/19] cxl/extent: Process dynamic partition events and realize region extents
Date: Thu, 5 Mar 2026 16:00:55 -0600 [thread overview]
Message-ID: <69a9fd172cb8d_15c95d100a9@iweiny-mobl.notmuch> (raw)
In-Reply-To: <aZz9qi1b1DsOETa_@4470NRD-ASU.ssi.samsung.com>
Anisa Su wrote:
> On Sun, Apr 13, 2025 at 05:52:20PM -0500, Ira Weiny wrote:i
> A few notes while going through and removing sparse dax semantics and plumbing
> for fs-dax mode:
> > A dynamic capacity device (DCD) sends events to signal the host for
> > changes in the availability of Dynamic Capacity (DC) memory. These
> > events contain extents describing a DPA range and meta data for memory
> > to be added or removed. Events may be sent from the device at any time.
> >
> > Three types of events can be signaled, Add, Release, and Force Release.
> >
> > On add, the host may accept or reject the memory being offered. If no
> > region exists, or the extent is invalid, the extent should be rejected.
> > Add extent events may be grouped by a 'more' bit which indicates those
> > extents should be processed as a group.
> >
> > On remove, the host can delay the response until the host is safely not
> > using the memory. If no region exists the release can be sent
> > immediately. The host may also release extents (or partial extents) at
> > any time.
> Partial release is no longer valid for tagged release iirc from the calls
Tags were not supported in this version:
if (!uuid_is_null((const uuid_t *)extent->uuid)) {
dev_err_ratelimited(dev,
"DC extent DPA %pra (%pU); tags not supported\n",
&ext_range, extent->uuid);
return -ENXIO;
}
>
> > Thus the 'more' bit grouping of release events is of less
> > value and can be ignored in favor of sending multiple release capacity
> > responses for groups of release events.
> >
> [snip]
> > +
> > +static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
> > + struct xarray *extent_array, int cnt)
> > +{
> > + struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
> > + struct cxl_mbox_dc_response *p;
> > + struct cxl_extent *extent;
> > + unsigned long index;
> > + u32 pl_index;
> > +
> > + size_t pl_size = struct_size(p, extent_list, cnt);
> > + u32 max_extents = cnt;
> > +
> > + /* May have to use more bit on response. */
> > + if (pl_size > cxl_mbox->payload_size) {
> > + max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
> > + sizeof(struct updated_extent_list);
> > + pl_size = struct_size(p, extent_list, max_extents);
> > + }
> > +
> > + struct cxl_mbox_dc_response *response __free(kfree) =
> > + kzalloc(pl_size, GFP_KERNEL);
> > + if (!response)
> > + return -ENOMEM;
> > +
> > + if (cnt == 0)
> > + return send_one_response(cxl_mbox, response, opcode, 0, 0);
> > +
> > + pl_index = 0;
> I was wondering why xarray is used here instead of a list? I didn't see anywhere
> that we need to look up a specific index to benefit from the log complexity and
> afaict, simply used to iterate over all elements.
xarray was just easier than a list.
>
> > + xa_for_each(extent_array, index, extent) {
> > + response->extent_list[pl_index].dpa_start = extent->start_dpa;
> > + response->extent_list[pl_index].length = extent->length;
> > + pl_index++;
> > +
> > + if (pl_index == max_extents) {
> > + u8 flags = 0;
> > + int rc;
> > +
> > + if (pl_index < cnt)
> > + flags |= CXL_DCD_EVENT_MORE;
> > + rc = send_one_response(cxl_mbox, response, opcode,
> > + pl_index, flags);
> > + if (rc)
> > + return rc;
> > + cnt -= pl_index;
> > + pl_index = 0;
> > + }
> > + }
> > +
> > + if (!pl_index) /* nothing more to do */
> > + return 0;
> > + return send_one_response(cxl_mbox, response, opcode, pl_index, 0);
> > +}
> > +
[snip]
> > +static int validate_add_extent(struct cxl_memdev_state *mds,
> > + struct cxl_extent *extent)
> > +{
> > + int rc;
> > +
> > + rc = cxl_validate_extent(mds, extent);
> > + if (rc)
> > + return rc;
> > +
> > + return cxl_add_extent(mds, extent);
> > +}
> > +
> > +static int cxl_add_pending(struct cxl_memdev_state *mds)
> > +{
> > + struct device *dev = mds->cxlds.dev;
> > + struct cxl_extent *extent;
> > + unsigned long cnt = 0;
> > + unsigned long index;
> > + int rc;
> > +
> Also according to the spec:
> "In response to an Add Capacity Event Record, or multiple Add Capacity Event
> records grouped via the More flag (see Table 8-229), the host is expected to
> respond with exactly one Add Dynamic Capacity Response acknowledgment,
> corresponding to the order of the Add Capacity Events received. If the order
> does not match, the device shall return Invalid Input. The Add Dynamic Capacity
> Response acknowledgment must be sent in the same order as the Add Capacity
> Event Records."
hmmm... yea that might be wrong, I don't recall.
>
> Using xarray does not preserve the order of the extents, which requires a fifo
> queue.
It could if the index was the order.
But in the end I'm not opposed to using a list.
Ira
[snip]
next prev parent reply other threads:[~2026-03-05 21:57 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-13 22:52 [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2025-04-13 22:52 ` [PATCH v9 01/19] cxl/mbox: Flag " Ira Weiny
2025-04-14 14:19 ` Jonathan Cameron
2025-05-05 21:04 ` Fan Ni
2025-05-06 16:09 ` Ira Weiny
2025-05-06 18:54 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 02/19] cxl/mem: Read dynamic capacity configuration from the device Ira Weiny
2025-04-14 14:35 ` Jonathan Cameron
2025-04-14 15:20 ` Jonathan Cameron
2025-05-07 17:40 ` Fan Ni
2025-05-08 13:35 ` Ira Weiny
2025-04-13 22:52 ` [PATCH v9 03/19] cxl/cdat: Gather DSMAS data for DCD partitions Ira Weiny
2025-04-14 15:29 ` Jonathan Cameron
2025-04-13 22:52 ` [PATCH v9 04/19] cxl/core: Enforce partition order/simplify partition calls Ira Weiny
2025-04-14 15:32 ` Jonathan Cameron
2026-02-02 19:25 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 05/19] cxl/mem: Expose dynamic ram A partition in sysfs Ira Weiny
2025-04-14 15:34 ` Jonathan Cameron
2026-02-02 19:28 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 06/19] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode Ira Weiny
2025-04-14 15:36 ` Jonathan Cameron
2025-05-07 20:50 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 07/19] cxl/region: Add sparse DAX region support Ira Weiny
2025-04-14 15:40 ` Jonathan Cameron
2025-05-08 17:54 ` Fan Ni
2025-05-08 18:17 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 08/19] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2025-04-13 22:52 ` [PATCH v9 09/19] cxl/pci: Factor out interrupt policy check Ira Weiny
2025-04-13 22:52 ` [PATCH v9 10/19] cxl/mem: Configure dynamic capacity interrupts Ira Weiny
2025-04-13 22:52 ` [PATCH v9 11/19] cxl/core: Return endpoint decoder information from region search Ira Weiny
2025-04-13 22:52 ` [PATCH v9 12/19] cxl/extent: Process dynamic partition events and realize region extents Ira Weiny
2025-04-14 16:07 ` Jonathan Cameron
2025-04-14 22:10 ` Alison Schofield
2025-05-12 17:47 ` Fan Ni
2026-02-02 20:00 ` Davidlohr Bueso
2026-02-24 1:24 ` Anisa Su
2026-03-05 22:00 ` Ira Weiny [this message]
2025-04-13 22:52 ` [PATCH v9 13/19] cxl/region/extent: Expose region extent information in sysfs Ira Weiny
2025-04-13 22:52 ` [PATCH v9 14/19] dax/bus: Factor out dev dax resize logic Ira Weiny
2025-04-13 22:52 ` [PATCH v9 15/19] dax/region: Create resources on sparse DAX regions Ira Weiny
2025-04-13 22:52 ` [PATCH v9 16/19] cxl/region: Read existing extents on region creation Ira Weiny
2025-04-14 16:15 ` Jonathan Cameron
2026-02-02 19:42 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 17/19] cxl/mem: Trace Dynamic capacity Event Record Ira Weiny
2025-04-13 22:52 ` [PATCH v9 18/19] tools/testing/cxl: Make event logs dynamic Ira Weiny
2025-04-13 22:52 ` [PATCH v9 19/19] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2025-04-14 16:11 ` [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni
2025-04-15 2:37 ` Ira Weiny
2025-04-15 2:47 ` Fan Ni
2025-04-15 4:28 ` Dan Williams
2025-05-13 18:55 ` Fan Ni
2025-04-14 16:47 ` Jonathan Cameron
2025-04-15 4:50 ` Dan Williams
2025-04-15 10:03 ` Jonathan Cameron
2025-04-15 17:45 ` Dan Williams
2025-06-03 16:32 ` Fan Ni
2025-06-09 17:09 ` Fan Ni
2026-02-02 20:22 ` Gregory Price
2026-02-03 22:04 ` Ira Weiny
2026-02-04 15:12 ` Gregory Price
2026-02-04 17:57 ` Ira Weiny
2026-02-04 18:53 ` Gregory Price
2026-02-05 17:48 ` Jonathan Cameron
2026-02-06 11:01 ` Alireza Sanaee
2026-02-06 13:26 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=69a9fd172cb8d_15c95d100a9@iweiny-mobl.notmuch \
--to=ira.weiny@intel.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=anisa.su887@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.li@zohomail.com \
--cc=nvdimm@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox