From: Fan Ni <nifan.cxl@gmail.com>
To: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Dan Williams <dan.j.williams@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
linux-cxl@vger.kernel.org, nvdimm@lists.linux.dev,
linux-kernel@vger.kernel.org, Li Ming <ming.li@zohomail.com>
Subject: Re: [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD)
Date: Mon, 14 Apr 2025 09:11:20 -0700 [thread overview]
Message-ID: <Z_0v-iFQpWlgG7oT@debian> (raw)
In-Reply-To: <20250413-dcd-type2-upstream-v9-0-1d4911a0b365@intel.com>
On Sun, Apr 13, 2025 at 05:52:08PM -0500, Ira Weiny wrote:
> A git tree of this series can be found here:
>
> https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-04-13
>
> This is now based on 6.15-rc2.
>
> Due to the stagnation of solid requirements for users of DCD I do not
> plan to rev this work in Q2 of 2025 and possibly beyond.
>
> It is anticipated that this will support at least the initial
> implementation of DCD devices, if and when they appear in the ecosystem.
> The patch set should be reviewed with the limited set of functionality in
> mind. Additional functionality can be added as devices support them.
>
> It is strongly encouraged for individuals or companies wishing to bring
> DCD devices to market review this set with the customer use cases they
> have in mind.
Hi Ira,
thanks for sending it out.
I have not got a chance to check the code or test it extensively.
I tried to test one specific case and hit issue.
I tried to add some DC extents to the extent list on the device when the
VM is launched by hacking qemu like below,
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 87fa308495..4049fc8dd9 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -826,6 +826,11 @@ static bool cxl_create_dc_regions(CXLType3Dev *ct3d, Error **errp)
QTAILQ_INIT(&ct3d->dc.extents);
QTAILQ_INIT(&ct3d->dc.extents_pending);
+ cxl_insert_extent_to_extent_list(&ct3d->dc.extents, 0,
+ CXL_CAPACITY_MULTIPLIER, NULL, 0);
+ ct3d->dc.total_extent_count = 1;
+ ct3_set_region_block_backed(ct3d, 0, CXL_CAPACITY_MULTIPLIER);
+
return true;
}
Then after the VM is launched, I tried to create a DC region with
commmand: cxl create-region -m mem0 -d decoder0.0 -s 1G -t
dynamic_ram_a.
It works fine. As you can see below, the region is created and the
extent is showing correctly.
root@debian:~# cxl list -r region0 -N
[
{
"region":"region0",
"resource":79725330432,
"size":1073741824,
"interleave_ways":1,
"interleave_granularity":256,
"decode_state":"commit",
"extents":[
{
"offset":0,
"length":268435456,
"uuid":"00000000-0000-0000-0000-000000000000"
}
]
}
]
However, after that, I tried to create a dax device as below, it failed.
root@debian:~# daxctl create-device -r region0 -v
libdaxctl: __dax_regions_init: no dax regions found via: /sys/class/dax
error creating devices: No such device or address
created 0 devices
root@debian:~#
root@debian:~# ls /sys/class/dax
ls: cannot access '/sys/class/dax': No such file or directory
The dmesg shows the really_probe function returns early as resource
presents before probe as below,
[ 1745.505068] cxl_core:devm_cxl_add_dax_region:3251: cxl_region region0: region0: register dax_region0
[ 1745.506063] cxl_pci:__cxl_pci_mbox_send_cmd:263: cxl_pci 0000:0d:00.0: Sending command: 0x4801
[ 1745.506953] cxl_pci:cxl_pci_mbox_wait_for_doorbell:74: cxl_pci 0000:0d:00.0: Doorbell wait took 0ms
[ 1745.507911] cxl_core:__cxl_process_extent_list:1802: cxl_pci 0000:0d:00.0: Got extent list 0-0 of 1 generation Num:0
[ 1745.508958] cxl_core:__cxl_process_extent_list:1815: cxl_pci 0000:0d:00.0: Processing extent 0/1
[ 1745.509843] cxl_core:cxl_validate_extent:975: cxl_pci 0000:0d:00.0: DC extent DPA [range 0x0000000000000000-0x000000000fffffff] (DCR:[range 0x0000000000000000-0x000000007fffffff])(00000000-0000-0000-0000-000000000000)
[ 1745.511748] cxl_core:__cxl_dpa_to_region:2869: cxl decoder2.0: dpa:0x0 mapped in region:region0
[ 1745.512626] cxl_core:cxl_add_extent:460: cxl decoder2.0: Checking ED ([mem 0x00000000-0x3fffffff flags 0x80000200]) for extent [range 0x0000000000000000-0x000000000fffffff]
[ 1745.514143] cxl_core:cxl_add_extent:492: cxl decoder2.0: Add extent [range 0x0000000000000000-0x000000000fffffff] (00000000-0000-0000-0000-000000000000)
[ 1745.515485] cxl_core:online_region_extent:176: extent0.0: region extent HPA [range 0x0000000000000000-0x000000000fffffff]
[ 1745.516576] cxl_core:cxlr_notify_extent:285: cxl dax_region0: Trying notify: type 0 HPA [range 0x0000000000000000-0x000000000fffffff]
[ 1745.517768] cxl_core:cxl_bus_probe:2087: cxl_region region0: probe: 0
[ 1745.524984] cxl dax_region0: Resources present before probing
btw, I hit the same issue with the previous verson also.
Fan
>
> Series info
> ===========
>
> This series has 2 parts:
>
> Patch 1-17: Core DCD support
> Patch 18-19: cxl_test support
>
> Background
> ==========
>
> A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
> device that allows memory capacity within a region to change
> dynamically without the need for resetting the device, reconfiguring
> HDM decoders, or reconfiguring software DAX regions.
>
> One of the biggest anticipated use cases for Dynamic Capacity is to
> allow hosts to dynamically add or remove memory from a host within a
> data center without physically changing the per-host attached memory nor
> rebooting the host.
>
> The general flow for the addition or removal of memory is to have an
> orchestrator coordinate the use of the memory. Generally there are 5
> actors in such a system, the Orchestrator, Fabric Manager, the Logical
> device, the Host Kernel, and a Host User.
>
> An example work flow is shown below.
>
> Orchestrator FM Device Host Kernel Host User
>
> | | | | |
> |-------------- Create region ------------------------>|
> | | | | |
> | | | |<-- Create ----|
> | | | | Region |
> | | | |(dynamic_ram_a)|
> |<------------- Signal done ---------------------------|
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create ---->|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | | | |
> | | | |<- Release ----| <-+
> | | | | DAX dev |
> | | | | |
> |<------------- Signal done ---------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Accept -|<- Accept -| |
> | | Extent | Extent | |
> | | | |<- Create -----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> | | | |<- Release ----| <-+
> | | | | DAX dev |
> |<------------- Signal done ---------------------------|
> | | | | |
> |-- Remove -->|- Release->|- Release ->| |
> | Capacity | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | | |
> |-- Add ----->|-- Add --->|--- Add --->| |
> | Capacity | Extent | Extent | |
> | | | |<- Create -----|
> | | | | DAX dev |-- Use memory
> | | | | | |
> |-- Remove -->|- Release->|- Release ->| | |
> | Capacity | Extent | Extent | | |
> | | | | | |
> | | | (Release Ignored) | |
> | | | | | |
> | | | |<- Release ----| <-+
> | | | | DAX dev |
> |<------------- Signal done ---------------------------|
> | | | | |
> | |- Release->|- Release ->| |
> | | Extent | Extent | |
> | | | | |
> | |<- Release-|<- Release -| |
> | | Extent | Extent | |
> | | | |<- Destroy ----|
> | | | | Region |
> | | | | |
>
> Implementation
> ==============
>
> This series requires the creation of regions and DAX devices to be
> closely synchronized with the Orchestrator and Fabric Manager. The host
> kernel will reject extents if a region is not yet created. It also
> ignores extent release if memory is in use (DAX device created). These
> synchronizations are not anticipated to be an issue with real
> applications.
>
> Only a single dynamic ram partition is supported (dynamic_ram_a). The
> requirements, use cases, and existence of actual hardware devices to
> support more than one DC partition is unknown at this time. So a less
> complex implementation was chosen.
>
> In order to allow for capacity to be added and removed a new concept of
> a sparse DAX region is introduced. A sparse DAX region may have 0 or
> more bytes of available space. The total space depends on the number
> and size of the extents which have been added.
>
> It is anticipated that users of the memory will carefully coordinate the
> surfacing of capacity with the creation of DAX devices which use that
> capacity. Therefore, the allocation of the memory to DAX devices does
> not allow for specific associations between DAX device and extent. This
> keeps allocations of DAX devices similar to existing DAX region
> behavior.
>
> To keep the DAX memory allocation aligned with the existing DAX devices
> which do not have tags, extents are not allowed to have tags in this
> implementation. Future support for tags can be added when real use
> cases surface.
>
> Great care was taken to keep the extent tracking simple. Some xarray's
> needed to be added but extra software objects are kept to a minimum.
>
> Region extents are tracked as sub-devices of the DAX region. This
> ensures that region destruction cleans up all extent allocations
> properly.
>
> The major functionality of this series includes:
>
> - Getting the dynamic capacity (DC) configuration information from cxl
> devices
>
> - Configuring a DC partition found in hardware.
>
> - Enhancing the CXL and DAX regions for dynamic capacity support
> a. Maintain a logical separation between hardware extents and
> software managed extents. This provides an abstraction
> between the layers and should allow for interleaving in the
> future
>
> - Get existing hardware extent lists for endpoint decoders upon region
> creation.
>
> - Respond to DC capacity events and adjust available region memory.
> a. Add capacity Events
> b. Release capacity events
>
> - Host response for add capacity
> a. do not accept the extent if:
> If the region does not exist
> or an error occurs realizing the extent
> b. If the region does exist
> realize a DAX region extent with 1:1 mapping (no
> interleave yet)
> c. Support the event more bit by processing a list of extents
> marked with the more bit together before setting up a
> response.
>
> - Host response for remove capacity
> a. If no DAX device references the extent; release the extent
> b. If a reference does exist, ignore the request.
> (Require FM to issue release again.)
> c. Release extents flagged with the 'more' bit individually as
> the specification allows for the asynchronous release of
> memory and the implementation is simplified by doing so.
>
> - Modify DAX device creation/resize to account for extents within a
> sparse DAX region
>
> - Trace Dynamic Capacity events for debugging
>
> - Add cxl-test infrastructure to allow for faster unit testing
> (See new ndctl branch for cxl-dcd.sh test[1])
>
> - Only support 0 value extent tags
>
> Fan Ni's upstream of Qemu DCD was used for testing.
>
> Remaining work:
>
> 1) Allow mapping to specific extents (perhaps based on
> label/tag)
> 1a) devise region size reporting based on tags
> 2) Interleave support
>
> Possible additional work depending on requirements:
>
> 1) Accept a new extent which extends (but overlaps) already
> accepted extent(s)
> 2) Rework DAX device interfaces, memfd has been explored a bit
> 3) Support more than 1 DC partition
>
> [1] https://github.com/weiny2/ndctl/tree/dcd-region3-2025-04-13
>
> ---
> Changes in v9:
> - djbw: pare down support to only a single DC parition
> - djbw: adjust to the new core partition processing which aligns with
> new type2 work.
> - iweiny: address smaller comments from v8
> - iweiny: rebase off of 6.15-rc1
> - Link to v8: https://patch.msgid.link/20241210-dcd-type2-upstream-v8-0-812852504400@intel.com
>
> ---
> Ira Weiny (19):
> cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> cxl/mem: Read dynamic capacity configuration from the device
> cxl/cdat: Gather DSMAS data for DCD partitions
> cxl/core: Enforce partition order/simplify partition calls
> cxl/mem: Expose dynamic ram A partition in sysfs
> cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
> cxl/region: Add sparse DAX region support
> cxl/events: Split event msgnum configuration from irq setup
> cxl/pci: Factor out interrupt policy check
> cxl/mem: Configure dynamic capacity interrupts
> cxl/core: Return endpoint decoder information from region search
> cxl/extent: Process dynamic partition events and realize region extents
> cxl/region/extent: Expose region extent information in sysfs
> dax/bus: Factor out dev dax resize logic
> dax/region: Create resources on sparse DAX regions
> cxl/region: Read existing extents on region creation
> cxl/mem: Trace Dynamic capacity Event Record
> tools/testing/cxl: Make event logs dynamic
> tools/testing/cxl: Add DC Regions to mock mem data
>
> Documentation/ABI/testing/sysfs-bus-cxl | 100 ++-
> drivers/cxl/core/Makefile | 2 +-
> drivers/cxl/core/cdat.c | 11 +
> drivers/cxl/core/core.h | 33 +-
> drivers/cxl/core/extent.c | 495 +++++++++++++++
> drivers/cxl/core/hdm.c | 13 +-
> drivers/cxl/core/mbox.c | 632 ++++++++++++++++++-
> drivers/cxl/core/memdev.c | 87 ++-
> drivers/cxl/core/port.c | 5 +
> drivers/cxl/core/region.c | 76 ++-
> drivers/cxl/core/trace.h | 65 ++
> drivers/cxl/cxl.h | 61 +-
> drivers/cxl/cxlmem.h | 134 +++-
> drivers/cxl/mem.c | 2 +-
> drivers/cxl/pci.c | 115 +++-
> drivers/dax/bus.c | 356 +++++++++--
> drivers/dax/bus.h | 4 +-
> drivers/dax/cxl.c | 71 ++-
> drivers/dax/dax-private.h | 40 ++
> drivers/dax/hmem/hmem.c | 2 +-
> drivers/dax/pmem.c | 2 +-
> include/cxl/event.h | 31 +
> include/linux/ioport.h | 3 +
> tools/testing/cxl/Kbuild | 3 +-
> tools/testing/cxl/test/mem.c | 1021 +++++++++++++++++++++++++++----
> 25 files changed, 3102 insertions(+), 262 deletions(-)
> ---
> base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
> change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
>
> Best regards,
> --
> Ira Weiny <ira.weiny@intel.com>
>
next prev parent reply other threads:[~2025-04-14 16:11 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-13 22:52 [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2025-04-13 22:52 ` [PATCH v9 01/19] cxl/mbox: Flag " Ira Weiny
2025-04-14 14:19 ` Jonathan Cameron
2025-05-05 21:04 ` Fan Ni
2025-05-06 16:09 ` Ira Weiny
2025-05-06 18:54 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 02/19] cxl/mem: Read dynamic capacity configuration from the device Ira Weiny
2025-04-14 14:35 ` Jonathan Cameron
2025-04-14 15:20 ` Jonathan Cameron
2025-05-07 17:40 ` Fan Ni
2025-05-08 13:35 ` Ira Weiny
2025-04-13 22:52 ` [PATCH v9 03/19] cxl/cdat: Gather DSMAS data for DCD partitions Ira Weiny
2025-04-14 15:29 ` Jonathan Cameron
2025-04-13 22:52 ` [PATCH v9 04/19] cxl/core: Enforce partition order/simplify partition calls Ira Weiny
2025-04-14 15:32 ` Jonathan Cameron
2026-02-02 19:25 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 05/19] cxl/mem: Expose dynamic ram A partition in sysfs Ira Weiny
2025-04-14 15:34 ` Jonathan Cameron
2026-02-02 19:28 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 06/19] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode Ira Weiny
2025-04-14 15:36 ` Jonathan Cameron
2025-05-07 20:50 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 07/19] cxl/region: Add sparse DAX region support Ira Weiny
2025-04-14 15:40 ` Jonathan Cameron
2025-05-08 17:54 ` Fan Ni
2025-05-08 18:17 ` Fan Ni
2025-04-13 22:52 ` [PATCH v9 08/19] cxl/events: Split event msgnum configuration from irq setup Ira Weiny
2025-04-13 22:52 ` [PATCH v9 09/19] cxl/pci: Factor out interrupt policy check Ira Weiny
2025-04-13 22:52 ` [PATCH v9 10/19] cxl/mem: Configure dynamic capacity interrupts Ira Weiny
2025-04-13 22:52 ` [PATCH v9 11/19] cxl/core: Return endpoint decoder information from region search Ira Weiny
2025-04-13 22:52 ` [PATCH v9 12/19] cxl/extent: Process dynamic partition events and realize region extents Ira Weiny
2025-04-14 16:07 ` Jonathan Cameron
2025-04-14 22:10 ` Alison Schofield
2025-05-12 17:47 ` Fan Ni
2026-02-02 20:00 ` Davidlohr Bueso
2026-02-24 1:24 ` Anisa Su
2026-03-05 22:00 ` Ira Weiny
2025-04-13 22:52 ` [PATCH v9 13/19] cxl/region/extent: Expose region extent information in sysfs Ira Weiny
2025-04-13 22:52 ` [PATCH v9 14/19] dax/bus: Factor out dev dax resize logic Ira Weiny
2025-04-13 22:52 ` [PATCH v9 15/19] dax/region: Create resources on sparse DAX regions Ira Weiny
2025-04-13 22:52 ` [PATCH v9 16/19] cxl/region: Read existing extents on region creation Ira Weiny
2025-04-14 16:15 ` Jonathan Cameron
2026-02-02 19:42 ` Davidlohr Bueso
2025-04-13 22:52 ` [PATCH v9 17/19] cxl/mem: Trace Dynamic capacity Event Record Ira Weiny
2025-04-13 22:52 ` [PATCH v9 18/19] tools/testing/cxl: Make event logs dynamic Ira Weiny
2025-04-13 22:52 ` [PATCH v9 19/19] tools/testing/cxl: Add DC Regions to mock mem data Ira Weiny
2025-04-14 16:11 ` Fan Ni [this message]
2025-04-15 2:37 ` [PATCH v9 00/19] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny
2025-04-15 2:47 ` Fan Ni
2025-04-15 4:28 ` Dan Williams
2025-05-13 18:55 ` Fan Ni
2025-04-14 16:47 ` Jonathan Cameron
2025-04-15 4:50 ` Dan Williams
2025-04-15 10:03 ` Jonathan Cameron
2025-04-15 17:45 ` Dan Williams
2025-06-03 16:32 ` Fan Ni
2025-06-09 17:09 ` Fan Ni
2026-02-02 20:22 ` Gregory Price
2026-02-03 22:04 ` Ira Weiny
2026-02-04 15:12 ` Gregory Price
2026-02-04 17:57 ` Ira Weiny
2026-02-04 18:53 ` Gregory Price
2026-02-05 17:48 ` Jonathan Cameron
2026-02-06 11:01 ` Alireza Sanaee
2026-02-06 13:26 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_0v-iFQpWlgG7oT@debian \
--to=nifan.cxl@gmail.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.li@zohomail.com \
--cc=nvdimm@lists.linux.dev \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.