Linux CXL
 help / color / mirror / Atom feed
* DCD: Add support for Dynamic Capacity Devices (DCD)
@ 2026-06-25 11:04 Anisa Su
  2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
                   ` (30 more replies)
  0 siblings, 31 replies; 61+ messages in thread
From: Anisa Su @ 2026-06-25 11:04 UTC (permalink / raw)
  To: linux-cxl, linux-kernel
  Cc: nvdimm, Dan Williams, Jonathan Cameron, Davidlohr Bueso,
	Dave Jiang, Vishal Verma, Ira Weiny, Alison Schofield,
	John Groves, Gregory Price, Anisa Su

Table of Contents
=================
  1. Changes since v10
  2. Background
  3. Patch organization
  4. Noteable
  5. Testing

This series branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-v11-06-23-26
NDCTL branch: https://github.com/anisa-su993/anisa-ndctl/tree/dcd-2026-06-24

v10: https://lore.kernel.org/linux-cxl/ajuMJi5nTQRB_ZP0@AnisaLaptop.localdomain/T/#mfdfc28c829071204333824c542ca3af4170dafb4

Changes since v10
=================
The overall architecture and semantics are unchanged; v11 is review
fixes, naming/ABI corrections, and irons out locking/concurrency edge cases
between the CXL and DAX layers.

Naming / ABI:
 - Renamed dynamic_ram_a to dynamic_ram_1 throughout (endpoint-decoder
   mode, the partition sysfs name, and enum CXL_PARTMODE_DYNAMIC_RAM_1),
   matching the numbered-partition convention.
 - Sharable extent sequence numbers are now a dense 0..n-1 (previously
   1..n); the CXL validation path and the DAX claim path enforce the same
   0..n-1 invariant.
 - The DAX 'uuid' attribute reads back the null UUID (all-zeroes) when
   untagged rather than "0".

Recovery and lifecycle:
 - Creating a region over a DC partition now reads the device's
   already-accepted extents at probe time. cxl_dax_region probe
   and recovered extents are not re-acknowledged via Add-DC-Response.  New
   add events are deferred until the initial scan completes so a tag already in use
   is never registered twice.
 - Per-tag-group add and release of DAX resources are atomic (all-or-none). Previously,
   adding a tag group only locked for each extent addition. The lock is widened to
   the entire group.
 - Upper bound of 100 pending extents to prevent 20-second timeout for the More
   chain to close from being infinitely refreshed (unlikely unless device is malicious)

Robustness (device-supplied data is treated as untrusted):
 - Various device-supplied payload sizing checks, overflow/underflow, etc.
 - Fix places where we need to check for native_cxl to avoid overriding
   BIOS-owned events

Documentation:
 - Small changes to reflect dynamic_ram_a to dynamic_ram_1 change and the
   sequence num change (0...n-1 instead of 1...n)
 - Bump kver to 7.3 and date for sysfs attribute documentation

Signoffs/Tags:
- updated Ira's signoffs and authored-by to use iweiny@kernel.org
- update Jonathan Cameron's email to jic23@kernel.org for various review tags
- update Fan's email to nifan.cxl@gmail.com
- update Dan's email to djbw@kernel.org

Background
=============
A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
device that allows memory capacity within a region to change
dynamically without the need for resetting the device, reconfiguring
HDM decoders, or reconfiguring software DAX regions.
One of the biggest anticipated use cases for Dynamic Capacity is to
allow hosts to dynamically add or remove memory from a host within a
data center without physically changing the per-host attached memory nor
rebooting the host.
The general flow for the addition or removal of memory is to have an
orchestrator coordinate the use of the memory.  Generally there are 5
actors in such a system, the Orchestrator, Fabric Manager, the Logical
device, the Host Kernel, and a Host User.
An example work flow is shown below.
Orchestrator      FM         Device       Host Kernel    Host User
    |             |           |            |               |
    |-------------- Create region ------------------------>|
    |             |           |            |               |
    |             |           |            |<-- Create ----|
    |             |           |            |    Region     |
    |             |           |            |(dynamic_ram_1)|
    |<------------- Signal done ---------------------------|
    |             |           |            |               |
    |-- Add ----->|-- Add --->|--- Add --->|               |
    |  Capacity   |  Extent   |   Extent   |               |
    |             |           |            |               |
    |             |<- Accept -|<- Accept  -|               |
    |             |   Extent  |   Extent   |               |
    |             |           |            |<- Create -----|
    |             |           |            |   DAX dev     |-- Use memory
    |             |           |            |               |   |
    |             |           |            |               |   |
    |             |           |            |<- Release ----| <-+
    |             |           |            |   DAX dev     |
    |             |           |            |               |
    |<------------- Signal done ---------------------------|
    |             |           |            |               |
    |-- Remove -->|- Release->|- Release ->|               |
    |  Capacity   |  Extent   |   Extent   |               |
    |             |           |            |               |
    |             |<- Release-|<- Release -|               |
    |             |   Extent  |   Extent   |               |
    |             |           |            |               |
    |-- Add ----->|-- Add --->|--- Add --->|               |
    |  Capacity   |  Extent   |   Extent   |               |
    |             |           |            |               |
    |             |<- Accept -|<- Accept  -|               |
    |             |   Extent  |   Extent   |               |
    |             |           |            |<- Create -----|
    |             |           |            |   DAX dev     |-- Use memory
    |             |           |            |               |   |
    |             |           |            |<- Release ----| <-+
    |             |           |            |   DAX dev     |
    |<------------- Signal done ---------------------------|
    |             |           |            |               |
    |-- Remove -->|- Release->|- Release ->|               |
    |  Capacity   |  Extent   |   Extent   |               |
    |             |           |            |               |
    |             |<- Release-|<- Release -|               |
    |             |   Extent  |   Extent   |               |
    |             |           |            |               |
    |-- Add ----->|-- Add --->|--- Add --->|               |
    |  Capacity   |  Extent   |   Extent   |               |
    |             |           |            |<- Create -----|
    |             |           |            |   DAX dev     |-- Use memory
    |             |           |            |               |   |
    |-- Remove -->|- Release->|- Release ->|               |   |
    |  Capacity   |  Extent   |   Extent   |               |   |
    |             |           |            |               |   |
    |             |           |     (Release Ignored)      |   |
    |             |           |            |               |   |
    |             |           |            |<- Release ----| <-+
    |             |           |            |   DAX dev     |
    |<------------- Signal done ---------------------------|
    |             |           |            |               |
    |             |- Release->|- Release ->|               |
    |             |  Extent   |   Extent   |               |
    |             |           |            |               |
    |             |<- Release-|<- Release -|               |
    |             |   Extent  |   Extent   |               |
    |             |           |            |<- Destroy ----|
    |             |           |            |   Region      |
    |             |           |            |               |


Patch organization
==================
Device enablement and partition configuration:
 - cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
 - cxl/mem: Read dynamic capacity configuration from the device
 - cxl/cdat: Gather DSMAS data for DCD partitions
 - cxl/core: Enforce partition order/simplify partition calls
 - cxl/mem: Expose dynamic ram 1 partition in sysfs
 - cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode
 - cxl/region: Add DC DAX region support

Event and interrupt plumbing:
 - cxl/events: Split event msgnum configuration from irq setup
 - cxl/pci: Factor out interrupt policy check
 - cxl/mem: Configure dynamic capacity interrupts
 - cxl/core: Return endpoint decoder information from region search
 - cxl/mem: Set up framework for handling DC Events
 - cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains

Extent handling - add, release, and validation:
 - cxl/extent: Handle DC Add Capacity events
 - cxl/mem: Drop misaligned DCD extent groups
 - cxl/extent: Validate DC extent partition
 - cxl/mem: Enforce tag-group semantics
 - cxl/extent: Handle DC Release Capacity events
 - cxl/extent: Enforce cross-region tag uniqueness
 - cxl/region/extent: Expose dc_extent information in sysfs

DAX resource surfacing and device model:
 - cxl + dax: Surface dax_resources on DCD Add Capacity events
 - cxl + dax: Release dax_resources on DCD Release Capacity events
 - dax/bus: Factor out dev dax resize logic
 - dax/bus: Add uuid sysfs attribute to dax devices
 - dax/bus: Reject resize on DC dax devices and enforce 0-size creation
 - dax/bus: Tag-aware uuid claim and show on DC dax devices
 - cxl/region: Read existing extents on region creation

Tracing, test infrastructure, and documentation:
 - cxl/mem: Trace Dynamic capacity Event Record
 - tools/testing/cxl: Make event logs dynamic
 - tools/testing/cxl: Add DC Regions to mock mem data
 - Documentation/cxl: Document DCD extent handling and DC-backed DAX regions


Noteable
========
 - A More=1 add chain is bounded by the 20s timeout and CXL_DC_MAX_PENDING_EXTENTS,
   set to 100. Suggested by Sashiko as a defensive cap against a fabric manager
   that never closes the chain.  The value is arbitrary; feedback on it is welcome.

 - Several Sashiko review comments assumed multiple host threads could process a
   single DCD add event, or concurrently mutate one tag group, at the same
   time. But I don't think that happens because DCD events for a memdev are delivered
   and handled serially by that device's event-interrupt thread,
   and a tag group is owned by exactly one memory device.  Those comments
   were therefore ignored. Please correct me if this assumption is wrong
   so I can fix those.

Testing
=======
ndctl unit suite: built and run against the QEMU cxl_test mock with the
ndctl 'cxl' suite (branch dcd-2026-06-24): 16 of 17 tests pass and
cxl-features is skipped as unsupported, including cxl-dcd.sh and the
cxl-region-replay.sh crash-recovery test that exercises reading
pre-existing extents on region creation.

QEMU end-to-end: used Ali's QEMU patchset adding tag support
[1], with the below topology:

TOPO='-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
     -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
     -device usb-ehci,id=ehci \
     -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
     -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
     -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
     -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
     -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'

The exact instructions are the same as the previous version, so I've truncated some details.

  1. Boot the guest.
  2. QMP object-add a tagged 8G memory-backend-ram
     (tag 5be13bce-ae34-4a77-b6c3-16df975fcf1a).
  3. cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_1
  4. QMP cxl-add-dynamic-capacity (prescriptive, region 0, same tag)
     injecting an 8G extent at offset 0.
  5. The extent surfaces under the region: dax_region0/extent0.0 reports
     offset 0x0, length 0x200000000, uuid 5be13bce-...
  6. daxctl create-device -r region0 --uuid 5be13bce-... creates the 8G
     devdax device.

We are also working with some internal teams to test on real hardware, so
I'll report any findings as we go.

References:
[1] https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t

This series applies on the v7.1 tag (Linus' tree).

base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6

Anisa Su (6):
  cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains
  cxl/mem: Enforce tag-group semantics
  cxl/extent: Enforce cross-region tag uniqueness
  dax/bus: Add uuid sysfs attribute to dax devices
  dax/bus: Tag-aware uuid claim and show on DC dax devices
  Documentation/cxl: Document DCD extent handling and DC-backed DAX
    regions

Ira Weiny (25):
  cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
  cxl/mem: Read dynamic capacity configuration from the device
  cxl/cdat: Gather DSMAS data for DCD partitions
  cxl/core: Enforce partition order/simplify partition calls
  cxl/mem: Expose dynamic ram 1 partition in sysfs
  cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode
  cxl/region: Add DC DAX region support
  cxl/events: Split event msgnum configuration from irq setup
  cxl/pci: Factor out interrupt policy check
  cxl/mem: Configure dynamic capacity interrupts
  cxl/core: Return endpoint decoder information from region search
  cxl/mem: Set up framework for handling DC Events
  cxl/extent: Handle DC Add Capacity events
  cxl/mem: Drop misaligned DCD extent groups
  cxl/extent: Validate DC extent partition
  cxl/extent: Handle DC Release Capacity events
  cxl/region/extent: Expose dc_extent information in sysfs
  cxl + dax: Surface dax_resources on DCD Add Capacity events
  cxl + dax: Release dax_resources on DCD Release Capacity events
  dax/bus: Factor out dev dax resize logic
  dax/bus: Reject resize on DC dax devices and enforce 0-size creation
  cxl/region: Read existing extents on region creation
  cxl/mem: Trace Dynamic capacity Event Record
  tools/testing/cxl: Make event logs dynamic
  tools/testing/cxl: Add DC Regions to mock mem data

 Documentation/ABI/testing/sysfs-bus-cxl       |  100 +-
 Documentation/ABI/testing/sysfs-bus-dax       |   18 +
 .../driver-api/cxl/linux/cxl-driver.rst       |  149 +++
 .../driver-api/cxl/linux/dax-driver.rst       |  169 +++
 drivers/cxl/core/Makefile                     |    2 +-
 drivers/cxl/core/cdat.c                       |   12 +
 drivers/cxl/core/core.h                       |   67 +-
 drivers/cxl/core/extent.c                     |  783 ++++++++++++
 drivers/cxl/core/hdm.c                        |   14 +-
 drivers/cxl/core/mbox.c                       | 1107 +++++++++++++++-
 drivers/cxl/core/memdev.c                     |   87 +-
 drivers/cxl/core/port.c                       |    9 +
 drivers/cxl/core/region.c                     |   53 +-
 drivers/cxl/core/region_dax.c                 |   49 +-
 drivers/cxl/core/trace.h                      |   75 ++
 drivers/cxl/cxl.h                             |  114 +-
 drivers/cxl/cxlmem.h                          |  162 ++-
 drivers/cxl/mem.c                             |    2 +-
 drivers/cxl/pci.c                             |  136 +-
 drivers/dax/bus.c                             |  653 +++++++++-
 drivers/dax/bus.h                             |    4 +-
 drivers/dax/cxl.c                             |  115 +-
 drivers/dax/dax-private.h                     |   63 +
 drivers/dax/hmem/hmem.c                       |    2 +-
 drivers/dax/pmem.c                            |    2 +-
 include/cxl/cxl.h                             |    7 +-
 include/cxl/event.h                           |   38 +
 tools/testing/cxl/Kbuild                      |    5 +-
 tools/testing/cxl/test/cxl.c                  |   12 +
 tools/testing/cxl/test/mem.c                  | 1109 +++++++++++++++--
 tools/testing/cxl/test/mock.h                 |    9 +
 31 files changed, 4858 insertions(+), 269 deletions(-)
 create mode 100644 drivers/cxl/core/extent.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2026-06-26 23:18 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 11:04 DCD: Add support for Dynamic Capacity Devices (DCD) Anisa Su
2026-06-25 11:04 ` [PATCH v11 01/31] cxl/mbox: Flag " Anisa Su
2026-06-26 21:43   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 02/31] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-26 22:26   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 04/31] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-06-26 22:37   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 05/31] cxl/mem: Expose dynamic ram 1 partition in sysfs Anisa Su
2026-06-25 18:12   ` sashiko-bot
2026-06-26 23:08   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 06/31] cxl/port: Add 'dynamic_ram_1' to endpoint decoder mode Anisa Su
2026-06-25 11:04 ` [PATCH v11 07/31] cxl/region: Add DC DAX region support Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-26 23:18   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 08/31] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-06-25 11:04 ` [PATCH v11 09/31] cxl/pci: Factor out interrupt policy check Anisa Su
2026-06-25 11:04 ` [PATCH v11 10/31] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-06-25 18:14   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 11/31] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-06-25 11:04 ` [PATCH v11 12/31] cxl/mem: Set up framework for handling DC Events Anisa Su
2026-06-25 18:12   ` sashiko-bot
2026-06-26 21:54   ` Dave Jiang
2026-06-25 11:04 ` [PATCH v11 13/31] cxl/mem: Add 20 second timeout for stalled DC_ADD_CAPACITY chains Anisa Su
2026-06-25 18:15   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 14/31] cxl/extent: Handle DC Add Capacity events Anisa Su
2026-06-25 18:16   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 15/31] cxl/mem: Drop misaligned DCD extent groups Anisa Su
2026-06-25 18:19   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 16/31] cxl/extent: Validate DC extent partition Anisa Su
2026-06-25 18:20   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 17/31] cxl/mem: Enforce tag-group semantics Anisa Su
2026-06-25 18:24   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 18/31] cxl/extent: Handle DC Release Capacity events Anisa Su
2026-06-25 18:23   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 19/31] cxl/extent: Enforce cross-region tag uniqueness Anisa Su
2026-06-25 18:23   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 20/31] cxl/region/extent: Expose dc_extent information in sysfs Anisa Su
2026-06-25 18:33   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 21/31] cxl + dax: Surface dax_resources on DCD Add Capacity events Anisa Su
2026-06-25 18:29   ` sashiko-bot
2026-06-25 11:04 ` [PATCH v11 22/31] cxl + dax: Release dax_resources on DCD Release " Anisa Su
2026-06-25 18:36   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 23/31] dax/bus: Factor out dev dax resize logic Anisa Su
2026-06-25 18:27   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 24/31] dax/bus: Add uuid sysfs attribute to dax devices Anisa Su
2026-06-25 11:05 ` [PATCH v11 25/31] dax/bus: Reject resize on DC dax devices and enforce 0-size creation Anisa Su
2026-06-25 11:05 ` [PATCH v11 26/31] dax/bus: Tag-aware uuid claim and show on DC dax devices Anisa Su
2026-06-25 18:26   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 27/31] cxl/region: Read existing extents on region creation Anisa Su
2026-06-25 18:32   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 28/31] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-06-25 18:29   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 29/31] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-06-25 18:31   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 30/31] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-06-25 18:34   ` sashiko-bot
2026-06-25 11:05 ` [PATCH v11 31/31] Documentation/cxl: Document DCD extent handling and DC-backed DAX regions Anisa Su
2026-06-25 18:24   ` sashiko-bot
2026-06-25 18:00 ` [PATCH v11 03/31] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-06-26 22:30   ` Dave Jiang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox