All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
@ 2026-04-11  1:22 Anisa Su
  2026-04-11  1:22 ` [PATCH 01/20] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Anisa Su
                   ` (22 more replies)
  0 siblings, 23 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:22 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Anisa Su

This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
support, which are the requirements that I've understood from the community
meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
greatly appreciated to let me know if this is on the right track? Or totally off
the mark...

Everything is the same as before except:
- extents must have tags (uuids)
- 1 tag per region
- regions must be contiguous (no more sparse regions)

To achieve this, the main thing is to change the relationship between
cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
comprised of 1+ contiguous device extents with the same tag. Contiguity
is enforced by sorting device extents by DPA order. They're re-sorted by the
original order in which they were sent for the response, which is required by
the spec.

Once valid extents have been collected, it's passed as 1 contiguous capacity
to the DAX layer via cxl_dax_region notify(). Once notified, the same region
cannot be added to again, unless all extents are released.

For release: upon receiving a release event record, if the extent is within the
bounds of any cxl_region, and it has the correct tag, then all extents in the
region are released, so the "More" flag is still ignored. Not sure if this is the
right way to do it but it was the simplest.

The changes to the DAX layer remain untouched, as all of this extra validation is done
in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
there was no need to add anything there.

Most of the series remains unchanged as I've tried not to make too many big changes
right off the bat. Only the following commits were modified:
- cxl/extent: Process dynamic partition events and realize region extents
- dax/region: Create resources on DAX regions
- cxl/region: Read existing extents on region creation

I've tacked on 1 commit at the end to change the driver type of DC regions from
DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.

Also, I've documented with more detail in the commit messages of the commits
that were modified on what exactly was changed, so I hope that's clear.

================================================================================
Git History

This series is based on cxl-next, with base commit:
3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
+ bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u

GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26

It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
- famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9

I've tested the current series without famfs as well as the series applied on
famfs-v9 with famfs.
================================================================================
Testing:

This patchset was tested with Ali's QEMU patchset adding tag support:
https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t

Details:
Topology: '-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
     -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
     -device usb-ehci,id=ehci \
     -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
     -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
     -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
     -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
     -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'

1. Start VM (12GB)
2. Issue QMP to add tagged backend (8GB):
{ "execute": "qmp_capabilities" }
{
    "execute": "object-add",
    "arguments": {
        "qom-type": "memory-backend-ram",
        "id": "tm0",
        "size": 8589934592,
        "share": true,
        "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a"
    }
}
3. Create region on the VM: cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_a
4. Issue QMP to add an 8GB extent:
{ "execute": "qmp_capabilities" }
{
    "execute": "cxl-add-dynamic-capacity",
    "arguments": {
        "path": "/machine/peripheral/cxl-dcd0",
        "host-id": 0,
        "selection-policy": "prescriptive",
        "region": 0,
        "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a",
        "extents": [
            {
                "offset": 0,
                "len": 8589934592
            }
        ]
    }
}
5. Verify with sysfs:
root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/offset
0x0
root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/length
0x200000000
root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/uuid
5be13bce-ae34-4a77-b6c3-16df975fcf1a

6. daxctl create-device -r region0
[
  {
    "chardev":"dax0.1",
    "size":8589934592,
    "target_node":1,
    "align":2097152,
    "mode":"devdax"
  }
]
created 1 device

Currently, QEMU only supports sending 1 extent in an add/release request, which
limits what I can test. However, I was able to verify that once extent(s) have
been added to a region, it can't be added to again (size cannot be increased).

Up to this point is what I test with this patchset. Then below is the additional
famfs tests for the version applied on famfs.
================================================================================
7. Install famfs userspace tool:
https://github.com/cxl-micron-reskit/famfs

8.mkfs.famfs --v /dev/dax0.1 output:
devsize: 8589934592
Famfs Superblock:
  Filesystem UUID:   c33b4525-a2c7-4d64-9204-e8ed273b4ffb
  Device UUID:       ae887e6b-f886-45f9-bc55-f0696f3cd91d
  System UUID:       da314140-12e7-45c2-98b2-753d3bfe4f46
  role of this node: Owner
  alloc_unit:        0x200000
  OMF major version: 2
  OMF minor version: 1
  sizeof superblock: 200
  log size (bytes):  8388608
  primary: /dev/dax0.1   8589934592

Log stats:
  # of log entries in use: 0 of 15420
  Log size in use:          48
  Log size (total bytes)    8388608
  No allocation errors found

Capacity:
  Device capacity:        8.00G
  Bitmap capacity:        8.00G
  Sum of file sizes:      0.00G
  Allocated space:        0.01G
  Free space:             7.99G
  Space amplification:     inf
  Percent used:            0.1%

Famfs log:
  0 of 15420 entries used
  0 bad log entries detected
  0 files
  0 directories

9. famfs smoke tests also succeed. The smoke tests include
some fio tests, which run some simulated workloads

:== Test Timing Summary
:==-------------------------------------------------------------------
:==  prepare              0:10
:==  test0                0:07
:==  test_shadow_yaml     0:04
:==  test1                0:22
:==  test2                0:12
:==  test3                0:03
:==  test4                0:10
:==  test_errors          0:01
:==  stripe_test          0:58
:==  test_pcq             1:26
:==  test_fio             0:34
:==-------------------------------------------------------------------
:==  TOTAL                4:27
:==-------------------------------------------------------------------
:==run_smoke completed successfully (Thu Apr  9 10:33:28 PM UTC 2026)

Anisa Su (1):
  dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE

Ira Weiny (19):
  cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
  cxl/mem: Read dynamic capacity configuration from the device
  cxl/cdat: Gather DSMAS data for DCD partitions
  cxl/core: Enforce partition order/simplify partition calls
  cxl/mem: Expose dynamic ram A partition in sysfs
  cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
  cxl/region: Add sparse DAX region support
  cxl/events: Split event msgnum configuration from irq setup
  cxl/pci: Factor out interrupt policy check
  cxl/mem: Configure dynamic capacity interrupts
  cxl/core: Return endpoint decoder information from region search
  cxl/extent: Process dynamic partition events and realize region
    extents
  cxl/region/extent: Expose region extent information in sysfs
  dax/bus: Factor out dev dax resize logic
  dax/region: Create resources on DAX regions
  cxl/region: Read existing extents on region creation
  cxl/mem: Trace Dynamic capacity Event Record
  tools/testing/cxl: Make event logs dynamic
  tools/testing/cxl: Add DC Regions to mock mem data

 Documentation/ABI/testing/sysfs-bus-cxl |  100 ++-
 drivers/cxl/core/Makefile               |    2 +-
 drivers/cxl/core/cdat.c                 |   11 +
 drivers/cxl/core/core.h                 |   47 +-
 drivers/cxl/core/extent.c               |  471 +++++++++++
 drivers/cxl/core/hdm.c                  |   13 +-
 drivers/cxl/core/mbox.c                 |  770 ++++++++++++++++-
 drivers/cxl/core/memdev.c               |   87 +-
 drivers/cxl/core/port.c                 |    5 +
 drivers/cxl/core/region.c               |   43 +-
 drivers/cxl/core/region_dax.c           |    6 +
 drivers/cxl/core/trace.h                |   65 ++
 drivers/cxl/cxl.h                       |   60 +-
 drivers/cxl/cxlmem.h                    |  124 ++-
 drivers/cxl/mem.c                       |    2 +-
 drivers/cxl/pci.c                       |  115 ++-
 drivers/dax/bus.c                       |  360 ++++++--
 drivers/dax/bus.h                       |    4 +-
 drivers/dax/cxl.c                       |   71 +-
 drivers/dax/dax-private.h               |   40 +
 drivers/dax/hmem/hmem.c                 |    2 +-
 drivers/dax/pmem.c                      |    2 +-
 include/cxl/cxl.h                       |    6 +
 include/cxl/event.h                     |   39 +
 include/linux/ioport.h                  |    3 +
 tools/testing/cxl/Kbuild                |    5 +-
 tools/testing/cxl/test/mem.c            | 1018 ++++++++++++++++++++---
 27 files changed, 3210 insertions(+), 261 deletions(-)
 create mode 100644 drivers/cxl/core/extent.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 01/20] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
@ 2026-04-11  1:22 ` Anisa Su
  2026-04-11  1:22 ` [PATCH 02/20] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:22 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Per the CXL 3.1 specification software must check the Command Effects
Log (CEL) for dynamic capacity command support.

Detect support for the DCD commands while reading the CEL, including:

	Get DC Config
	Get DC Extent List
	Add DC Response
	Release DC

Based on an original patch by Navneet Singh.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/core/mbox.c | 43 +++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    | 15 ++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index aaa5c6277ebf..7ef5708bf210 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -165,6 +165,42 @@ static void cxl_set_security_cmd_enabled(struct cxl_security_state *security,
 	}
 }
 
+static bool cxl_is_dcd_command(u16 opcode)
+{
+#define CXL_MBOX_OP_DCD_CMDS 0x48
+
+	return (opcode >> 8) == CXL_MBOX_OP_DCD_CMDS;
+}
+
+static void cxl_set_dcd_cmd_enabled(struct cxl_memdev_state *mds, u16 opcode,
+				    unsigned long *cmd_mask)
+{
+	switch (opcode) {
+	case CXL_MBOX_OP_GET_DC_CONFIG:
+		set_bit(CXL_DCD_ENABLED_GET_CONFIG, cmd_mask);
+		break;
+	case CXL_MBOX_OP_GET_DC_EXTENT_LIST:
+		set_bit(CXL_DCD_ENABLED_GET_EXTENT_LIST, cmd_mask);
+		break;
+	case CXL_MBOX_OP_ADD_DC_RESPONSE:
+		set_bit(CXL_DCD_ENABLED_ADD_RESPONSE, cmd_mask);
+		break;
+	case CXL_MBOX_OP_RELEASE_DC:
+		set_bit(CXL_DCD_ENABLED_RELEASE, cmd_mask);
+		break;
+	default:
+		break;
+	}
+}
+
+static bool cxl_verify_dcd_cmds(struct cxl_memdev_state *mds, unsigned long *cmds_seen)
+{
+	DECLARE_BITMAP(all_cmds, CXL_DCD_ENABLED_MAX);
+
+	bitmap_fill(all_cmds, CXL_DCD_ENABLED_MAX);
+	return bitmap_equal(cmds_seen, all_cmds, CXL_DCD_ENABLED_MAX);
+}
+
 static bool cxl_is_poison_command(u16 opcode)
 {
 #define CXL_MBOX_OP_POISON_CMDS 0x43
@@ -757,6 +793,7 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
 	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
 	struct cxl_cel_entry *cel_entry;
 	const int cel_entries = size / sizeof(*cel_entry);
+	DECLARE_BITMAP(dcd_cmds, CXL_DCD_ENABLED_MAX);
 	struct device *dev = mds->cxlds.dev;
 	int i, ro_cmds = 0, wr_cmds = 0;
 
@@ -785,11 +822,17 @@ static void cxl_walk_cel(struct cxl_memdev_state *mds, size_t size, u8 *cel)
 			enabled++;
 		}
 
+		if (cxl_is_dcd_command(opcode)) {
+			cxl_set_dcd_cmd_enabled(mds, opcode, dcd_cmds);
+			enabled++;
+		}
+
 		dev_dbg(dev, "Opcode 0x%04x %s\n", opcode,
 			enabled ? "enabled" : "unsupported by driver");
 	}
 
 	set_features_cap(cxl_mbox, ro_cmds, wr_cmds);
+	mds->dcd_supported = cxl_verify_dcd_cmds(mds, dcd_cmds);
 }
 
 static struct cxl_mbox_get_supported_logs *cxl_get_gsl(struct cxl_memdev_state *mds)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 776c50d1db51..53444af448d7 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -230,6 +230,15 @@ struct cxl_event_state {
 	struct mutex log_lock;
 };
 
+/* Device enabled DCD commands */
+enum dcd_cmd_enabled_bits {
+	CXL_DCD_ENABLED_GET_CONFIG,
+	CXL_DCD_ENABLED_GET_EXTENT_LIST,
+	CXL_DCD_ENABLED_ADD_RESPONSE,
+	CXL_DCD_ENABLED_RELEASE,
+	CXL_DCD_ENABLED_MAX
+};
+
 /* Device enabled poison commands */
 enum poison_cmd_enabled_bits {
 	CXL_POISON_ENABLED_LIST,
@@ -405,6 +414,7 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
  * @partition_align_bytes: alignment size for partition-able capacity
  * @active_volatile_bytes: sum of hard + soft volatile
  * @active_persistent_bytes: sum of hard + soft persistent
+ * @dcd_supported: all DCD commands are supported
  * @event: event log driver state
  * @poison: poison driver state info
  * @security: security driver state info
@@ -424,6 +434,7 @@ struct cxl_memdev_state {
 	u64 partition_align_bytes;
 	u64 active_volatile_bytes;
 	u64 active_persistent_bytes;
+	bool dcd_supported;
 
 	struct cxl_event_state event;
 	struct cxl_poison_state poison;
@@ -485,6 +496,10 @@ enum cxl_opcode {
 	CXL_MBOX_OP_UNLOCK		= 0x4503,
 	CXL_MBOX_OP_FREEZE_SECURITY	= 0x4504,
 	CXL_MBOX_OP_PASSPHRASE_SECURE_ERASE	= 0x4505,
+	CXL_MBOX_OP_GET_DC_CONFIG	= 0x4800,
+	CXL_MBOX_OP_GET_DC_EXTENT_LIST	= 0x4801,
+	CXL_MBOX_OP_ADD_DC_RESPONSE	= 0x4802,
+	CXL_MBOX_OP_RELEASE_DC		= 0x4803,
 	CXL_MBOX_OP_MAX			= 0x10000
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 02/20] cxl/mem: Read dynamic capacity configuration from the device
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
  2026-04-11  1:22 ` [PATCH 01/20] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Anisa Su
@ 2026-04-11  1:22 ` Anisa Su
  2026-04-11  1:22 ` [PATCH 03/20] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:22 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Devices which optionally support Dynamic Capacity (DC) are configured
via mailbox commands.  CXL 3.2 section 9.13.3 requires the host to issue
the Get DC Configuration command in order to properly configure DCDs.
Without the Get DC Configuration command DCD can't be supported.

Implement the DC mailbox commands as specified in CXL 3.2 section
8.2.10.9.9 (opcodes 48XXh) to read and store the DCD configuration
information.  Disable DCD if an invalid configuration is found.

Linux has no support for more than one dynamic capacity partition.  Read
and validate all the partitions but configure only the first partition
as 'dynamic ram A'.  Additional partitions can be added in the future if
such a device ever materializes.  Additionally is it anticipated that no
skips will be present from the end of the pmem partition.  Check for an
disallow this configuration as well.

Linux has no use for the trailing fields of the Get Dynamic Capacity
Configuration Output Payload (Total number of supported extents, number
of available extents, total number of supported tags, and number of
available tags).  Avoid defining those fields to use the more useful
dynamic C array.

Based on an original patch by Navneet Singh.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/core/hdm.c  |   2 +
 drivers/cxl/core/mbox.c | 182 ++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    |  45 ++++++++++
 drivers/cxl/pci.c       |   3 +
 include/cxl/cxl.h       |   2 +
 5 files changed, 234 insertions(+)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 3930e130d6b6..28974adaab75 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -453,6 +453,8 @@ static const char *cxl_mode_name(enum cxl_partition_mode mode)
 		return "ram";
 	case CXL_PARTMODE_PMEM:
 		return "pmem";
+	case CXL_PARTMODE_DYNAMIC_RAM_A:
+		return "dynamic_ram_a";
 	default:
 		return "";
 	};
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 7ef5708bf210..71b29cd6abfe 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1351,6 +1351,156 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
 	return -EBUSY;
 }
 
+static int cxl_dc_check(struct device *dev, struct cxl_dc_partition_info *part_array,
+			u8 index, struct cxl_dc_partition *dev_part)
+{
+	size_t blk_size = le64_to_cpu(dev_part->block_size);
+	size_t len = le64_to_cpu(dev_part->length);
+
+	part_array[index].start = le64_to_cpu(dev_part->base);
+	part_array[index].size = le64_to_cpu(dev_part->decode_length);
+	part_array[index].size *= CXL_CAPACITY_MULTIPLIER;
+
+	/* Check partitions are in increasing DPA order */
+	if (index > 0) {
+		struct cxl_dc_partition_info *prev_part = &part_array[index - 1];
+
+		if ((prev_part->start + prev_part->size) >
+		     part_array[index].start) {
+			dev_err(dev,
+				"DPA ordering violation for DC partition %d and %d\n",
+				index - 1, index);
+			return -EINVAL;
+		}
+	}
+
+	if (!IS_ALIGNED(part_array[index].start, SZ_256M) ||
+	    !IS_ALIGNED(part_array[index].start, blk_size)) {
+		dev_err(dev, "DC partition %d invalid start %zu blk size %zu\n",
+			index, part_array[index].start, blk_size);
+		return -EINVAL;
+	}
+
+	if (part_array[index].size == 0 || len == 0 ||
+	    part_array[index].size < len || !IS_ALIGNED(len, blk_size)) {
+		dev_err(dev, "DC partition %d invalid length; size %zu len %zu blk size %zu\n",
+			index, part_array[index].size, len, blk_size);
+		return -EINVAL;
+	}
+
+	if (blk_size == 0 || blk_size % CXL_DCD_BLOCK_LINE_SIZE ||
+	    !is_power_of_2(blk_size)) {
+		dev_err(dev, "DC partition %d invalid block size; %zu\n",
+			index, blk_size);
+		return -EINVAL;
+	}
+
+	dev_dbg(dev, "DC partition %d start %zu start %zu size %zu\n",
+		index, part_array[index].start, part_array[index].size,
+		blk_size);
+
+	return 0;
+}
+
+/* Returns the number of partitions in dc_resp or -ERRNO */
+static int cxl_get_dc_config(struct cxl_mailbox *mbox, u8 start_partition,
+			     struct cxl_mbox_get_dc_config_out *dc_resp,
+			     size_t dc_resp_size)
+{
+	struct cxl_mbox_get_dc_config_in get_dc = (struct cxl_mbox_get_dc_config_in) {
+		.partition_count = CXL_MAX_DC_PARTITIONS,
+		.start_partition_index = start_partition,
+	};
+	struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = CXL_MBOX_OP_GET_DC_CONFIG,
+		.payload_in = &get_dc,
+		.size_in = sizeof(get_dc),
+		.size_out = dc_resp_size,
+		.payload_out = dc_resp,
+		.min_out = 8,
+	};
+	int rc;
+
+	rc = cxl_internal_send_cmd(mbox, &mbox_cmd);
+	if (rc < 0)
+		return rc;
+
+	dev_dbg(mbox->host, "Read %d/%d DC partitions\n",
+		dc_resp->partitions_returned, dc_resp->avail_partition_count);
+	return dc_resp->partitions_returned;
+}
+
+/**
+ * cxl_dev_dc_identify() - Reads the dynamic capacity information from the
+ *                         device.
+ * @mbox: Mailbox to query
+ * @dc_info: The dynamic partition information to return
+ *
+ * Read Dynamic Capacity information from the device and return the partition
+ * information.
+ *
+ * Return: 0 if identify was executed successfully, -ERRNO on error.
+ *         on error only dynamic_bytes is left unchanged.
+ */
+int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
+			struct cxl_dc_partition_info *dc_info)
+{
+	struct cxl_dc_partition_info partitions[CXL_MAX_DC_PARTITIONS];
+	struct device *dev = mbox->host;
+	size_t dc_resp_size =
+		sizeof(struct cxl_mbox_get_dc_config_out) + sizeof(partitions);
+	u8 start_partition;
+	u8 num_partitions;
+
+	struct cxl_mbox_get_dc_config_out *dc_resp __free(kfree) =
+					kmalloc(dc_resp_size, GFP_KERNEL);
+	if (!dc_resp)
+		return -ENOMEM;
+
+	/**
+	 * Read and check all partition information for validity and potential
+	 * debugging; see debug output in cxl_dc_check()
+	 */
+	start_partition = 0;
+	num_partitions = 0;
+	do {
+		int rc, i, j;
+
+		rc = cxl_get_dc_config(mbox, start_partition, dc_resp, dc_resp_size);
+		if (rc < 0) {
+			dev_err(dev, "Failed to get DC config: %d\n", rc);
+			return rc;
+		}
+
+		num_partitions += rc;
+
+		if (num_partitions < 1 || num_partitions > CXL_MAX_DC_PARTITIONS) {
+			dev_err(dev, "Invalid num of dynamic capacity partitions %d\n",
+				num_partitions);
+			return -EINVAL;
+		}
+
+		for (i = start_partition, j = 0; i < num_partitions; i++, j++) {
+			rc = cxl_dc_check(dev, partitions, i,
+					  &dc_resp->partition[j]);
+			if (rc)
+				return rc;
+		}
+
+		start_partition = num_partitions;
+
+	} while (num_partitions < dc_resp->avail_partition_count);
+
+	/* Return 1st partition */
+	dc_info->start = partitions[0].start;
+	dc_info->size = partitions[0].size;
+	dev_dbg(dev, "Returning partition 0 %zu size %zu\n",
+		dc_info->start, dc_info->size);
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_dev_dc_identify, "CXL");
+
 static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
 {
 	int i = info->nr_partitions;
@@ -1421,6 +1571,38 @@ int cxl_get_dirty_count(struct cxl_memdev_state *mds, u32 *count)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_get_dirty_count, "CXL");
 
+void cxl_configure_dcd(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
+{
+	struct cxl_dc_partition_info dc_info = { 0 };
+	struct device *dev = mds->cxlds.dev;
+	size_t skip;
+	int rc;
+
+	rc = cxl_dev_dc_identify(&mds->cxlds.cxl_mbox, &dc_info);
+	if (rc) {
+		dev_warn(dev,
+			 "Failed to read Dynamic Capacity config: %d\n", rc);
+		cxl_disable_dcd(mds);
+		return;
+	}
+
+	/* Skips between pmem and the dynamic partition are not supported */
+	skip = dc_info.start - info->size;
+	if (skip) {
+		dev_warn(dev,
+			 "Dynamic Capacity skip from pmem not supported: %zu\n",
+			 skip);
+		cxl_disable_dcd(mds);
+		return;
+	}
+
+	info->size += dc_info.size;
+	dev_dbg(dev, "Adding dynamic ram partition A; %zu size %zu\n",
+		dc_info.start, dc_info.size);
+	add_part(info, dc_info.start, dc_info.size, CXL_PARTMODE_DYNAMIC_RAM_A);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_configure_dcd, "CXL");
+
 int cxl_arm_dirty_shutdown(struct cxl_memdev_state *mds)
 {
 	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 53444af448d7..2fef6bb373b9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -664,6 +664,31 @@ struct cxl_mbox_set_shutdown_state_in {
 	u8 state;
 } __packed;
 
+/* See CXL 3.2 Table 8-178 get dynamic capacity config Input Payload */
+struct cxl_mbox_get_dc_config_in {
+	u8 partition_count;
+	u8 start_partition_index;
+} __packed;
+
+/* See CXL 3.2 Table 8-179 get dynamic capacity config Output Payload */
+struct cxl_mbox_get_dc_config_out {
+	u8 avail_partition_count;
+	u8 partitions_returned;
+	u8 rsvd[6];
+	/* See CXL 3.2 Table 8-180 */
+	struct cxl_dc_partition {
+		__le64 base;
+		__le64 decode_length;
+		__le64 length;
+		__le64 block_size;
+		__le32 dsmad_handle;
+		u8 flags;
+		u8 rsvd[3];
+	} __packed partition[] __counted_by(partitions_returned);
+	/* Trailing fields unused */
+} __packed;
+#define CXL_DCD_BLOCK_LINE_SIZE 0x40
+
 /* Set Timestamp CXL 3.0 Spec 8.2.9.4.2 */
 struct cxl_mbox_set_timestamp_in {
 	__le64 timestamp;
@@ -787,11 +812,20 @@ enum {
 int cxl_internal_send_cmd(struct cxl_mailbox *cxl_mbox,
 			  struct cxl_mbox_cmd *cmd);
 int cxl_dev_state_identify(struct cxl_memdev_state *mds);
+
+struct cxl_dc_partition_info {
+	size_t start;
+	size_t size;
+};
+
+int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
+			struct cxl_dc_partition_info *dc_info);
 int cxl_await_media_ready(struct cxl_dev_state *cxlds);
 int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
 int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
 struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
 						 u16 dvsec);
+void cxl_configure_dcd(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
 void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				unsigned long *cmds);
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
@@ -803,6 +837,17 @@ void cxl_event_trace_record(struct cxl_memdev *cxlmd,
 			    const uuid_t *uuid, union cxl_event *evt);
 int cxl_get_dirty_count(struct cxl_memdev_state *mds, u32 *count);
 int cxl_arm_dirty_shutdown(struct cxl_memdev_state *mds);
+
+static inline bool cxl_dcd_supported(struct cxl_memdev_state *mds)
+{
+	return mds->dcd_supported;
+}
+
+static inline void cxl_disable_dcd(struct cxl_memdev_state *mds)
+{
+	mds->dcd_supported = false;
+}
+
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bace662dc988..60f9fa05d9ef 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -870,6 +870,9 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (rc)
 		return rc;
 
+	if (cxl_dcd_supported(mds))
+		cxl_configure_dcd(mds, &range_info);
+
 	rc = cxl_dpa_setup(cxlds, &range_info);
 	if (rc)
 		return rc;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fa7269154620..0d3f144016f2 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -133,6 +133,7 @@ struct cxl_dpa_perf {
 enum cxl_partition_mode {
 	CXL_PARTMODE_RAM,
 	CXL_PARTMODE_PMEM,
+	CXL_PARTMODE_DYNAMIC_RAM_A
 };
 
 /**
@@ -148,6 +149,7 @@ struct cxl_dpa_partition {
 };
 
 #define CXL_NR_PARTITIONS_MAX 2
+#define CXL_MAX_DC_PARTITIONS 8
 
 /**
  * struct cxl_dev_state - The driver device state
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 03/20] cxl/cdat: Gather DSMAS data for DCD partitions
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
  2026-04-11  1:22 ` [PATCH 01/20] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Anisa Su
  2026-04-11  1:22 ` [PATCH 02/20] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
@ 2026-04-11  1:22 ` Anisa Su
  2026-04-11  1:22 ` [PATCH 04/20] cxl/core: Enforce partition order/simplify partition calls Anisa Su
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:22 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Additional DCD partition (AKA region) information is contained in the
DSMAS CDAT tables, including performance, read only, and shareable
attributes.

Match DCD partitions with DSMAS tables and store the meta data.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/core/cdat.c | 11 +++++++++++
 drivers/cxl/core/mbox.c |  7 +++++++
 drivers/cxl/cxlmem.h    |  2 ++
 include/cxl/cxl.h       |  4 ++++
 4 files changed, 24 insertions(+)

diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c
index 5c9f07262513..c5f3d2ebea55 100644
--- a/drivers/cxl/core/cdat.c
+++ b/drivers/cxl/core/cdat.c
@@ -17,6 +17,7 @@ struct dsmas_entry {
 	struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
 	int entries;
 	int qos_class;
+	bool shareable;
 };
 
 static u32 cdat_normalize(u16 entry, u64 base, u8 type)
@@ -74,6 +75,7 @@ static int cdat_dsmas_handler(union acpi_subtable_headers *header, void *arg,
 		return -ENOMEM;
 
 	dent->handle = dsmas->dsmad_handle;
+	dent->shareable = dsmas->flags & ACPI_CDAT_DSMAS_SHAREABLE;
 	dent->dpa_range.start = le64_to_cpu((__force __le64)dsmas->dpa_base_address);
 	dent->dpa_range.end = le64_to_cpu((__force __le64)dsmas->dpa_base_address) +
 			      le64_to_cpu((__force __le64)dsmas->dpa_length) - 1;
@@ -244,6 +246,7 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent,
 		dpa_perf->coord[i] = dent->coord[i];
 		dpa_perf->cdat_coord[i] = dent->cdat_coord[i];
 	}
+	dpa_perf->shareable = dent->shareable;
 	dpa_perf->dpa_range = dent->dpa_range;
 	dpa_perf->qos_class = dent->qos_class;
 	dev_dbg(dev,
@@ -266,13 +269,21 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds,
 		bool found = false;
 
 		for (int i = 0; i < cxlds->nr_partitions; i++) {
+			enum cxl_partition_mode mode = cxlds->part[i].mode;
 			struct resource *res = &cxlds->part[i].res;
+			u8 handle = cxlds->part[i].handle;
 			struct range range = {
 				.start = res->start,
 				.end = res->end,
 			};
 
 			if (range_contains(&range, &dent->dpa_range)) {
+				if (mode == CXL_PARTMODE_DYNAMIC_RAM_A &&
+				    dent->handle != handle)
+					dev_warn(dev,
+						"Dynamic RAM perf mismatch; %pra (%u) vs %pra (%u)\n",
+						&range, handle, &dent->dpa_range, dent->handle);
+
 				update_perf_entry(dev, dent,
 						  &cxlds->part[i].perf);
 				found = true;
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 71b29cd6abfe..f9a5e21f5d09 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1356,10 +1356,16 @@ static int cxl_dc_check(struct device *dev, struct cxl_dc_partition_info *part_a
 {
 	size_t blk_size = le64_to_cpu(dev_part->block_size);
 	size_t len = le64_to_cpu(dev_part->length);
+	u32 handle = le32_to_cpu(dev_part->dsmad_handle);
 
 	part_array[index].start = le64_to_cpu(dev_part->base);
 	part_array[index].size = le64_to_cpu(dev_part->decode_length);
 	part_array[index].size *= CXL_CAPACITY_MULTIPLIER;
+	if (handle & ~0xFF) {
+		dev_warn(dev, "DSMAD handle 0x%x has non-zero reserved bits\n", handle);
+		return -EINVAL;
+	}
+	part_array[index].handle = handle;
 
 	/* Check partitions are in increasing DPA order */
 	if (index > 0) {
@@ -1494,6 +1500,7 @@ int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
 	/* Return 1st partition */
 	dc_info->start = partitions[0].start;
 	dc_info->size = partitions[0].size;
+	dc_info->handle = partitions[0].handle;
 	dev_dbg(dev, "Returning partition 0 %zu size %zu\n",
 		dc_info->start, dc_info->size);
 
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 2fef6bb373b9..314d37ca9bdb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -118,6 +118,7 @@ struct cxl_dpa_info {
 	struct cxl_dpa_part_info {
 		struct range range;
 		enum cxl_partition_mode mode;
+		u8 handle;
 	} part[CXL_NR_PARTITIONS_MAX];
 	int nr_partitions;
 };
@@ -816,6 +817,7 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
 struct cxl_dc_partition_info {
 	size_t start;
 	size_t size;
+	u8 handle;
 };
 
 int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 0d3f144016f2..91964272f392 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -122,12 +122,14 @@ struct cxl_register_map {
  * @coord: QoS performance data (i.e. latency, bandwidth)
  * @cdat_coord: raw QoS performance data from CDAT
  * @qos_class: QoS Class cookies
+ * @shareable: Is the range sharable
  */
 struct cxl_dpa_perf {
 	struct range dpa_range;
 	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
 	struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
 	int qos_class;
+	bool shareable;
 };
 
 enum cxl_partition_mode {
@@ -141,11 +143,13 @@ enum cxl_partition_mode {
  * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
  * @perf: performance attributes of the partition from CDAT
  * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ * @handle: DSMAS handle intended to represent this partition
  */
 struct cxl_dpa_partition {
 	struct resource res;
 	struct cxl_dpa_perf perf;
 	enum cxl_partition_mode mode;
+	u8 handle;
 };
 
 #define CXL_NR_PARTITIONS_MAX 2
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 04/20] cxl/core: Enforce partition order/simplify partition calls
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (2 preceding siblings ...)
  2026-04-11  1:22 ` [PATCH 03/20] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
@ 2026-04-11  1:22 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 05/20] cxl/mem: Expose dynamic ram A partition in sysfs Anisa Su
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:22 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Device partitions have an implied order which is made more complex by
the addition of a dynamic partition.

Remove the ram special case information calls in favor of generic calls
with a check ahead of time to ensure the preservation of the implied
partition order.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes::
[anisa: rebase]
---
 drivers/cxl/core/hdm.c    | 11 ++++++++++-
 drivers/cxl/core/memdev.c | 32 +++++++++-----------------------
 drivers/cxl/cxlmem.h      |  9 +++------
 drivers/cxl/mem.c         |  2 +-
 4 files changed, 23 insertions(+), 31 deletions(-)

diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 28974adaab75..7a5812971f8f 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -464,6 +464,7 @@ static const char *cxl_mode_name(enum cxl_partition_mode mode)
 int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
 {
 	struct device *dev = cxlds->dev;
+	int i;
 
 	guard(rwsem_write)(&cxl_rwsem.dpa);
 
@@ -476,9 +477,17 @@ int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info)
 		return 0;
 	}
 
+	/* Verify partitions are in expected order. */
+	for (i = 1; i < info->nr_partitions; i++) {
+		if (cxlds->part[i].mode < cxlds->part[i-1].mode) {
+			dev_err(dev, "Partition order mismatch\n");
+			return -EINVAL;
+		}
+	}
+
 	cxlds->dpa_res = DEFINE_RES_MEM(0, info->size);
 
-	for (int i = 0; i < info->nr_partitions; i++) {
+	for (i = 0; i < info->nr_partitions; i++) {
 		const struct cxl_dpa_part_info *part = &info->part[i];
 		int rc;
 
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 80e65690eb77..71602820f896 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -75,20 +75,12 @@ static ssize_t label_storage_size_show(struct device *dev,
 }
 static DEVICE_ATTR_RO(label_storage_size);
 
-static resource_size_t cxl_ram_size(struct cxl_dev_state *cxlds)
-{
-	/* Static RAM is only expected at partition 0. */
-	if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
-		return 0;
-	return resource_size(&cxlds->part[0].res);
-}
-
 static ssize_t ram_size_show(struct device *dev, struct device_attribute *attr,
 			     char *buf)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	unsigned long long len = cxl_ram_size(cxlds);
+	unsigned long long len = cxl_part_size(cxlds, CXL_PARTMODE_RAM);
 
 	return sysfs_emit(buf, "%#llx\n", len);
 }
@@ -101,7 +93,7 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
-	unsigned long long len = cxl_pmem_size(cxlds);
+	unsigned long long len = cxl_part_size(cxlds, CXL_PARTMODE_PMEM);
 
 	return sysfs_emit(buf, "%#llx\n", len);
 }
@@ -424,10 +416,11 @@ static struct attribute *cxl_memdev_attributes[] = {
 	NULL,
 };
 
-static struct cxl_dpa_perf *to_pmem_perf(struct cxl_dev_state *cxlds)
+static struct cxl_dpa_perf *part_perf(struct cxl_dev_state *cxlds,
+				      enum cxl_partition_mode mode)
 {
 	for (int i = 0; i < cxlds->nr_partitions; i++)
-		if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+		if (cxlds->part[i].mode == mode)
 			return &cxlds->part[i].perf;
 	return NULL;
 }
@@ -438,7 +431,7 @@ static ssize_t pmem_qos_class_show(struct device *dev,
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 
-	return sysfs_emit(buf, "%d\n", to_pmem_perf(cxlds)->qos_class);
+	return sysfs_emit(buf, "%d\n", part_perf(cxlds, CXL_PARTMODE_PMEM)->qos_class);
 }
 
 static struct device_attribute dev_attr_pmem_qos_class =
@@ -450,20 +443,13 @@ static struct attribute *cxl_memdev_pmem_attributes[] = {
 	NULL,
 };
 
-static struct cxl_dpa_perf *to_ram_perf(struct cxl_dev_state *cxlds)
-{
-	if (cxlds->part[0].mode != CXL_PARTMODE_RAM)
-		return NULL;
-	return &cxlds->part[0].perf;
-}
-
 static ssize_t ram_qos_class_show(struct device *dev,
 				  struct device_attribute *attr, char *buf)
 {
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
 	struct cxl_dev_state *cxlds = cxlmd->cxlds;
 
-	return sysfs_emit(buf, "%d\n", to_ram_perf(cxlds)->qos_class);
+	return sysfs_emit(buf, "%d\n", part_perf(cxlds, CXL_PARTMODE_RAM)->qos_class);
 }
 
 static struct device_attribute dev_attr_ram_qos_class =
@@ -499,7 +485,7 @@ static umode_t cxl_ram_visible(struct kobject *kobj, struct attribute *a, int n)
 {
 	struct device *dev = kobj_to_dev(kobj);
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
-	struct cxl_dpa_perf *perf = to_ram_perf(cxlmd->cxlds);
+	struct cxl_dpa_perf *perf = part_perf(cxlmd->cxlds, CXL_PARTMODE_RAM);
 
 	if (a == &dev_attr_ram_qos_class.attr &&
 	    (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
@@ -518,7 +504,7 @@ static umode_t cxl_pmem_visible(struct kobject *kobj, struct attribute *a, int n
 {
 	struct device *dev = kobj_to_dev(kobj);
 	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
-	struct cxl_dpa_perf *perf = to_pmem_perf(cxlmd->cxlds);
+	struct cxl_dpa_perf *perf = part_perf(cxlmd->cxlds, CXL_PARTMODE_PMEM);
 
 	if (a == &dev_attr_pmem_qos_class.attr &&
 	    (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 314d37ca9bdb..a933b6f5454f 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -381,14 +381,11 @@ struct cxl_security_state {
 	struct kernfs_node *sanitize_node;
 };
 
-static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
+static inline resource_size_t cxl_part_size(struct cxl_dev_state *cxlds,
+					    enum cxl_partition_mode mode)
 {
-	/*
-	 * Static PMEM may be at partition index 0 when there is no static RAM
-	 * capacity.
-	 */
 	for (int i = 0; i < cxlds->nr_partitions; i++)
-		if (cxlds->part[i].mode == CXL_PARTMODE_PMEM)
+		if (cxlds->part[i].mode == mode)
 			return resource_size(&cxlds->part[i].res);
 	return 0;
 }
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index fcffe24dcb42..f19e08279ec7 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -114,7 +114,7 @@ static int cxl_mem_probe(struct device *dev)
 		return -ENXIO;
 	}
 
-	if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
+	if (cxl_part_size(cxlds, CXL_PARTMODE_PMEM) && IS_ENABLED(CONFIG_CXL_PMEM)) {
 		rc = devm_cxl_add_nvdimm(dev, parent_port, cxlmd);
 		if (rc) {
 			if (rc == -ENODEV)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 05/20] cxl/mem: Expose dynamic ram A partition in sysfs
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (3 preceding siblings ...)
  2026-04-11  1:22 ` [PATCH 04/20] cxl/core: Enforce partition order/simplify partition calls Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 06/20] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode Anisa Su
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

To properly configure CXL regions user space will need to know the
details of the dynamic ram partition.

Expose the first dynamic ram partition through sysfs.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: Update kernel version to 7.0]
[davidlohr: Remove "persistent" from description of
/sys/bus/cxl/devices/memX/dynamic_ram_a/qos_class]
---
 Documentation/ABI/testing/sysfs-bus-cxl | 24 +++++++++++
 drivers/cxl/core/memdev.c               | 57 +++++++++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 16a9b3d2e2c0..3d95c325f6e0 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -89,6 +89,30 @@ Description:
 		and there are platform specific performance related
 		side-effects that may result. First class-id is displayed.
 
+What:		/sys/bus/cxl/devices/memX/dynamic_ram_a/size
+Date:		May, 2025
+KernelVersion:	v7.0
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) The first Dynamic RAM partition capacity as bytes.
+
+
+What:		/sys/bus/cxl/devices/memX/dynamic_ram_a/qos_class
+Date:		May, 2025
+KernelVersion:	v7.0
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) For CXL host platforms that support "QoS Telemmetry"
+		this attribute conveys a comma delimited list of platform
+		specific cookies that identifies a QoS performance class
+		for the partition of the CXL mem device. These
+		class-ids can be compared against a similar "qos_class"
+		published for a root decoder. While it is not required
+		that the endpoints map their local memory-class to a
+		matching platform class, mismatches are not recommended
+		and there are platform specific performance related
+		side-effects that may result. First class-id is displayed.
+
 
 What:		/sys/bus/cxl/devices/memX/serial
 Date:		January, 2022
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 71602820f896..064cfd628577 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -101,6 +101,19 @@ static ssize_t pmem_size_show(struct device *dev, struct device_attribute *attr,
 static struct device_attribute dev_attr_pmem_size =
 	__ATTR(size, 0444, pmem_size_show, NULL);
 
+static ssize_t dynamic_ram_a_size_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+	unsigned long long len = cxl_part_size(cxlds, CXL_PARTMODE_DYNAMIC_RAM_A);
+
+	return sysfs_emit(buf, "%#llx\n", len);
+}
+
+static struct device_attribute dev_attr_dynamic_ram_a_size =
+	__ATTR(size, 0444, dynamic_ram_a_size_show, NULL);
+
 static ssize_t serial_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
 {
@@ -443,6 +456,25 @@ static struct attribute *cxl_memdev_pmem_attributes[] = {
 	NULL,
 };
 
+static ssize_t dynamic_ram_a_qos_class_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	return sysfs_emit(buf, "%d\n",
+			  part_perf(cxlds, CXL_PARTMODE_DYNAMIC_RAM_A)->qos_class);
+}
+
+static struct device_attribute dev_attr_dynamic_ram_a_qos_class =
+	__ATTR(qos_class, 0444, dynamic_ram_a_qos_class_show, NULL);
+
+static struct attribute *cxl_memdev_dynamic_ram_a_attributes[] = {
+	&dev_attr_dynamic_ram_a_size.attr,
+	&dev_attr_dynamic_ram_a_qos_class.attr,
+	NULL,
+};
+
 static ssize_t ram_qos_class_show(struct device *dev,
 				  struct device_attribute *attr, char *buf)
 {
@@ -519,6 +551,29 @@ static struct attribute_group cxl_memdev_pmem_attribute_group = {
 	.is_visible = cxl_pmem_visible,
 };
 
+static umode_t cxl_dynamic_ram_a_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
+	struct cxl_dpa_perf *perf = part_perf(cxlmd->cxlds, CXL_PARTMODE_DYNAMIC_RAM_A);
+
+	if (a == &dev_attr_dynamic_ram_a_qos_class.attr &&
+	    (!perf || perf->qos_class == CXL_QOS_CLASS_INVALID))
+		return 0;
+
+	if (a == &dev_attr_dynamic_ram_a_size.attr &&
+	    (!cxl_part_size(cxlmd->cxlds, CXL_PARTMODE_DYNAMIC_RAM_A)))
+		return 0;
+
+	return a->mode;
+}
+
+static struct attribute_group cxl_memdev_dynamic_ram_a_attribute_group = {
+	.name = "dynamic_ram_a",
+	.attrs = cxl_memdev_dynamic_ram_a_attributes,
+	.is_visible = cxl_dynamic_ram_a_visible,
+};
+
 static umode_t cxl_memdev_security_visible(struct kobject *kobj,
 					   struct attribute *a, int n)
 {
@@ -547,6 +602,7 @@ static const struct attribute_group *cxl_memdev_attribute_groups[] = {
 	&cxl_memdev_attribute_group,
 	&cxl_memdev_ram_attribute_group,
 	&cxl_memdev_pmem_attribute_group,
+	&cxl_memdev_dynamic_ram_a_attribute_group,
 	&cxl_memdev_security_attribute_group,
 	NULL,
 };
@@ -555,6 +611,7 @@ void cxl_memdev_update_perf(struct cxl_memdev *cxlmd)
 {
 	sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_ram_attribute_group);
 	sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_pmem_attribute_group);
+	sysfs_update_group(&cxlmd->dev.kobj, &cxl_memdev_dynamic_ram_a_attribute_group);
 }
 EXPORT_SYMBOL_NS_GPL(cxl_memdev_update_perf, "CXL");
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 06/20] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (4 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 05/20] cxl/mem: Expose dynamic ram A partition in sysfs Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 07/20] cxl/region: Add sparse DAX region support Anisa Su
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Endpoints can now support a single dynamic ram partition following the
persistent memory partition.

Expand the mode to allow a decoder to point to the first dynamic ram
partition.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 Documentation/ABI/testing/sysfs-bus-cxl | 18 +++++++++---------
 drivers/cxl/core/port.c                 |  4 ++++
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 3d95c325f6e0..c604c7ca6432 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -358,22 +358,22 @@ Description:
 
 
 What:		/sys/bus/cxl/devices/decoderX.Y/mode
-Date:		May, 2022
-KernelVersion:	v6.0
+Date:		May, 2022, May 2025
+KernelVersion:	v6.0, v6.16 (dynamic_ram_a)
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) When a CXL decoder is of devtype "cxl_decoder_endpoint" it
 		translates from a host physical address range, to a device
 		local address range. Device-local address ranges are further
-		split into a 'ram' (volatile memory) range and 'pmem'
-		(persistent memory) range. The 'mode' attribute emits one of
-		'ram', 'pmem', or 'none'. The 'none' indicates the decoder is
-		not actively decoding, or no DPA allocation policy has been
-		set.
+		split into a 'ram' (volatile memory) range, 'pmem' (persistent
+		memory), and 'dynamic_ram_a' (first Dynamic RAM) range. The
+		'mode' attribute emits one of 'ram', 'pmem', 'dynamic_ram_a' or
+		'none'. The 'none' indicates the decoder is not actively
+		decoding, or no DPA allocation policy has been set.
 
 		'mode' can be written, when the decoder is in the 'disabled'
-		state, with either 'ram' or 'pmem' to set the boundaries for the
-		next allocation.
+		state, with either 'ram', 'pmem', or 'dynamic_ram_a' to set the
+		boundaries for the next allocation.
 
 
 What:		/sys/bus/cxl/devices/decoderX.Y/dpa_resource
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0c5957d1d329..a7f71f36531f 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -128,6 +128,7 @@ static DEVICE_ATTR_RO(name)
 
 CXL_DECODER_FLAG_ATTR(cap_pmem, CXL_DECODER_F_PMEM);
 CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
+CXL_DECODER_FLAG_ATTR(cap_dynamic_ram_a, CXL_DECODER_F_RAM);
 CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
 CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
 CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
@@ -222,6 +223,8 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
 		mode = CXL_PARTMODE_PMEM;
 	else if (sysfs_streq(buf, "ram"))
 		mode = CXL_PARTMODE_RAM;
+	else if (sysfs_streq(buf, "dynamic_ram_a"))
+		mode = CXL_PARTMODE_DYNAMIC_RAM_A;
 	else
 		return -EINVAL;
 
@@ -327,6 +330,7 @@ static struct attribute_group cxl_decoder_base_attribute_group = {
 static struct attribute *cxl_decoder_root_attrs[] = {
 	&dev_attr_cap_pmem.attr,
 	&dev_attr_cap_ram.attr,
+	&dev_attr_cap_dynamic_ram_a.attr,
 	&dev_attr_cap_type2.attr,
 	&dev_attr_cap_type3.attr,
 	&dev_attr_target_list.attr,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 07/20] cxl/region: Add sparse DAX region support
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (5 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 06/20] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 08/20] cxl/events: Split event msgnum configuration from irq setup Anisa Su
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

From: Ira Weiny <ira.weiny@intel.com>

Dynamic Capacity CXL regions must allow memory to be added or removed
dynamically.  In addition to the quantity of memory available the
location of the memory within a DC partition is dynamic based on the
extents offered by a device.  CXL DAX regions must accommodate the
sparseness of this memory in the management of DAX regions and devices.

Introduce the concept of a sparse DAX region.  Introduce
create_dynamic_ram_a_region() sysfs entry to create such regions.
Special case dynamic capable regions to create a 0 sized seed DAX device
to maintain compatibility which requires a default DAX device to hold a
region reference.

Indicate 0 byte available capacity until such time that capacity is
added.

Sparse regions complicate the range mapping of dax devices.  There is no
known use case for range mapping on sparse regions.  Avoid the
complication by preventing range mapping of dax devices on sparse
regions.

Interleaving is deferred for now.  Add checks.

Based on an original patch by Navneet Singh.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
[fan: fix return logic when validating DCD mode in store_targetN]
---
 Documentation/ABI/testing/sysfs-bus-cxl | 22 ++++++++---------
 drivers/cxl/core/core.h                 | 11 +++++++++
 drivers/cxl/core/port.c                 |  1 +
 drivers/cxl/core/region.c               | 33 +++++++++++++++++++++++--
 drivers/dax/bus.c                       | 10 ++++++++
 drivers/dax/bus.h                       |  1 +
 drivers/dax/cxl.c                       | 16 ++++++++++--
 7 files changed, 79 insertions(+), 15 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index c604c7ca6432..3080aef9ad67 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -434,20 +434,20 @@ Description:
 		interleave_granularity).
 
 
-What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
-Date:		May, 2022, January, 2023
-KernelVersion:	v6.0 (pmem), v6.3 (ram)
+What:		/sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram,dynamic_ram_a}_region
+Date:		May, 2022, January, 2023, May 2025
+KernelVersion:	v6.0 (pmem), v6.3 (ram), v6.16 (dynamic_ram_a)
 Contact:	linux-cxl@vger.kernel.org
 Description:
 		(RW) Write a string in the form 'regionZ' to start the process
-		of defining a new persistent, or volatile memory region
-		(interleave-set) within the decode range bounded by root decoder
-		'decoderX.Y'. The value written must match the current value
-		returned from reading this attribute. An atomic compare exchange
-		operation is done on write to assign the requested id to a
-		region and allocate the region-id for the next creation attempt.
-		EBUSY is returned if the region name written does not match the
-		current cached value.
+		of defining a new persistent, volatile, or dynamic RAM memory
+		region (interleave-set) within the decode range bounded by root
+		decoder 'decoderX.Y'. The value written must match the current
+		value returned from reading this attribute.  An atomic compare
+		exchange operation is done on write to assign the requested id
+		to a region and allocate the region-id for the next creation
+		attempt.  EBUSY is returned if the region name written does not
+		match the current cached value.
 
 
 What:		/sys/bus/cxl/devices/decoderX.Y/delete_region
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 82ca3a476708..8881cc9323e0 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -6,6 +6,7 @@
 
 #include <cxl/mailbox.h>
 #include <linux/rwsem.h>
+#include <cxlmem.h>
 
 extern const struct device_type cxl_nvdimm_bridge_type;
 extern const struct device_type cxl_nvdimm_type;
@@ -18,6 +19,15 @@ enum cxl_detach_mode {
 	DETACH_INVALIDATE,
 };
 
+static inline struct cxl_memdev_state *
+cxled_to_mds(struct cxl_endpoint_decoder *cxled)
+{
+	struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+	struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+	return container_of(cxlds, struct cxl_memdev_state, cxlds);
+}
+
 #ifdef CONFIG_CXL_REGION
 
 struct cxl_region_context {
@@ -29,6 +39,7 @@ struct cxl_region_context {
 
 extern struct device_attribute dev_attr_create_pmem_region;
 extern struct device_attribute dev_attr_create_ram_region;
+extern struct device_attribute dev_attr_create_dynamic_ram_a_region;
 extern struct device_attribute dev_attr_delete_region;
 extern struct device_attribute dev_attr_region;
 extern const struct device_type cxl_pmem_region_type;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index a7f71f36531f..2d33001dac26 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -337,6 +337,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
 	&dev_attr_qos_class.attr,
 	SET_CXL_REGION_ATTR(create_pmem_region)
 	SET_CXL_REGION_ATTR(create_ram_region)
+	SET_CXL_REGION_ATTR(create_dynamic_ram_a_region)
 	SET_CXL_REGION_ATTR(delete_region)
 	NULL,
 };
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index edc267c6cf77..d2ce844a142f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -533,6 +533,11 @@ static ssize_t interleave_ways_store(struct device *dev,
 	if (rc)
 		return rc;
 
+	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A && val != 1) {
+		dev_err(dev, "Interleaving and DCD not supported\n");
+		return -EINVAL;
+	}
+
 	ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
 	if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
 		return rc;
@@ -2389,6 +2394,7 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
 	if (sysfs_streq(buf, "\n"))
 		rc = detach_target(cxlr, pos);
 	else {
+		struct cxl_endpoint_decoder *cxled;
 		struct device *dev;
 
 		dev = bus_find_device_by_name(&cxl_bus_type, NULL, buf);
@@ -2400,8 +2406,14 @@ static size_t store_targetN(struct cxl_region *cxlr, const char *buf, int pos,
 			goto out;
 		}
 
-		rc = attach_target(cxlr, to_cxl_endpoint_decoder(dev), pos,
-				   TASK_INTERRUPTIBLE);
+		cxled = to_cxl_endpoint_decoder(dev);
+		if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A &&
+		    !cxl_dcd_supported(cxled_to_mds(cxled))) {
+			dev_dbg(dev, "DCD unsupported\n");
+			rc = -EINVAL;
+			goto out;
+		}
+		rc = attach_target(cxlr, cxled, pos, TASK_INTERRUPTIBLE);
 out:
 		put_device(dev);
 	}
@@ -2750,6 +2762,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
 	switch (mode) {
 	case CXL_PARTMODE_RAM:
 	case CXL_PARTMODE_PMEM:
+	case CXL_PARTMODE_DYNAMIC_RAM_A:
 		break;
 	default:
 		dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode);
@@ -2802,6 +2815,21 @@ static ssize_t create_ram_region_store(struct device *dev,
 }
 DEVICE_ATTR_RW(create_ram_region);
 
+static ssize_t create_dynamic_ram_a_region_show(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static ssize_t create_dynamic_ram_a_region_store(struct device *dev,
+						 struct device_attribute *attr,
+						 const char *buf, size_t len)
+{
+	return create_region_store(dev, buf, len, CXL_PARTMODE_DYNAMIC_RAM_A);
+}
+DEVICE_ATTR_RW(create_dynamic_ram_a_region);
+
 static ssize_t region_show(struct device *dev, struct device_attribute *attr,
 			   char *buf)
 {
@@ -4081,6 +4109,7 @@ static int cxl_region_probe(struct device *dev)
 
 		return devm_cxl_add_pmem_region(cxlr);
 	case CXL_PARTMODE_RAM:
+	case CXL_PARTMODE_DYNAMIC_RAM_A:
 		rc = devm_cxl_region_edac_register(cxlr);
 		if (rc)
 			dev_dbg(&cxlr->dev, "CXL EDAC registration for region_id=%d failed\n",
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 68437c05e21d..e41e36747111 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -179,6 +179,11 @@ static bool is_static(struct dax_region *dax_region)
 	return (dax_region->res.flags & IORESOURCE_DAX_STATIC) != 0;
 }
 
+static bool is_sparse(struct dax_region *dax_region)
+{
+	return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0;
+}
+
 bool static_dev_dax(struct dev_dax *dev_dax)
 {
 	return is_static(dev_dax->region);
@@ -302,6 +307,9 @@ static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
 
 	lockdep_assert_held(&dax_region_rwsem);
 
+	if (is_sparse(dax_region))
+		return 0;
+
 	for_each_dax_region_resource(dax_region, res)
 		size -= resource_size(res);
 	return size;
@@ -1387,6 +1395,8 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n)
 		return 0;
 	if (a == &dev_attr_mapping.attr && is_static(dax_region))
 		return 0;
+	if (a == &dev_attr_mapping.attr && is_sparse(dax_region))
+		return 0;
 	if ((a == &dev_attr_align.attr ||
 	     a == &dev_attr_size.attr) && is_static(dax_region))
 		return 0444;
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 7b1a83f1ce1f..7abdd5a403dc 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -15,6 +15,7 @@ struct dax_region;
 /* dax bus specific ioresource flags */
 #define IORESOURCE_DAX_STATIC BIT(0)
 #define IORESOURCE_DAX_KMEM BIT(1)
+#define IORESOURCE_DAX_SPARSE_CAP BIT(2)
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct range *range, int target_node, unsigned int align,
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 3ab39b77843d..9ebe974d25c3 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -13,19 +13,31 @@ static int cxl_dax_region_probe(struct device *dev)
 	struct cxl_region *cxlr = cxlr_dax->cxlr;
 	struct dax_region *dax_region;
 	struct dev_dax_data data;
+	resource_size_t dev_size;
+	unsigned long flags;
 
 	if (nid == NUMA_NO_NODE)
 		nid = memory_add_physaddr_to_nid(cxlr_dax->hpa_range.start);
 
+	flags = IORESOURCE_DAX_KMEM;
+	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A)
+		flags |= IORESOURCE_DAX_SPARSE_CAP;
+
 	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
-				      PMD_SIZE, IORESOURCE_DAX_KMEM);
+				      PMD_SIZE, flags);
 	if (!dax_region)
 		return -ENOMEM;
 
+	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A)
+		/* Add empty seed dax device */
+		dev_size = 0;
+	else
+		dev_size = range_len(&cxlr_dax->hpa_range);
+
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
 		.id = -1,
-		.size = range_len(&cxlr_dax->hpa_range),
+		.size = dev_size,
 		.memmap_on_memory = true,
 	};
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 08/20] cxl/events: Split event msgnum configuration from irq setup
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (6 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 07/20] cxl/region: Add sparse DAX region support Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 09/20] cxl/pci: Factor out interrupt policy check Anisa Su
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni, Li Ming

From: Ira Weiny <ira.weiny@intel.com>

Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal.  BIOS may have control over non-DCD event
processing.  DCD interrupt configuration needs to be separate from
memory event interrupt configuration.

Split cxl_event_config_msgnums() from irq setup in preparation for
separate DCD interrupts configuration.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/pci.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 60f9fa05d9ef..35942b2ace53 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -599,35 +599,31 @@ static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
 	return cxl_event_get_int_policy(mds, policy);
 }
 
-static int cxl_event_irqsetup(struct cxl_memdev_state *mds)
+static int cxl_event_irqsetup(struct cxl_memdev_state *mds,
+			      struct cxl_event_interrupt_policy *policy)
 {
 	struct cxl_dev_state *cxlds = &mds->cxlds;
-	struct cxl_event_interrupt_policy policy;
 	int rc;
 
-	rc = cxl_event_config_msgnums(mds, &policy);
-	if (rc)
-		return rc;
-
-	rc = cxl_event_req_irq(cxlds, policy.info_settings);
+	rc = cxl_event_req_irq(cxlds, policy->info_settings);
 	if (rc) {
 		dev_err(cxlds->dev, "Failed to get interrupt for event Info log\n");
 		return rc;
 	}
 
-	rc = cxl_event_req_irq(cxlds, policy.warn_settings);
+	rc = cxl_event_req_irq(cxlds, policy->warn_settings);
 	if (rc) {
 		dev_err(cxlds->dev, "Failed to get interrupt for event Warn log\n");
 		return rc;
 	}
 
-	rc = cxl_event_req_irq(cxlds, policy.failure_settings);
+	rc = cxl_event_req_irq(cxlds, policy->failure_settings);
 	if (rc) {
 		dev_err(cxlds->dev, "Failed to get interrupt for event Failure log\n");
 		return rc;
 	}
 
-	rc = cxl_event_req_irq(cxlds, policy.fatal_settings);
+	rc = cxl_event_req_irq(cxlds, policy->fatal_settings);
 	if (rc) {
 		dev_err(cxlds->dev, "Failed to get interrupt for event Fatal log\n");
 		return rc;
@@ -674,11 +670,15 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 		return -EBUSY;
 	}
 
+	rc = cxl_event_config_msgnums(mds, &policy);
+	if (rc)
+		return rc;
+
 	rc = cxl_mem_alloc_event_buf(mds);
 	if (rc)
 		return rc;
 
-	rc = cxl_event_irqsetup(mds);
+	rc = cxl_event_irqsetup(mds, &policy);
 	if (rc)
 		return rc;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 09/20] cxl/pci: Factor out interrupt policy check
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (7 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 08/20] cxl/events: Split event msgnum configuration from irq setup Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 10/20] cxl/mem: Configure dynamic capacity interrupts Anisa Su
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni, Li Ming

From: Ira Weiny <ira.weiny@intel.com>

Dynamic Capacity Devices (DCD) require event interrupts to process
memory addition or removal.  BIOS may have control over non-DCD event
processing.  DCD interrupt configuration needs to be separate from
memory event interrupt configuration.

Factor out event interrupt setting validation.

Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Link: https://lore.kernel.org/all/663922b475e50_d54d72945b@dwillia2-xfh.jf.intel.com.notmuch/ [1]
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/pci.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 35942b2ace53..8d12c684d670 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -639,6 +639,21 @@ static bool cxl_event_int_is_fw(u8 setting)
 	return mode == CXL_INT_FW;
 }
 
+static bool cxl_event_validate_mem_policy(struct cxl_memdev_state *mds,
+					  struct cxl_event_interrupt_policy *policy)
+{
+	if (cxl_event_int_is_fw(policy->info_settings) ||
+	    cxl_event_int_is_fw(policy->warn_settings) ||
+	    cxl_event_int_is_fw(policy->failure_settings) ||
+	    cxl_event_int_is_fw(policy->fatal_settings)) {
+		dev_err(mds->cxlds.dev,
+			"FW still in control of Event Logs despite _OSC settings\n");
+		return false;
+	}
+
+	return true;
+}
+
 static int cxl_event_config(struct pci_host_bridge *host_bridge,
 			    struct cxl_memdev_state *mds, bool irq_avail)
 {
@@ -661,14 +676,8 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 	if (rc)
 		return rc;
 
-	if (cxl_event_int_is_fw(policy.info_settings) ||
-	    cxl_event_int_is_fw(policy.warn_settings) ||
-	    cxl_event_int_is_fw(policy.failure_settings) ||
-	    cxl_event_int_is_fw(policy.fatal_settings)) {
-		dev_err(mds->cxlds.dev,
-			"FW still in control of Event Logs despite _OSC settings\n");
+	if (!cxl_event_validate_mem_policy(mds, &policy))
 		return -EBUSY;
-	}
 
 	rc = cxl_event_config_msgnums(mds, &policy);
 	if (rc)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 10/20] cxl/mem: Configure dynamic capacity interrupts
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (8 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 09/20] cxl/pci: Factor out interrupt policy check Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 11/20] cxl/core: Return endpoint decoder information from region search Anisa Su
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Li Ming, Fan Ni

From: Ira Weiny <ira.weiny@intel.com>

Dynamic Capacity Devices (DCD) support extent change notifications
through the event log mechanism.  The interrupt mailbox commands were
extended in CXL 3.1 to support these notifications.  Firmware can't
configure DCD events to be FW controlled but can retain control of
memory events.

Configure DCD event log interrupts on devices supporting dynamic
capacity.  Disable DCD if interrupts are not supported.

Care is taken to preserve the interrupt policy set by the FW if FW first
has been selected by the BIOS.

Based on an original patch by Navneet Singh.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/cxlmem.h |  2 ++
 drivers/cxl/pci.c    | 73 ++++++++++++++++++++++++++++++++++++--------
 2 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index a933b6f5454f..88bcc69f22c2 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -218,7 +218,9 @@ struct cxl_event_interrupt_policy {
 	u8 warn_settings;
 	u8 failure_settings;
 	u8 fatal_settings;
+	u8 dcd_settings;
 } __packed;
+#define CXL_EVENT_INT_POLICY_BASE_SIZE 4 /* info, warn, failure, fatal */
 
 /**
  * struct cxl_event_state - Event log driver state
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 8d12c684d670..622252ae0ed8 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -569,23 +569,34 @@ static int cxl_event_get_int_policy(struct cxl_memdev_state *mds,
 }
 
 static int cxl_event_config_msgnums(struct cxl_memdev_state *mds,
-				    struct cxl_event_interrupt_policy *policy)
+				    struct cxl_event_interrupt_policy *policy,
+				    bool native_cxl)
 {
 	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+	size_t size_in = CXL_EVENT_INT_POLICY_BASE_SIZE;
 	struct cxl_mbox_cmd mbox_cmd;
 	int rc;
 
-	*policy = (struct cxl_event_interrupt_policy) {
-		.info_settings = CXL_INT_MSI_MSIX,
-		.warn_settings = CXL_INT_MSI_MSIX,
-		.failure_settings = CXL_INT_MSI_MSIX,
-		.fatal_settings = CXL_INT_MSI_MSIX,
-	};
+	/* memory event policy is left if FW has control */
+	if (native_cxl) {
+		*policy = (struct cxl_event_interrupt_policy) {
+			.info_settings = CXL_INT_MSI_MSIX,
+			.warn_settings = CXL_INT_MSI_MSIX,
+			.failure_settings = CXL_INT_MSI_MSIX,
+			.fatal_settings = CXL_INT_MSI_MSIX,
+			.dcd_settings = 0,
+		};
+	}
+
+	if (cxl_dcd_supported(mds)) {
+		policy->dcd_settings = CXL_INT_MSI_MSIX;
+		size_in += sizeof(policy->dcd_settings);
+	}
 
 	mbox_cmd = (struct cxl_mbox_cmd) {
 		.opcode = CXL_MBOX_OP_SET_EVT_INT_POLICY,
 		.payload_in = policy,
-		.size_in = sizeof(*policy),
+		.size_in = size_in,
 	};
 
 	rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
@@ -632,6 +643,30 @@ static int cxl_event_irqsetup(struct cxl_memdev_state *mds,
 	return 0;
 }
 
+static int cxl_irqsetup(struct cxl_memdev_state *mds,
+			struct cxl_event_interrupt_policy *policy,
+			bool native_cxl)
+{
+	struct cxl_dev_state *cxlds = &mds->cxlds;
+	int rc;
+
+	if (native_cxl) {
+		rc = cxl_event_irqsetup(mds, policy);
+		if (rc)
+			return rc;
+	}
+
+	if (cxl_dcd_supported(mds)) {
+		rc = cxl_event_req_irq(cxlds, policy->dcd_settings);
+		if (rc) {
+			dev_err(cxlds->dev, "Failed to get interrupt for DCD event log\n");
+			cxl_disable_dcd(mds);
+		}
+	}
+
+	return 0;
+}
+
 static bool cxl_event_int_is_fw(u8 setting)
 {
 	u8 mode = FIELD_GET(CXLDEV_EVENT_INT_MODE_MASK, setting);
@@ -657,18 +692,26 @@ static bool cxl_event_validate_mem_policy(struct cxl_memdev_state *mds,
 static int cxl_event_config(struct pci_host_bridge *host_bridge,
 			    struct cxl_memdev_state *mds, bool irq_avail)
 {
-	struct cxl_event_interrupt_policy policy;
+	struct cxl_event_interrupt_policy policy = { 0 };
+	bool native_cxl = host_bridge->native_cxl_error;
 	int rc;
 
 	/*
 	 * When BIOS maintains CXL error reporting control, it will process
 	 * event records.  Only one agent can do so.
+	 *
+	 * If BIOS has control of events and DCD is not supported skip event
+	 * configuration.
 	 */
-	if (!host_bridge->native_cxl_error)
+	if (!native_cxl && !cxl_dcd_supported(mds))
 		return 0;
 
 	if (!irq_avail) {
 		dev_info(mds->cxlds.dev, "No interrupt support, disable event processing.\n");
+		if (cxl_dcd_supported(mds)) {
+			dev_info(mds->cxlds.dev, "DCD requires interrupts, disable DCD\n");
+			cxl_disable_dcd(mds);
+		}
 		return 0;
 	}
 
@@ -676,10 +719,10 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 	if (rc)
 		return rc;
 
-	if (!cxl_event_validate_mem_policy(mds, &policy))
+	if (native_cxl && !cxl_event_validate_mem_policy(mds, &policy))
 		return -EBUSY;
 
-	rc = cxl_event_config_msgnums(mds, &policy);
+	rc = cxl_event_config_msgnums(mds, &policy, native_cxl);
 	if (rc)
 		return rc;
 
@@ -687,12 +730,16 @@ static int cxl_event_config(struct pci_host_bridge *host_bridge,
 	if (rc)
 		return rc;
 
-	rc = cxl_event_irqsetup(mds, &policy);
+	rc = cxl_irqsetup(mds, &policy, native_cxl);
 	if (rc)
 		return rc;
 
 	cxl_mem_get_event_records(mds, CXLDEV_EVENT_STATUS_ALL);
 
+	dev_dbg(mds->cxlds.dev, "Event config : %s DCD %s\n",
+		native_cxl ? "OS" : "BIOS",
+		cxl_dcd_supported(mds) ? "supported" : "not supported");
+
 	return 0;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 11/20] cxl/core: Return endpoint decoder information from region search
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (9 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 10/20] cxl/mem: Configure dynamic capacity interrupts Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 12/20] cxl/extent: Process dynamic partition events and realize region extents Anisa Su
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni, Li Ming

From: Ira Weiny <ira.weiny@intel.com>

cxl_dpa_to_region() finds the region from a <DPA, device> tuple.
The search involves finding the device endpoint decoder as well.

Dynamic capacity extent processing uses the endpoint decoder HPA
information to calculate the HPA offset.  In addition, well behaved
extents should be contained within an endpoint decoder.

Return the endpoint decoder found to be used in subsequent DCD code.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Li Ming <ming.li@zohomail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/cxl/core/core.h   | 6 ++++--
 drivers/cxl/core/mbox.c   | 2 +-
 drivers/cxl/core/memdev.c | 4 ++--
 drivers/cxl/core/region.c | 8 +++++++-
 4 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 8881cc9323e0..14723cfd05f0 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -58,7 +58,8 @@ int cxl_decoder_detach(struct cxl_region *cxlr,
 int cxl_region_init(void);
 void cxl_region_exit(void);
 int cxl_get_poison_by_endpoint(struct cxl_port *port);
-struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa);
+struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa,
+				     struct cxl_endpoint_decoder **cxled);
 u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
 		   u64 dpa);
 int devm_cxl_add_dax_region(struct cxl_region *cxlr);
@@ -71,7 +72,8 @@ static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
 	return ULLONG_MAX;
 }
 static inline
-struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
+struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa,
+				     struct cxl_endpoint_decoder **cxled)
 {
 	return NULL;
 }
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index f9a5e21f5d09..01b1a318f34f 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -968,7 +968,7 @@ void cxl_event_trace_record(struct cxl_memdev *cxlmd,
 		guard(rwsem_read)(&cxl_rwsem.dpa);
 
 		dpa = le64_to_cpu(evt->media_hdr.phys_addr) & CXL_DPA_MASK;
-		cxlr = cxl_dpa_to_region(cxlmd, dpa);
+		cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL);
 		if (cxlr) {
 			u64 cache_size = cxlr->params.cache_size;
 
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 064cfd628577..b8b3489f69e5 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -320,7 +320,7 @@ int cxl_inject_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
 	if (rc)
 		return rc;
 
-	cxlr = cxl_dpa_to_region(cxlmd, dpa);
+	cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL);
 	if (cxlr)
 		dev_warn_once(cxl_mbox->host,
 			      "poison inject dpa:%#llx region: %s\n", dpa,
@@ -389,7 +389,7 @@ int cxl_clear_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
 	if (rc)
 		return rc;
 
-	cxlr = cxl_dpa_to_region(cxlmd, dpa);
+	cxlr = cxl_dpa_to_region(cxlmd, dpa, NULL);
 	if (cxlr)
 		dev_warn_once(cxl_mbox->host,
 			      "poison clear dpa:%#llx region: %s\n", dpa,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index d2ce844a142f..e00fdb74589c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2991,6 +2991,7 @@ int cxl_get_poison_by_endpoint(struct cxl_port *port)
 struct cxl_dpa_to_region_context {
 	struct cxl_region *cxlr;
 	u64 dpa;
+	struct cxl_endpoint_decoder *cxled;
 };
 
 static int __cxl_dpa_to_region(struct device *dev, void *arg)
@@ -3024,11 +3025,13 @@ static int __cxl_dpa_to_region(struct device *dev, void *arg)
 			dev_name(dev));
 
 	ctx->cxlr = cxlr;
+	ctx->cxled = cxled;
 
 	return 1;
 }
 
-struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
+struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa,
+				     struct cxl_endpoint_decoder **cxled)
 {
 	struct cxl_dpa_to_region_context ctx;
 	struct cxl_port *port = cxlmd->endpoint;
@@ -3042,6 +3045,9 @@ struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa)
 	if (cxl_num_decoders_committed(port))
 		device_for_each_child(&port->dev, &ctx, __cxl_dpa_to_region);
 
+	if (cxled)
+		*cxled = ctx.cxled;
+
 	return ctx.cxlr;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 12/20] cxl/extent: Process dynamic partition events and realize region extents
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (10 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 11/20] cxl/core: Return endpoint decoder information from region search Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 13/20] cxl/region/extent: Expose region extent information in sysfs Anisa Su
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Anisa Su

From: Ira Weiny <ira.weiny@intel.com>

A dynamic capacity device (DCD) sends events to signal the host for
changes in the availability of Dynamic Capacity (DC) memory.  These
events contain extents describing a DPA range and meta data for memory
to be added or removed.  Events may be sent from the device at any time.

Three types of events can be signaled, Add, Release, and Force Release.

On add, the host may accept or reject the memory being offered. If no
region exists or the extent DPA range is invalid, the extent is
rejected. Additionally, the implementation choice to reject extents
without a tag or form a non-contiguous range has been made (not stated
in the spec).

Add extent events may be grouped by a 'more' bit which indicates those extents
should be processed as a group. Use the first extent with a valid DPA
range, region, and non-null tag to check subsequent extents against for
contiguity and that it has the same tag.

On remove, the host can delay the response until the host is safely not
using the memory. If no region exists the release can be sent
immediately. The host may also release extents at any time. Thus the 'more' bit
grouping of release events is of less value and can be ignored in favor of
sending multiple release capacity responses for groups of release events. Upon
receiving a release event containing any extent falling under a region
with a tag matching the region's tag, release all extents under the
region.

Force removal is intended as a mechanism between the FM and the device
and intended only when the host is unresponsive, out of sync, or
otherwise broken.  Purposely ignore force removal events.

Regions are made up of one or more devices which may be surfacing memory
to the host. Once all devices in a region have surfaced an extent the
region can expose a corresponding extent for the user to consume.
A region extent is comprised of 1 or more contiguous device extents with
the same tag. It is surfaced after all extents from the DC event ar
 processed (events grouped together with the "More" flag).
Interleaving is unsupported.

Per the specification the device is allowed to offer or remove extents
at any time.  However, anticipated use cases can expect extents to be
offered, accepted, and removed in well defined chunks.

Simplify extent tracking with the following restrictions.

	1) Flag for removal any extent which overlaps a requested
	   release range.
	2) Refuse the offer of extents which overlap already accepted
	   memory ranges.
	3) Accept again a range which has already been accepted by the
	   host.  Eating duplicates serves three purposes.
	   3a) This simplifies the code if the device should get out of
	       sync with the host.  And it should be safe to acknowledge
	       the extent again.
	   3b) This simplifies the code to process existing extents if
	       the extent list should change while the extent list is
	       being read.
	   3c) Duplicates for a given partition which are seen during a
	       race between the hardware surfacing an extent and the cxl
	       dax driver scanning for existing extents will be ignored.

	   NOTE: Processing existing extents is done in a later patch.

Management of the region extent devices must be synchronized with
potential uses of the memory within the DAX layer.  Create region extent
devices as children of the cxl_dax_region device such that the DAX
region driver can co-drive them and synchronize with the DAX layer.
Synchronization and management is handled in a subsequent patch.

Process DCD events and create region devices.

Based on an original patch by Navneet Singh.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Anisa Su <anisa.su@samsung.com>
---
Changes:
[anisa] replace mds->pending_extents xarray with list

Spec requires the DC add response to send extents back in the same order
they were received in DC add event record. Extents are sorted by dpa
order to check for contiguity, then re-sorted by original index to send
DC rsp.

[anisa] replace cxl_dax_region->extent_ida with pointer to single region_extent

Each region is now associated with 1 tagged capacity. Make
cxl_dax_region:region_extent 1:1 instead of 1:many.

[anisa] enforce contiguous tagged add
Collect extents in pending_list and sort by dpa to enforce they are
contiguous. Drop non-contiguous extents.

Build 1 region_extent incrementally while iterating over
pending_extents. Uses the first valid extent (valid DPA, non-null tag, etc.)
to check all other extents against. Then all following extents must have the
same tag in order to be added to the region_extent.

"Commit" the region by onlining the region extent (sets
cxlr_dax->region_extent = region_ext). Once set, we disallow
adding additional extents to the region. It can only be added to after
being cleared with a release event, when we remove the entire tagged capacity
and set cxlr_dax->region_extent back to NULL.

[anisa] enforce tagged release

Upon receiving any DC release event, check that the extent is within the
bounds of a cxl_dax_region and its tag matches the region's tag.

Then release the entire capacity of the region by removing the
cxlr_dax->region_extent.

Deletes the functions allowing us to delete individual decoder_extents
and recalc the region_extent hpa range, as partial release is no longer
supported
---
 drivers/cxl/core/Makefile     |   2 +-
 drivers/cxl/core/core.h       |  18 +-
 drivers/cxl/core/extent.c     | 368 +++++++++++++++++++++++++++++++
 drivers/cxl/core/mbox.c       | 401 +++++++++++++++++++++++++++++++++-
 drivers/cxl/core/region_dax.c |   6 +
 drivers/cxl/cxl.h             |  55 ++++-
 drivers/cxl/cxlmem.h          |  30 +++
 include/cxl/event.h           |  39 ++++
 tools/testing/cxl/Kbuild      |   5 +-
 9 files changed, 919 insertions(+), 5 deletions(-)
 create mode 100644 drivers/cxl/core/extent.c

diff --git a/drivers/cxl/core/Makefile b/drivers/cxl/core/Makefile
index ce7213818d3c..208917ad8aac 100644
--- a/drivers/cxl/core/Makefile
+++ b/drivers/cxl/core/Makefile
@@ -15,7 +15,7 @@ cxl_core-y += hdm.o
 cxl_core-y += pmu.o
 cxl_core-y += cdat.o
 cxl_core-$(CONFIG_TRACING) += trace.o
-cxl_core-$(CONFIG_CXL_REGION) += region.o region_pmem.o region_dax.o
+cxl_core-$(CONFIG_CXL_REGION) += region.o region_pmem.o region_dax.o extent.o
 cxl_core-$(CONFIG_CXL_MCE) += mce.o
 cxl_core-$(CONFIG_CXL_FEATURES) += features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += edac.o
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 14723cfd05f0..14d91dd52b02 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -64,13 +64,28 @@ u64 cxl_dpa_to_hpa(struct cxl_region *cxlr, const struct cxl_memdev *cxlmd,
 		   u64 dpa);
 int devm_cxl_add_dax_region(struct cxl_region *cxlr);
 int devm_cxl_add_pmem_region(struct cxl_region *cxlr);
-
+int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent);
+int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent);
+int online_region_extent(struct region_extent *region_extent);
 #else
 static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
 				 const struct cxl_memdev *cxlmd, u64 dpa)
 {
 	return ULLONG_MAX;
 }
+static inline int cxl_add_extent(struct cxl_memdev_state *mds,
+				   struct cxl_extent *extent) {
+	return 0;
+}
+static inline int cxl_rm_extent(struct cxl_memdev_state *mds,
+				struct cxl_extent *extent)
+{
+	return 0;
+}
+static inline int online_region_extent(struct region_extent *region_extent)
+{
+	return 0;
+}
 static inline
 struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa,
 				     struct cxl_endpoint_decoder **cxled)
@@ -166,6 +181,7 @@ long cxl_pci_get_latency(struct pci_dev *pdev);
 int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c);
 int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port,
 					struct access_coordinate *c);
+void memdev_release_extent(struct cxl_memdev_state *mds, struct range *range);
 
 static inline struct device *port_to_host(struct cxl_port *port)
 {
diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
new file mode 100644
index 000000000000..ecdd7717f7d7
--- /dev/null
+++ b/drivers/cxl/core/extent.c
@@ -0,0 +1,368 @@
+// SPDX-License-Identifier: GPL-2.0
+/*  Copyright(c) 2024 Intel Corporation. All rights reserved. */
+
+#include <linux/device.h>
+#include <cxl.h>
+
+#include "core.h"
+
+static void cxled_release_extent(struct cxl_endpoint_decoder *cxled,
+				 struct cxled_extent *ed_extent)
+{
+	struct cxl_memdev_state *mds = cxled_to_mds(cxled);
+	struct device *dev = &cxled->cxld.dev;
+
+	dev_dbg(dev, "Remove extent %pra (%pU)\n",
+		&ed_extent->dpa_range, &ed_extent->uuid);
+	memdev_release_extent(mds, &ed_extent->dpa_range);
+	kfree(ed_extent);
+}
+
+static void free_region_extent(struct region_extent *region_extent)
+{
+	struct cxled_extent *ed_extent;
+	unsigned long index;
+
+	/*
+	 * Remove from each endpoint decoder the extent which backs this region
+	 * extent
+	 */
+	xa_for_each(&region_extent->decoder_extents, index, ed_extent)
+		cxled_release_extent(ed_extent->cxled, ed_extent);
+	xa_destroy(&region_extent->decoder_extents);
+	region_extent->cxlr_dax->region_extent = NULL;
+	kfree(region_extent);
+}
+
+static void region_extent_release(struct device *dev)
+{
+	struct region_extent *region_extent = to_region_extent(dev);
+
+	free_region_extent(region_extent);
+}
+
+static const struct device_type region_extent_type = {
+	.name = "extent",
+	.release = region_extent_release,
+};
+
+bool is_region_extent(struct device *dev)
+{
+	return dev->type == &region_extent_type;
+}
+EXPORT_SYMBOL_NS_GPL(is_region_extent, "CXL");
+
+static void region_extent_unregister(void *ext)
+{
+	struct region_extent *region_extent = ext;
+
+	dev_dbg(&region_extent->dev, "DAX region rm extent HPA %pra\n",
+		&region_extent->hpa_range);
+	device_unregister(&region_extent->dev);
+}
+
+static void region_rm_extent(struct region_extent *region_extent)
+{
+	struct device *region_dev = region_extent->dev.parent;
+
+	devm_release_action(region_dev, region_extent_unregister, region_extent);
+}
+
+static struct region_extent *
+alloc_region_extent(struct cxl_dax_region *cxlr_dax, struct range *hpa_range,
+		    uuid_t *uuid)
+{
+	struct region_extent *region_extent __free(kfree) =
+				kzalloc(sizeof(*region_extent), GFP_KERNEL);
+	if (!region_extent)
+		return ERR_PTR(-ENOMEM);
+
+	region_extent->hpa_range = *hpa_range;
+	region_extent->cxlr_dax = cxlr_dax;
+	uuid_copy(&region_extent->uuid, uuid);
+	region_extent->dev.id = 0;
+	xa_init(&region_extent->decoder_extents);
+	return no_free_ptr(region_extent);
+}
+
+int online_region_extent(struct region_extent *region_extent)
+{
+	struct cxl_dax_region *cxlr_dax = region_extent->cxlr_dax;
+	struct device *dev = &region_extent->dev;
+	int rc;
+
+	device_initialize(dev);
+	device_set_pm_not_required(dev);
+	dev->parent = &cxlr_dax->dev;
+	dev->type = &region_extent_type;
+	rc = dev_set_name(dev, "extent%d.%d", cxlr_dax->cxlr->id, dev->id);
+	if (rc)
+		goto err;
+
+	rc = device_add(dev);
+	if (rc)
+		goto err;
+
+	cxlr_dax->region_extent = region_extent;
+	dev_dbg(dev, "region extent HPA %pra\n", &region_extent->hpa_range);
+	return devm_add_action_or_reset(&cxlr_dax->dev, region_extent_unregister,
+					region_extent);
+
+err:
+	dev_err(&cxlr_dax->dev, "Failed to initialize region extent HPA %pra\n",
+		&region_extent->hpa_range);
+
+	put_device(dev);
+	return rc;
+}
+
+static bool extents_contain(struct cxl_dax_region *cxlr_dax,
+			    struct cxl_endpoint_decoder *cxled,
+			    struct range *new_range)
+{
+	struct region_extent *re = cxlr_dax->region_extent;
+	struct cxled_extent *entry;
+	unsigned long index;
+
+	if (!re)
+		return false;
+
+	xa_for_each(&re->decoder_extents, index, entry) {
+		if (cxled == entry->cxled &&
+		    range_contains(&entry->dpa_range, new_range))
+			return true;
+	}
+	return false;
+}
+
+static bool extents_overlap(struct cxl_dax_region *cxlr_dax,
+			    struct cxl_endpoint_decoder *cxled,
+			    struct range *new_range)
+{
+	struct region_extent *re = cxlr_dax->region_extent;
+	struct cxled_extent *entry;
+	unsigned long index;
+
+	if (!re)
+		return false;
+
+	xa_for_each(&re->decoder_extents, index, entry) {
+		if (cxled == entry->cxled &&
+		    range_overlaps(&entry->dpa_range, new_range))
+			return true;
+	}
+	return false;
+}
+
+static void calc_hpa_range(struct cxl_endpoint_decoder *cxled,
+			   struct cxl_dax_region *cxlr_dax,
+			   struct range *dpa_range,
+			   struct range *hpa_range)
+{
+	resource_size_t dpa_offset, hpa;
+
+	dpa_offset = dpa_range->start - cxled->dpa_res->start;
+	hpa = cxled->cxld.hpa_range.start + dpa_offset;
+
+	hpa_range->start = hpa - cxlr_dax->hpa_range.start;
+	hpa_range->end = hpa_range->start + range_len(dpa_range) - 1;
+}
+
+int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
+{
+	u64 start_dpa = le64_to_cpu(extent->start_dpa);
+	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+	struct cxl_endpoint_decoder *cxled;
+	struct range dpa_range, hpa_range;
+	struct cxl_dax_region *cxlr_dax;
+	struct region_extent *reg_ext;
+	struct cxl_region *cxlr;
+	uuid_t tag;
+
+	dpa_range = (struct range) {
+		.start = start_dpa,
+		.end = start_dpa + le64_to_cpu(extent->length) - 1,
+	};
+
+	guard(rwsem_read)(&cxl_rwsem.region);
+	cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
+	if (!cxlr) {
+		/*
+		 * No region can happen here for a few reasons:
+		 *
+		 * 1) Extents were accepted and the host crashed/rebooted
+		 *    leaving them in an accepted state.  On reboot the host
+		 *    has not yet created a region to own them.
+		 *
+		 * 2) Region destruction won the race with the device releasing
+		 *    all the extents.  Here the release will be a duplicate of
+		 *    the one sent via region destruction.
+		 *
+		 * 3) The device is confused and releasing extents for which no
+		 *    region ever existed.
+		 *
+		 * In all these cases make sure the device knows we are not
+		 * using this extent.
+		 */
+		memdev_release_extent(mds, &dpa_range);
+		return -ENXIO;
+	}
+
+	cxlr_dax = cxlr->cxlr_dax;
+	reg_ext = cxlr_dax->region_extent;
+	if (!reg_ext) {
+		dev_err(&cxlr->cxlr_dax->dev,
+			"no capacity has been added to the region\n");
+		return -ENXIO;
+	}
+
+	import_uuid(&tag, extent->uuid);
+	if (!uuid_equal(&tag, &reg_ext->uuid)) {
+		dev_err(&cxlr->cxlr_dax->dev,
+			"extent tag %pU doesn't match region tag %pU\n",
+			&tag, &reg_ext->uuid);
+		return -EINVAL;
+	}
+
+	calc_hpa_range(cxled, cxlr_dax, &dpa_range, &hpa_range);
+	if (!range_contains(&reg_ext->hpa_range, &hpa_range)) {
+		dev_err(&cxlr_dax->dev,
+			"extent HPA %pra exceeds region HPA %pra\n",
+			&hpa_range, &reg_ext->hpa_range);
+		return -EINVAL;
+	}
+
+	/* Release entire capacity of the region */
+	region_rm_extent(reg_ext);
+	return 0;
+}
+
+static int cxlr_add_extent(struct cxl_memdev_state *mds,
+			   struct cxl_dax_region *cxlr_dax,
+			   struct cxl_endpoint_decoder *cxled,
+			   struct cxled_extent *ed_extent)
+ {
+	struct region_extent **reg_ext;
+ 	struct range hpa_range;
+ 	int rc;
+
+ 	calc_hpa_range(cxled, cxlr_dax, &ed_extent->dpa_range, &hpa_range);
+
+	reg_ext = &mds->add_ctx.region_extent;
+	if (*reg_ext) {
+ 		/* Add decoder extent to existing region extent */
+		dev_dbg(&cxlr_dax->dev,
+			"Append decoder extent to region extent\n");
+		rc = xa_insert(&(*reg_ext)->decoder_extents,
+ 			       ed_extent->dpa_range.start, ed_extent,
+ 			       GFP_KERNEL);
+ 		if (rc) {
+ 			kfree(ed_extent);
+ 			return rc;
+ 		}
+		(*reg_ext)->hpa_range.start = min((*reg_ext)->hpa_range.start,
+ 						     hpa_range.start);
+		(*reg_ext)->hpa_range.end = max((*reg_ext)->hpa_range.end,
+ 						   hpa_range.end);
+ 		return 0;
+ 	}
+
+ 	/* First decoder extent - create new region extent */
+	dev_dbg(&cxlr_dax->dev, "Alloc new region_extent\n");
+	mds->add_ctx.region_extent = alloc_region_extent(cxlr_dax,
+							 &hpa_range,
+							 &ed_extent->uuid);
+	if (IS_ERR(*reg_ext)) {
+ 		kfree(ed_extent);
+		return PTR_ERR(*reg_ext);
+ 	}
+
+	rc = xa_insert(&(*reg_ext)->decoder_extents,
+ 		       ed_extent->dpa_range.start, ed_extent, GFP_KERNEL);
+ 	if (rc) {
+		free_region_extent(*reg_ext);
+ 		kfree(ed_extent);
+ 		return rc;
+ 	}
+
+ 	return 0;
+ }
+
+/* Callers are expected to ensure cxled has been attached to a region */
+int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
+{
+	u64 start_dpa = le64_to_cpu(extent->start_dpa);
+	struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
+	struct region_extent *pending_region_ext = mds->add_ctx.region_extent;
+	struct cxl_endpoint_decoder *cxled;
+	struct range ed_range, ext_range;
+	struct cxl_dax_region *cxlr_dax;
+	struct cxled_extent *ed_extent;
+	struct cxl_region *cxlr;
+	struct device *dev;
+
+	ext_range = (struct range) {
+		.start = start_dpa,
+		.end = start_dpa + le64_to_cpu(extent->length) - 1,
+	};
+
+	guard(rwsem_read)(&cxl_rwsem.region);
+	cxlr = cxl_dpa_to_region(cxlmd, start_dpa, &cxled);
+	if (!cxlr)
+		return -ENXIO;
+
+	cxlr_dax = cxlr->cxlr_dax;
+	/* Cannot add to a region_extent once it's been onlined */
+	if (cxlr_dax->region_extent) {
+		dev_err(&cxlr_dax->dev, "Can no longer add to region %d\n",
+			cxlr->id);
+		return -EINVAL;
+	}
+
+	if (pending_region_ext &&
+	    !uuid_equal((uuid_t *)extent->uuid, &pending_region_ext->uuid)) {
+		return -EINVAL;
+	}
+
+	dev = &cxled->cxld.dev;
+	ed_range = (struct range) {
+		.start = cxled->dpa_res->start,
+		.end = cxled->dpa_res->end,
+	};
+
+	dev_dbg(&cxled->cxld.dev, "Checking ED (%pr) for extent %pra\n",
+		cxled->dpa_res, &ext_range);
+
+	if (!range_contains(&ed_range, &ext_range)) {
+		dev_err_ratelimited(dev,
+				    "DC extent DPA %pra (%pU) is not fully in ED %pra\n",
+				    &ext_range, extent->uuid, &ed_range);
+		return -ENXIO;
+	}
+
+	/*
+	 * Allowing duplicates or extents which are already in an accepted
+	 * range simplifies extent processing, especially when dealing with the
+	 * cxl dax driver scanning for existing extents.
+	 */
+	if (extents_contain(cxlr_dax, cxled, &ext_range)) {
+		dev_warn_ratelimited(dev, "Extent %pra exists; accept again\n",
+				     &ext_range);
+		return 0;
+	}
+
+	if (extents_overlap(cxlr_dax, cxled, &ext_range))
+		return -ENXIO;
+
+	ed_extent = kzalloc(sizeof(*ed_extent), GFP_KERNEL);
+	if (!ed_extent)
+		return -ENOMEM;
+
+	ed_extent->cxled = cxled;
+	ed_extent->dpa_range = ext_range;
+	import_uuid(&ed_extent->uuid, extent->uuid);
+
+	dev_dbg(dev, "Add extent %pra (%pU)\n", &ed_extent->dpa_range, &ed_extent->uuid);
+
+	return cxlr_add_extent(mds, cxlr_dax, cxled, ed_extent);
+}
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 01b1a318f34f..56d8b2a9974c 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -5,6 +5,8 @@
 #include <linux/ktime.h>
 #include <linux/mutex.h>
 #include <linux/unaligned.h>
+#include <linux/list.h>
+#include <linux/list_sort.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
 #include <cxl.h>
@@ -936,6 +938,74 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, "CXL");
 
+static int cxl_validate_extent_partition(struct cxl_memdev_state *mds,
+						struct cxl_extent *extent,
+						struct range *ext_range)
+{
+	struct cxl_dev_state *cxlds = &mds->cxlds;
+	struct device *dev = mds->cxlds.dev;
+
+	/* Extents must be within the DC partition boundary */
+	for (int i = 0; i < cxlds->nr_partitions; i++) {
+		struct cxl_dpa_partition *part = &cxlds->part[i];
+		struct range partition_range = {
+			.start = part->res.start,
+			.end = part->res.end,
+		};
+
+		if (part->mode != CXL_PARTMODE_DYNAMIC_RAM_A)
+			continue;
+
+		if (range_contains(&partition_range, ext_range)) {
+			dev_dbg(dev, "DC extent DPA %pra (DCR:%pra)(%pU)\n",
+				&ext_range, &partition_range, extent->uuid);
+			return 0;
+		}
+	}
+
+	dev_err_ratelimited(dev,
+			    "DC extent DPA %pra (%pU) is not in a valid DC partition\n",
+			    &ext_range, extent->uuid);
+	return -ENXIO;
+}
+
+/*
+ * Check extent tag is non-null, tag is not already in use, extent belongs to a
+ * region, and the extent is within the bounds of a DC partition.
+ * If extent is not first in the pending_list, check tag and region match the
+ * previous entry's.
+ *
+ */
+static int cxl_validate_extent(struct cxl_memdev_state *mds,
+			       struct cxl_extent_list_node *pos)
+{
+	struct device *dev = mds->cxlds.dev;
+	struct cxl_extent *extent = pos->extent;
+	struct range ext_range = (struct range) {
+		.start = le64_to_cpu(extent->start_dpa),
+		.end = le64_to_cpu(extent->start_dpa) +
+			le64_to_cpu(extent->length) - 1,
+	};
+	uuid_t *uuid = (uuid_t *)extent->uuid;
+
+	if (uuid_is_null(uuid)) {
+		dev_dbg(dev, "no tag for extent: %pra\n", &ext_range);
+		return -EINVAL;
+	}
+
+	if (le16_to_cpu(extent->shared_extn_seq) != 0) {
+		dev_dbg(dev,
+			"DC extent DPA %pra (%pU) can not be shared\n",
+			&ext_range, uuid);
+		return -ENXIO;
+	}
+
+	if (cxl_validate_extent_partition(mds, extent, &ext_range))
+		return -ENXIO;
+
+	return 0;
+}
+
 void cxl_event_trace_record(struct cxl_memdev *cxlmd,
 			    enum cxl_event_log_type type,
 			    enum cxl_event_type event_type,
@@ -1102,6 +1172,324 @@ static int cxl_clear_event_record(struct cxl_memdev_state *mds,
 	return rc;
 }
 
+static int send_one_response(struct cxl_mailbox *cxl_mbox,
+			     struct cxl_mbox_dc_response *response,
+			     int opcode, u32 extent_list_size, u8 flags)
+{
+	struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd) {
+		.opcode = opcode,
+		.size_in = struct_size(response, extent_list, extent_list_size),
+		.payload_in = response,
+	};
+
+	response->extent_list_size = cpu_to_le32(extent_list_size);
+	response->flags = flags;
+	return cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+}
+
+static int cxl_send_dc_response(struct cxl_memdev_state *mds, int opcode,
+				struct list_head *extent_list, int cnt)
+{
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+	struct cxl_mbox_dc_response *p;
+	struct cxl_extent_list_node *pos, *tmp;
+	struct cxl_extent *extent;
+	u32 pl_index;
+
+	size_t pl_size = struct_size(p, extent_list, cnt);
+	u32 max_extents = cnt;
+
+	/* May have to use more bit on response. */
+	if (pl_size > cxl_mbox->payload_size) {
+		max_extents = (cxl_mbox->payload_size - sizeof(*p)) /
+			      sizeof(struct updated_extent_list);
+		pl_size = struct_size(p, extent_list, max_extents);
+	}
+
+	struct cxl_mbox_dc_response *response __free(kfree) =
+						kzalloc(pl_size, GFP_KERNEL);
+	if (!response)
+		return -ENOMEM;
+
+	if (cnt == 0)
+		return send_one_response(cxl_mbox, response, opcode, 0, 0);
+
+	pl_index = 0;
+	list_for_each_entry_safe(pos, tmp, extent_list, list) {
+		extent = pos->extent;
+		response->extent_list[pl_index].dpa_start = extent->start_dpa;
+		response->extent_list[pl_index].length = extent->length;
+		pl_index++;
+
+		if (pl_index == max_extents) {
+			u8 flags = 0;
+			int rc;
+
+			if (pl_index < cnt)
+				flags |= CXL_DCD_EVENT_MORE;
+			rc = send_one_response(cxl_mbox, response, opcode,
+					       pl_index, flags);
+			if (rc)
+				return rc;
+			cnt -= pl_index;
+			if (cnt < max_extents)
+				max_extents = cnt;
+			pl_index = 0;
+		}
+	}
+
+	if (!pl_index) /* nothing more to do */
+		return 0;
+	return send_one_response(cxl_mbox, response, opcode, pl_index, 0);
+}
+
+static void delete_extent_node(struct cxl_extent_list_node *node)
+{
+	list_del(&node->list);
+	kfree(node->extent);
+	kfree(node);
+}
+
+void memdev_release_extent(struct cxl_memdev_state *mds, struct range *range)
+{
+	struct device *dev = mds->cxlds.dev;
+	struct cxl_extent_list_node *node;
+	LIST_HEAD(extent_list);
+
+	dev_dbg(dev, "Release response dpa %pra\n", range);
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return;
+
+	node->extent = kzalloc(sizeof(*node->extent), GFP_KERNEL);
+	if (!node->extent) {
+		kfree(node);
+		return;
+	}
+
+	node->extent->start_dpa = cpu_to_le64(range->start);
+	node->extent->length = cpu_to_le64(range_len(range));
+	list_add_tail(&node->list, &extent_list);
+
+	if (cxl_send_dc_response(mds, CXL_MBOX_OP_RELEASE_DC, &extent_list, 1))
+		dev_dbg(dev, "Failed to release %pra\n", range);
+
+	delete_extent_node(node);
+}
+
+static void clear_pending_extents(void *_mds)
+{
+	struct cxl_memdev_state *mds = _mds;
+	struct cxl_extent_list_node *pos, *tmp;
+
+	list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list) {
+                delete_extent_node(pos);
+        }
+	mds->add_ctx.region_extent = NULL;
+}
+
+static int dpa_compare(void *priv,
+		       const struct list_head *a,
+		       const struct list_head *b)
+{
+	const struct cxl_extent_list_node *ea =
+		list_entry(a, struct cxl_extent_list_node, list);
+	const struct cxl_extent_list_node *eb =
+		list_entry(b, struct cxl_extent_list_node, list);
+
+	if (ea->extent->start_dpa < eb->extent->start_dpa)
+		return -1;
+	if (ea->extent->start_dpa > eb->extent->start_dpa)
+		return 1;
+
+	return 0;
+}
+
+static int idx_compare(void *priv,
+		       const struct list_head *a,
+		       const struct list_head *b)
+{
+	const struct cxl_extent_list_node *ea =
+		list_entry(a, struct cxl_extent_list_node, list);
+	const struct cxl_extent_list_node *eb =
+		list_entry(b, struct cxl_extent_list_node, list);
+
+	if (ea->idx < eb->idx)
+		return -1;
+	if (ea->idx > eb->idx)
+		return 1;
+
+	return 0;
+}
+
+/*
+ * Validate and add contiguous extents. Removes invalid, non-contiguous, or
+ * mismatched extents from pending_list. Sorts by DPA for processing, then
+ * restores original order for response.
+ */
+static int cxl_add_pending(struct cxl_memdev_state *mds)
+{
+	struct device *dev = mds->cxlds.dev;
+	struct cxl_extent_list_node *pos, *tmp;
+	struct region_extent *pending_reg_ext;
+	struct cxl_extent *extent;
+	u64 prev_end, start, len;
+	int cnt = 0, rc;
+
+	list_sort(NULL, &mds->add_ctx.pending_extents, dpa_compare);
+	list_for_each_entry_safe(pos, tmp, &mds->add_ctx.pending_extents, list) {
+		extent = pos->extent;
+		start = le64_to_cpu(extent->start_dpa);
+		len = le64_to_cpu(extent->length);
+
+		/* Start enforcing contiguity after accepting first extent */
+		if (cnt && start != prev_end) {
+			dev_dbg(dev,
+				"Non-contiguous extent DPA:%#llx LEN:%#llx\n",
+				start, len);
+			delete_extent_node(pos);
+			continue;
+		}
+
+		if (cxl_validate_extent(mds, pos)) {
+			delete_extent_node(pos);
+			continue;
+		}
+
+		if (cxl_add_extent(mds, extent)) {
+			dev_dbg(dev,
+				"Failed to add extent DPA:%#llx LEN:%#llx\n",
+				start, len);
+			delete_extent_node(pos);
+			continue;
+		}
+
+		prev_end = start + len;
+		cnt++;
+	}
+
+	if (!mds->add_ctx.region_extent) {
+		dev_dbg(dev, "No valid extents in list; accept none\n");
+		return 0;
+	}
+
+	pending_reg_ext = mds->add_ctx.region_extent;
+
+	/* device model handles freeing region_extent */
+	rc = online_region_extent(mds->add_ctx.region_extent);
+	if (rc)
+		return rc;
+
+	/* Restore remaining extents to original order and send rsp */
+	list_sort(NULL, &mds->add_ctx.pending_extents, idx_compare);
+	return cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
+				    &mds->add_ctx.pending_extents, cnt);
+}
+
+static int add_to_pending_list(struct list_head *pending_list,
+			       struct cxl_extent *to_add)
+{
+	struct cxl_extent_list_node *node, *prev;
+	struct cxl_extent *extent;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+	extent = kmemdup(to_add, sizeof(*extent), GFP_KERNEL);
+	if (!extent)
+		return -ENOMEM;
+
+	node->extent = extent;
+	list_add_tail(&node->list, pending_list);
+
+	/*
+	 * List is sorted by DPA when adding. Save original index to restore
+	 * order when sending DC rsp, as required by the spec.
+	 */
+	if (list_is_first(&node->list, pending_list)) {
+		node->idx = 0;
+	} else {
+		prev = list_prev_entry(node, list);
+		node->idx = prev->idx + 1;
+	}
+
+	return 0;
+}
+
+static int handle_add_event(struct cxl_memdev_state *mds,
+			    struct cxl_event_dcd *event)
+{
+	struct device *dev = mds->cxlds.dev;
+	int rc;
+
+	rc = add_to_pending_list(&mds->add_ctx.pending_extents, &event->extent);
+	if (rc) {
+		return rc;
+	}
+
+	if (event->flags & CXL_DCD_EVENT_MORE) {
+		dev_dbg(dev, "more bit set; delay the surfacing of extent\n");
+		return 0;
+	}
+
+	rc = cxl_add_pending(mds);
+	clear_pending_extents(mds);
+	return rc;
+}
+
+static char *cxl_dcd_evt_type_str(u8 type)
+{
+	switch (type) {
+	case DCD_ADD_CAPACITY:
+		return "add";
+	case DCD_RELEASE_CAPACITY:
+		return "release";
+	case DCD_FORCED_CAPACITY_RELEASE:
+		return "force release";
+	default:
+		break;
+	}
+
+	return "<unknown>";
+}
+
+static void cxl_handle_dcd_event_records(struct cxl_memdev_state *mds,
+					struct cxl_event_record_raw *raw_rec)
+{
+	struct cxl_event_dcd *event = &raw_rec->event.dcd;
+	struct cxl_extent *extent = &event->extent;
+	struct device *dev = mds->cxlds.dev;
+	uuid_t *id = &raw_rec->id;
+	int rc;
+
+	if (!uuid_equal(id, &CXL_EVENT_DC_EVENT_UUID))
+		return;
+
+	dev_dbg(dev, "DCD event %s : DPA:%#llx LEN:%#llx\n",
+		cxl_dcd_evt_type_str(event->event_type),
+		le64_to_cpu(extent->start_dpa), le64_to_cpu(extent->length));
+
+	switch (event->event_type) {
+	case DCD_ADD_CAPACITY:
+		rc = handle_add_event(mds, event);
+		break;
+	case DCD_RELEASE_CAPACITY:
+		rc = cxl_rm_extent(mds, &event->extent);
+		break;
+	case DCD_FORCED_CAPACITY_RELEASE:
+		dev_err_ratelimited(dev, "Forced release event ignored.\n");
+		rc = 0;
+		break;
+	default:
+		rc = -EINVAL;
+		break;
+	}
+
+	if (rc)
+		dev_err_ratelimited(dev, "dcd event failed: %d\n", rc);
+}
+
 static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 				    enum cxl_event_log_type type)
 {
@@ -1138,9 +1526,13 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 		if (!nr_rec)
 			break;
 
-		for (i = 0; i < nr_rec; i++)
+		for (i = 0; i < nr_rec; i++) {
 			__cxl_event_trace_record(cxlmd, type,
 						 &payload->records[i]);
+			if (type == CXL_EVENT_TYPE_DCD)
+				cxl_handle_dcd_event_records(mds,
+							&payload->records[i]);
+		}
 
 		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
 			trace_cxl_overflow(cxlmd, type, payload);
@@ -1172,6 +1564,8 @@ void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status)
 {
 	dev_dbg(mds->cxlds.dev, "Reading event logs: %x\n", status);
 
+	if (cxl_dcd_supported(mds) && (status & CXLDEV_EVENT_STATUS_DCD))
+		cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_DCD);
 	if (status & CXLDEV_EVENT_STATUS_FATAL)
 		cxl_mem_get_records_log(mds, CXL_EVENT_TYPE_FATAL);
 	if (status & CXLDEV_EVENT_STATUS_FAIL)
@@ -1769,6 +2163,11 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
 	}
 
 	mutex_init(&mds->event.log_lock);
+	INIT_LIST_HEAD(&mds->add_ctx.pending_extents);
+
+	rc = devm_add_action_or_reset(dev, clear_pending_extents, mds);
+	if (rc)
+		return ERR_PTR(rc);
 
 	rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
 	if (rc == -EOPNOTSUPP)
diff --git a/drivers/cxl/core/region_dax.c b/drivers/cxl/core/region_dax.c
index 068690d02c8c..b2853494aee2 100644
--- a/drivers/cxl/core/region_dax.c
+++ b/drivers/cxl/core/region_dax.c
@@ -90,6 +90,12 @@ int devm_cxl_add_dax_region(struct cxl_region *cxlr)
 	if (IS_ERR(cxlr_dax))
 		return PTR_ERR(cxlr_dax);
 
+	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A &&
+	    cxlr->params.interleave_ways != 1) {
+		dev_err(&cxlr->dev, "Interleaving DC not supported\n");
+		return -EINVAL;
+	}
+
 	dev = &cxlr_dax->dev;
 	rc = dev_set_name(dev, "dax_region%d", cxlr->id);
 	if (rc)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..477e91bc2ab7 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -13,6 +13,10 @@
 #include <linux/io.h>
 #include <linux/range.h>
 #include <cxl/cxl.h>
+#include <linux/xarray.h>
+
+/* FIXME needed? */
+#include <cxl/event.h>
 
 extern const struct nvdimm_security_ops *cxl_security_ops;
 
@@ -180,11 +184,13 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
 #define CXLDEV_EVENT_STATUS_WARN		BIT(1)
 #define CXLDEV_EVENT_STATUS_FAIL		BIT(2)
 #define CXLDEV_EVENT_STATUS_FATAL		BIT(3)
+#define CXLDEV_EVENT_STATUS_DCD			BIT(4)
 
 #define CXLDEV_EVENT_STATUS_ALL (CXLDEV_EVENT_STATUS_INFO |	\
 				 CXLDEV_EVENT_STATUS_WARN |	\
 				 CXLDEV_EVENT_STATUS_FAIL |	\
-				 CXLDEV_EVENT_STATUS_FATAL)
+				 CXLDEV_EVENT_STATUS_FATAL |	\
+				 CXLDEV_EVENT_STATUS_DCD)
 
 /* CXL rev 3.0 section 8.2.9.2.4; Table 8-52 */
 #define CXLDEV_EVENT_INT_MODE_MASK	GENMASK(1, 0)
@@ -306,6 +312,18 @@ enum cxl_decoder_state {
 	CXL_DECODER_STATE_AUTO_STAGED,
 };
 
+/**
+ * struct cxled_extent - Extent within an endpoint decoder
+ * @cxled: Reference to the endpoint decoder
+ * @dpa_range: DPA range this extent covers within the decoder
+ * @uuid: uuid from device for this extent
+ */
+struct cxled_extent {
+	struct cxl_endpoint_decoder *cxled;
+	struct range dpa_range;
+	uuid_t uuid;
+};
+
 /**
  * struct cxl_endpoint_decoder - Endpoint  / SPA to DPA decoder
  * @cxld: base cxl_decoder_object
@@ -457,6 +475,7 @@ struct cxl_region_params {
  * @type: Endpoint decoder target type
  * @cxl_nvb: nvdimm bridge for coordinating @cxlr_pmem setup / shutdown
  * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge
+ * @cxlr_dax: (for DC regions) cached copy of CXL DAX bridge
  * @flags: Region state flags
  * @params: active + config params for the region
  * @coord: QoS access coordinates for the region
@@ -472,6 +491,7 @@ struct cxl_region {
 	enum cxl_decoder_type type;
 	struct cxl_nvdimm_bridge *cxl_nvb;
 	struct cxl_pmem_region *cxlr_pmem;
+	struct cxl_dax_region *cxlr_dax;
 	unsigned long flags;
 	struct cxl_region_params params;
 	struct access_coordinate coord[ACCESS_COORDINATE_MAX];
@@ -518,12 +538,45 @@ struct cxl_pmem_region {
 	struct cxl_pmem_region_mapping mapping[];
 };
 
+/* See CXL 3.1 8.2.9.2.1.6 */
+enum dc_event {
+	DCD_ADD_CAPACITY,
+	DCD_RELEASE_CAPACITY,
+	DCD_FORCED_CAPACITY_RELEASE,
+	DCD_REGION_CONFIGURATION_UPDATED,
+};
+
 struct cxl_dax_region {
 	struct device dev;
 	struct cxl_region *cxlr;
 	struct range hpa_range;
+	struct region_extent *region_extent;
+};
+
+/**
+ * struct region_extent - CXL DAX region extent
+ * @dev: device representing this extent
+ * @cxlr_dax: back reference to parent region device
+ * @hpa_range: HPA range of this extent
+ * @uuid: uuid of the extent
+ * @decoder_extents: Endpoint decoder extents which make up this region extent
+ */
+struct region_extent {
+	struct device dev;
+	struct cxl_dax_region *cxlr_dax;
+	struct range hpa_range;
+	uuid_t uuid;
+	struct xarray decoder_extents;
 };
 
+bool is_region_extent(struct device *dev);
+static inline struct region_extent *to_region_extent(struct device *dev)
+{
+	if (!is_region_extent(dev))
+		return NULL;
+	return container_of(dev, struct region_extent, dev);
+}
+
 /**
  * struct cxl_port - logical collection of upstream port devices and
  *		     downstream port devices to construct a CXL memory
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 88bcc69f22c2..6da958352afd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -7,6 +7,7 @@
 #include <linux/cdev.h>
 #include <linux/uuid.h>
 #include <linux/node.h>
+#include <linux/list.h>
 #include <cxl/event.h>
 #include <cxl/mailbox.h>
 #include "cxl.h"
@@ -415,6 +416,7 @@ static inline struct cxl_dev_state *mbox_to_cxlds(struct cxl_mailbox *cxl_mbox)
  * @active_volatile_bytes: sum of hard + soft volatile
  * @active_persistent_bytes: sum of hard + soft persistent
  * @dcd_supported: all DCD commands are supported
+ * @pending_ctx: pending_add ctx
  * @event: event log driver state
  * @poison: poison driver state info
  * @security: security driver state info
@@ -435,6 +437,10 @@ struct cxl_memdev_state {
 	u64 active_volatile_bytes;
 	u64 active_persistent_bytes;
 	bool dcd_supported;
+	struct pending_add_ctx {
+		struct list_head pending_extents;
+		struct region_extent *region_extent;
+	} add_ctx;
 
 	struct cxl_event_state event;
 	struct cxl_poison_state poison;
@@ -511,6 +517,21 @@ enum cxl_opcode {
 	UUID_INIT(0x5e1819d9, 0x11a9, 0x400c, 0x81, 0x1f, 0xd6, 0x07, 0x19,     \
 		  0x40, 0x3d, 0x86)
 
+/*
+ * Add Dynamic Capacity Response
+ * CXL rev 3.1 section 8.2.9.9.9.3; Table 8-168 & Table 8-169
+ */
+struct cxl_mbox_dc_response {
+	__le32 extent_list_size;
+	u8 flags;
+	u8 reserved[3];
+	struct updated_extent_list {
+		__le64 dpa_start;
+		__le64 length;
+		u8 reserved[8];
+	} __packed extent_list[] __counted_by(extent_list_size);
+} __packed;
+
 struct cxl_mbox_get_supported_logs {
 	__le16 entries;
 	u8 rsvd[6];
@@ -581,6 +602,14 @@ struct cxl_mbox_identify {
 	UUID_INIT(0xe71f3a40, 0x2d29, 0x4092, 0x8a, 0x39, 0x4d, 0x1c, 0x96, \
 		  0x6c, 0x7c, 0x65)
 
+/*
+ * Dynamic Capacity Event Record
+ * CXL rev 3.1 section 8.2.9.2.1; Table 8-43
+ */
+#define CXL_EVENT_DC_EVENT_UUID                                             \
+	UUID_INIT(0xca95afa7, 0xf183, 0x4018, 0x8c, 0x2f, 0x95, 0x26, 0x8e, \
+		  0x10, 0x1a, 0x2a)
+
 /*
  * Get Event Records output payload
  * CXL rev 3.0 section 8.2.9.2.2; Table 8-50
@@ -606,6 +635,7 @@ enum cxl_event_log_type {
 	CXL_EVENT_TYPE_WARN,
 	CXL_EVENT_TYPE_FAIL,
 	CXL_EVENT_TYPE_FATAL,
+	CXL_EVENT_TYPE_DCD,
 	CXL_EVENT_TYPE_MAX
 };
 
diff --git a/include/cxl/event.h b/include/cxl/event.h
index ff97fea718d2..fbd95e381e41 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -6,6 +6,7 @@
 #include <linux/types.h>
 #include <linux/uuid.h>
 #include <linux/workqueue_types.h>
+#include <linux/list.h>
 
 /*
  * Common Event Record Format
@@ -141,12 +142,50 @@ struct cxl_event_mem_sparing {
 	u8 reserved2[0x25];
 } __packed;
 
+/*
+ * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-51
+ */
+struct cxl_extent {
+	__le64 start_dpa;
+	__le64 length;
+	u8 uuid[UUID_SIZE];
+	__le16 shared_extn_seq;
+	u8 reserved[0x6];
+} __packed;
+
+struct cxl_extent_list_node {
+	struct cxl_extent *extent;
+	struct list_head list;
+	int idx;
+	int rid;
+};
+
+/*
+ * Dynamic Capacity Event Record
+ * CXL rev 3.1 section 8.2.9.2.1.6; Table 8-50
+ */
+#define CXL_DCD_EVENT_MORE			BIT(0)
+struct cxl_event_dcd {
+	struct cxl_event_record_hdr hdr;
+	u8 event_type;
+	u8 validity_flags;
+	__le16 host_id;
+	u8 partition_index;
+	u8 flags;
+	u8 reserved1[0x2];
+	struct cxl_extent extent;
+	u8 reserved2[0x18];
+	__le32 num_avail_extents;
+	__le32 num_avail_tags;
+} __packed;
+
 union cxl_event {
 	struct cxl_event_generic generic;
 	struct cxl_event_gen_media gen_media;
 	struct cxl_event_dram dram;
 	struct cxl_event_mem_module mem_module;
 	struct cxl_event_mem_sparing mem_sparing;
+	struct cxl_event_dcd dcd;
 	/* dram & gen_media event header */
 	struct cxl_event_media_hdr media_hdr;
 } __packed;
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 2be1df80fcc9..8941cf187462 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -63,7 +63,10 @@ cxl_core-y += $(CXL_CORE_SRC)/hdm.o
 cxl_core-y += $(CXL_CORE_SRC)/pmu.o
 cxl_core-y += $(CXL_CORE_SRC)/cdat.o
 cxl_core-$(CONFIG_TRACING) += $(CXL_CORE_SRC)/trace.o
-cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o $(CXL_CORE_SRC)/region_pmem.o $(CXL_CORE_SRC)/region_dax.o
+cxl_core-$(CONFIG_CXL_REGION) += $(CXL_CORE_SRC)/region.o \
+				 $(CXL_CORE_SRC)/region_pmem.o \
+				 $(CXL_CORE_SRC)/region_dax.o \
+				 $(CXL_CORE_SRC)/extent.o
 cxl_core-$(CONFIG_CXL_MCE) += $(CXL_CORE_SRC)/mce.o
 cxl_core-$(CONFIG_CXL_FEATURES) += $(CXL_CORE_SRC)/features.o
 cxl_core-$(CONFIG_CXL_EDAC_MEM_FEATURES) += $(CXL_CORE_SRC)/edac.o
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 13/20] cxl/region/extent: Expose region extent information in sysfs
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (11 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 12/20] cxl/extent: Process dynamic partition events and realize region extents Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 14/20] dax/bus: Factor out dev dax resize logic Anisa Su
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni

From: Ira Weiny <ira.weiny@intel.com>

Extent information can be helpful to the user to coordinate memory usage
with the external orchestrator and FM.

Expose the details of region extents by creating the following
sysfs entries.

        /sys/bus/cxl/devices/dax_regionX/extentX.Y
        /sys/bus/cxl/devices/dax_regionX/extentX.Y/offset
        /sys/bus/cxl/devices/dax_regionX/extentX.Y/length
        /sys/bus/cxl/devices/dax_regionX/extentX.Y/tag

Based on an original patch by Navneet Singh.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Tested-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 Documentation/ABI/testing/sysfs-bus-cxl | 36 +++++++++++++++
 drivers/cxl/core/extent.c               | 58 +++++++++++++++++++++++++
 2 files changed, 94 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 3080aef9ad67..38cf0a2894b9 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -661,3 +661,39 @@ Description:
 		The count is persistent across power loss and wraps back to 0
 		upon overflow. If this file is not present, the device does not
 		have the necessary support for dirty tracking.
+
+
+What:		/sys/bus/cxl/devices/dax_regionX/extentX.Y/offset
+Date:		May, 2025
+KernelVersion:	v6.16
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) [For Dynamic Capacity regions only] Users can use the
+		extent information to create DAX devices on specific extents.
+		This is done by creating and destroying DAX devices in specific
+		sequences and looking at the mappings created.  Extent offset
+		within the region.
+
+
+What:		/sys/bus/cxl/devices/dax_regionX/extentX.Y/length
+Date:		May, 2025
+KernelVersion:	v6.16
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) [For Dynamic Capacity regions only] Users can use the
+		extent information to create DAX devices on specific extents.
+		This is done by creating and destroying DAX devices in specific
+		sequences and looking at the mappings created.  Extent length
+		within the region.
+
+
+What:		/sys/bus/cxl/devices/dax_regionX/extentX.Y/uuid
+Date:		May, 2025
+KernelVersion:	v6.16
+Contact:	linux-cxl@vger.kernel.org
+Description:
+		(RO) [For Dynamic Capacity regions only] Users can use the
+		extent information to create DAX devices on specific extents.
+		This is done by creating and destroying DAX devices in specific
+		sequences and looking at the mappings created.  UUID of this
+		extent.
diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
index ecdd7717f7d7..17fad51f084d 100644
--- a/drivers/cxl/core/extent.c
+++ b/drivers/cxl/core/extent.c
@@ -6,6 +6,63 @@
 
 #include "core.h"
 
+static ssize_t offset_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct region_extent *region_extent = to_region_extent(dev);
+
+	return sysfs_emit(buf, "%#llx\n", region_extent->hpa_range.start);
+}
+static DEVICE_ATTR_RO(offset);
+
+static ssize_t length_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	struct region_extent *region_extent = to_region_extent(dev);
+	u64 length = range_len(&region_extent->hpa_range);
+
+	return sysfs_emit(buf, "%#llx\n", length);
+}
+static DEVICE_ATTR_RO(length);
+
+static ssize_t uuid_show(struct device *dev, struct device_attribute *attr,
+			 char *buf)
+{
+	struct region_extent *region_extent = to_region_extent(dev);
+
+	return sysfs_emit(buf, "%pUb\n", &region_extent->uuid);
+}
+static DEVICE_ATTR_RO(uuid);
+
+static struct attribute *region_extent_attrs[] = {
+	&dev_attr_offset.attr,
+	&dev_attr_length.attr,
+	&dev_attr_uuid.attr,
+	NULL
+};
+
+static uuid_t empty_uuid = { 0 };
+
+static umode_t region_extent_visible(struct kobject *kobj,
+				     struct attribute *a, int n)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct region_extent *region_extent = to_region_extent(dev);
+
+	if (a == &dev_attr_uuid.attr &&
+	    uuid_equal(&region_extent->uuid, &empty_uuid))
+		return 0;
+
+	return a->mode;
+}
+
+static const struct attribute_group region_extent_attribute_group = {
+	.attrs = region_extent_attrs,
+	.is_visible = region_extent_visible,
+};
+
+__ATTRIBUTE_GROUPS(region_extent_attribute);
+
 static void cxled_release_extent(struct cxl_endpoint_decoder *cxled,
 				 struct cxled_extent *ed_extent)
 {
@@ -44,6 +101,7 @@ static void region_extent_release(struct device *dev)
 static const struct device_type region_extent_type = {
 	.name = "extent",
 	.release = region_extent_release,
+	.groups = region_extent_attribute_groups,
 };
 
 bool is_region_extent(struct device *dev)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 14/20] dax/bus: Factor out dev dax resize logic
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (12 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 13/20] cxl/region/extent: Expose region extent information in sysfs Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 15/20] dax/region: Create resources on DAX regions Anisa Su
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron

From: Ira Weiny <ira.weiny@intel.com>

Dynamic Capacity regions must limit dev dax resources to those areas
which have extents backing real memory.  Such DAX regions are dubbed
'sparse' regions.  In order to manage where memory is available four
alternatives were considered:

1) Create a single region resource child on region creation which
   reserves the entire region.  Then as extents are added punch holes in
   this reservation.  This requires new resource manipulation to punch
   the holes and still requires an additional iteration over the extent
   areas which may already have existing dev dax resources used.

2) Maintain an ordered xarray of extents which can be queried while
   processing the resize logic.  The issue is that existing region->res
   children may artificially limit the allocation size sent to
   alloc_dev_dax_range().  IE the resource children can't be directly
   used in the resize logic to find where space in the region is.  This
   also poses a problem of managing the available size in 2 places.

3) Maintain a separate resource tree with extents.  This option is the
   same as 2) but with the different data structure.  Most ideally there
   should be a unified representation of the resource tree not two places
   to look for space.

4) Create region resource children for each extent.  Manage the dax dev
   resize logic in the same way as before but use a region child
   (extent) resource as the parents to find space within each extent.

Option 4 can leverage the existing resize algorithm to find space within
the extents.  It manages the available space in a singular resource tree
which is less complicated for finding space.

In preparation for this change, factor out the dev_dax_resize logic.
For static regions use dax_region->res as the parent to find space for
the dax ranges.  Future patches will use the same algorithm with
individual extent resources as the parent.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: rebase]
---
 drivers/dax/bus.c | 130 ++++++++++++++++++++++++++++------------------
 1 file changed, 80 insertions(+), 50 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index e41e36747111..40f11fda66b5 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -858,11 +858,9 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id)
 	return 0;
 }
 
-static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
-		resource_size_t size)
+static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax,
+			       u64 start, resource_size_t size)
 {
-	struct dax_region *dax_region = dev_dax->region;
-	struct resource *res = &dax_region->res;
 	struct device *dev = &dev_dax->dev;
 	struct dev_dax_range *ranges;
 	unsigned long pgoff = 0;
@@ -880,14 +878,14 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start,
 		return 0;
 	}
 
-	alloc = __request_region(res, start, size, dev_name(dev), 0);
+	alloc = __request_region(parent, start, size, dev_name(dev), 0);
 	if (!alloc)
 		return -ENOMEM;
 
 	ranges = krealloc(dev_dax->ranges, sizeof(*ranges)
 			* (dev_dax->nr_range + 1), GFP_KERNEL);
 	if (!ranges) {
-		__release_region(res, alloc->start, resource_size(alloc));
+		__release_region(parent, alloc->start, resource_size(alloc));
 		return -ENOMEM;
 	}
 
@@ -1040,50 +1038,45 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res)
 	return true;
 }
 
-static ssize_t dev_dax_resize(struct dax_region *dax_region,
-		struct dev_dax *dev_dax, resource_size_t size)
+/**
+ * dev_dax_resize_static - Expand the device into the unused portion of the
+ * region. This may involve adjusting the end of an existing resource, or
+ * allocating a new resource.
+ *
+ * @parent: parent resource to allocate this range in
+ * @dev_dax: DAX device to be expanded
+ * @to_alloc: amount of space to alloc; must be <= space available in @parent
+ *
+ * Return the amount of space allocated or -ERRNO on failure
+ */
+static ssize_t dev_dax_resize_static(struct resource *parent,
+				     struct dev_dax *dev_dax,
+				     resource_size_t to_alloc)
 {
-	resource_size_t avail = dax_region_avail_size(dax_region), to_alloc;
-	resource_size_t dev_size = dev_dax_size(dev_dax);
-	struct resource *region_res = &dax_region->res;
-	struct device *dev = &dev_dax->dev;
 	struct resource *res, *first;
-	resource_size_t alloc = 0;
 	int rc;
 
-	if (dev->driver)
-		return -EBUSY;
-	if (size == dev_size)
-		return 0;
-	if (size > dev_size && size - dev_size > avail)
-		return -ENOSPC;
-	if (size < dev_size)
-		return dev_dax_shrink(dev_dax, size);
-
-	to_alloc = size - dev_size;
-	if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc),
-			"resize of %pa misaligned\n", &to_alloc))
-		return -ENXIO;
-
-	/*
-	 * Expand the device into the unused portion of the region. This
-	 * may involve adjusting the end of an existing resource, or
-	 * allocating a new resource.
-	 */
-retry:
-	first = region_res->child;
-	if (!first)
-		return alloc_dev_dax_range(dev_dax, dax_region->res.start, to_alloc);
+	first = parent->child;
+	if (!first) {
+		rc = alloc_dev_dax_range(parent, dev_dax,
+					   parent->start, to_alloc);
+		if (rc)
+			return rc;
+		return to_alloc;
+	}
 
-	rc = -ENOSPC;
 	for (res = first; res; res = res->sibling) {
 		struct resource *next = res->sibling;
+		resource_size_t alloc;
 
 		/* space at the beginning of the region */
-		if (res == first && res->start > dax_region->res.start) {
-			alloc = min(res->start - dax_region->res.start, to_alloc);
-			rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, alloc);
-			break;
+		if (res == first && res->start > parent->start) {
+			alloc = min(res->start - parent->start, to_alloc);
+			rc = alloc_dev_dax_range(parent, dev_dax,
+						 parent->start, alloc);
+			if (rc)
+				return rc;
+			return alloc;
 		}
 
 		alloc = 0;
@@ -1092,21 +1085,56 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region,
 			alloc = min(next->start - (res->end + 1), to_alloc);
 
 		/* space at the end of the region */
-		if (!alloc && !next && res->end < region_res->end)
-			alloc = min(region_res->end - res->end, to_alloc);
+		if (!alloc && !next && res->end < parent->end)
+			alloc = min(parent->end - res->end, to_alloc);
 
 		if (!alloc)
 			continue;
 
 		if (adjust_ok(dev_dax, res)) {
 			rc = adjust_dev_dax_range(dev_dax, res, resource_size(res) + alloc);
-			break;
+			if (rc)
+				return rc;
+			return alloc;
 		}
-		rc = alloc_dev_dax_range(dev_dax, res->end + 1, alloc);
-		break;
+		rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc);
+		if (rc)
+			return rc;
+		return alloc;
 	}
-	if (rc)
-		return rc;
+
+	/* available was already calculated and should never be an issue */
+	dev_WARN_ONCE(&dev_dax->dev, 1, "space not found?");
+	return 0;
+}
+
+static ssize_t dev_dax_resize(struct dax_region *dax_region,
+		struct dev_dax *dev_dax, resource_size_t size)
+{
+	resource_size_t avail = dax_region_avail_size(dax_region);
+	resource_size_t dev_size = dev_dax_size(dev_dax);
+	struct device *dev = &dev_dax->dev;
+	resource_size_t to_alloc;
+	resource_size_t alloc;
+
+	if (dev->driver)
+		return -EBUSY;
+	if (size == dev_size)
+		return 0;
+	if (size > dev_size && size - dev_size > avail)
+		return -ENOSPC;
+	if (size < dev_size)
+		return dev_dax_shrink(dev_dax, size);
+
+	to_alloc = size - dev_size;
+	if (dev_WARN_ONCE(dev, !alloc_is_aligned(dev_dax, to_alloc),
+			"resize of %pa misaligned\n", &to_alloc))
+		return -ENXIO;
+
+retry:
+	alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc);
+	if (alloc <= 0)
+		return alloc;
 	to_alloc -= alloc;
 	if (to_alloc)
 		goto retry;
@@ -1212,7 +1240,8 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
 
 	to_alloc = range_len(&r);
 	if (alloc_is_aligned(dev_dax, to_alloc))
-		rc = alloc_dev_dax_range(dev_dax, r.start, to_alloc);
+		rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start,
+					 to_alloc);
 	up_write(&dax_dev_rwsem);
 	up_write(&dax_region_rwsem);
 
@@ -1480,7 +1509,8 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 	device_initialize(dev);
 	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
 
-	rc = alloc_dev_dax_range(dev_dax, dax_region->res.start, data->size);
+	rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start,
+				 data->size);
 	if (rc)
 		goto err_range;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 15/20] dax/region: Create resources on DAX regions
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (13 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 14/20] dax/bus: Factor out dev dax resize logic Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 16/20] cxl/region: Read existing extents on region creation Anisa Su
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron

From: Ira Weiny <ira.weiny@intel.com>

DAX regions which map dynamic capacity partitions require that memory be
allowed to come and go.  Recall struct dax_resource was created for this
purpose.  Now that extents can be realized within DAX regions the DAX
region driver can start tracking sub-resource information.

The tight relationship between DAX region operations and extent
operations require memory changes to be controlled synchronously with
the user of the region.  Synchronize through the dax_region_rwsem and by
having the region driver drive both the region device as well as the
extent sub-devices.

Recall requests to remove extents can happen at any time and that a host
is not obligated to release the memory until it is not being used.  If
an extent is not used allow a release response.

When extents are eligible for release.  No mappings exist but data may
reside in caches not yet written to the device.  Call
cxl_region_invalidate_memregion() to write back data to the device prior
to signaling the release complete.

Speculative writes after a release may dirty the cache such that a read
from a newly surfaced extent may not come from the device.  Call
cxl_region_invalidate_memregion() prior to bringing a new extent online
to ensure the cache is marked invalid.

While these invalidate calls are inefficient they are the best we can do
to ensure cache consistency without back invalidate.  Furthermore this
should occur infrequently with sufficiently large extents that real work
loads should not be impacted much.

The DAX layer has no need for the details of the CXL memory extent
devices.  Expose extents to the DAX layer as device children of the DAX
region device.  A single callback from the driver aids the DAX layer to
determine if the child device is an extent.  The DAX layer also
registers a devres function to automatically clean up when the device is
removed from the region.

There is a race between extents being surfaced and the dax_cxl driver
being loaded.  Synchronizes the driver during probe by scanning for
existing extents while under the device lock.

Respond to extent notifications.  Manage the DAX region resource tree
based on the extents lifetime.  Return the status of remove
notifications to lower layers such that it can manage the hardware
appropriately.

Based on an original patch by Navneet Singh.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa: notify once for region_extent_online and region_rm_extent]

Instead of notifying the DAX layer for each device extent, notify only
once after processing the entire pending list. This ensures the tagged
capacity is added/removed all at once.
---
 drivers/cxl/core/core.h   |  11 ++
 drivers/cxl/core/extent.c |  47 +++++++-
 drivers/cxl/core/mbox.c   |  16 ++-
 drivers/cxl/core/region.c |   2 +-
 drivers/cxl/cxl.h         |   5 +
 drivers/dax/bus.c         | 247 ++++++++++++++++++++++++++++++++++----
 drivers/dax/bus.h         |   3 +-
 drivers/dax/cxl.c         |  61 +++++++++-
 drivers/dax/dax-private.h |  40 ++++++
 drivers/dax/hmem/hmem.c   |   2 +-
 drivers/dax/pmem.c        |   2 +-
 include/linux/ioport.h    |   3 +
 12 files changed, 404 insertions(+), 35 deletions(-)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 14d91dd52b02..cb3f46a1152a 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -28,6 +28,8 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
 	return container_of(cxlds, struct cxl_memdev_state, cxlds);
 }
 
+int cxl_region_invalidate_memregion(struct cxl_region *cxlr);
+
 #ifdef CONFIG_CXL_REGION
 
 struct cxl_region_context {
@@ -67,6 +69,9 @@ int devm_cxl_add_pmem_region(struct cxl_region *cxlr);
 int cxl_add_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent);
 int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent);
 int online_region_extent(struct region_extent *region_extent);
+void region_rm_extent(struct region_extent *region_extent);
+int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
+		       struct region_extent *region_extent);
 #else
 static inline u64 cxl_dpa_to_hpa(struct cxl_region *cxlr,
 				 const struct cxl_memdev *cxlmd, u64 dpa)
@@ -86,6 +91,12 @@ static inline int online_region_extent(struct region_extent *region_extent)
 {
 	return 0;
 }
+static inline void region_rm_extent(struct region_extent *region_extent) { }
+static int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
+			      struct region_extent *region_extent)
+{
+	return 0;
+}
 static inline
 struct cxl_region *cxl_dpa_to_region(const struct cxl_memdev *cxlmd, u64 dpa,
 				     struct cxl_endpoint_decoder **cxled)
diff --git a/drivers/cxl/core/extent.c b/drivers/cxl/core/extent.c
index 17fad51f084d..c4ad81814fb4 100644
--- a/drivers/cxl/core/extent.c
+++ b/drivers/cxl/core/extent.c
@@ -116,10 +116,16 @@ static void region_extent_unregister(void *ext)
 
 	dev_dbg(&region_extent->dev, "DAX region rm extent HPA %pra\n",
 		&region_extent->hpa_range);
+	/*
+	 * Extent is not in use or an error has occur.  No mappings
+	 * exist at this point.  Write and invalidate caches to ensure
+	 * the device has all data prior to final release.
+	 */
+	cxl_region_invalidate_memregion(region_extent->cxlr_dax->cxlr);
 	device_unregister(&region_extent->dev);
 }
 
-static void region_rm_extent(struct region_extent *region_extent)
+void region_rm_extent(struct region_extent *region_extent)
 {
 	struct device *region_dev = region_extent->dev.parent;
 
@@ -226,6 +232,38 @@ static void calc_hpa_range(struct cxl_endpoint_decoder *cxled,
 	hpa_range->end = hpa_range->start + range_len(dpa_range) - 1;
 }
 
+int cxlr_notify_extent(struct cxl_region *cxlr, enum dc_event event,
+			      struct region_extent *region_extent)
+{
+	struct device *dev = &cxlr->cxlr_dax->dev;
+	struct cxl_notify_data notify_data;
+	struct cxl_driver *driver;
+
+	dev_dbg(dev, "Trying notify: type %d HPA %pra\n", event,
+		&region_extent->hpa_range);
+
+	guard(device)(dev);
+
+	/*
+	 * The lack of a driver indicates a notification has failed.  No user
+	 * space coordination was possible.
+	 */
+	if (!dev->driver)
+		return 0;
+	driver = to_cxl_drv(dev->driver);
+	if (!driver->notify)
+		return 0;
+
+	notify_data = (struct cxl_notify_data) {
+		.event = event,
+		.region_extent = region_extent,
+	};
+
+	dev_dbg(dev, "Notify: type %d HPA %pra\n", event,
+		&region_extent->hpa_range);
+	return driver->notify(dev, &notify_data);
+}
+
 int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
 {
 	u64 start_dpa = le64_to_cpu(extent->start_dpa);
@@ -236,6 +274,7 @@ int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
 	struct region_extent *reg_ext;
 	struct cxl_region *cxlr;
 	uuid_t tag;
+	int rc;
 
 	dpa_range = (struct range) {
 		.start = start_dpa,
@@ -290,6 +329,12 @@ int cxl_rm_extent(struct cxl_memdev_state *mds, struct cxl_extent *extent)
 		return -EINVAL;
 	}
 
+	rc = cxlr_notify_extent(cxlr,
+				DCD_RELEASE_CAPACITY,
+				cxlr_dax->region_extent);
+	if (rc == -EBUSY)
+		return 0;
+
 	/* Release entire capacity of the region */
 	region_rm_extent(reg_ext);
 	return 0;
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 56d8b2a9974c..c256081d2c65 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1375,12 +1375,26 @@ static int cxl_add_pending(struct cxl_memdev_state *mds)
 	}
 
 	pending_reg_ext = mds->add_ctx.region_extent;
+	/* Ensure caches are clean prior onlining */
+	rc = cxl_region_invalidate_memregion(pending_reg_ext->cxlr_dax->cxlr);
+	if (rc)
+		return rc;
 
 	/* device model handles freeing region_extent */
-	rc = online_region_extent(mds->add_ctx.region_extent);
+	rc = online_region_extent(pending_reg_ext);
 	if (rc)
 		return rc;
 
+	rc = cxlr_notify_extent(pending_reg_ext->cxlr_dax->cxlr,
+				DCD_ADD_CAPACITY,
+				pending_reg_ext);
+	/*
+	 * The region device was briefly live but DAX layer ensures it was not
+	 * used
+	 */
+	if (rc)
+		region_rm_extent(pending_reg_ext);
+
 	/* Restore remaining extents to original order and send rsp */
 	list_sort(NULL, &mds->add_ctx.pending_extents, idx_compare);
 	return cxl_send_dc_response(mds, CXL_MBOX_OP_ADD_DC_RESPONSE,
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e00fdb74589c..7b869c8ada22 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -222,7 +222,7 @@ static struct cxl_region_ref *cxl_rr_load(struct cxl_port *port,
 	return xa_load(&port->regions, (unsigned long)cxlr);
 }
 
-static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
+int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
 {
 	if (!cpu_cache_has_invalidate_memregion()) {
 		if (IS_ENABLED(CONFIG_CXL_REGION_INVALIDATION_TEST)) {
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 477e91bc2ab7..6fda10e25043 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -858,12 +858,17 @@ extern const struct bus_type cxl_bus_type;
  * by @id and the add_dport() op only defined for the CXL_DEVICE_PORT driver
  * template.
  */
+struct cxl_notify_data {
+	enum dc_event event;
+	struct region_extent *region_extent;
+};
 struct cxl_driver {
 	const char *name;
 	int (*probe)(struct device *dev);
 	void (*remove)(struct device *dev);
 	struct cxl_dport *(*add_dport)(struct cxl_port *port,
 				       struct device *dport_dev);
+	int (*notify)(struct device *dev, struct cxl_notify_data *notify_data);
 	struct device_driver drv;
 	int id;
 };
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 40f11fda66b5..2ada38fa7dca 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -184,6 +184,93 @@ static bool is_sparse(struct dax_region *dax_region)
 	return (dax_region->res.flags & IORESOURCE_DAX_SPARSE_CAP) != 0;
 }
 
+static void __dax_release_resource(struct dax_resource *dax_resource)
+{
+	struct dax_region *dax_region = dax_resource->region;
+
+	lockdep_assert_held_write(&dax_region_rwsem);
+	dev_dbg(dax_region->dev, "Extent release resource %pr\n",
+		dax_resource->res);
+	if (dax_resource->res)
+		__release_region(&dax_region->res, dax_resource->res->start,
+				 resource_size(dax_resource->res));
+	dax_resource->res = NULL;
+}
+
+static void dax_release_resource(void *res)
+{
+	struct dax_resource *dax_resource = res;
+
+	guard(rwsem_write)(&dax_region_rwsem);
+	__dax_release_resource(dax_resource);
+	kfree(dax_resource);
+}
+
+int dax_region_add_resource(struct dax_region *dax_region,
+			    struct device *device,
+			    resource_size_t start, resource_size_t length)
+{
+	struct resource *new_resource;
+	int rc;
+
+	struct dax_resource *dax_resource __free(kfree) =
+				kzalloc(sizeof(*dax_resource), GFP_KERNEL);
+	if (!dax_resource)
+		return -ENOMEM;
+
+	guard(rwsem_write)(&dax_region_rwsem);
+
+	dev_dbg(dax_region->dev, "DAX region resource %pr\n", &dax_region->res);
+	new_resource = __request_region(&dax_region->res, start, length, "extent", 0);
+	if (!new_resource) {
+		dev_err(dax_region->dev, "Failed to add region s:%pa l:%pa\n",
+			&start, &length);
+		return -ENOSPC;
+	}
+
+	dev_dbg(dax_region->dev, "add resource %pr\n", new_resource);
+	dax_resource->region = dax_region;
+	dax_resource->res = new_resource;
+
+	/*
+	 * open code devm_add_action_or_reset() to avoid recursive write lock
+	 * of dax_region_rwsem in the error case.
+	 */
+	rc = devm_add_action(device, dax_release_resource, dax_resource);
+	if (rc) {
+		__dax_release_resource(dax_resource);
+		return rc;
+	}
+
+	dev_set_drvdata(device, no_free_ptr(dax_resource));
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dax_region_add_resource);
+
+int dax_region_rm_resource(struct dax_region *dax_region,
+			   struct device *dev)
+{
+	struct dax_resource *dax_resource;
+
+	guard(rwsem_write)(&dax_region_rwsem);
+
+	dax_resource = dev_get_drvdata(dev);
+	if (!dax_resource)
+		return 0;
+
+	if (dax_resource->use_cnt)
+		return -EBUSY;
+
+	/*
+	 * release the resource under dax_region_rwsem to avoid races with
+	 * users trying to use the extent
+	 */
+	__dax_release_resource(dax_resource);
+	dev_set_drvdata(dev, NULL);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dax_region_rm_resource);
+
 bool static_dev_dax(struct dev_dax *dev_dax)
 {
 	return is_static(dev_dax->region);
@@ -297,19 +384,41 @@ static ssize_t region_align_show(struct device *dev,
 static struct device_attribute dev_attr_region_align =
 		__ATTR(align, 0400, region_align_show, NULL);
 
+resource_size_t
+dax_avail_size(struct resource *dax_resource)
+{
+	resource_size_t rc;
+	struct resource *used_res;
+
+	rc = resource_size(dax_resource);
+	for_each_child_resource(dax_resource, used_res)
+		rc -= resource_size(used_res);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(dax_avail_size);
+
 #define for_each_dax_region_resource(dax_region, res) \
 	for (res = (dax_region)->res.child; res; res = res->sibling)
 
 static unsigned long long dax_region_avail_size(struct dax_region *dax_region)
 {
-	resource_size_t size = resource_size(&dax_region->res);
+	resource_size_t size;
 	struct resource *res;
 
 	lockdep_assert_held(&dax_region_rwsem);
 
-	if (is_sparse(dax_region))
-		return 0;
+	if (is_sparse(dax_region)) {
+		/*
+		 * Children of a sparse region represent available space not
+		 * used space.
+		 */
+		size = 0;
+		for_each_dax_region_resource(dax_region, res)
+			size += dax_avail_size(res);
+		return size;
+	}
 
+	size = resource_size(&dax_region->res);
 	for_each_dax_region_resource(dax_region, res)
 		size -= resource_size(res);
 	return size;
@@ -450,15 +559,26 @@ EXPORT_SYMBOL_GPL(kill_dev_dax);
 static void trim_dev_dax_range(struct dev_dax *dev_dax)
 {
 	int i = dev_dax->nr_range - 1;
-	struct range *range = &dev_dax->ranges[i].range;
+	struct dev_dax_range *dev_range = &dev_dax->ranges[i];
+	struct range *range = &dev_range->range;
 	struct dax_region *dax_region = dev_dax->region;
+	struct resource *res = &dax_region->res;
 
 	lockdep_assert_held_write(&dax_region_rwsem);
 	dev_dbg(&dev_dax->dev, "delete range[%d]: %#llx:%#llx\n", i,
 		(unsigned long long)range->start,
 		(unsigned long long)range->end);
 
-	__release_region(&dax_region->res, range->start, range_len(range));
+	if (dev_range->dax_resource) {
+		res = dev_range->dax_resource->res;
+		dev_dbg(&dev_dax->dev, "Trim sparse extent %pr\n", res);
+	}
+
+	__release_region(res, range->start, range_len(range));
+
+	if (dev_range->dax_resource)
+		dev_range->dax_resource->use_cnt--;
+
 	if (--dev_dax->nr_range == 0) {
 		kfree(dev_dax->ranges);
 		dev_dax->ranges = NULL;
@@ -642,7 +762,7 @@ static void dax_region_unregister(void *region)
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct range *range, int target_node, unsigned int align,
-		unsigned long flags)
+		unsigned long flags, struct dax_sparse_ops *sparse_ops)
 {
 	struct dax_region *dax_region;
 	int rc;
@@ -661,12 +781,16 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 			|| !IS_ALIGNED(range_len(range), align))
 		return NULL;
 
+	if (!sparse_ops && (flags & IORESOURCE_DAX_SPARSE_CAP))
+		return NULL;
+
 	dax_region = kzalloc_obj(*dax_region);
 	if (!dax_region)
 		return NULL;
 
 	dev_set_drvdata(parent, dax_region);
 	kref_init(&dax_region->kref);
+	dax_region->sparse_ops = sparse_ops;
 	dax_region->id = region_id;
 	dax_region->align = align;
 	dax_region->dev = parent;
@@ -859,7 +983,8 @@ static int devm_register_dax_mapping(struct dev_dax *dev_dax, int range_id)
 }
 
 static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax,
-			       u64 start, resource_size_t size)
+			       u64 start, resource_size_t size,
+			       struct dax_resource *dax_resource)
 {
 	struct device *dev = &dev_dax->dev;
 	struct dev_dax_range *ranges;
@@ -898,6 +1023,7 @@ static int alloc_dev_dax_range(struct resource *parent, struct dev_dax *dev_dax,
 			.start = alloc->start,
 			.end = alloc->end,
 		},
+		.dax_resource = dax_resource,
 	};
 
 	dev_dbg(dev, "alloc range[%d]: %pa:%pa\n", dev_dax->nr_range - 1,
@@ -980,7 +1106,8 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size)
 	int i;
 
 	for (i = dev_dax->nr_range - 1; i >= 0; i--) {
-		struct range *range = &dev_dax->ranges[i].range;
+		struct dev_dax_range *dev_range = &dev_dax->ranges[i];
+		struct range *range = &dev_range->range;
 		struct dax_mapping *mapping = dev_dax->ranges[i].mapping;
 		struct resource *adjust = NULL, *res;
 		resource_size_t shrink;
@@ -996,12 +1123,21 @@ static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size)
 			continue;
 		}
 
-		for_each_dax_region_resource(dax_region, res)
-			if (strcmp(res->name, dev_name(dev)) == 0
-					&& res->start == range->start) {
-				adjust = res;
-				break;
-			}
+		if (dev_range->dax_resource) {
+			for_each_child_resource(dev_range->dax_resource->res, res)
+				if (strcmp(res->name, dev_name(dev)) == 0
+						&& res->start == range->start) {
+					adjust = res;
+					break;
+				}
+		} else {
+			for_each_dax_region_resource(dax_region, res)
+				if (strcmp(res->name, dev_name(dev)) == 0
+						&& res->start == range->start) {
+					adjust = res;
+					break;
+				}
+		}
 
 		if (dev_WARN_ONCE(dev, !adjust || i != dev_dax->nr_range - 1,
 					"failed to find matching resource\n"))
@@ -1039,19 +1175,21 @@ static bool adjust_ok(struct dev_dax *dev_dax, struct resource *res)
 }
 
 /**
- * dev_dax_resize_static - Expand the device into the unused portion of the
- * region. This may involve adjusting the end of an existing resource, or
- * allocating a new resource.
+ * __dev_dax_resize - Expand the device into the unused portion of the region.
+ * This may involve adjusting the end of an existing resource, or allocating a
+ * new resource.
  *
  * @parent: parent resource to allocate this range in
  * @dev_dax: DAX device to be expanded
  * @to_alloc: amount of space to alloc; must be <= space available in @parent
+ * @dax_resource: if sparse; the parent resource
  *
  * Return the amount of space allocated or -ERRNO on failure
  */
-static ssize_t dev_dax_resize_static(struct resource *parent,
-				     struct dev_dax *dev_dax,
-				     resource_size_t to_alloc)
+static ssize_t __dev_dax_resize(struct resource *parent,
+				struct dev_dax *dev_dax,
+				resource_size_t to_alloc,
+				struct dax_resource *dax_resource)
 {
 	struct resource *res, *first;
 	int rc;
@@ -1059,7 +1197,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent,
 	first = parent->child;
 	if (!first) {
 		rc = alloc_dev_dax_range(parent, dev_dax,
-					   parent->start, to_alloc);
+					   parent->start, to_alloc,
+					   dax_resource);
 		if (rc)
 			return rc;
 		return to_alloc;
@@ -1073,7 +1212,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent,
 		if (res == first && res->start > parent->start) {
 			alloc = min(res->start - parent->start, to_alloc);
 			rc = alloc_dev_dax_range(parent, dev_dax,
-						 parent->start, alloc);
+						 parent->start, alloc,
+						 dax_resource);
 			if (rc)
 				return rc;
 			return alloc;
@@ -1097,7 +1237,8 @@ static ssize_t dev_dax_resize_static(struct resource *parent,
 				return rc;
 			return alloc;
 		}
-		rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc);
+		rc = alloc_dev_dax_range(parent, dev_dax, res->end + 1, alloc,
+					 dax_resource);
 		if (rc)
 			return rc;
 		return alloc;
@@ -1108,6 +1249,51 @@ static ssize_t dev_dax_resize_static(struct resource *parent,
 	return 0;
 }
 
+static ssize_t dev_dax_resize_static(struct dax_region *dax_region,
+				     struct dev_dax *dev_dax,
+				     resource_size_t to_alloc)
+{
+	return __dev_dax_resize(&dax_region->res, dev_dax, to_alloc, NULL);
+}
+
+static int find_free_extent(struct device *dev, const void *data)
+{
+	const struct dax_region *dax_region = data;
+	struct dax_resource *dax_resource;
+
+	if (!dax_region->sparse_ops->is_extent(dev))
+		return 0;
+
+	dax_resource = dev_get_drvdata(dev);
+	if (!dax_resource || !dax_avail_size(dax_resource->res))
+		return 0;
+	return 1;
+}
+
+static ssize_t dev_dax_resize_sparse(struct dax_region *dax_region,
+				     struct dev_dax *dev_dax,
+				     resource_size_t to_alloc)
+{
+	struct dax_resource *dax_resource;
+	ssize_t alloc;
+
+	struct device *extent_dev __free(put_device) =
+			device_find_child(dax_region->dev, dax_region,
+					  find_free_extent);
+	if (!extent_dev)
+		return 0;
+
+	dax_resource = dev_get_drvdata(extent_dev);
+	if (!dax_resource)
+		return 0;
+
+	to_alloc = min(dax_avail_size(dax_resource->res), to_alloc);
+	alloc = __dev_dax_resize(dax_resource->res, dev_dax, to_alloc, dax_resource);
+	if (alloc > 0)
+		dax_resource->use_cnt++;
+	return alloc;
+}
+
 static ssize_t dev_dax_resize(struct dax_region *dax_region,
 		struct dev_dax *dev_dax, resource_size_t size)
 {
@@ -1132,7 +1318,10 @@ static ssize_t dev_dax_resize(struct dax_region *dax_region,
 		return -ENXIO;
 
 retry:
-	alloc = dev_dax_resize_static(&dax_region->res, dev_dax, to_alloc);
+	if (is_sparse(dax_region))
+		alloc = dev_dax_resize_sparse(dax_region, dev_dax, to_alloc);
+	else
+		alloc = dev_dax_resize_static(dax_region, dev_dax, to_alloc);
 	if (alloc <= 0)
 		return alloc;
 	to_alloc -= alloc;
@@ -1241,7 +1430,7 @@ static ssize_t mapping_store(struct device *dev, struct device_attribute *attr,
 	to_alloc = range_len(&r);
 	if (alloc_is_aligned(dev_dax, to_alloc))
 		rc = alloc_dev_dax_range(&dax_region->res, dev_dax, r.start,
-					 to_alloc);
+					 to_alloc, NULL);
 	up_write(&dax_dev_rwsem);
 	up_write(&dax_region_rwsem);
 
@@ -1480,6 +1669,12 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 	struct device *dev;
 	int rc;
 
+	dev_dax = kzalloc_obj(*dev_dax);
+	if (is_sparse(dax_region) && data->size) {
+		dev_err(parent, "Sparse DAX region devices must be created initially with 0 size");
+		return ERR_PTR(-EINVAL);
+	}
+
 	dev_dax = kzalloc_obj(*dev_dax);
 	if (!dev_dax)
 		return ERR_PTR(-ENOMEM);
@@ -1510,7 +1705,7 @@ static struct dev_dax *__devm_create_dev_dax(struct dev_dax_data *data)
 	dev_set_name(dev, "dax%d.%d", dax_region->id, dev_dax->id);
 
 	rc = alloc_dev_dax_range(&dax_region->res, dev_dax, dax_region->res.start,
-				 data->size);
+				 data->size, NULL);
 	if (rc)
 		goto err_range;
 
diff --git a/drivers/dax/bus.h b/drivers/dax/bus.h
index 7abdd5a403dc..d41cb36e8868 100644
--- a/drivers/dax/bus.h
+++ b/drivers/dax/bus.h
@@ -11,6 +11,7 @@ struct dev_dax;
 struct resource;
 struct dax_device;
 struct dax_region;
+struct dax_sparse_ops;
 
 /* dax bus specific ioresource flags */
 #define IORESOURCE_DAX_STATIC BIT(0)
@@ -19,7 +20,7 @@ struct dax_region;
 
 struct dax_region *alloc_dax_region(struct device *parent, int region_id,
 		struct range *range, int target_node, unsigned int align,
-		unsigned long flags);
+		unsigned long flags, struct dax_sparse_ops *sparse_ops);
 
 struct dev_dax_data {
 	struct dax_region *dax_region;
diff --git a/drivers/dax/cxl.c b/drivers/dax/cxl.c
index 9ebe974d25c3..159fb4e82529 100644
--- a/drivers/dax/cxl.c
+++ b/drivers/dax/cxl.c
@@ -5,6 +5,57 @@
 
 #include "../cxl/cxl.h"
 #include "bus.h"
+#include "dax-private.h"
+
+static int __cxl_dax_add_resource(struct dax_region *dax_region,
+				  struct region_extent *region_extent)
+{
+	struct device *dev = &region_extent->dev;
+	resource_size_t start, length;
+
+	start = dax_region->res.start + region_extent->hpa_range.start;
+	length = range_len(&region_extent->hpa_range);
+	return dax_region_add_resource(dax_region, dev, start, length);
+}
+
+static int cxl_dax_add_resource(struct device *dev, void *data)
+{
+	struct dax_region *dax_region = data;
+	struct region_extent *region_extent;
+
+	region_extent = to_region_extent(dev);
+	if (!region_extent)
+		return 0;
+
+	dev_dbg(dax_region->dev, "Adding resource HPA %pra\n",
+		&region_extent->hpa_range);
+
+	return __cxl_dax_add_resource(dax_region, region_extent);
+}
+
+static int cxl_dax_region_notify(struct device *dev,
+				 struct cxl_notify_data *notify_data)
+{
+	struct cxl_dax_region *cxlr_dax = to_cxl_dax_region(dev);
+	struct dax_region *dax_region = dev_get_drvdata(dev);
+	struct region_extent *region_extent = notify_data->region_extent;
+
+	switch (notify_data->event) {
+	case DCD_ADD_CAPACITY:
+		return __cxl_dax_add_resource(dax_region, region_extent);
+	case DCD_RELEASE_CAPACITY:
+		return dax_region_rm_resource(dax_region, &region_extent->dev);
+	case DCD_FORCED_CAPACITY_RELEASE:
+	default:
+		dev_err(&cxlr_dax->dev, "Unknown DC event %d\n",
+			notify_data->event);
+		return -ENXIO;
+	}
+}
+
+struct dax_sparse_ops sparse_ops = {
+	.is_extent = is_region_extent,
+};
 
 static int cxl_dax_region_probe(struct device *dev)
 {
@@ -24,15 +75,18 @@ static int cxl_dax_region_probe(struct device *dev)
 		flags |= IORESOURCE_DAX_SPARSE_CAP;
 
 	dax_region = alloc_dax_region(dev, cxlr->id, &cxlr_dax->hpa_range, nid,
-				      PMD_SIZE, flags);
+				      PMD_SIZE, flags, &sparse_ops);
 	if (!dax_region)
 		return -ENOMEM;
 
-	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A)
+	if (cxlr->mode == CXL_PARTMODE_DYNAMIC_RAM_A) {
+		device_for_each_child(&cxlr_dax->dev, dax_region,
+				      cxl_dax_add_resource);
 		/* Add empty seed dax device */
 		dev_size = 0;
-	else
+	} else {
 		dev_size = range_len(&cxlr_dax->hpa_range);
+	}
 
 	data = (struct dev_dax_data) {
 		.dax_region = dax_region,
@@ -47,6 +101,7 @@ static int cxl_dax_region_probe(struct device *dev)
 static struct cxl_driver cxl_dax_region_driver = {
 	.name = "cxl_dax_region",
 	.probe = cxl_dax_region_probe,
+	.notify = cxl_dax_region_notify,
 	.id = CXL_DEVICE_DAX_REGION,
 	.drv = {
 		.suppress_bind_attrs = true,
diff --git a/drivers/dax/dax-private.h b/drivers/dax/dax-private.h
index c6ae27c982f4..83b198068597 100644
--- a/drivers/dax/dax-private.h
+++ b/drivers/dax/dax-private.h
@@ -16,6 +16,14 @@ struct inode *dax_inode(struct dax_device *dax_dev);
 int dax_bus_init(void);
 void dax_bus_exit(void);
 
+/**
+ * struct dax_sparse_ops - Operations for sparse regions
+ * @is_extent: return if the device is an extent
+ */
+struct dax_sparse_ops {
+	bool (*is_extent)(struct device *dev);
+};
+
 /**
  * struct dax_region - mapping infrastructure for dax devices
  * @id: kernel-wide unique region for a memory range
@@ -27,6 +35,7 @@ void dax_bus_exit(void);
  * @res: resource tree to track instance allocations
  * @seed: allow userspace to find the first unbound seed device
  * @youngest: allow userspace to find the most recently created device
+ * @sparse_ops: operations required for sparse regions
  */
 struct dax_region {
 	int id;
@@ -38,6 +47,7 @@ struct dax_region {
 	struct resource res;
 	struct device *seed;
 	struct device *youngest;
+	struct dax_sparse_ops *sparse_ops;
 };
 
 /**
@@ -57,11 +67,13 @@ struct dax_mapping {
  * @pgoff: page offset
  * @range: resource-span
  * @mapping: reference to the dax_mapping for this range
+ * @dax_resource: if not NULL; dax sparse resource containing this range
  */
 struct dev_dax_range {
 	unsigned long pgoff;
 	struct range range;
 	struct dax_mapping *mapping;
+	struct dax_resource *dax_resource;
 };
 
 /**
@@ -102,6 +114,34 @@ struct dev_dax {
  */
 void run_dax(struct dax_device *dax_dev);
 
+/**
+ * struct dax_resource - For sparse regions; an active resource
+ * @region: dax_region this resources is in
+ * @res: resource
+ * @use_cnt: count the number of uses of this resource
+ *
+ * Changes to the dax_region and the dax_resources within it are protected by
+ * dax_region_rwsem
+ *
+ * dax_resource's are not intended to be used outside the dax layer.
+ */
+struct dax_resource {
+	struct dax_region *region;
+	struct resource *res;
+	unsigned int use_cnt;
+};
+
+/*
+ * Similar to run_dax() dax_region_{add,rm}_resource() and dax_avail_size() are
+ * exported but are not intended to be generic operations outside the dax
+ * subsystem.  They are only generic between the dax layer and the dax drivers.
+ */
+int dax_region_add_resource(struct dax_region *dax_region, struct device *dev,
+			    resource_size_t start, resource_size_t length);
+int dax_region_rm_resource(struct dax_region *dax_region,
+			   struct device *dev);
+resource_size_t dax_avail_size(struct resource *dax_resource);
+
 static inline struct dev_dax *to_dev_dax(struct device *dev)
 {
 	return container_of(dev, struct dev_dax, dev);
diff --git a/drivers/dax/hmem/hmem.c b/drivers/dax/hmem/hmem.c
index af21f66bf872..be938c2a73f8 100644
--- a/drivers/dax/hmem/hmem.c
+++ b/drivers/dax/hmem/hmem.c
@@ -28,7 +28,7 @@ static int dax_hmem_probe(struct platform_device *pdev)
 
 	mri = dev->platform_data;
 	dax_region = alloc_dax_region(dev, pdev->id, &mri->range,
-				      mri->target_node, PMD_SIZE, flags);
+				      mri->target_node, PMD_SIZE, flags, NULL);
 	if (!dax_region)
 		return -ENOMEM;
 
diff --git a/drivers/dax/pmem.c b/drivers/dax/pmem.c
index bee93066a849..5b5be86768f3 100644
--- a/drivers/dax/pmem.c
+++ b/drivers/dax/pmem.c
@@ -53,7 +53,7 @@ static struct dev_dax *__dax_pmem_probe(struct device *dev)
 	range.start += offset;
 	dax_region = alloc_dax_region(dev, region_id, &range,
 			nd_region->target_node, le32_to_cpu(pfn_sb->align),
-			IORESOURCE_DAX_STATIC);
+			IORESOURCE_DAX_STATIC, NULL);
 	if (!dax_region)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 5533a5debf3f..ab364e25f479 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -28,6 +28,9 @@ struct resource {
 	struct resource *parent, *sibling, *child;
 };
 
+#define for_each_child_resource(parent, res) \
+	for (res = (parent)->child; res; res = res->sibling)
+
 /*
  * IO resources have these defined flags.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 16/20] cxl/region: Read existing extents on region creation
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (14 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 15/20] dax/region: Create resources on DAX regions Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 17/20] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni

From: Ira Weiny <ira.weiny@intel.com>

Dynamic capacity device extents may be left in an accepted state on a
device due to an unexpected host crash.  In this case it is expected
that the creation of a new region on top of a DC partition can read
those extents and surface them for continued use.

Once all endpoint decoders are part of a region and the region is being
realized, a read of the 'devices extent list' can reveal these
previously accepted extents.

CXL r3.1 specifies the mailbox call Get Dynamic Capacity Extent List for
this purpose.  The call returns all the extents for all dynamic capacity
partitions.  If the fabric manager is adding extents to any DCD
partition, the extent list for the recovered region may change.  In this
case the query must retry.  Upon retry the query could encounter extents
which were accepted on a previous list query.  Adding such extents is
ignored without error because they are entirely within a previous
accepted extent.  Instead warn on this case to allow for differentiating
bad devices from this normal condition.

Latch any errors to be bubbled up to ensure notification to the user
even if individual errors are rate limited or otherwise ignored.

The scan for existing extents races with the dax_cxl driver.  This is
synchronized through the region device lock.  Extents which are found
after the driver has loaded will surface through the normal notification
path while extents seen prior to the driver are read during driver load.

Based on an original patch by Navneet Singh.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[anisa]: enforce same tagged add rules for processing existing extents

Instead of onlining and notifying the dax layer as we go, process the
entire list, creating a single region extent from it, then
online/notify.
---
 drivers/cxl/core/core.h |   1 +
 drivers/cxl/core/mbox.c | 116 ++++++++++++++++++++++++++++++++++++++++
 drivers/cxl/cxlmem.h    |  21 ++++++++
 3 files changed, 138 insertions(+)

diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index cb3f46a1152a..24121734b2d5 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -28,6 +28,7 @@ cxled_to_mds(struct cxl_endpoint_decoder *cxled)
 	return container_of(cxlds, struct cxl_memdev_state, cxlds);
 }
 
+int cxl_process_extent_list(struct cxl_endpoint_decoder *cxled);
 int cxl_region_invalidate_memregion(struct cxl_region *cxlr);
 
 #ifdef CONFIG_CXL_REGION
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index c256081d2c65..9ef38a81933d 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1916,6 +1916,122 @@ int cxl_dev_dc_identify(struct cxl_mailbox *mbox,
 }
 EXPORT_SYMBOL_NS_GPL(cxl_dev_dc_identify, "CXL");
 
+/* Return -EAGAIN if the extent list changes while reading */
+static int __cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
+{
+	u32 current_index, total_read, total_expected, initial_gen_num;
+	struct cxl_memdev_state *mds = cxled_to_mds(cxled);
+	struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
+	struct device *dev = mds->cxlds.dev;
+	struct cxl_mbox_cmd mbox_cmd;
+	u32 max_extent_count;
+	int latched_rc = 0;
+	bool first = true;
+
+	struct cxl_mbox_get_extent_out *extents __free(kvfree) =
+				kvmalloc(cxl_mbox->payload_size, GFP_KERNEL);
+	if (!extents)
+		return -ENOMEM;
+
+	total_read = 0;
+	current_index = 0;
+	total_expected = 0;
+	max_extent_count = (cxl_mbox->payload_size - sizeof(*extents)) /
+				sizeof(struct cxl_extent);
+	do {
+		u32 nr_returned, current_total, current_gen_num;
+		struct cxl_mbox_get_extent_in get_extent;
+		int rc;
+
+		get_extent = (struct cxl_mbox_get_extent_in) {
+			.extent_cnt = cpu_to_le32(max(max_extent_count,
+						  total_expected - current_index)),
+			.start_extent_index = cpu_to_le32(current_index),
+		};
+
+		mbox_cmd = (struct cxl_mbox_cmd) {
+			.opcode = CXL_MBOX_OP_GET_DC_EXTENT_LIST,
+			.payload_in = &get_extent,
+			.size_in = sizeof(get_extent),
+			.size_out = cxl_mbox->payload_size,
+			.payload_out = extents,
+			.min_out = 1,
+		};
+
+		rc = cxl_internal_send_cmd(cxl_mbox, &mbox_cmd);
+		if (rc < 0)
+			return rc;
+
+		/* Save initial data */
+		if (first) {
+			total_expected = le32_to_cpu(extents->total_extent_count);
+			initial_gen_num = le32_to_cpu(extents->generation_num);
+			first = false;
+		}
+
+		nr_returned = le32_to_cpu(extents->returned_extent_count);
+		total_read += nr_returned;
+		current_total = le32_to_cpu(extents->total_extent_count);
+		current_gen_num = le32_to_cpu(extents->generation_num);
+
+		dev_dbg(dev, "Got extent list %d-%d of %d generation Num:%d\n",
+			current_index, total_read - 1, current_total, current_gen_num);
+
+		if (current_gen_num != initial_gen_num || total_expected != current_total) {
+			dev_warn(dev, "Extent list change detected; gen %u != %u : cnt %u != %u\n",
+				 current_gen_num, initial_gen_num,
+				 total_expected, current_total);
+			return -EAGAIN;
+		}
+
+		for (int i = 0; i < nr_returned ; i++) {
+			struct cxl_extent *extent = &extents->extent[i];
+
+			dev_dbg(dev, "Processing extent %d/%d\n",
+				current_index + i, total_expected);
+
+			rc = add_to_pending_list(&mds->add_ctx.pending_extents,
+						 extent);
+			if (rc) {
+				latched_rc = rc;
+			}
+		}
+
+		current_index += nr_returned;
+	} while (total_expected > total_read);
+
+	if (!latched_rc && !list_empty(&mds->add_ctx.pending_extents)) {
+		latched_rc = cxl_add_pending(mds);
+	}
+	clear_pending_extents(mds);
+
+	return latched_rc;
+}
+
+#define CXL_READ_EXTENT_LIST_RETRY 10
+
+/**
+ * cxl_process_extent_list() - Read existing extents
+ * @cxled: Endpoint decoder which is part of a region
+ *
+ * Issue the Get Dynamic Capacity Extent List command to the device
+ * and add existing extents if found.
+ *
+ * A retry of 10 is somewhat arbitrary, however, extent changes should be
+ * relatively rare while bringing up a region.  So 10 should be plenty.
+ */
+int cxl_process_extent_list(struct cxl_endpoint_decoder *cxled)
+{
+	int retry = CXL_READ_EXTENT_LIST_RETRY;
+	int rc;
+
+	do {
+		rc = __cxl_process_extent_list(cxled);
+	} while (rc == -EAGAIN && retry--);
+
+	return rc;
+}
+
 static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
 {
 	int i = info->nr_partitions;
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 6da958352afd..3346da309be3 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -532,6 +532,27 @@ struct cxl_mbox_dc_response {
 	} __packed extent_list[] __counted_by(extent_list_size);
 } __packed;
 
+/*
+ * Get Dynamic Capacity Extent List; Input Payload
+ * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-166
+ */
+struct cxl_mbox_get_extent_in {
+	__le32 extent_cnt;
+	__le32 start_extent_index;
+} __packed;
+
+/*
+ * Get Dynamic Capacity Extent List; Output Payload
+ * CXL rev 3.1 section 8.2.9.9.9.2; Table 8-167
+ */
+struct cxl_mbox_get_extent_out {
+	__le32 returned_extent_count;
+	__le32 total_extent_count;
+	__le32 generation_num;
+	u8 rsvd[4];
+	struct cxl_extent extent[];
+} __packed;
+
 struct cxl_mbox_get_supported_logs {
 	__le16 entries;
 	u8 rsvd[6];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 17/20] cxl/mem: Trace Dynamic capacity Event Record
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (15 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 16/20] cxl/region: Read existing extents on region creation Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 18/20] tools/testing/cxl: Make event logs dynamic Anisa Su
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron,
	Fan Ni

From: Ira Weiny <ira.weiny@intel.com>

CXL rev 3.1 section 8.2.9.2.1 adds the Dynamic Capacity Event Records.
User space can use trace events for debugging of DC capacity changes.

Add DC trace points to the trace log.

Based on an original patch by Navneet Singh.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[djbw: s/region/partition/]
[iweiny: s/tag/uuid/]
---
 drivers/cxl/core/mbox.c  |  5 ++++
 drivers/cxl/core/trace.h | 65 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9ef38a81933d..81a1d7afced9 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1097,6 +1097,11 @@ static void __cxl_event_trace_record(struct cxl_memdev *cxlmd,
 		ev_type = CXL_CPER_EVENT_MEM_MODULE;
 	else if (uuid_equal(uuid, &CXL_EVENT_MEM_SPARING_UUID))
 		ev_type = CXL_CPER_EVENT_MEM_SPARING;
+	else if (uuid_equal(uuid, &CXL_EVENT_DC_EVENT_UUID)) {
+/* FIXME still valid? */
+		trace_cxl_dynamic_capacity(cxlmd, type, &record->event.dcd);
+		return;
+	}
 
 	cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
 }
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index a972e4ef1936..421e492d1b3f 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -1099,6 +1099,71 @@ TRACE_EVENT(cxl_poison,
 	)
 );
 
+/*
+ * Dynamic Capacity Event Record - DER
+ *
+ * CXL rev 3.1 section 8.2.9.2.1.6 Table 8-50
+ */
+
+#define CXL_DC_ADD_CAPACITY			0x00
+#define CXL_DC_REL_CAPACITY			0x01
+#define CXL_DC_FORCED_REL_CAPACITY		0x02
+#define CXL_DC_REG_CONF_UPDATED			0x03
+#define show_dc_evt_type(type)	__print_symbolic(type,		\
+	{ CXL_DC_ADD_CAPACITY,	"Add capacity"},		\
+	{ CXL_DC_REL_CAPACITY,	"Release capacity"},		\
+	{ CXL_DC_FORCED_REL_CAPACITY,	"Forced capacity release"},	\
+	{ CXL_DC_REG_CONF_UPDATED,	"Region Configuration Updated"	} \
+)
+
+TRACE_EVENT(cxl_dynamic_capacity,
+
+	TP_PROTO(const struct cxl_memdev *cxlmd, enum cxl_event_log_type log,
+		 struct cxl_event_dcd *rec),
+
+	TP_ARGS(cxlmd, log, rec),
+
+	TP_STRUCT__entry(
+		CXL_EVT_TP_entry
+
+		/* Dynamic capacity Event */
+		__field(u8, event_type)
+		__field(u16, hostid)
+		__field(u8, partition_id)
+		__field(u64, dpa_start)
+		__field(u64, length)
+		__array(u8, uuid, UUID_SIZE)
+		__field(u16, sh_extent_seq)
+	),
+
+	TP_fast_assign(
+		CXL_EVT_TP_fast_assign(cxlmd, log, rec->hdr);
+
+		/* Dynamic_capacity Event */
+		__entry->event_type = rec->event_type;
+
+		/* DCD event record data */
+		__entry->hostid = le16_to_cpu(rec->host_id);
+		__entry->partition_id = rec->partition_index;
+		__entry->dpa_start = le64_to_cpu(rec->extent.start_dpa);
+		__entry->length = le64_to_cpu(rec->extent.length);
+		memcpy(__entry->uuid, &rec->extent.uuid, UUID_SIZE);
+		__entry->sh_extent_seq = le16_to_cpu(rec->extent.shared_extn_seq);
+	),
+
+	CXL_EVT_TP_printk("event_type='%s' host_id='%d' partition_id='%d' " \
+		"starting_dpa=%llx length=%llx tag=%pU " \
+		"shared_extent_sequence=%d",
+		show_dc_evt_type(__entry->event_type),
+		__entry->hostid,
+		__entry->partition_id,
+		__entry->dpa_start,
+		__entry->length,
+		__entry->uuid,
+		__entry->sh_extent_seq
+	)
+);
+
 #endif /* _CXL_EVENTS_H */
 
 #define TRACE_INCLUDE_FILE trace
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 18/20] tools/testing/cxl: Make event logs dynamic
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (16 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 17/20] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 19/20] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron

From: Ira Weiny <ira.weiny@intel.com>

The event logs test was created as static arrays as an easy way to mock
events.  Dynamic Capacity Device (DCD) test support requires events be
generated dynamically when extents are created or destroyed.

The current event log test has specific checks for the number of events
seen including log overflow.

Modify mock event logs to be dynamically allocated.  Adjust array size
and mock event entry data to match the output expected by the existing
event test.

Use the static event data to create the dynamic events in the new logs
without inventing complex event injection for the previous tests.

Simplify log processing by using the event log array index as the
handle.  Add a lock to manage concurrency required when user space is
allowed to control DCD extents

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[iweiny: rebase to 6.15-rc1]
---
 tools/testing/cxl/test/mem.c | 265 +++++++++++++++++++++--------------
 1 file changed, 161 insertions(+), 104 deletions(-)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 271c7ad8cc32..fe1dadddd18e 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -142,18 +142,26 @@ static struct {
 
 #define PASS_TRY_LIMIT 3
 
-#define CXL_TEST_EVENT_CNT_MAX 15
+#define CXL_TEST_EVENT_CNT_MAX 16
+/* 1 extra slot to accommodate that handles can't be 0 */
+#define CXL_TEST_EVENT_ARRAY_SIZE (CXL_TEST_EVENT_CNT_MAX + 1)
 
 /* Set a number of events to return at a time for simulation.  */
 #define CXL_TEST_EVENT_RET_MAX 4
 
+/*
+ * @last_handle: last handle (index) to have an entry stored
+ * @current_handle: current handle (index) to be returned to the user on get_event
+ * @nr_overflow: number of events added past the log size
+ * @lock: protect these state variables
+ * @events: array of pending events to be returned.
+ */
 struct mock_event_log {
-	u16 clear_idx;
-	u16 cur_idx;
-	u16 nr_events;
+	u16 last_handle;
+	u16 current_handle;
 	u16 nr_overflow;
-	u16 overflow_reset;
-	struct cxl_event_record_raw *events[CXL_TEST_EVENT_CNT_MAX];
+	rwlock_t lock;
+	struct cxl_event_record_raw *events[CXL_TEST_EVENT_ARRAY_SIZE];
 };
 
 struct mock_event_store {
@@ -194,56 +202,65 @@ static struct mock_event_log *event_find_log(struct device *dev, int log_type)
 	return &mdata->mes.mock_logs[log_type];
 }
 
-static struct cxl_event_record_raw *event_get_current(struct mock_event_log *log)
-{
-	return log->events[log->cur_idx];
-}
-
-static void event_reset_log(struct mock_event_log *log)
-{
-	log->cur_idx = 0;
-	log->clear_idx = 0;
-	log->nr_overflow = log->overflow_reset;
-}
-
 /* Handle can never be 0 use 1 based indexing for handle */
-static u16 event_get_clear_handle(struct mock_event_log *log)
+static u16 event_inc_handle(u16 handle)
 {
-	return log->clear_idx + 1;
+	handle = (handle + 1) % CXL_TEST_EVENT_ARRAY_SIZE;
+	if (handle == 0)
+		handle = 1;
+	return handle;
 }
 
-/* Handle can never be 0 use 1 based indexing for handle */
-static __le16 event_get_cur_event_handle(struct mock_event_log *log)
-{
-	u16 cur_handle = log->cur_idx + 1;
-
-	return cpu_to_le16(cur_handle);
-}
-
-static bool event_log_empty(struct mock_event_log *log)
-{
-	return log->cur_idx == log->nr_events;
-}
-
-static void mes_add_event(struct mock_event_store *mes,
+/* Add the event or free it on overflow */
+static void mes_add_event(struct cxl_mockmem_data *mdata,
 			  enum cxl_event_log_type log_type,
 			  struct cxl_event_record_raw *event)
 {
+	struct device *dev = mdata->mds->cxlds.dev;
 	struct mock_event_log *log;
 
 	if (WARN_ON(log_type >= CXL_EVENT_TYPE_MAX))
 		return;
 
-	log = &mes->mock_logs[log_type];
+	log = &mdata->mes.mock_logs[log_type];
+
+	guard(write_lock)(&log->lock);
 
-	if ((log->nr_events + 1) > CXL_TEST_EVENT_CNT_MAX) {
+	dev_dbg(dev, "Add log %d cur %d last %d\n",
+		log_type, log->current_handle, log->last_handle);
+
+	/* Check next buffer */
+	if (event_inc_handle(log->last_handle) == log->current_handle) {
 		log->nr_overflow++;
-		log->overflow_reset = log->nr_overflow;
+		dev_dbg(dev, "Overflowing log %d nr %d\n",
+			log_type, log->nr_overflow);
+		devm_kfree(dev, event);
 		return;
 	}
 
-	log->events[log->nr_events] = event;
-	log->nr_events++;
+	dev_dbg(dev, "Log %d; handle %u\n", log_type, log->last_handle);
+	event->event.generic.hdr.handle = cpu_to_le16(log->last_handle);
+	log->events[log->last_handle] = event;
+	log->last_handle = event_inc_handle(log->last_handle);
+}
+
+static void mes_del_event(struct device *dev,
+			  struct mock_event_log *log,
+			  u16 handle)
+{
+	struct cxl_event_record_raw *record;
+
+	lockdep_assert(lockdep_is_held(&log->lock));
+
+	dev_dbg(dev, "Clearing event %u; record %u\n",
+		handle, log->current_handle);
+	record = log->events[handle];
+	if (!record)
+		dev_err(dev, "Mock event index %u empty?\n", handle);
+
+	log->events[handle] = NULL;
+	log->current_handle = event_inc_handle(log->current_handle);
+	devm_kfree(dev, record);
 }
 
 /*
@@ -257,6 +274,7 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 	struct cxl_get_event_payload *pl;
 	struct mock_event_log *log;
 	int ret_limit;
+	u16 handle;
 	u8 log_type;
 	int i;
 
@@ -276,22 +294,31 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 	memset(cmd->payload_out, 0, struct_size(pl, records, 0));
 
 	log = event_find_log(dev, log_type);
-	if (!log || event_log_empty(log))
+	if (!log)
 		return 0;
 
 	pl = cmd->payload_out;
 
-	for (i = 0; i < ret_limit && !event_log_empty(log); i++) {
-		memcpy(&pl->records[i], event_get_current(log),
-		       sizeof(pl->records[i]));
-		pl->records[i].event.generic.hdr.handle =
-				event_get_cur_event_handle(log);
-		log->cur_idx++;
+	guard(read_lock)(&log->lock);
+
+	handle = log->current_handle;
+	dev_dbg(dev, "Get log %d handle %u last %u\n",
+		log_type, handle, log->last_handle);
+	for (i = 0; i < ret_limit && handle != log->last_handle;
+	     i++, handle = event_inc_handle(handle)) {
+		struct cxl_event_record_raw *cur;
+
+		cur = log->events[handle];
+		dev_dbg(dev, "Sending event log %d handle %d idx %u\n",
+			log_type, le16_to_cpu(cur->event.generic.hdr.handle),
+			handle);
+		memcpy(&pl->records[i], cur, sizeof(pl->records[i]));
+		pl->records[i].event.generic.hdr.handle = cpu_to_le16(handle);
 	}
 
 	cmd->size_out = struct_size(pl, records, i);
 	pl->record_count = cpu_to_le16(i);
-	if (!event_log_empty(log))
+	if (handle != log->last_handle)
 		pl->flags |= CXL_GET_EVENT_FLAG_MORE_RECORDS;
 
 	if (log->nr_overflow) {
@@ -313,8 +340,8 @@ static int mock_get_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 {
 	struct cxl_mbox_clear_event_payload *pl = cmd->payload_in;
-	struct mock_event_log *log;
 	u8 log_type = pl->event_log;
+	struct mock_event_log *log;
 	u16 handle;
 	int nr;
 
@@ -325,23 +352,20 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 	if (!log)
 		return 0; /* No mock data in this log */
 
-	/*
-	 * This check is technically not invalid per the specification AFAICS.
-	 * (The host could 'guess' handles and clear them in order).
-	 * However, this is not good behavior for the host so test it.
-	 */
-	if (log->clear_idx + pl->nr_recs > log->cur_idx) {
-		dev_err(dev,
-			"Attempting to clear more events than returned!\n");
-		return -EINVAL;
-	}
+	guard(write_lock)(&log->lock);
 
 	/* Check handle order prior to clearing events */
-	for (nr = 0, handle = event_get_clear_handle(log);
-	     nr < pl->nr_recs;
-	     nr++, handle++) {
+	handle = log->current_handle;
+	for (nr = 0; nr < pl->nr_recs && handle != log->last_handle;
+	     nr++, handle = event_inc_handle(handle)) {
+
+		dev_dbg(dev, "Checking clear of %d handle %u plhandle %u\n",
+			log_type, handle,
+			le16_to_cpu(pl->handles[nr]));
+
 		if (handle != le16_to_cpu(pl->handles[nr])) {
-			dev_err(dev, "Clearing events out of order\n");
+			dev_err(dev, "Clearing events out of order %u %u\n",
+				handle, le16_to_cpu(pl->handles[nr]));
 			return -EINVAL;
 		}
 	}
@@ -350,25 +374,12 @@ static int mock_clear_event(struct device *dev, struct cxl_mbox_cmd *cmd)
 		log->nr_overflow = 0;
 
 	/* Clear events */
-	log->clear_idx += pl->nr_recs;
-	return 0;
-}
-
-static void cxl_mock_event_trigger(struct device *dev)
-{
-	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
-	struct mock_event_store *mes = &mdata->mes;
-	int i;
-
-	for (i = CXL_EVENT_TYPE_INFO; i < CXL_EVENT_TYPE_MAX; i++) {
-		struct mock_event_log *log;
+	for (nr = 0; nr < pl->nr_recs; nr++)
+		mes_del_event(dev, log, le16_to_cpu(pl->handles[nr]));
+	dev_dbg(dev, "Delete log %d cur %d last %d\n",
+		log_type, log->current_handle, log->last_handle);
 
-		log = event_find_log(dev, i);
-		if (log)
-			event_reset_log(log);
-	}
-
-	cxl_mem_get_event_records(mdata->mds, mes->ev_status);
+	return 0;
 }
 
 struct cxl_event_record_raw maint_needed = {
@@ -509,8 +520,27 @@ static int mock_set_timestamp(struct cxl_dev_state *cxlds,
 	return 0;
 }
 
-static void cxl_mock_add_event_logs(struct mock_event_store *mes)
+/* Create a dynamically allocated event out of a statically defined event. */
+static void add_event_from_static(struct cxl_mockmem_data *mdata,
+				  enum cxl_event_log_type log_type,
+				  struct cxl_event_record_raw *raw)
 {
+	struct device *dev = mdata->mds->cxlds.dev;
+	struct cxl_event_record_raw *rec;
+
+	rec = devm_kmemdup(dev, raw, sizeof(*rec), GFP_KERNEL);
+	if (!rec) {
+		dev_err(dev, "Failed to alloc event for log\n");
+		return;
+	}
+	mes_add_event(mdata, log_type, rec);
+}
+
+static void cxl_mock_add_event_logs(struct cxl_mockmem_data *mdata)
+{
+	struct mock_event_store *mes = &mdata->mes;
+	struct device *dev = mdata->mds->cxlds.dev;
+
 	put_unaligned_le16(CXL_GMER_VALID_CHANNEL | CXL_GMER_VALID_RANK |
 			   CXL_GMER_VALID_COMPONENT | CXL_GMER_VALID_COMPONENT_ID_FORMAT,
 			   &gen_media.rec.media_hdr.validity_flags);
@@ -523,43 +553,60 @@ static void cxl_mock_add_event_logs(struct mock_event_store *mes)
 	put_unaligned_le16(CXL_MMER_VALID_COMPONENT | CXL_MMER_VALID_COMPONENT_ID_FORMAT,
 			   &mem_module.rec.validity_flags);
 
-	mes_add_event(mes, CXL_EVENT_TYPE_INFO, &maint_needed);
-	mes_add_event(mes, CXL_EVENT_TYPE_INFO,
+	dev_dbg(dev, "Generating fake event logs %d\n",
+		CXL_EVENT_TYPE_INFO);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_INFO, &maint_needed);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_INFO,
 		      (struct cxl_event_record_raw *)&gen_media);
-	mes_add_event(mes, CXL_EVENT_TYPE_INFO,
+	add_event_from_static(mdata, CXL_EVENT_TYPE_INFO,
 		      (struct cxl_event_record_raw *)&mem_module);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_INFO;
 
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &maint_needed);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL,
+	dev_dbg(dev, "Generating fake event logs %d\n",
+		CXL_EVENT_TYPE_FAIL);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &maint_needed);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL,
+		      (struct cxl_event_record_raw *)&mem_module);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL,
 		      (struct cxl_event_record_raw *)&dram);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL,
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL,
 		      (struct cxl_event_record_raw *)&gen_media);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL,
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL,
 		      (struct cxl_event_record_raw *)&mem_module);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL,
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL,
 		      (struct cxl_event_record_raw *)&dram);
 	/* Overflow this log */
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FAIL, &hardware_replace);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_FAIL;
 
-	mes_add_event(mes, CXL_EVENT_TYPE_FATAL, &hardware_replace);
-	mes_add_event(mes, CXL_EVENT_TYPE_FATAL,
+	dev_dbg(dev, "Generating fake event logs %d\n",
+		CXL_EVENT_TYPE_FATAL);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FATAL, &hardware_replace);
+	add_event_from_static(mdata, CXL_EVENT_TYPE_FATAL,
 		      (struct cxl_event_record_raw *)&dram);
 	mes->ev_status |= CXLDEV_EVENT_STATUS_FATAL;
 }
 
+static void cxl_mock_event_trigger(struct device *dev)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct mock_event_store *mes = &mdata->mes;
+
+	cxl_mock_add_event_logs(mdata);
+	cxl_mem_get_event_records(mdata->mds, mes->ev_status);
+}
+
 static int mock_gsl(struct cxl_mbox_cmd *cmd)
 {
 	if (cmd->size_out < sizeof(mock_gsl_payload))
@@ -1684,6 +1731,14 @@ static void cxl_mock_test_feat_init(struct cxl_mockmem_data *mdata)
 	mdata->test_feat.data = cpu_to_le32(0xdeadbeef);
 }
 
+static void init_event_log(struct mock_event_log *log)
+{
+	rwlock_init(&log->lock);
+	/* Handle can never be 0 use 1 based indexing for handle */
+	log->current_handle = 1;
+	log->last_handle = 1;
+}
+
 static int cxl_mock_mem_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
@@ -1767,7 +1822,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (rc)
 		dev_dbg(dev, "No CXL Features discovered\n");
 
-	cxl_mock_add_event_logs(&mdata->mes);
+	for (int i = 0; i < CXL_EVENT_TYPE_MAX; i++)
+		init_event_log(&mdata->mes.mock_logs[i]);
+	cxl_mock_add_event_logs(mdata);
 
 	cxlmd = devm_cxl_add_memdev(cxlds, NULL);
 	if (IS_ERR(cxlmd))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 19/20] tools/testing/cxl: Add DC Regions to mock mem data
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (17 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 18/20] tools/testing/cxl: Make event logs dynamic Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  1:23 ` [PATCH 20/20] dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE Anisa Su
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Jonathan Cameron

From: Ira Weiny <ira.weiny@intel.com>

cxl_test provides a good way to ensure quick smoke and regression
testing.  The complexity of Dynamic Capacity (DC) extent processing as
well as the complexity of the new sparse DAX regions can mostly be
tested through cxl_test.  This includes management of sparse regions and
DAX devices on those regions; the management of extent device lifetimes;
and the processing of DCD events.

The only missing functionality from this test is actual interrupt
processing.

Mock memory devices can easily mock DC information and manage fake
extent data.

Define mock_dc_partition information within the mock memory data.  Add
sysfs entries on the mock device to inject and delete extents.

The inject format is <start>:<length>:<tag>:<more_flag>
The delete format is <start>:<length>

Directly call the event irq callback to simulate irqs to process the
test extents.

Add DC mailbox commands to the CEL and implement those commands.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes:
[iweiny: rebase]
[djbw: s/region/partition/]
[iweiny: s/tag/uuid/]
---
 tools/testing/cxl/test/mem.c | 753 +++++++++++++++++++++++++++++++++++
 1 file changed, 753 insertions(+)

diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index fe1dadddd18e..6e3d97dce09e 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -20,6 +20,7 @@
 #define FW_SLOTS 3
 #define DEV_SIZE SZ_2G
 #define EFFECT(x) (1U << x)
+#define BASE_DYNAMIC_CAP_DPA DEV_SIZE
 
 #define MOCK_INJECT_DEV_MAX 8
 #define MOCK_INJECT_TEST_MAX 128
@@ -113,6 +114,22 @@ static struct cxl_cel_entry mock_cel[] = {
 				      EFFECT(SECURITY_CHANGE_IMMEDIATE) |
 				      EFFECT(BACKGROUND_OP)),
 	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_CONFIG),
+		.effect = CXL_CMD_EFFECT_NONE,
+	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_GET_DC_EXTENT_LIST),
+		.effect = CXL_CMD_EFFECT_NONE,
+	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_ADD_DC_RESPONSE),
+		.effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)),
+	},
+	{
+		.opcode = cpu_to_le16(CXL_MBOX_OP_RELEASE_DC),
+		.effect = cpu_to_le16(EFFECT(CONF_CHANGE_IMMEDIATE)),
+	},
 };
 
 /* See CXL 2.0 Table 181 Get Health Info Output Payload */
@@ -173,6 +190,8 @@ struct vendor_test_feat {
 	__le32 data;
 } __packed;
 
+#define NUM_MOCK_DC_REGIONS 2
+
 struct cxl_mockmem_data {
 	void *lsa;
 	void *fw;
@@ -191,6 +210,20 @@ struct cxl_mockmem_data {
 	unsigned long sanitize_timeout;
 	struct vendor_test_feat test_feat;
 	u8 shutdown_state;
+
+	struct cxl_dc_partition dc_partitions[NUM_MOCK_DC_REGIONS];
+	u32 dc_ext_generation;
+	struct mutex ext_lock;
+
+	/*
+	 * Extents are in 1 of 3 states
+	 * FM (sysfs added but not sent to the host yet)
+	 * sent (sent to the host but not accepted)
+	 * accepted (by the host)
+	 */
+	struct xarray dc_fm_extents;
+	struct xarray dc_sent_extents;
+	struct xarray dc_accepted_exts;
 };
 
 static struct mock_event_log *event_find_log(struct device *dev, int log_type)
@@ -607,6 +640,251 @@ static void cxl_mock_event_trigger(struct device *dev)
 	cxl_mem_get_event_records(mdata->mds, mes->ev_status);
 }
 
+struct cxl_extent_data {
+	u64 dpa_start;
+	u64 length;
+	u8 uuid[UUID_SIZE];
+	bool shared;
+};
+
+static int __devm_add_extent(struct device *dev, struct xarray *array,
+			     u64 start, u64 length, const char *uuid,
+			     bool shared)
+{
+	struct cxl_extent_data *extent;
+
+	extent = devm_kzalloc(dev, sizeof(*extent), GFP_KERNEL);
+	if (!extent)
+		return -ENOMEM;
+
+	extent->dpa_start = start;
+	extent->length = length;
+	memcpy(extent->uuid, uuid, min(sizeof(extent->uuid), strlen(uuid)));
+	extent->shared = shared;
+
+	if (xa_insert(array, start, extent, GFP_KERNEL)) {
+		devm_kfree(dev, extent);
+		dev_err(dev, "Failed xarry insert %#llx\n", start);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int devm_add_fm_extent(struct device *dev, u64 start, u64 length,
+			      const char *uuid, bool shared)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+
+	guard(mutex)(&mdata->ext_lock);
+	return __devm_add_extent(dev, &mdata->dc_fm_extents, start, length,
+				 uuid, shared);
+}
+
+/* It is known that ext and the new range are not equal */
+static struct cxl_extent_data *
+split_ext(struct device *dev, struct xarray *array,
+	  struct cxl_extent_data *ext, u64 start, u64 length)
+{
+	u64 new_start, new_length;
+
+	if (ext->dpa_start == start) {
+		new_start = start + length;
+		new_length = (ext->dpa_start + ext->length) - new_start;
+
+		if (__devm_add_extent(dev, array, new_start, new_length,
+				      ext->uuid, false))
+			return NULL;
+
+		ext = xa_erase(array, ext->dpa_start);
+		if (__devm_add_extent(dev, array, start, length, ext->uuid,
+				      false))
+			return NULL;
+
+		return xa_load(array, start);
+	}
+
+	/* ext->dpa_start != start */
+
+	if (__devm_add_extent(dev, array, start, length, ext->uuid, false))
+		return NULL;
+
+	new_start = ext->dpa_start;
+	new_length = start - ext->dpa_start;
+
+	ext = xa_erase(array, ext->dpa_start);
+	if (__devm_add_extent(dev, array, new_start, new_length, ext->uuid,
+			      false))
+		return NULL;
+
+	return xa_load(array, start);
+}
+
+/*
+ * Do not handle extents which are not inside a single extent sent to
+ * the host.
+ */
+static struct cxl_extent_data *
+find_create_ext(struct device *dev, struct xarray *array, u64 start, u64 length)
+{
+	struct cxl_extent_data *ext;
+	unsigned long index;
+
+	xa_for_each(array, index, ext) {
+		u64 end = start + length;
+
+		/* start < [ext) <= start */
+		if (start < ext->dpa_start ||
+		    (ext->dpa_start + ext->length) <= start)
+			continue;
+
+		if (end <= ext->dpa_start ||
+		    (ext->dpa_start + ext->length) < end) {
+			dev_err(dev, "Invalid range %#llx-%#llx\n", start,
+				end);
+			return NULL;
+		}
+
+		break;
+	}
+
+	if (!ext)
+		return NULL;
+
+	if (start == ext->dpa_start && length == ext->length)
+		return ext;
+
+	return split_ext(dev, array, ext, start, length);
+}
+
+static int dc_accept_extent(struct device *dev, u64 start, u64 length)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_extent_data *ext;
+
+	dev_dbg(dev, "Host accepting extent %#llx\n", start);
+	mdata->dc_ext_generation++;
+
+	lockdep_assert_held(&mdata->ext_lock);
+	ext = find_create_ext(dev, &mdata->dc_sent_extents, start, length);
+	if (!ext) {
+		dev_err(dev, "Extent %#llx-%#llx not found\n",
+			start, start + length);
+		return -ENOMEM;
+	}
+	ext = xa_erase(&mdata->dc_sent_extents, ext->dpa_start);
+	return xa_insert(&mdata->dc_accepted_exts, start, ext, GFP_KERNEL);
+}
+
+static void release_dc_ext(void *md)
+{
+	struct cxl_mockmem_data *mdata = md;
+
+	xa_destroy(&mdata->dc_fm_extents);
+	xa_destroy(&mdata->dc_sent_extents);
+	xa_destroy(&mdata->dc_accepted_exts);
+}
+
+/* Pretend to have some previous accepted extents */
+struct pre_ext_info {
+	u64 offset;
+	u64 length;
+} pre_ext_info[] = {
+	{
+		.offset = SZ_128M,
+		.length = SZ_64M,
+	},
+	{
+		.offset = SZ_256M,
+		.length = SZ_64M,
+	},
+};
+
+static int devm_add_sent_extent(struct device *dev, u64 start, u64 length,
+				const char *tag, bool shared)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+
+	lockdep_assert_held(&mdata->ext_lock);
+	return __devm_add_extent(dev, &mdata->dc_sent_extents, start, length,
+				 tag, shared);
+}
+
+static int inject_prev_extents(struct device *dev, u64 base_dpa)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	int rc;
+
+	dev_dbg(dev, "Adding %ld pre-extents for testing\n",
+		ARRAY_SIZE(pre_ext_info));
+
+	guard(mutex)(&mdata->ext_lock);
+	for (int i = 0; i < ARRAY_SIZE(pre_ext_info); i++) {
+		u64 ext_dpa = base_dpa + pre_ext_info[i].offset;
+		u64 ext_len = pre_ext_info[i].length;
+
+		dev_dbg(dev, "Adding pre-extent DPA:%#llx LEN:%#llx\n",
+			ext_dpa, ext_len);
+
+		rc = devm_add_sent_extent(dev, ext_dpa, ext_len, "", false);
+		if (rc) {
+			dev_err(dev, "Failed to add pre-extent DPA:%#llx LEN:%#llx; %d\n",
+				ext_dpa, ext_len, rc);
+			return rc;
+		}
+
+		rc = dc_accept_extent(dev, ext_dpa, ext_len);
+		if (rc)
+			return rc;
+	}
+	return 0;
+}
+
+static int cxl_mock_dc_partition_setup(struct device *dev)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	u64 base_dpa = BASE_DYNAMIC_CAP_DPA;
+	u32 dsmad_handle = 0xFADE;
+	u64 decode_length = SZ_512M;
+	u64 block_size = SZ_512;
+	u64 length = SZ_512M;
+	int rc;
+
+	mutex_init(&mdata->ext_lock);
+	xa_init(&mdata->dc_fm_extents);
+	xa_init(&mdata->dc_sent_extents);
+	xa_init(&mdata->dc_accepted_exts);
+
+	rc = devm_add_action_or_reset(dev, release_dc_ext, mdata);
+	if (rc)
+		return rc;
+
+	for (int i = 0; i < NUM_MOCK_DC_REGIONS; i++) {
+		struct cxl_dc_partition *part = &mdata->dc_partitions[i];
+
+		dev_dbg(dev, "Creating DC partition DC%d DPA:%#llx LEN:%#llx\n",
+			i, base_dpa, length);
+
+		part->base = cpu_to_le64(base_dpa);
+		part->decode_length = cpu_to_le64(decode_length /
+						  CXL_CAPACITY_MULTIPLIER);
+		part->length = cpu_to_le64(length);
+		part->block_size = cpu_to_le64(block_size);
+		part->dsmad_handle = cpu_to_le32(dsmad_handle);
+		dsmad_handle++;
+
+		rc = inject_prev_extents(dev, base_dpa);
+		if (rc) {
+			dev_err(dev, "Failed to add pre-extents for DC%d\n", i);
+			return rc;
+		}
+
+		base_dpa += decode_length;
+	}
+
+	return 0;
+}
+
 static int mock_gsl(struct cxl_mbox_cmd *cmd)
 {
 	if (cmd->size_out < sizeof(mock_gsl_payload))
@@ -1582,6 +1860,192 @@ static int mock_get_supported_features(struct cxl_mockmem_data *mdata,
 	return 0;
 }
 
+static int mock_get_dc_config(struct device *dev,
+			      struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_get_dc_config_in *dc_config = cmd->payload_in;
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	u8 partition_requested, partition_start_idx, partition_ret_cnt;
+	struct cxl_mbox_get_dc_config_out *resp;
+	int i;
+
+	partition_requested = min(dc_config->partition_count, NUM_MOCK_DC_REGIONS);
+
+	if (cmd->size_out < struct_size(resp, partition, partition_requested))
+		return -EINVAL;
+
+	memset(cmd->payload_out, 0, cmd->size_out);
+	resp = cmd->payload_out;
+
+	partition_start_idx = dc_config->start_partition_index;
+	partition_ret_cnt = 0;
+	for (i = 0; i < NUM_MOCK_DC_REGIONS; i++) {
+		if (i >= partition_start_idx) {
+			memcpy(&resp->partition[partition_ret_cnt],
+				&mdata->dc_partitions[i],
+				sizeof(resp->partition[partition_ret_cnt]));
+			partition_ret_cnt++;
+		}
+	}
+	resp->avail_partition_count = NUM_MOCK_DC_REGIONS;
+	resp->partitions_returned = i;
+
+	dev_dbg(dev, "Returning %d dc partitions\n", partition_ret_cnt);
+	return 0;
+}
+
+static int mock_get_dc_extent_list(struct device *dev,
+				   struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_get_extent_out *resp = cmd->payload_out;
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_mbox_get_extent_in *get = cmd->payload_in;
+	u32 total_avail = 0, total_ret = 0;
+	struct cxl_extent_data *ext;
+	u32 ext_count, start_idx;
+	unsigned long i;
+
+	ext_count = le32_to_cpu(get->extent_cnt);
+	start_idx = le32_to_cpu(get->start_extent_index);
+
+	memset(resp, 0, sizeof(*resp));
+
+	guard(mutex)(&mdata->ext_lock);
+	/*
+	 * Total available needs to be calculated and returned regardless of
+	 * how many can actually be returned.
+	 */
+	xa_for_each(&mdata->dc_accepted_exts, i, ext)
+		total_avail++;
+
+	if (start_idx > total_avail)
+		return -EINVAL;
+
+	xa_for_each(&mdata->dc_accepted_exts, i, ext) {
+		if (total_ret >= ext_count)
+			break;
+
+		if (total_ret >= start_idx) {
+			resp->extent[total_ret].start_dpa =
+						cpu_to_le64(ext->dpa_start);
+			resp->extent[total_ret].length =
+						cpu_to_le64(ext->length);
+			memcpy(&resp->extent[total_ret].uuid, ext->uuid,
+					sizeof(resp->extent[total_ret]));
+			total_ret++;
+		}
+	}
+
+	resp->returned_extent_count = cpu_to_le32(total_ret);
+	resp->total_extent_count = cpu_to_le32(total_avail);
+	resp->generation_num = cpu_to_le32(mdata->dc_ext_generation);
+
+	dev_dbg(dev, "Returning %d extents of %d total\n",
+		total_ret, total_avail);
+
+	return 0;
+}
+
+static void dc_clear_sent(struct device *dev)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_extent_data *ext;
+	unsigned long index;
+
+	lockdep_assert_held(&mdata->ext_lock);
+
+	/* Any extents not accepted must be cleared */
+	xa_for_each(&mdata->dc_sent_extents, index, ext) {
+		dev_dbg(dev, "Host rejected extent %#llx\n", ext->dpa_start);
+		xa_erase(&mdata->dc_sent_extents, ext->dpa_start);
+	}
+}
+
+static int mock_add_dc_response(struct device *dev,
+				struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_mbox_dc_response *req = cmd->payload_in;
+	u32 list_size = le32_to_cpu(req->extent_list_size);
+
+	guard(mutex)(&mdata->ext_lock);
+	for (int i = 0; i < list_size; i++) {
+		u64 start = le64_to_cpu(req->extent_list[i].dpa_start);
+		u64 length = le64_to_cpu(req->extent_list[i].length);
+		int rc;
+
+		rc = dc_accept_extent(dev, start, length);
+		if (rc)
+			return rc;
+	}
+
+	dc_clear_sent(dev);
+	return 0;
+}
+
+static void dc_delete_extent(struct device *dev, unsigned long long start,
+			     unsigned long long length)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	unsigned long long end = start + length;
+	struct cxl_extent_data *ext;
+	unsigned long index;
+
+	dev_dbg(dev, "Deleting extent at %#llx len:%#llx\n", start, length);
+
+	guard(mutex)(&mdata->ext_lock);
+	xa_for_each(&mdata->dc_fm_extents, index, ext) {
+		u64 extent_end = ext->dpa_start + ext->length;
+
+		/*
+		 * Any extent which 'touches' the released delete range will be
+		 * removed.
+		 */
+		if ((start <= ext->dpa_start && ext->dpa_start < end) ||
+		    (start <= extent_end && extent_end < end))
+			xa_erase(&mdata->dc_fm_extents, ext->dpa_start);
+	}
+
+	/*
+	 * If the extent was accepted let it be for the host to drop
+	 * later.
+	 */
+}
+
+static int release_accepted_extent(struct device *dev, u64 start, u64 length)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_extent_data *ext;
+
+	guard(mutex)(&mdata->ext_lock);
+	ext = find_create_ext(dev, &mdata->dc_accepted_exts, start, length);
+	if (!ext) {
+		dev_err(dev, "Extent %#llx not in accepted state\n", start);
+		return -EINVAL;
+	}
+	xa_erase(&mdata->dc_accepted_exts, ext->dpa_start);
+	mdata->dc_ext_generation++;
+
+	return 0;
+}
+
+static int mock_dc_release(struct device *dev,
+			   struct cxl_mbox_cmd *cmd)
+{
+	struct cxl_mbox_dc_response *req = cmd->payload_in;
+	u32 list_size = le32_to_cpu(req->extent_list_size);
+
+	for (int i = 0; i < list_size; i++) {
+		u64 start = le64_to_cpu(req->extent_list[i].dpa_start);
+		u64 length = le64_to_cpu(req->extent_list[i].length);
+
+		dev_dbg(dev, "Extent %#llx released by host\n", start);
+		release_accepted_extent(dev, start, length);
+	}
+
+	return 0;
+}
+
 static int cxl_mock_mbox_send(struct cxl_mailbox *cxl_mbox,
 			      struct cxl_mbox_cmd *cmd)
 {
@@ -1673,6 +2137,18 @@ static int cxl_mock_mbox_send(struct cxl_mailbox *cxl_mbox,
 	case CXL_MBOX_OP_GET_SUPPORTED_FEATURES:
 		rc = mock_get_supported_features(mdata, cmd);
 		break;
+	case CXL_MBOX_OP_GET_DC_CONFIG:
+		rc = mock_get_dc_config(dev, cmd);
+		break;
+	case CXL_MBOX_OP_GET_DC_EXTENT_LIST:
+		rc = mock_get_dc_extent_list(dev, cmd);
+		break;
+	case CXL_MBOX_OP_ADD_DC_RESPONSE:
+		rc = mock_add_dc_response(dev, cmd);
+		break;
+	case CXL_MBOX_OP_RELEASE_DC:
+		rc = mock_dc_release(dev, cmd);
+		break;
 	case CXL_MBOX_OP_GET_FEATURE:
 		rc = mock_get_feature(mdata, cmd);
 		break;
@@ -1758,6 +2234,10 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	dev_set_drvdata(dev, mdata);
 
+	rc = cxl_mock_dc_partition_setup(dev);
+	if (rc)
+		return rc;
+
 	mdata->lsa = vmalloc(LSA_SIZE);
 	if (!mdata->lsa)
 		return -ENOMEM;
@@ -1814,6 +2294,9 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
 	if (rc)
 		return rc;
 
+	if (cxl_dcd_supported(mds))
+		cxl_configure_dcd(mds, &range_info);
+
 	rc = cxl_dpa_setup(cxlds, &range_info);
 	if (rc)
 		return rc;
@@ -1921,11 +2404,281 @@ static ssize_t sanitize_timeout_store(struct device *dev,
 
 static DEVICE_ATTR_RW(sanitize_timeout);
 
+/* Return if the proposed extent would break the test code */
+static bool new_extent_valid(struct device *dev, size_t new_start,
+			     size_t new_len)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_extent_data *extent;
+	size_t new_end, i;
+
+	if (!new_len)
+		return false;
+
+	new_end = new_start + new_len;
+
+	dev_dbg(dev, "New extent %zx-%zx\n", new_start, new_end);
+
+	guard(mutex)(&mdata->ext_lock);
+	dev_dbg(dev, "Checking extents starts...\n");
+	xa_for_each(&mdata->dc_fm_extents, i, extent) {
+		if (extent->dpa_start == new_start)
+			return false;
+	}
+
+	dev_dbg(dev, "Checking sent extents starts...\n");
+	xa_for_each(&mdata->dc_sent_extents, i, extent) {
+		if (extent->dpa_start == new_start)
+			return false;
+	}
+
+	dev_dbg(dev, "Checking accepted extents starts...\n");
+	xa_for_each(&mdata->dc_accepted_exts, i, extent) {
+		if (extent->dpa_start == new_start)
+			return false;
+	}
+
+	return true;
+}
+
+struct cxl_test_dcd {
+	uuid_t id;
+	struct cxl_event_dcd rec;
+} __packed;
+
+struct cxl_test_dcd dcd_event_rec_template = {
+	.id = CXL_EVENT_DC_EVENT_UUID,
+	.rec = {
+		.hdr = {
+			.length = sizeof(struct cxl_test_dcd),
+		},
+	},
+};
+
+static int log_dc_event(struct cxl_mockmem_data *mdata, enum dc_event type,
+			u64 start, u64 length, const char *tag_str, bool more)
+{
+	struct device *dev = mdata->mds->cxlds.dev;
+	struct cxl_test_dcd *dcd_event;
+
+	dev_dbg(dev, "mock device log event %d\n", type);
+
+	dcd_event = devm_kmemdup(dev, &dcd_event_rec_template,
+				     sizeof(*dcd_event), GFP_KERNEL);
+	if (!dcd_event)
+		return -ENOMEM;
+
+	dcd_event->rec.flags = 0;
+	if (more)
+		dcd_event->rec.flags |= CXL_DCD_EVENT_MORE;
+	dcd_event->rec.event_type = type;
+	dcd_event->rec.extent.start_dpa = cpu_to_le64(start);
+	dcd_event->rec.extent.length = cpu_to_le64(length);
+	memcpy(dcd_event->rec.extent.uuid, tag_str,
+	       min(sizeof(dcd_event->rec.extent.uuid),
+		   strlen(tag_str)));
+
+	mes_add_event(mdata, CXL_EVENT_TYPE_DCD,
+		      (struct cxl_event_record_raw *)dcd_event);
+
+	/* Fake the irq */
+	cxl_mem_get_event_records(mdata->mds, CXLDEV_EVENT_STATUS_DCD);
+
+	return 0;
+}
+
+static void mark_extent_sent(struct device *dev, unsigned long long start)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	struct cxl_extent_data *ext;
+
+	guard(mutex)(&mdata->ext_lock);
+	ext = xa_erase(&mdata->dc_fm_extents, start);
+	if (xa_insert(&mdata->dc_sent_extents, ext->dpa_start, ext, GFP_KERNEL))
+		dev_err(dev, "Failed to mark extent %#llx sent\n", ext->dpa_start);
+}
+
+/*
+ * Format <start>:<length>:<tag>:<more_flag>
+ *
+ * start and length must be a multiple of the configured partition block size.
+ * Tag can be any string up to 16 bytes.
+ *
+ * Extents must be exclusive of other extents
+ *
+ * If the more flag is specified it is expected that an additional extent will
+ * be specified without the more flag to complete the test transaction with the
+ * host.
+ */
+static ssize_t __dc_inject_extent_store(struct device *dev,
+					struct device_attribute *attr,
+					const char *buf, size_t count,
+					bool shared)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	unsigned long long start, length, more;
+	char *len_str, *uuid_str, *more_str;
+	size_t buf_len = count;
+	int rc;
+
+	char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL);
+	if (!start_str)
+		return -ENOMEM;
+
+	len_str = strnchr(start_str, buf_len, ':');
+	if (!len_str) {
+		dev_err(dev, "Extent failed to find len_str: %s\n", start_str);
+		return -EINVAL;
+	}
+
+	*len_str = '\0';
+	len_str += 1;
+	buf_len -= strlen(start_str);
+
+	uuid_str = strnchr(len_str, buf_len, ':');
+	if (!uuid_str) {
+		dev_err(dev, "Extent failed to find uuid_str: %s\n", len_str);
+		return -EINVAL;
+	}
+	*uuid_str = '\0';
+	uuid_str += 1;
+
+	more_str = strnchr(uuid_str, buf_len, ':');
+	if (!more_str) {
+		dev_err(dev, "Extent failed to find more_str: %s\n", uuid_str);
+		return -EINVAL;
+	}
+	*more_str = '\0';
+	more_str += 1;
+
+	if (kstrtoull(start_str, 0, &start)) {
+		dev_err(dev, "Extent failed to parse start: %s\n", start_str);
+		return -EINVAL;
+	}
+
+	if (kstrtoull(len_str, 0, &length)) {
+		dev_err(dev, "Extent failed to parse length: %s\n", len_str);
+		return -EINVAL;
+	}
+
+	if (kstrtoull(more_str, 0, &more)) {
+		dev_err(dev, "Extent failed to parse more: %s\n", more_str);
+		return -EINVAL;
+	}
+
+	if (!new_extent_valid(dev, start, length))
+		return -EINVAL;
+
+	rc = devm_add_fm_extent(dev, start, length, uuid_str, shared);
+	if (rc) {
+		dev_err(dev, "Failed to add extent DPA:%#llx LEN:%#llx; %d\n",
+			start, length, rc);
+		return rc;
+	}
+
+	mark_extent_sent(dev, start);
+	rc = log_dc_event(mdata, DCD_ADD_CAPACITY, start, length, uuid_str, more);
+	if (rc) {
+		dev_err(dev, "Failed to add event %d\n", rc);
+		return rc;
+	}
+
+	return count;
+}
+
+static ssize_t dc_inject_extent_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	return __dc_inject_extent_store(dev, attr, buf, count, false);
+}
+static DEVICE_ATTR_WO(dc_inject_extent);
+
+static ssize_t dc_inject_shared_extent_store(struct device *dev,
+					     struct device_attribute *attr,
+					     const char *buf, size_t count)
+{
+	return __dc_inject_extent_store(dev, attr, buf, count, true);
+}
+static DEVICE_ATTR_WO(dc_inject_shared_extent);
+
+static ssize_t __dc_del_extent_store(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, size_t count,
+				     enum dc_event type)
+{
+	struct cxl_mockmem_data *mdata = dev_get_drvdata(dev);
+	unsigned long long start, length;
+	char *len_str;
+	int rc;
+
+	char *start_str __free(kfree) = kstrdup(buf, GFP_KERNEL);
+	if (!start_str)
+		return -ENOMEM;
+
+	len_str = strnchr(start_str, count, ':');
+	if (!len_str) {
+		dev_err(dev, "Failed to find len_str: %s\n", start_str);
+		return -EINVAL;
+	}
+	*len_str = '\0';
+	len_str += 1;
+
+	if (kstrtoull(start_str, 0, &start)) {
+		dev_err(dev, "Failed to parse start: %s\n", start_str);
+		return -EINVAL;
+	}
+
+	if (kstrtoull(len_str, 0, &length)) {
+		dev_err(dev, "Failed to parse length: %s\n", len_str);
+		return -EINVAL;
+	}
+
+	dc_delete_extent(dev, start, length);
+
+	if (type == DCD_FORCED_CAPACITY_RELEASE)
+		dev_dbg(dev, "Forcing delete of extent %#llx len:%#llx\n",
+			start, length);
+
+	rc = log_dc_event(mdata, type, start, length, "", false);
+	if (rc) {
+		dev_err(dev, "Failed to add event %d\n", rc);
+		return rc;
+	}
+
+	return count;
+}
+
+/*
+ * Format <start>:<length>
+ */
+static ssize_t dc_del_extent_store(struct device *dev,
+				   struct device_attribute *attr,
+				   const char *buf, size_t count)
+{
+	return __dc_del_extent_store(dev, attr, buf, count,
+				     DCD_RELEASE_CAPACITY);
+}
+static DEVICE_ATTR_WO(dc_del_extent);
+
+static ssize_t dc_force_del_extent_store(struct device *dev,
+					 struct device_attribute *attr,
+					 const char *buf, size_t count)
+{
+	return __dc_del_extent_store(dev, attr, buf, count,
+				     DCD_FORCED_CAPACITY_RELEASE);
+}
+static DEVICE_ATTR_WO(dc_force_del_extent);
+
 static struct attribute *cxl_mock_mem_attrs[] = {
 	&dev_attr_security_lock.attr,
 	&dev_attr_event_trigger.attr,
 	&dev_attr_fw_buf_checksum.attr,
 	&dev_attr_sanitize_timeout.attr,
+	&dev_attr_dc_inject_extent.attr,
+	&dev_attr_dc_inject_shared_extent.attr,
+	&dev_attr_dc_del_extent.attr,
+	&dev_attr_dc_force_del_extent.attr,
 	NULL
 };
 ATTRIBUTE_GROUPS(cxl_mock_mem);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 20/20] dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (18 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 19/20] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
@ 2026-04-11  1:23 ` Anisa Su
  2026-04-11  5:05 ` [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Gregory Price
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-11  1:23 UTC (permalink / raw)
  To: linux-cxl
  Cc: john, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Anisa Su

Change DC region driver type from DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE
so it can be bound to fsdev driver

Signed-off-by: Anisa Su <anisa.su@samsung.com>
---
 drivers/dax/bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2ada38fa7dca..a9b25046d845 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -71,7 +71,8 @@ static int dax_match_type(const struct dax_device_driver *dax_drv, struct device
 	enum dax_driver_type type = DAXDRV_DEVICE_TYPE;
 	struct dev_dax *dev_dax = to_dev_dax(dev);
 
-	if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM)
+	if (dev_dax->region->res.flags & IORESOURCE_DAX_KMEM &&
+	    !(dev_dax->region->res.flags & IORESOURCE_DAX_SPARSE_CAP))
 		type = DAXDRV_KMEM_TYPE;
 
 	if (dax_drv->type == type)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (19 preceding siblings ...)
  2026-04-11  1:23 ` [PATCH 20/20] dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE Anisa Su
@ 2026-04-11  5:05 ` Gregory Price
  2026-04-21 18:48   ` Anisa Su
  2026-04-21 15:30 ` John Groves
  2026-04-21 21:02 ` Alison Schofield
  22 siblings, 1 reply; 30+ messages in thread
From: Gregory Price @ 2026-04-11  5:05 UTC (permalink / raw)
  To: Anisa Su
  Cc: linux-cxl, john, dave.jiang, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Anisa Su

On Fri, Apr 10, 2026 at 06:22:55PM -0700, Anisa Su wrote:
> This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> support, which are the requirements that I've understood from the community
> meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> greatly appreciated to let me know if this is on the right track? Or totally off
> the mark...
> 
> Everything is the same as before except:
> - extents must have tags (uuids)
> - 1 tag per region
> - regions must be contiguous (no more sparse regions)
> 
> To achieve this, the main thing is to change the relationship between
> cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> comprised of 1+ contiguous device extents with the same tag. Contiguity
> is enforced by sorting device extents by DPA order.

Are you saying they need to be contiguous in DPA?  I don't think that's
what we want - in fact that's actually somewhat defeats the entire point
of DCD being allocator-like.

We don't much care what DPA space the extents take up so long as the
total size of the set of extents matches the size of the HPA region.

Sorting by DPA at least gets you extent-ordering for free as a
pre-agreed upon mechanism, so that's fine.

Basically: we don't want sparse regions where the HPA space has holes
(i.e. an extent hasn't been sent by the device to back the hole yet),
but we don't really care what DPA actually backs those HPA regions.

Unless my take on how DCD "should" work is wildly different from the
general understand.

~Gregory

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (20 preceding siblings ...)
  2026-04-11  5:05 ` [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Gregory Price
@ 2026-04-21 15:30 ` John Groves
  2026-04-21 17:42   ` Anisa Su
  2026-04-22  3:14   ` John Groves
  2026-04-21 21:02 ` Alison Schofield
  22 siblings, 2 replies; 30+ messages in thread
From: John Groves @ 2026-04-21 15:30 UTC (permalink / raw)
  To: Anisa Su
  Cc: linux-cxl, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams, Anisa Su

On 26/04/10 06:22PM, Anisa Su wrote:
> This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> support, which are the requirements that I've understood from the community
> meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> greatly appreciated to let me know if this is on the right track? Or totally off
> the mark...
> 
> Everything is the same as before except:
> - extents must have tags (uuids)
> - 1 tag per region
> - regions must be contiguous (no more sparse regions)
> 
> To achieve this, the main thing is to change the relationship between
> cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> comprised of 1+ contiguous device extents with the same tag. Contiguity
> is enforced by sorting device extents by DPA order. They're re-sorted by the
> original order in which they were sent for the response, which is required by
> the spec.
> 
> Once valid extents have been collected, it's passed as 1 contiguous capacity
> to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> cannot be added to again, unless all extents are released.
> 
> For release: upon receiving a release event record, if the extent is within the
> bounds of any cxl_region, and it has the correct tag, then all extents in the
> region are released, so the "More" flag is still ignored. Not sure if this is the
> right way to do it but it was the simplest.
> 
> The changes to the DAX layer remain untouched, as all of this extra validation is done
> in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> there was no need to add anything there.
> 
> Most of the series remains unchanged as I've tried not to make too many big changes
> right off the bat. Only the following commits were modified:
> - cxl/extent: Process dynamic partition events and realize region extents
> - dax/region: Create resources on DAX regions
> - cxl/region: Read existing extents on region creation
> 
> I've tacked on 1 commit at the end to change the driver type of DC regions from
> DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.
> 
> Also, I've documented with more detail in the commit messages of the commits
> that were modified on what exactly was changed, so I hope that's clear.
> 
> ================================================================================
> Git History
> 
> This series is based on cxl-next, with base commit:
> 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> 
> GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> 
> It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> 
> I've tested the current series without famfs as well as the series applied on
> famfs-v9 with famfs.
> ================================================================================
> Testing:
> 
> This patchset was tested with Ali's QEMU patchset adding tag support:
> https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t
> 
> Details:
> Topology: '-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
>      -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
>      -device usb-ehci,id=ehci \
>      -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
>      -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
>      -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
>      -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
>      -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'
> 
> 1. Start VM (12GB)
> 2. Issue QMP to add tagged backend (8GB):
> { "execute": "qmp_capabilities" }
> {
>     "execute": "object-add",
>     "arguments": {
>         "qom-type": "memory-backend-ram",
>         "id": "tm0",
>         "size": 8589934592,
>         "share": true,
>         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a"
>     }
> }
> 3. Create region on the VM: cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_a
> 4. Issue QMP to add an 8GB extent:
> { "execute": "qmp_capabilities" }
> {
>     "execute": "cxl-add-dynamic-capacity",
>     "arguments": {
>         "path": "/machine/peripheral/cxl-dcd0",
>         "host-id": 0,
>         "selection-policy": "prescriptive",
>         "region": 0,
>         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a",
>         "extents": [
>             {
>                 "offset": 0,
>                 "len": 8589934592
>             }
>         ]
>     }
> }
> 5. Verify with sysfs:
> root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/offset
> 0x0
> root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/length
> 0x200000000
> root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/uuid
> 5be13bce-ae34-4a77-b6c3-16df975fcf1a
> 
> 6. daxctl create-device -r region0
> [
>   {
>     "chardev":"dax0.1",
>     "size":8589934592,
>     "target_node":1,
>     "align":2097152,
>     "mode":"devdax"
>   }
> ]
> created 1 device
> 
> Currently, QEMU only supports sending 1 extent in an add/release request, which
> limits what I can test. However, I was able to verify that once extent(s) have
> been added to a region, it can't be added to again (size cannot be increased).
> 
> Up to this point is what I test with this patchset. Then below is the additional
> famfs tests for the version applied on famfs.
> ================================================================================
> 7. Install famfs userspace tool:
> https://github.com/cxl-micron-reskit/famfs
> 
> 8.mkfs.famfs --v /dev/dax0.1 output:
> devsize: 8589934592
> Famfs Superblock:
>   Filesystem UUID:   c33b4525-a2c7-4d64-9204-e8ed273b4ffb
>   Device UUID:       ae887e6b-f886-45f9-bc55-f0696f3cd91d
>   System UUID:       da314140-12e7-45c2-98b2-753d3bfe4f46
>   role of this node: Owner
>   alloc_unit:        0x200000
>   OMF major version: 2
>   OMF minor version: 1
>   sizeof superblock: 200
>   log size (bytes):  8388608
>   primary: /dev/dax0.1   8589934592
> 
> Log stats:
>   # of log entries in use: 0 of 15420
>   Log size in use:          48
>   Log size (total bytes)    8388608
>   No allocation errors found
> 
> Capacity:
>   Device capacity:        8.00G
>   Bitmap capacity:        8.00G
>   Sum of file sizes:      0.00G
>   Allocated space:        0.01G
>   Free space:             7.99G
>   Space amplification:     inf
>   Percent used:            0.1%
> 
> Famfs log:
>   0 of 15420 entries used
>   0 bad log entries detected
>   0 files
>   0 directories
> 
> 9. famfs smoke tests also succeed. The smoke tests include
> some fio tests, which run some simulated workloads
> 
> :== Test Timing Summary
> :==-------------------------------------------------------------------
> :==  prepare              0:10
> :==  test0                0:07
> :==  test_shadow_yaml     0:04
> :==  test1                0:22
> :==  test2                0:12
> :==  test3                0:03
> :==  test4                0:10
> :==  test_errors          0:01
> :==  stripe_test          0:58
> :==  test_pcq             1:26
> :==  test_fio             0:34
> :==-------------------------------------------------------------------
> :==  TOTAL                4:27
> :==-------------------------------------------------------------------
> :==run_smoke completed successfully (Thu Apr  9 10:33:28 PM UTC 2026)
> 
> Anisa Su (1):
>   dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE
> 
> Ira Weiny (19):
>   cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
>   cxl/mem: Read dynamic capacity configuration from the device
>   cxl/cdat: Gather DSMAS data for DCD partitions
>   cxl/core: Enforce partition order/simplify partition calls
>   cxl/mem: Expose dynamic ram A partition in sysfs
>   cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
>   cxl/region: Add sparse DAX region support
>   cxl/events: Split event msgnum configuration from irq setup
>   cxl/pci: Factor out interrupt policy check
>   cxl/mem: Configure dynamic capacity interrupts
>   cxl/core: Return endpoint decoder information from region search
>   cxl/extent: Process dynamic partition events and realize region
>     extents
>   cxl/region/extent: Expose region extent information in sysfs
>   dax/bus: Factor out dev dax resize logic
>   dax/region: Create resources on DAX regions
>   cxl/region: Read existing extents on region creation
>   cxl/mem: Trace Dynamic capacity Event Record
>   tools/testing/cxl: Make event logs dynamic
>   tools/testing/cxl: Add DC Regions to mock mem data
> 
>  Documentation/ABI/testing/sysfs-bus-cxl |  100 ++-
>  drivers/cxl/core/Makefile               |    2 +-
>  drivers/cxl/core/cdat.c                 |   11 +
>  drivers/cxl/core/core.h                 |   47 +-
>  drivers/cxl/core/extent.c               |  471 +++++++++++
>  drivers/cxl/core/hdm.c                  |   13 +-
>  drivers/cxl/core/mbox.c                 |  770 ++++++++++++++++-
>  drivers/cxl/core/memdev.c               |   87 +-
>  drivers/cxl/core/port.c                 |    5 +
>  drivers/cxl/core/region.c               |   43 +-
>  drivers/cxl/core/region_dax.c           |    6 +
>  drivers/cxl/core/trace.h                |   65 ++
>  drivers/cxl/cxl.h                       |   60 +-
>  drivers/cxl/cxlmem.h                    |  124 ++-
>  drivers/cxl/mem.c                       |    2 +-
>  drivers/cxl/pci.c                       |  115 ++-
>  drivers/dax/bus.c                       |  360 ++++++--
>  drivers/dax/bus.h                       |    4 +-
>  drivers/dax/cxl.c                       |   71 +-
>  drivers/dax/dax-private.h               |   40 +
>  drivers/dax/hmem/hmem.c                 |    2 +-
>  drivers/dax/pmem.c                      |    2 +-
>  include/cxl/cxl.h                       |    6 +
>  include/cxl/event.h                     |   39 +
>  include/linux/ioport.h                  |    3 +
>  tools/testing/cxl/Kbuild                |    5 +-
>  tools/testing/cxl/test/mem.c            | 1018 ++++++++++++++++++++---
>  27 files changed, 3210 insertions(+), 261 deletions(-)
>  create mode 100644 drivers/cxl/core/extent.c
> 
> -- 
> 2.43.0
> 

what commit or branch does this apply on top of? Not v7.0...

Thanks!
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-21 15:30 ` John Groves
@ 2026-04-21 17:42   ` Anisa Su
  2026-04-22  3:14   ` John Groves
  1 sibling, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-21 17:42 UTC (permalink / raw)
  To: John Groves
  Cc: Anisa Su, linux-cxl, dave.jiang, gourry, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

On Tue, Apr 21, 2026 at 10:30:33AM -0500, John Groves wrote:
> On 26/04/10 06:22PM, Anisa Su wrote:
> > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > support, which are the requirements that I've understood from the community
> > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > greatly appreciated to let me know if this is on the right track? Or totally off
> > the mark...
> > 
> > Everything is the same as before except:
> > - extents must have tags (uuids)
> > - 1 tag per region
> > - regions must be contiguous (no more sparse regions)
> > 
> > To achieve this, the main thing is to change the relationship between
> > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > is enforced by sorting device extents by DPA order. They're re-sorted by the
> > original order in which they were sent for the response, which is required by
> > the spec.
> > 
> > Once valid extents have been collected, it's passed as 1 contiguous capacity
> > to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> > cannot be added to again, unless all extents are released.
> > 
> > For release: upon receiving a release event record, if the extent is within the
> > bounds of any cxl_region, and it has the correct tag, then all extents in the
> > region are released, so the "More" flag is still ignored. Not sure if this is the
> > right way to do it but it was the simplest.
> > 
> > The changes to the DAX layer remain untouched, as all of this extra validation is done
> > in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> > there was no need to add anything there.
> > 
> > Most of the series remains unchanged as I've tried not to make too many big changes
> > right off the bat. Only the following commits were modified:
> > - cxl/extent: Process dynamic partition events and realize region extents
> > - dax/region: Create resources on DAX regions
> > - cxl/region: Read existing extents on region creation
> > 
> > i've tacked on 1 commit at the end to change the driver type of dc regions from
> > daxdrv_kmem_type to daxdrv_device_type so it can be bound to the new fsdev driver.
> > 
> > also, i've documented with more detail in the commit messages of the commits
> > that were modified on what exactly was changed, so i hope that's clear.
> > 
> > ================================================================================
> > git history
> > 
> > this series is based on cxl-next, with base commit:
> > 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> > + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> > 
> > GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> > 
> > It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> > onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> > - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> > 
> > I've tested the current series without famfs as well as the series applied on
> > famfs-v9 with famfs.
> > ================================================================================
> > Testing:
> > 
> > This patchset was tested with Ali's QEMU patchset adding tag support:
> > https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t
> > 
> > Details:
> > Topology: '-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
> >      -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
> >      -device usb-ehci,id=ehci \
> >      -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
> >      -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
> >      -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
> >      -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
> >      -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'
> > 
> > 1. Start VM (12GB)
> > 2. Issue QMP to add tagged backend (8GB):
> > { "execute": "qmp_capabilities" }
> > {
> >     "execute": "object-add",
> >     "arguments": {
> >         "qom-type": "memory-backend-ram",
> >         "id": "tm0",
> >         "size": 8589934592,
> >         "share": true,
> >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a"
> >     }
> > }
> > 3. Create region on the VM: cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_a
> > 4. Issue QMP to add an 8GB extent:
> > { "execute": "qmp_capabilities" }
> > {
> >     "execute": "cxl-add-dynamic-capacity",
> >     "arguments": {
> >         "path": "/machine/peripheral/cxl-dcd0",
> >         "host-id": 0,
> >         "selection-policy": "prescriptive",
> >         "region": 0,
> >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a",
> >         "extents": [
> >             {
> >                 "offset": 0,
> >                 "len": 8589934592
> >             }
> >         ]
> >     }
> > }
> > 5. Verify with sysfs:
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/offset
> > 0x0
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/length
> > 0x200000000
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/uuid
> > 5be13bce-ae34-4a77-b6c3-16df975fcf1a
> > 
> > 6. daxctl create-device -r region0
> > [
> >   {
> >     "chardev":"dax0.1",
> >     "size":8589934592,
> >     "target_node":1,
> >     "align":2097152,
> >     "mode":"devdax"
> >   }
> > ]
> > created 1 device
> > 
> > Currently, QEMU only supports sending 1 extent in an add/release request, which
> > limits what I can test. However, I was able to verify that once extent(s) have
> > been added to a region, it can't be added to again (size cannot be increased).
> > 
> > Up to this point is what I test with this patchset. Then below is the additional
> > famfs tests for the version applied on famfs.
> > ================================================================================
> > 7. Install famfs userspace tool:
> > https://github.com/cxl-micron-reskit/famfs
> > 
> > 8.mkfs.famfs --v /dev/dax0.1 output:
> > devsize: 8589934592
> > Famfs Superblock:
> >   Filesystem UUID:   c33b4525-a2c7-4d64-9204-e8ed273b4ffb
> >   Device UUID:       ae887e6b-f886-45f9-bc55-f0696f3cd91d
> >   System UUID:       da314140-12e7-45c2-98b2-753d3bfe4f46
> >   role of this node: Owner
> >   alloc_unit:        0x200000
> >   OMF major version: 2
> >   OMF minor version: 1
> >   sizeof superblock: 200
> >   log size (bytes):  8388608
> >   primary: /dev/dax0.1   8589934592
> > 
> > Log stats:
> >   # of log entries in use: 0 of 15420
> >   Log size in use:          48
> >   Log size (total bytes)    8388608
> >   No allocation errors found
> > 
> > Capacity:
> >   Device capacity:        8.00G
> >   Bitmap capacity:        8.00G
> >   Sum of file sizes:      0.00G
> >   Allocated space:        0.01G
> >   Free space:             7.99G
> >   Space amplification:     inf
> >   Percent used:            0.1%
> > 
> > Famfs log:
> >   0 of 15420 entries used
> >   0 bad log entries detected
> >   0 files
> >   0 directories
> > 
> > 9. famfs smoke tests also succeed. The smoke tests include
> > some fio tests, which run some simulated workloads
> > 
> > :== Test Timing Summary
> > :==-------------------------------------------------------------------
> > :==  prepare              0:10
> > :==  test0                0:07
> > :==  test_shadow_yaml     0:04
> > :==  test1                0:22
> > :==  test2                0:12
> > :==  test3                0:03
> > :==  test4                0:10
> > :==  test_errors          0:01
> > :==  stripe_test          0:58
> > :==  test_pcq             1:26
> > :==  test_fio             0:34
> > :==-------------------------------------------------------------------
> > :==  TOTAL                4:27
> > :==-------------------------------------------------------------------
> > :==run_smoke completed successfully (Thu Apr  9 10:33:28 PM UTC 2026)
> > 
> > Anisa Su (1):
> >   dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE
> > 
> > Ira Weiny (19):
> >   cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> >   cxl/mem: Read dynamic capacity configuration from the device
> >   cxl/cdat: Gather DSMAS data for DCD partitions
> >   cxl/core: Enforce partition order/simplify partition calls
> >   cxl/mem: Expose dynamic ram A partition in sysfs
> >   cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
> >   cxl/region: Add sparse DAX region support
> >   cxl/events: Split event msgnum configuration from irq setup
> >   cxl/pci: Factor out interrupt policy check
> >   cxl/mem: Configure dynamic capacity interrupts
> >   cxl/core: Return endpoint decoder information from region search
> >   cxl/extent: Process dynamic partition events and realize region
> >     extents
> >   cxl/region/extent: Expose region extent information in sysfs
> >   dax/bus: Factor out dev dax resize logic
> >   dax/region: Create resources on DAX regions
> >   cxl/region: Read existing extents on region creation
> >   cxl/mem: Trace Dynamic capacity Event Record
> >   tools/testing/cxl: Make event logs dynamic
> >   tools/testing/cxl: Add DC Regions to mock mem data
> > 
> >  Documentation/ABI/testing/sysfs-bus-cxl |  100 ++-
> >  drivers/cxl/core/Makefile               |    2 +-
> >  drivers/cxl/core/cdat.c                 |   11 +
> >  drivers/cxl/core/core.h                 |   47 +-
> >  drivers/cxl/core/extent.c               |  471 +++++++++++
> >  drivers/cxl/core/hdm.c                  |   13 +-
> >  drivers/cxl/core/mbox.c                 |  770 ++++++++++++++++-
> >  drivers/cxl/core/memdev.c               |   87 +-
> >  drivers/cxl/core/port.c                 |    5 +
> >  drivers/cxl/core/region.c               |   43 +-
> >  drivers/cxl/core/region_dax.c           |    6 +
> >  drivers/cxl/core/trace.h                |   65 ++
> >  drivers/cxl/cxl.h                       |   60 +-
> >  drivers/cxl/cxlmem.h                    |  124 ++-
> >  drivers/cxl/mem.c                       |    2 +-
> >  drivers/cxl/pci.c                       |  115 ++-
> >  drivers/dax/bus.c                       |  360 ++++++--
> >  drivers/dax/bus.h                       |    4 +-
> >  drivers/dax/cxl.c                       |   71 +-
> >  drivers/dax/dax-private.h               |   40 +
> >  drivers/dax/hmem/hmem.c                 |    2 +-
> >  drivers/dax/pmem.c                      |    2 +-
> >  include/cxl/cxl.h                       |    6 +
> >  include/cxl/event.h                     |   39 +
> >  include/linux/ioport.h                  |    3 +
> >  tools/testing/cxl/Kbuild                |    5 +-
> >  tools/testing/cxl/test/mem.c            | 1018 ++++++++++++++++++++---
> >  27 files changed, 3210 insertions(+), 261 deletions(-)
> >  create mode 100644 drivers/cxl/core/extent.c
> > 
> > -- 
> > 2.43.0
> > 
> 
> what commit or branch does this apply on top of? Not v7.0...
> 
This series was based on cxl-for-next: 3939dba00f98 Merge branch 'for-7.1/cxl-misc'
But I was dumb while resolving the merge conflicts when I rebased and dropped
1 loc which I need to add back in :( sorry...

But I also have a version of this that's based on famfs-v9, which I used to run
the famfs smoke tests. It's on github:
https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd

or I can send you the patches separately?

- Anisa
> Thanks!
> John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-11  5:05 ` [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Gregory Price
@ 2026-04-21 18:48   ` Anisa Su
  2026-04-23 20:43     ` Ira Weiny
  0 siblings, 1 reply; 30+ messages in thread
From: Anisa Su @ 2026-04-21 18:48 UTC (permalink / raw)
  To: Gregory Price
  Cc: Anisa Su, linux-cxl, john, dave.jiang, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

On Sat, Apr 11, 2026 at 01:05:07AM -0400, Gregory Price wrote:
> On Fri, Apr 10, 2026 at 06:22:55PM -0700, Anisa Su wrote:
> > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > support, which are the requirements that I've understood from the community
> > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > greatly appreciated to let me know if this is on the right track? Or totally off
> > the mark...
> > 
> > Everything is the same as before except:
> > - extents must have tags (uuids)
> > - 1 tag per region
> > - regions must be contiguous (no more sparse regions)
> > 
> > To achieve this, the main thing is to change the relationship between
> > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > is enforced by sorting device extents by DPA order.
> 
> Are you saying they need to be contiguous in DPA?  I don't think that's
> what we want - in fact that's actually somewhat defeats the entire point
> of DCD being allocator-like.
> 
> We don't much care what DPA space the extents take up so long as the
> total size of the set of extents matches the size of the HPA region.
> 
> Sorting by DPA at least gets you extent-ordering for free as a
> pre-agreed upon mechanism, so that's fine.
> 
> Basically: we don't want sparse regions where the HPA space has holes
> (i.e. an extent hasn't been sent by the device to back the hole yet),
> but we don't really care what DPA actually backs those HPA regions.
> 
> Unless my take on how DCD "should" work is wildly different from the
> general understand.
> 
> ~Gregory

Super belated follow up to document some discussion from last Monday's DAX call:

Instead of enforcing contiguous DPA, what we mean by not supporting sparse
regions means that each DAX *device* needs to be fully backed by extents?

- The CXL region/DAX region itself can be made of multiple DAX devices, and
  there can be HPA gaps between the devices, so the CXL region can be "sparse"
- But each DAX device must be fully backed (non-sparse)

We potentially want to do something like "daxctl prepare-device daxX.Y" to set
up an "empty" DAX device? Then as extents come in, make sure they cover the full
range of the device?
    - Add some kind of timeout to receive extents that back the device
    - If not fully backed within timeout, release all and destroy device

That's kind of my understanding after letting it marinate in my brain... pls
correct if that's wrong

One quick question I currently have is: in this scenario, we expect the orchestrator to
coordinate the "daxctl prepare-device daxX.Y"? Is that correct?

Thanks,
Anisa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
                   ` (21 preceding siblings ...)
  2026-04-21 15:30 ` John Groves
@ 2026-04-21 21:02 ` Alison Schofield
  2026-04-23  1:20   ` Anisa Su
  22 siblings, 1 reply; 30+ messages in thread
From: Alison Schofield @ 2026-04-21 21:02 UTC (permalink / raw)
  To: Anisa Su
  Cc: linux-cxl, john, dave.jiang, gourry, dave, jonathan.cameron,
	ira.weiny, dan.j.williams, Anisa Su

On Fri, Apr 10, 2026 at 06:22:55PM -0700, Anisa Su wrote:
> This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> support, which are the requirements that I've understood from the community
> meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> greatly appreciated to let me know if this is on the right track? Or totally off
> the mark...
> 
> Everything is the same as before except:
> - extents must have tags (uuids)
> - 1 tag per region
> - regions must be contiguous (no more sparse regions)
> 
> To achieve this, the main thing is to change the relationship between
> cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> comprised of 1+ contiguous device extents with the same tag. Contiguity
> is enforced by sorting device extents by DPA order. They're re-sorted by the
> original order in which they were sent for the response, which is required by
> the spec.
> 
> Once valid extents have been collected, it's passed as 1 contiguous capacity
> to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> cannot be added to again, unless all extents are released.
> 
> For release: upon receiving a release event record, if the extent is within the
> bounds of any cxl_region, and it has the correct tag, then all extents in the
> region are released, so the "More" flag is still ignored. Not sure if this is the
> right way to do it but it was the simplest.
> 
> The changes to the DAX layer remain untouched, as all of this extra validation is done
> in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> there was no need to add anything there.
> 
> Most of the series remains unchanged as I've tried not to make too many big changes
> right off the bat. Only the following commits were modified:
> - cxl/extent: Process dynamic partition events and realize region extents
> - dax/region: Create resources on DAX regions
> - cxl/region: Read existing extents on region creation
> 
> I've tacked on 1 commit at the end to change the driver type of DC regions from
> DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.
> 
> Also, I've documented with more detail in the commit messages of the commits
> that were modified on what exactly was changed, so I hope that's clear.
> 
> ================================================================================
> Git History
> 
> This series is based on cxl-next, with base commit:
> 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> 
> GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> 
> It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> 
> I've tested the current series without famfs as well as the series applied on
> famfs-v9 with famfs.
> ================================================================================
> Testing:

Where's the CXl unit testing...

big skip


>   tools/testing/cxl: Make event logs dynamic
>   tools/testing/cxl: Add DC Regions to mock mem data

Hi Anisa,

I'm inserting here and referencing the last 2 patches above
that add cxl/test support for DCD testing.

Let's use it! We'd really like to see the ndctl patches sync'd up
with this kernel set. That would give us a CXL Unit test 'cxl-dcd.sh'
to run on each revision. 

This is one link, perhaps the last to the ndctl set, but I think you
may have posted something long ago too:
https://lore.kernel.org/nvdimm/20250413-dcd-region2-v5-0-fbd753a2e0e8@intel.com/

-- Alison


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-21 15:30 ` John Groves
  2026-04-21 17:42   ` Anisa Su
@ 2026-04-22  3:14   ` John Groves
  2026-04-23  1:24     ` Anisa Su
  1 sibling, 1 reply; 30+ messages in thread
From: John Groves @ 2026-04-22  3:14 UTC (permalink / raw)
  To: Anisa Su
  Cc: linux-cxl@vger.kernel.org, Dave Jiang, Gregory Price, dave,
	Jonathan Cameron, Alison Schofield, Ira Weiny, Dan Williams,
	Anisa Su, Aravind Ramesh, Ajay Joshi, dev.srinivasulu



On Tue, Apr 21, 2026, at 10:30 AM, John Groves wrote:
> On 26/04/10 06:22PM, Anisa Su wrote:
> > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > support, which are the requirements that I've understood from the community
> > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > greatly appreciated to let me know if this is on the right track? Or totally off
> > the mark...
> > 
> > Everything is the same as before except:
> > - extents must have tags (uuids)
> > - 1 tag per region
> > - regions must be contiguous (no more sparse regions)
> > 
> > To achieve this, the main thing is to change the relationship between
> > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > is enforced by sorting device extents by DPA order. They're re-sorted by the
> > original order in which they were sent for the response, which is required by
> > the spec.
> > 
> > Once valid extents have been collected, it's passed as 1 contiguous capacity
> > to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> > cannot be added to again, unless all extents are released.
> > 
> > For release: upon receiving a release event record, if the extent is within the
> > bounds of any cxl_region, and it has the correct tag, then all extents in the
> > region are released, so the "More" flag is still ignored. Not sure if this is the
> > right way to do it but it was the simplest.
> > 
> > The changes to the DAX layer remain untouched, as all of this extra validation is done
> > in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> > there was no need to add anything there.
> > 
> > Most of the series remains unchanged as I've tried not to make too many big changes
> > right off the bat. Only the following commits were modified:
> > - cxl/extent: Process dynamic partition events and realize region extents
> > - dax/region: Create resources on DAX regions
> > - cxl/region: Read existing extents on region creation
> > 
> > I've tacked on 1 commit at the end to change the driver type of DC regions from
> > DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.
> > 
> > Also, I've documented with more detail in the commit messages of the commits
> > that were modified on what exactly was changed, so I hope that's clear.
> > 
> > ================================================================================
> > Git History
> > 
> > This series is based on cxl-next, with base commit:
> > 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> > + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> > 
> > GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> > 
> > It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> > onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> > - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> > 
> > I've tested the current series without famfs as well as the series applied on
> > famfs-v9 with famfs.
> > ================================================================================
> > Testing:
> > 
> > This patchset was tested with Ali's QEMU patchset adding tag support:
> > https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t
> > 
> > Details:
> > Topology: '-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
> >      -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
> >      -device usb-ehci,id=ehci \
> >      -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
> >      -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
> >      -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
> >      -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
> >      -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'
> > 
> > 1. Start VM (12GB)
> > 2. Issue QMP to add tagged backend (8GB):
> > { "execute": "qmp_capabilities" }
> > {
> >     "execute": "object-add",
> >     "arguments": {
> >         "qom-type": "memory-backend-ram",
> >         "id": "tm0",
> >         "size": 8589934592,
> >         "share": true,
> >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a"
> >     }
> > }
> > 3. Create region on the VM: cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_a
> > 4. Issue QMP to add an 8GB extent:
> > { "execute": "qmp_capabilities" }
> > {
> >     "execute": "cxl-add-dynamic-capacity",
> >     "arguments": {
> >         "path": "/machine/peripheral/cxl-dcd0",
> >         "host-id": 0,
> >         "selection-policy": "prescriptive",
> >         "region": 0,
> >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a",
> >         "extents": [
> >             {
> >                 "offset": 0,
> >                 "len": 8589934592
> >             }
> >         ]
> >     }
> > }
> > 5. Verify with sysfs:
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/offset
> > 0x0
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/length
> > 0x200000000
> > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/uuid
> > 5be13bce-ae34-4a77-b6c3-16df975fcf1a
> > 
> > 6. daxctl create-device -r region0
> > [
> >   {
> >     "chardev":"dax0.1",
> >     "size":8589934592,
> >     "target_node":1,
> >     "align":2097152,
> >     "mode":"devdax"
> >   }
> > ]
> > created 1 device
> > 
> > Currently, QEMU only supports sending 1 extent in an add/release request, which
> > limits what I can test. However, I was able to verify that once extent(s) have
> > been added to a region, it can't be added to again (size cannot be increased).
> > 
> > Up to this point is what I test with this patchset. Then below is the additional
> > famfs tests for the version applied on famfs.
> > ================================================================================
> > 7. Install famfs userspace tool:
> > https://github.com/cxl-micron-reskit/famfs
> > 
> > 8.mkfs.famfs --v /dev/dax0.1 output:
> > devsize: 8589934592
> > Famfs Superblock:
> >   Filesystem UUID:   c33b4525-a2c7-4d64-9204-e8ed273b4ffb
> >   Device UUID:       ae887e6b-f886-45f9-bc55-f0696f3cd91d
> >   System UUID:       da314140-12e7-45c2-98b2-753d3bfe4f46
> >   role of this node: Owner
> >   alloc_unit:        0x200000
> >   OMF major version: 2
> >   OMF minor version: 1
> >   sizeof superblock: 200
> >   log size (bytes):  8388608
> >   primary: /dev/dax0.1   8589934592
> > 
> > Log stats:
> >   # of log entries in use: 0 of 15420
> >   Log size in use:          48
> >   Log size (total bytes)    8388608
> >   No allocation errors found
> > 
> > Capacity:
> >   Device capacity:        8.00G
> >   Bitmap capacity:        8.00G
> >   Sum of file sizes:      0.00G
> >   Allocated space:        0.01G
> >   Free space:             7.99G
> >   Space amplification:     inf
> >   Percent used:            0.1%
> > 
> > Famfs log:
> >   0 of 15420 entries used
> >   0 bad log entries detected
> >   0 files
> >   0 directories
> > 
> > 9. famfs smoke tests also succeed. The smoke tests include
> > some fio tests, which run some simulated workloads
> > 
> > :== Test Timing Summary
> > :==-------------------------------------------------------------------
> > :==  prepare              0:10
> > :==  test0                0:07
> > :==  test_shadow_yaml     0:04
> > :==  test1                0:22
> > :==  test2                0:12
> > :==  test3                0:03
> > :==  test4                0:10
> > :==  test_errors          0:01
> > :==  stripe_test          0:58
> > :==  test_pcq             1:26
> > :==  test_fio             0:34
> > :==-------------------------------------------------------------------
> > :==  TOTAL                4:27
> > :==-------------------------------------------------------------------
> > :==run_smoke completed successfully (Thu Apr  9 10:33:28 PM UTC 2026)
> > 
> > Anisa Su (1):
> >   dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE
> > 
> > Ira Weiny (19):
> >   cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> >   cxl/mem: Read dynamic capacity configuration from the device
> >   cxl/cdat: Gather DSMAS data for DCD partitions
> >   cxl/core: Enforce partition order/simplify partition calls
> >   cxl/mem: Expose dynamic ram A partition in sysfs
> >   cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
> >   cxl/region: Add sparse DAX region support
> >   cxl/events: Split event msgnum configuration from irq setup
> >   cxl/pci: Factor out interrupt policy check
> >   cxl/mem: Configure dynamic capacity interrupts
> >   cxl/core: Return endpoint decoder information from region search
> >   cxl/extent: Process dynamic partition events and realize region
> >     extents
> >   cxl/region/extent: Expose region extent information in sysfs
> >   dax/bus: Factor out dev dax resize logic
> >   dax/region: Create resources on DAX regions
> >   cxl/region: Read existing extents on region creation
> >   cxl/mem: Trace Dynamic capacity Event Record
> >   tools/testing/cxl: Make event logs dynamic
> >   tools/testing/cxl: Add DC Regions to mock mem data
> > 
> >  Documentation/ABI/testing/sysfs-bus-cxl |  100 ++-
> >  drivers/cxl/core/Makefile               |    2 +-
> >  drivers/cxl/core/cdat.c                 |   11 +
> >  drivers/cxl/core/core.h                 |   47 +-
> >  drivers/cxl/core/extent.c               |  471 +++++++++++
> >  drivers/cxl/core/hdm.c                  |   13 +-
> >  drivers/cxl/core/mbox.c                 |  770 ++++++++++++++++-
> >  drivers/cxl/core/memdev.c               |   87 +-
> >  drivers/cxl/core/port.c                 |    5 +
> >  drivers/cxl/core/region.c               |   43 +-
> >  drivers/cxl/core/region_dax.c           |    6 +
> >  drivers/cxl/core/trace.h                |   65 ++
> >  drivers/cxl/cxl.h                       |   60 +-
> >  drivers/cxl/cxlmem.h                    |  124 ++-
> >  drivers/cxl/mem.c                       |    2 +-
> >  drivers/cxl/pci.c                       |  115 ++-
> >  drivers/dax/bus.c                       |  360 ++++++--
> >  drivers/dax/bus.h                       |    4 +-
> >  drivers/dax/cxl.c                       |   71 +-
> >  drivers/dax/dax-private.h               |   40 +
> >  drivers/dax/hmem/hmem.c                 |    2 +-
> >  drivers/dax/pmem.c                      |    2 +-
> >  include/cxl/cxl.h                       |    6 +
> >  include/cxl/event.h                     |   39 +
> >  include/linux/ioport.h                  |    3 +
> >  tools/testing/cxl/Kbuild                |    5 +-
> >  tools/testing/cxl/test/mem.c            | 1018 ++++++++++++++++++++---
> >  27 files changed, 3210 insertions(+), 261 deletions(-)
> >  create mode 100644 drivers/cxl/core/extent.c
> > 
> > -- 
> > 2.43.0
> > 
> 
> what commit or branch does this apply on top of? Not v7.0...
> 
> Thanks!
> John

I've bounced around about the best way to do this, but I'm working on 
one patch at the end of this series that fixes up the tagged extent handling
to concatenate together all extents that share the same tag into one 
daxdev, in sequence number order (and fail if any of the extent boundaries 
are misaligned). The DPA ranges of the extents don't matter, provided the
boundary alignment is good. And the extents get assembled in sequence
number order, not DPA order.

Once we've agreed it's right, I'll help with squashing it sensibly if you like.

I'm close, but it's not quite ready. I should be able to send it Wednesday.

Then I'll try to channel Jonathan and to a general review pass on the series.
Actually, maybe I'll try to program AI Jonathan :D

Thanks for what you do Anisa,
John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-21 21:02 ` Alison Schofield
@ 2026-04-23  1:20   ` Anisa Su
  0 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-23  1:20 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Anisa Su, linux-cxl, john, dave.jiang, gourry, dave,
	jonathan.cameron, ira.weiny, dan.j.williams

On Tue, Apr 21, 2026 at 02:02:23PM -0700, Alison Schofield wrote:
> On Fri, Apr 10, 2026 at 06:22:55PM -0700, Anisa Su wrote:
> > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > support, which are the requirements that I've understood from the community
> > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > greatly appreciated to let me know if this is on the right track? Or totally off
> > the mark...
> > 
> > Everything is the same as before except:
> > - extents must have tags (uuids)
> > - 1 tag per region
> > - regions must be contiguous (no more sparse regions)
> > 
> > To achieve this, the main thing is to change the relationship between
> > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > is enforced by sorting device extents by DPA order. They're re-sorted by the
> > original order in which they were sent for the response, which is required by
> > the spec.
> > 
> > Once valid extents have been collected, it's passed as 1 contiguous capacity
> > to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> > cannot be added to again, unless all extents are released.
> > 
> > For release: upon receiving a release event record, if the extent is within the
> > bounds of any cxl_region, and it has the correct tag, then all extents in the
> > region are released, so the "More" flag is still ignored. Not sure if this is the
> > right way to do it but it was the simplest.
> > 
> > The changes to the DAX layer remain untouched, as all of this extra validation is done
> > in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> > there was no need to add anything there.
> > 
> > Most of the series remains unchanged as I've tried not to make too many big changes
> > right off the bat. Only the following commits were modified:
> > - cxl/extent: Process dynamic partition events and realize region extents
> > - dax/region: Create resources on DAX regions
> > - cxl/region: Read existing extents on region creation
> > 
> > I've tacked on 1 commit at the end to change the driver type of DC regions from
> > DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.
> > 
> > Also, I've documented with more detail in the commit messages of the commits
> > that were modified on what exactly was changed, so I hope that's clear.
> > 
> > ================================================================================
> > Git History
> > 
> > This series is based on cxl-next, with base commit:
> > 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> > + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> > 
> > GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> > 
> > It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> > onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> > - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> > 
> > I've tested the current series without famfs as well as the series applied on
> > famfs-v9 with famfs.
> > ================================================================================
> > Testing:
> 
> Where's the CXl unit testing...
> 
> big skip
> 
> 
> >   tools/testing/cxl: Make event logs dynamic
> >   tools/testing/cxl: Add DC Regions to mock mem data
> 
> Hi Anisa,
> 
> I'm inserting here and referencing the last 2 patches above
> that add cxl/test support for DCD testing.
> 
> Let's use it! We'd really like to see the ndctl patches sync'd up
> with this kernel set. That would give us a CXL Unit test 'cxl-dcd.sh'
> to run on each revision. 
> 

Will do for the next version! I wasn't sure if the behavior in this RFC
was what famfs wanted, so I left the tests alone. But anyway, once cleared up
with John, those will be updated :3

Thanks,
Anisa

> This is one link, perhaps the last to the ndctl set, but I think you
> may have posted something long ago too:
> https://lore.kernel.org/nvdimm/20250413-dcd-region2-v5-0-fbd753a2e0e8@intel.com/
> 
> -- Alison
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-22  3:14   ` John Groves
@ 2026-04-23  1:24     ` Anisa Su
  0 siblings, 0 replies; 30+ messages in thread
From: Anisa Su @ 2026-04-23  1:24 UTC (permalink / raw)
  To: John Groves
  Cc: Anisa Su, linux-cxl@vger.kernel.org, Dave Jiang, Gregory Price,
	dave, Jonathan Cameron, Alison Schofield, Ira Weiny, Dan Williams,
	Aravind Ramesh, Ajay Joshi, dev.srinivasulu

On Tue, Apr 21, 2026 at 10:14:42PM -0500, John Groves wrote:
> 
> 
> On Tue, Apr 21, 2026, at 10:30 AM, John Groves wrote:
> > On 26/04/10 06:22PM, Anisa Su wrote:
> > > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > > support, which are the requirements that I've understood from the community
> > > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > > greatly appreciated to let me know if this is on the right track? Or totally off
> > > the mark...
> > > 
> > > Everything is the same as before except:
> > > - extents must have tags (uuids)
> > > - 1 tag per region
> > > - regions must be contiguous (no more sparse regions)
> > > 
> > > To achieve this, the main thing is to change the relationship between
> > > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > > is enforced by sorting device extents by DPA order. They're re-sorted by the
> > > original order in which they were sent for the response, which is required by
> > > the spec.
> > > 
> > > Once valid extents have been collected, it's passed as 1 contiguous capacity
> > > to the DAX layer via cxl_dax_region notify(). Once notified, the same region
> > > cannot be added to again, unless all extents are released.
> > > 
> > > For release: upon receiving a release event record, if the extent is within the
> > > bounds of any cxl_region, and it has the correct tag, then all extents in the
> > > region are released, so the "More" flag is still ignored. Not sure if this is the
> > > right way to do it but it was the simplest.
> > > 
> > > The changes to the DAX layer remain untouched, as all of this extra validation is done
> > > in the CXL layer. And since FAMFS already takes care of the devdax -> fsdev conversion,
> > > there was no need to add anything there.
> > > 
> > > Most of the series remains unchanged as I've tried not to make too many big changes
> > > right off the bat. Only the following commits were modified:
> > > - cxl/extent: Process dynamic partition events and realize region extents
> > > - dax/region: Create resources on DAX regions
> > > - cxl/region: Read existing extents on region creation
> > > 
> > > I've tacked on 1 commit at the end to change the driver type of DC regions from
> > > DAXDRV_KMEM_TYPE to DAXDRV_DEVICE_TYPE so it can be bound to the new fsdev driver.
> > > 
> > > Also, I've documented with more detail in the commit messages of the commits
> > > that were modified on what exactly was changed, so I hope that's clear.
> > > 
> > > ================================================================================
> > > Git History
> > > 
> > > This series is based on cxl-next, with base commit:
> > > 3939dba00f98 Merge branch 'for-7.1/cxl-misc' into cxl-for-next
> > > + bug fix: https://lore.kernel.org/linux-cxl/20260411011137.43545-1-anisa.su@samsung.com/T/#u
> > > 
> > > GH Branch: https://github.com/anisa-su993/anisa-linux-kernel/tree/dcd-rfc-04-10-26
> > > 
> > > It doesn't apply cleanly onto famfs-v9, although I have the version that's applied
> > > onto famfs-v9 here: https://github.com/anisa-su993/anisa-linux-kernel/tree/famfs-v9-dcd
> > > - famfs-v9 for reference: https://github.com/jagalactic/linux/tree/famfs-v9
> > > 
> > > I've tested the current series without famfs as well as the series applied on
> > > famfs-v9 with famfs.
> > > ================================================================================
> > > Testing:
> > > 
> > > This patchset was tested with Ali's QEMU patchset adding tag support:
> > > https://lore.kernel.org/linux-cxl/20260325184259.366-1-alireza.sanaee@huawei.com/T/#t
> > > 
> > > Details:
> > > Topology: '-object memory-backend-file,id=cxl-mem1,mem-path=/tmp/t3_cxl1.raw,size=12G \
> > >      -object memory-backend-file,id=cxl-lsa1,mem-path=/tmp/t3_lsa1.raw,size=1G \
> > >      -device usb-ehci,id=ehci \
> > >      -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
> > >      -device cxl-rp,port=0,bus=cxl.1,id=cxl_rp_port0,chassis=0,slot=2 \
> > >      -device cxl-type3,bus=cxl_rp_port0,id=cxl-dcd0,dc-regions-total-size=12G,num-dc-regions=1,sn=99 \
> > >      -device usb-cxl-mctp,bus=ehci.0,id=usb1,target=cxl-dcd0\
> > >      -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=12G,cxl-fmw.0.interleave-granularity=1k'
> > > 
> > > 1. Start VM (12GB)
> > > 2. Issue QMP to add tagged backend (8GB):
> > > { "execute": "qmp_capabilities" }
> > > {
> > >     "execute": "object-add",
> > >     "arguments": {
> > >         "qom-type": "memory-backend-ram",
> > >         "id": "tm0",
> > >         "size": 8589934592,
> > >         "share": true,
> > >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a"
> > >     }
> > > }
> > > 3. Create region on the VM: cxl create-region -m -d decoder0.0 -w 1 -s 8G mem0 -t dynamic_ram_a
> > > 4. Issue QMP to add an 8GB extent:
> > > { "execute": "qmp_capabilities" }
> > > {
> > >     "execute": "cxl-add-dynamic-capacity",
> > >     "arguments": {
> > >         "path": "/machine/peripheral/cxl-dcd0",
> > >         "host-id": 0,
> > >         "selection-policy": "prescriptive",
> > >         "region": 0,
> > >         "tag": "5be13bce-ae34-4a77-b6c3-16df975fcf1a",
> > >         "extents": [
> > >             {
> > >                 "offset": 0,
> > >                 "len": 8589934592
> > >             }
> > >         ]
> > >     }
> > > }
> > > 5. Verify with sysfs:
> > > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/offset
> > > 0x0
> > > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/length
> > > 0x200000000
> > > root@bgt-140510-bm03:~# cat /sys/bus/cxl/devices/dax_region0/extent0.0/uuid
> > > 5be13bce-ae34-4a77-b6c3-16df975fcf1a
> > > 
> > > 6. daxctl create-device -r region0
> > > [
> > >   {
> > >     "chardev":"dax0.1",
> > >     "size":8589934592,
> > >     "target_node":1,
> > >     "align":2097152,
> > >     "mode":"devdax"
> > >   }
> > > ]
> > > created 1 device
> > > 
> > > Currently, QEMU only supports sending 1 extent in an add/release request, which
> > > limits what I can test. However, I was able to verify that once extent(s) have
> > > been added to a region, it can't be added to again (size cannot be increased).
> > > 
> > > Up to this point is what I test with this patchset. Then below is the additional
> > > famfs tests for the version applied on famfs.
> > > ================================================================================
> > > 7. Install famfs userspace tool:
> > > https://github.com/cxl-micron-reskit/famfs
> > > 
> > > 8.mkfs.famfs --v /dev/dax0.1 output:
> > > devsize: 8589934592
> > > Famfs Superblock:
> > >   Filesystem UUID:   c33b4525-a2c7-4d64-9204-e8ed273b4ffb
> > >   Device UUID:       ae887e6b-f886-45f9-bc55-f0696f3cd91d
> > >   System UUID:       da314140-12e7-45c2-98b2-753d3bfe4f46
> > >   role of this node: Owner
> > >   alloc_unit:        0x200000
> > >   OMF major version: 2
> > >   OMF minor version: 1
> > >   sizeof superblock: 200
> > >   log size (bytes):  8388608
> > >   primary: /dev/dax0.1   8589934592
> > > 
> > > Log stats:
> > >   # of log entries in use: 0 of 15420
> > >   Log size in use:          48
> > >   Log size (total bytes)    8388608
> > >   No allocation errors found
> > > 
> > > Capacity:
> > >   Device capacity:        8.00G
> > >   Bitmap capacity:        8.00G
> > >   Sum of file sizes:      0.00G
> > >   Allocated space:        0.01G
> > >   Free space:             7.99G
> > >   Space amplification:     inf
> > >   Percent used:            0.1%
> > > 
> > > Famfs log:
> > >   0 of 15420 entries used
> > >   0 bad log entries detected
> > >   0 files
> > >   0 directories
> > > 
> > > 9. famfs smoke tests also succeed. The smoke tests include
> > > some fio tests, which run some simulated workloads
> > > 
> > > :== Test Timing Summary
> > > :==-------------------------------------------------------------------
> > > :==  prepare              0:10
> > > :==  test0                0:07
> > > :==  test_shadow_yaml     0:04
> > > :==  test1                0:22
> > > :==  test2                0:12
> > > :==  test3                0:03
> > > :==  test4                0:10
> > > :==  test_errors          0:01
> > > :==  stripe_test          0:58
> > > :==  test_pcq             1:26
> > > :==  test_fio             0:34
> > > :==-------------------------------------------------------------------
> > > :==  TOTAL                4:27
> > > :==-------------------------------------------------------------------
> > > :==run_smoke completed successfully (Thu Apr  9 10:33:28 PM UTC 2026)
> > > 
> > > Anisa Su (1):
> > >   dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE
> > > 
> > > Ira Weiny (19):
> > >   cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
> > >   cxl/mem: Read dynamic capacity configuration from the device
> > >   cxl/cdat: Gather DSMAS data for DCD partitions
> > >   cxl/core: Enforce partition order/simplify partition calls
> > >   cxl/mem: Expose dynamic ram A partition in sysfs
> > >   cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode
> > >   cxl/region: Add sparse DAX region support
> > >   cxl/events: Split event msgnum configuration from irq setup
> > >   cxl/pci: Factor out interrupt policy check
> > >   cxl/mem: Configure dynamic capacity interrupts
> > >   cxl/core: Return endpoint decoder information from region search
> > >   cxl/extent: Process dynamic partition events and realize region
> > >     extents
> > >   cxl/region/extent: Expose region extent information in sysfs
> > >   dax/bus: Factor out dev dax resize logic
> > >   dax/region: Create resources on DAX regions
> > >   cxl/region: Read existing extents on region creation
> > >   cxl/mem: Trace Dynamic capacity Event Record
> > >   tools/testing/cxl: Make event logs dynamic
> > >   tools/testing/cxl: Add DC Regions to mock mem data
> > > 
> > >  Documentation/ABI/testing/sysfs-bus-cxl |  100 ++-
> > >  drivers/cxl/core/Makefile               |    2 +-
> > >  drivers/cxl/core/cdat.c                 |   11 +
> > >  drivers/cxl/core/core.h                 |   47 +-
> > >  drivers/cxl/core/extent.c               |  471 +++++++++++
> > >  drivers/cxl/core/hdm.c                  |   13 +-
> > >  drivers/cxl/core/mbox.c                 |  770 ++++++++++++++++-
> > >  drivers/cxl/core/memdev.c               |   87 +-
> > >  drivers/cxl/core/port.c                 |    5 +
> > >  drivers/cxl/core/region.c               |   43 +-
> > >  drivers/cxl/core/region_dax.c           |    6 +
> > >  drivers/cxl/core/trace.h                |   65 ++
> > >  drivers/cxl/cxl.h                       |   60 +-
> > >  drivers/cxl/cxlmem.h                    |  124 ++-
> > >  drivers/cxl/mem.c                       |    2 +-
> > >  drivers/cxl/pci.c                       |  115 ++-
> > >  drivers/dax/bus.c                       |  360 ++++++--
> > >  drivers/dax/bus.h                       |    4 +-
> > >  drivers/dax/cxl.c                       |   71 +-
> > >  drivers/dax/dax-private.h               |   40 +
> > >  drivers/dax/hmem/hmem.c                 |    2 +-
> > >  drivers/dax/pmem.c                      |    2 +-
> > >  include/cxl/cxl.h                       |    6 +
> > >  include/cxl/event.h                     |   39 +
> > >  include/linux/ioport.h                  |    3 +
> > >  tools/testing/cxl/Kbuild                |    5 +-
> > >  tools/testing/cxl/test/mem.c            | 1018 ++++++++++++++++++++---
> > >  27 files changed, 3210 insertions(+), 261 deletions(-)
> > >  create mode 100644 drivers/cxl/core/extent.c
> > > 
> > > -- 
> > > 2.43.0
> > > 
> > 
> > what commit or branch does this apply on top of? Not v7.0...
> > 
> > Thanks!
> > John
> 
> I've bounced around about the best way to do this, but I'm working on 
> one patch at the end of this series that fixes up the tagged extent handling
> to concatenate together all extents that share the same tag into one 
> daxdev, in sequence number order (and fail if any of the extent boundaries 
> are misaligned). The DPA ranges of the extents don't matter, provided the
> boundary alignment is good. And the extents get assembled in sequence
> number order, not DPA order.
> 
> Once we've agreed it's right, I'll help with squashing it sensibly if you like.
> 
> I'm close, but it's not quite ready. I should be able to send it Wednesday.
> 
Ok sounds good!

Thanks,
Anisa
> Then I'll try to channel Jonathan and to a general review pass on the series.
> Actually, maybe I'll try to program AI Jonathan :D
> 
> Thanks for what you do Anisa,
> John

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags
  2026-04-21 18:48   ` Anisa Su
@ 2026-04-23 20:43     ` Ira Weiny
  0 siblings, 0 replies; 30+ messages in thread
From: Ira Weiny @ 2026-04-23 20:43 UTC (permalink / raw)
  To: Anisa Su, Gregory Price
  Cc: Anisa Su, linux-cxl, john, dave.jiang, dave, jonathan.cameron,
	alison.schofield, ira.weiny, dan.j.williams

Anisa Su wrote:
> On Sat, Apr 11, 2026 at 01:05:07AM -0400, Gregory Price wrote:
> > On Fri, Apr 10, 2026 at 06:22:55PM -0700, Anisa Su wrote:
> > > This RFC modifies Ira's DCD patchset to enforce tags and remove sparse DAX region
> > > support, which are the requirements that I've understood from the community
> > > meetings, with the end goal of FAMFS as the end user for DCD. Feedback would be
> > > greatly appreciated to let me know if this is on the right track? Or totally off
> > > the mark...
> > > 
> > > Everything is the same as before except:
> > > - extents must have tags (uuids)
> > > - 1 tag per region
> > > - regions must be contiguous (no more sparse regions)
> > > 
> > > To achieve this, the main thing is to change the relationship between
> > > cxl_dax_region : region_extent from 1 : many to 1 : 1. Each region_extent is
> > > comprised of 1+ contiguous device extents with the same tag. Contiguity
> > > is enforced by sorting device extents by DPA order.
> > 
> > Are you saying they need to be contiguous in DPA?  I don't think that's
> > what we want - in fact that's actually somewhat defeats the entire point
> > of DCD being allocator-like.
> > 
> > We don't much care what DPA space the extents take up so long as the
> > total size of the set of extents matches the size of the HPA region.
> > 
> > Sorting by DPA at least gets you extent-ordering for free as a
> > pre-agreed upon mechanism, so that's fine.
> > 
> > Basically: we don't want sparse regions where the HPA space has holes
> > (i.e. an extent hasn't been sent by the device to back the hole yet),
> > but we don't really care what DPA actually backs those HPA regions.
> > 
> > Unless my take on how DCD "should" work is wildly different from the
> > general understand.
> > 
> > ~Gregory
> 
> Super belated follow up to document some discussion from last Monday's DAX call:
> 
> Instead of enforcing contiguous DPA, what we mean by not supporting sparse
> regions means that each DAX *device* needs to be fully backed by extents?
> 
> - The CXL region/DAX region itself can be made of multiple DAX devices, and
>   there can be HPA gaps between the devices, so the CXL region can be "sparse"
> - But each DAX device must be fully backed (non-sparse)
> 
> We potentially want to do something like "daxctl prepare-device daxX.Y" to set
> up an "empty" DAX device? Then as extents come in, make sure they cover the full
> range of the device?
>     - Add some kind of timeout to receive extents that back the device
>     - If not fully backed within timeout, release all and destroy device

I don't think a strict timeout is needed here.  If prepare-device does not
actually surface a dax device until all extents are there a subsequent
destroy-device (prior to all the extents being there) could release all
the extents which were there.  Basically allow user space to handle any
'timeout'.  It could even decide to ask the orchestrator what to do at
that point and either wait or unwind.

> 
> That's kind of my understanding after letting it marinate in my brain... pls
> correct if that's wrong
> 
> One quick question I currently have is: in this scenario, we expect the orchestrator to
> coordinate the "daxctl prepare-device daxX.Y"? Is that correct?

I would think so yes.

Ira

> 
> Thanks,
> Anisa

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2026-04-23 20:39 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-11  1:22 [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Anisa Su
2026-04-11  1:22 ` [PATCH 01/20] cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) Anisa Su
2026-04-11  1:22 ` [PATCH 02/20] cxl/mem: Read dynamic capacity configuration from the device Anisa Su
2026-04-11  1:22 ` [PATCH 03/20] cxl/cdat: Gather DSMAS data for DCD partitions Anisa Su
2026-04-11  1:22 ` [PATCH 04/20] cxl/core: Enforce partition order/simplify partition calls Anisa Su
2026-04-11  1:23 ` [PATCH 05/20] cxl/mem: Expose dynamic ram A partition in sysfs Anisa Su
2026-04-11  1:23 ` [PATCH 06/20] cxl/port: Add 'dynamic_ram_a' to endpoint decoder mode Anisa Su
2026-04-11  1:23 ` [PATCH 07/20] cxl/region: Add sparse DAX region support Anisa Su
2026-04-11  1:23 ` [PATCH 08/20] cxl/events: Split event msgnum configuration from irq setup Anisa Su
2026-04-11  1:23 ` [PATCH 09/20] cxl/pci: Factor out interrupt policy check Anisa Su
2026-04-11  1:23 ` [PATCH 10/20] cxl/mem: Configure dynamic capacity interrupts Anisa Su
2026-04-11  1:23 ` [PATCH 11/20] cxl/core: Return endpoint decoder information from region search Anisa Su
2026-04-11  1:23 ` [PATCH 12/20] cxl/extent: Process dynamic partition events and realize region extents Anisa Su
2026-04-11  1:23 ` [PATCH 13/20] cxl/region/extent: Expose region extent information in sysfs Anisa Su
2026-04-11  1:23 ` [PATCH 14/20] dax/bus: Factor out dev dax resize logic Anisa Su
2026-04-11  1:23 ` [PATCH 15/20] dax/region: Create resources on DAX regions Anisa Su
2026-04-11  1:23 ` [PATCH 16/20] cxl/region: Read existing extents on region creation Anisa Su
2026-04-11  1:23 ` [PATCH 17/20] cxl/mem: Trace Dynamic capacity Event Record Anisa Su
2026-04-11  1:23 ` [PATCH 18/20] tools/testing/cxl: Make event logs dynamic Anisa Su
2026-04-11  1:23 ` [PATCH 19/20] tools/testing/cxl: Add DC Regions to mock mem data Anisa Su
2026-04-11  1:23 ` [PATCH 20/20] dax/bus.c: make DC regions driver type DAXDRV_DEVICE_TYPE Anisa Su
2026-04-11  5:05 ` [RFC PATCH 00/20] DCD: Remove support for sparse regions & add tags Gregory Price
2026-04-21 18:48   ` Anisa Su
2026-04-23 20:43     ` Ira Weiny
2026-04-21 15:30 ` John Groves
2026-04-21 17:42   ` Anisa Su
2026-04-22  3:14   ` John Groves
2026-04-23  1:24     ` Anisa Su
2026-04-21 21:02 ` Alison Schofield
2026-04-23  1:20   ` Anisa Su

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.