* [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD)
@ 2024-10-07 23:16 Ira Weiny
2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Ira Weiny @ 2024-10-07 23:16 UTC (permalink / raw)
To: Dave Jiang, Fan Ni, Jonathan Cameron, Navneet Singh,
Jonathan Corbet, Andrew Morton
Cc: Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma,
Ira Weiny, linux-btrfs, linux-cxl, linux-doc, nvdimm,
linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko,
Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik,
David Sterba, Johannes Thumshirn, Li, Ming, Jonathan Cameron,
Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi,
acpica-devel
A git tree of this series can be found here:
https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04
Series info
===========
This series has 5 parts:
Patch 1-3: Add %pra printk format for struct range
Patch 4: Add core range_overlaps() function
Patch 5-6: CXL clean up/prelim patches
Patch 7-26: Core DCD support
Patch 27-28: cxl_test support
Background
==========
A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory
device that allows memory capacity within a region to change
dynamically without the need for resetting the device, reconfiguring
HDM decoders, or reconfiguring software DAX regions.
One of the biggest use cases for Dynamic Capacity is to allow hosts to
share memory dynamically within a data center without increasing the
per-host attached memory.
The general flow for the addition or removal of memory is to have an
orchestrator coordinate the use of the memory. Generally there are 5
actors in such a system, the Orchestrator, Fabric Manager, the Logical
device, the Host Kernel, and a Host User.
Typical work flows are shown below.
Orchestrator FM Device Host Kernel Host User
| | | | |
|-------------- Create region ----------------------->|
| | | | |
| | | |<-- Create ---|
| | | | Region |
|<------------- Signal done --------------------------|
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create --->|
| | | | DAX dev |-- Use memory
| | | | | |
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
| | | | |
|<------------- Signal done --------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Accept -|<- Accept -| |
| | Extent | Extent | |
| | | |<- Create ----|
| | | | DAX dev |-- Use memory
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
|<------------- Signal done --------------------------|
| | | | |
|-- Remove -->|- Release->|- Release ->| |
| Capacity | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | | |
|-- Add ----->|-- Add --->|--- Add --->| |
| Capacity | Extent | Extent | |
| | | |<- Create ----|
| | | | DAX dev |-- Use memory
| | | | | |
|-- Remove -->|- Release->|- Release ->| | |
| Capacity | Extent | Extent | | |
| | | | | |
| | | (Release Ignored) | |
| | | | | |
| | | |<- Release ---| <-+
| | | | DAX dev |
|<------------- Signal done --------------------------|
| | | | |
| |- Release->|- Release ->| |
| | Extent | Extent | |
| | | | |
| |<- Release-|<- Release -| |
| | Extent | Extent | |
| | | |<- Destroy ---|
| | | | Region |
| | | | |
Implementation
==============
The series still requires the creation of regions and DAX devices to be
closely synchronized with the Orchestrator and Fabric Manager. The host
kernel will reject extents if a region is not yet created. It also
ignores extent release if memory is in use (DAX device created). These
synchronizations are not anticipated to be an issue with real
applications.
In order to allow for capacity to be added and removed a new concept of
a sparse DAX region is introduced. A sparse DAX region may have 0 or
more bytes of available space. The total space depends on the number
and size of the extents which have been added.
Initially it is anticipated that users of the memory will carefully
coordinate the surfacing of additional capacity with the creation of DAX
devices which use that capacity. Therefore, the allocation of the
memory to DAX devices does not allow for specific associations between
DAX device and extent. This keeps allocations very similar to existing
DAX region behavior.
To keep the DAX memory allocation aligned with the existing DAX devices
which do not have tags extents are not allowed to have tags. Future
support for tags is planned.
Great care was taken to keep the extent tracking simple. Some xarray's
needed to be added but extra software objects were kept to a minimum.
Region extents continue to be tracked as sub-devices of the DAX region.
This ensures that region destruction cleans up all extent allocations
properly.
Some review tags were kept if a patch did not change.
The major functionality of this series includes:
- Getting the dynamic capacity (DC) configuration information from cxl
devices
- Configuring the DC partitions reported by hardware
- Enhancing the CXL and DAX regions for dynamic capacity support
a. Maintain a logical separation between hardware extents and
software managed region extents. This provides an
abstraction between the layers and should allow for
interleaving in the future
- Get hardware extent lists for endpoint decoders upon
region creation.
- Adjust extent/region memory available on the following events.
a. Add capacity Events
b. Release capacity events
- Host response for add capacity
a. do not accept the extent if:
If the region does not exist
or an error occurs realizing the extent
b. If the region does exist
realize a DAX region extent with 1:1 mapping (no
interleave yet)
c. Support the event more bit by processing a list of extents
marked with the more bit together before setting up a
response.
- Host response for remove capacity
a. If no DAX device references the extent; release the extent
b. If a reference does exist, ignore the request.
(Require FM to issue release again.)
- Modify DAX device creation/resize to account for extents within a
sparse DAX region
- Trace Dynamic Capacity events for debugging
- Add cxl-test infrastructure to allow for faster unit testing
(See new ndctl branch for cxl-dcd.sh test[1])
- Only support 0 value extent tags
Fan Ni's upstream of Qemu DCD was used for testing.
Remaining work:
1) Allow mapping to specific extents (perhaps based on
label/tag)
1a) devise region size reporting based on tags
2) Interleave support
Possible additional work depending on requirements:
1) Accept a new extent which extends (but overlaps) an existing
extent(s)
2) Release extents when DAX devices are released if a release
was previously seen from the device
3) Rework DAX device interfaces, memfd has been explored a bit
[1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01
---
Major changes in v4:
- iweiny: rebase to 6.12-rc
- iweiny: Add qos data to regions
- Jonathan: Fix up shared region detection
- Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags
- iweiny: Change DCD partition sysfs entries to allow for qos class and
additional parameters per partition
- Petr/Andy: s/%par/%pra/
- Andy: Share logic between printing struct resource and struct range
- Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@intel.com
---
Ira Weiny (14):
test printk: Add very basic struct resource tests
printk: Add print format (%pra) for struct range
cxl/cdat: Use %pra for dpa range outputs
range: Add range_overlaps()
dax: Document dax dev range tuple
cxl/pci: Delay event buffer allocation
cxl/cdat: Gather DSMAS data for DCD regions
cxl/region: Refactor common create region code
cxl/events: Split event msgnum configuration from irq setup
cxl/pci: Factor out interrupt policy check
cxl/core: Return endpoint decoder information from region search
dax/bus: Factor out dev dax resize logic
tools/testing/cxl: Make event logs dynamic
tools/testing/cxl: Add DC Regions to mock mem data
Navneet Singh (14):
cxl/mbox: Flag support for Dynamic Capacity Devices (DCD)
cxl/mem: Read dynamic capacity configuration from the device
cxl/core: Separate region mode from decoder mode
cxl/region: Add dynamic capacity decoder and region modes
cxl/hdm: Add dynamic capacity size support to endpoint decoders
cxl/mem: Expose DCD partition capabilities in sysfs
cxl/port: Add endpoint decoder DC mode support to sysfs
cxl/region: Add sparse DAX region support
cxl/mem: Configure dynamic capacity interrupts
cxl/extent: Process DCD events and realize region extents
cxl/region/extent: Expose region extent information in sysfs
dax/region: Create resources on sparse DAX regions
cxl/region: Read existing extents on region creation
cxl/mem: Trace Dynamic capacity Event Record
Documentation/ABI/testing/sysfs-bus-cxl | 120 +++-
Documentation/core-api/printk-formats.rst | 13 +
drivers/cxl/core/Makefile | 2 +-
drivers/cxl/core/cdat.c | 52 +-
drivers/cxl/core/core.h | 33 +-
drivers/cxl/core/extent.c | 486 +++++++++++++++
drivers/cxl/core/hdm.c | 213 ++++++-
drivers/cxl/core/mbox.c | 605 ++++++++++++++++++-
drivers/cxl/core/memdev.c | 130 +++-
drivers/cxl/core/port.c | 13 +-
drivers/cxl/core/region.c | 170 ++++--
drivers/cxl/core/trace.h | 65 ++
drivers/cxl/cxl.h | 122 +++-
drivers/cxl/cxlmem.h | 131 +++-
drivers/cxl/pci.c | 123 +++-
drivers/dax/bus.c | 352 +++++++++--
drivers/dax/bus.h | 4 +-
drivers/dax/cxl.c | 72 ++-
drivers/dax/dax-private.h | 47 +-
drivers/dax/hmem/hmem.c | 2 +-
drivers/dax/pmem.c | 2 +-
fs/btrfs/ordered-data.c | 10 +-
include/acpi/actbl1.h | 2 +
include/cxl/event.h | 32 +
include/linux/range.h | 7 +
lib/test_printf.c | 70 +++
lib/vsprintf.c | 55 +-
tools/testing/cxl/Kbuild | 3 +-
tools/testing/cxl/test/mem.c | 960 ++++++++++++++++++++++++++----
29 files changed, 3576 insertions(+), 320 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20230604-dcd-type2-upstream-0cd15f6216fd
Best regards,
--
Ira Weiny <ira.weiny@intel.com>
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny @ 2024-10-07 23:16 ` Ira Weiny 2024-10-09 14:42 ` Rafael J. Wysocki ` (2 more replies) 2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni 2024-10-21 16:47 ` Fan Ni 2 siblings, 3 replies; 14+ messages in thread From: Ira Weiny @ 2024-10-07 23:16 UTC (permalink / raw) To: Dave Jiang, Fan Ni, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton Cc: Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, Ira Weiny, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel Additional DCD region (partition) information is contained in the DSMAS CDAT tables, including performance, read only, and shareable attributes. Match DCD partitions with DSMAS tables and store the meta data. To: Robert Moore <robert.moore@intel.com> To: Rafael J. Wysocki <rafael.j.wysocki@intel.com> To: Len Brown <lenb@kernel.org> Cc: linux-acpi@vger.kernel.org Cc: acpica-devel@lists.linux.dev Signed-off-by: Ira Weiny <ira.weiny@intel.com> --- Changes: [iweiny: new patch] [iweiny: Gather shareable/read-only flags for later use] --- drivers/cxl/core/cdat.c | 38 ++++++++++++++++++++++++++++++++++++++ drivers/cxl/core/mbox.c | 2 ++ drivers/cxl/cxlmem.h | 3 +++ include/acpi/actbl1.h | 2 ++ 4 files changed, 45 insertions(+) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index bd50bb655741..9b2f717a16e5 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -17,6 +17,8 @@ struct dsmas_entry { struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX]; int entries; int qos_class; + bool shareable; + bool read_only; }; static u32 cdat_normalize(u16 entry, u64 base, u8 type) @@ -74,6 +76,8 @@ static int cdat_dsmas_handler(union acpi_subtable_headers *header, void *arg, return -ENOMEM; dent->handle = dsmas->dsmad_handle; + dent->shareable = dsmas->flags & ACPI_CDAT_DSMAS_SHAREABLE; + dent->read_only = dsmas->flags & ACPI_CDAT_DSMAS_READ_ONLY; dent->dpa_range.start = le64_to_cpu((__force __le64)dsmas->dpa_base_address); dent->dpa_range.end = le64_to_cpu((__force __le64)dsmas->dpa_base_address) + le64_to_cpu((__force __le64)dsmas->dpa_length) - 1; @@ -255,6 +259,38 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, dent->coord[ACCESS_COORDINATE_CPU].write_latency); } + +static void update_dcd_perf(struct cxl_dev_state *cxlds, + struct dsmas_entry *dent) +{ + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); + struct device *dev = cxlds->dev; + + for (int i = 0; i < mds->nr_dc_region; i++) { + /* CXL defines a u32 handle while cdat defines u8, ignore upper bits */ + u8 dc_handle = mds->dc_region[i].dsmad_handle & 0xff; + + if (resource_size(&cxlds->dc_res[i])) { + struct range dc_range = { + .start = cxlds->dc_res[i].start, + .end = cxlds->dc_res[i].end, + }; + + if (range_contains(&dent->dpa_range, &dc_range)) { + if (dent->handle != dc_handle) + dev_warn(dev, "DC Region/DSMAS mis-matched handle/range; region %pra (%u); dsmas %pra (%u)\n" + " setting DC region attributes regardless\n", + &dent->dpa_range, dent->handle, + &dc_range, dc_handle); + + mds->dc_region[i].shareable = dent->shareable; + mds->dc_region[i].read_only = dent->read_only; + update_perf_entry(dev, dent, &mds->dc_perf[i]); + } + } + } +} + static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, struct xarray *dsmas_xa) { @@ -278,6 +314,8 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, else if (resource_size(&cxlds->pmem_res) && range_contains(&pmem_range, &dent->dpa_range)) update_perf_entry(dev, dent, &mds->pmem_perf); + else if (cxl_dcd_supported(mds)) + update_dcd_perf(cxlds, dent); else dev_dbg(dev, "no partition for dsmas dpa: %pra\n", &dent->dpa_range); diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c index 4b51ddd1ff94..3ba465823564 100644 --- a/drivers/cxl/core/mbox.c +++ b/drivers/cxl/core/mbox.c @@ -1649,6 +1649,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; + for (int i = 0; i < CXL_MAX_DC_REGION; i++) + mds->dc_perf[i].qos_class = CXL_QOS_CLASS_INVALID; return mds; } diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h index 0690b917b1e0..c3b889a586d8 100644 --- a/drivers/cxl/cxlmem.h +++ b/drivers/cxl/cxlmem.h @@ -466,6 +466,8 @@ struct cxl_dc_region_info { u64 blk_size; u32 dsmad_handle; u8 flags; + bool shareable; + bool read_only; u8 name[CXL_DC_REGION_STRLEN]; }; @@ -533,6 +535,7 @@ struct cxl_memdev_state { u8 nr_dc_region; struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; + struct cxl_dpa_perf dc_perf[CXL_MAX_DC_REGION]; struct cxl_event_state event; struct cxl_poison_state poison; diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h index 199afc2cd122..387fc821703a 100644 --- a/include/acpi/actbl1.h +++ b/include/acpi/actbl1.h @@ -403,6 +403,8 @@ struct acpi_cdat_dsmas { /* Flags for subtable above */ #define ACPI_CDAT_DSMAS_NON_VOLATILE (1 << 2) +#define ACPI_CDAT_DSMAS_SHAREABLE (1 << 3) +#define ACPI_CDAT_DSMAS_READ_ONLY (1 << 6) /* Subtable 1: Device scoped Latency and Bandwidth Information Structure (DSLBIS) */ -- 2.46.0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny @ 2024-10-09 14:42 ` Rafael J. Wysocki 2024-10-11 20:38 ` Ira Weiny 2024-10-09 18:16 ` Fan Ni 2024-10-10 12:51 ` Jonathan Cameron 2 siblings, 1 reply; 14+ messages in thread From: Rafael J. Wysocki @ 2024-10-09 14:42 UTC (permalink / raw) To: Ira Weiny Cc: Dave Jiang, Fan Ni, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Tue, Oct 8, 2024 at 1:17 AM Ira Weiny <ira.weiny@intel.com> wrote: > > Additional DCD region (partition) information is contained in the DSMAS > CDAT tables, including performance, read only, and shareable attributes. > > Match DCD partitions with DSMAS tables and store the meta data. > > To: Robert Moore <robert.moore@intel.com> > To: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > To: Len Brown <lenb@kernel.org> > Cc: linux-acpi@vger.kernel.org > Cc: acpica-devel@lists.linux.dev > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes: > [iweiny: new patch] > [iweiny: Gather shareable/read-only flags for later use] > --- > drivers/cxl/core/cdat.c | 38 ++++++++++++++++++++++++++++++++++++++ > drivers/cxl/core/mbox.c | 2 ++ > drivers/cxl/cxlmem.h | 3 +++ > include/acpi/actbl1.h | 2 ++ > 4 files changed, 45 insertions(+) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index bd50bb655741..9b2f717a16e5 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -17,6 +17,8 @@ struct dsmas_entry { > struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX]; > int entries; > int qos_class; > + bool shareable; > + bool read_only; > }; > > static u32 cdat_normalize(u16 entry, u64 base, u8 type) > @@ -74,6 +76,8 @@ static int cdat_dsmas_handler(union acpi_subtable_headers *header, void *arg, > return -ENOMEM; > > dent->handle = dsmas->dsmad_handle; > + dent->shareable = dsmas->flags & ACPI_CDAT_DSMAS_SHAREABLE; > + dent->read_only = dsmas->flags & ACPI_CDAT_DSMAS_READ_ONLY; > dent->dpa_range.start = le64_to_cpu((__force __le64)dsmas->dpa_base_address); > dent->dpa_range.end = le64_to_cpu((__force __le64)dsmas->dpa_base_address) + > le64_to_cpu((__force __le64)dsmas->dpa_length) - 1; > @@ -255,6 +259,38 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, > dent->coord[ACCESS_COORDINATE_CPU].write_latency); > } > > + > +static void update_dcd_perf(struct cxl_dev_state *cxlds, > + struct dsmas_entry *dent) > +{ > + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); > + struct device *dev = cxlds->dev; > + > + for (int i = 0; i < mds->nr_dc_region; i++) { > + /* CXL defines a u32 handle while cdat defines u8, ignore upper bits */ > + u8 dc_handle = mds->dc_region[i].dsmad_handle & 0xff; > + > + if (resource_size(&cxlds->dc_res[i])) { > + struct range dc_range = { > + .start = cxlds->dc_res[i].start, > + .end = cxlds->dc_res[i].end, > + }; > + > + if (range_contains(&dent->dpa_range, &dc_range)) { > + if (dent->handle != dc_handle) > + dev_warn(dev, "DC Region/DSMAS mis-matched handle/range; region %pra (%u); dsmas %pra (%u)\n" > + " setting DC region attributes regardless\n", > + &dent->dpa_range, dent->handle, > + &dc_range, dc_handle); > + > + mds->dc_region[i].shareable = dent->shareable; > + mds->dc_region[i].read_only = dent->read_only; > + update_perf_entry(dev, dent, &mds->dc_perf[i]); > + } > + } > + } > +} > + > static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > struct xarray *dsmas_xa) > { > @@ -278,6 +314,8 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > else if (resource_size(&cxlds->pmem_res) && > range_contains(&pmem_range, &dent->dpa_range)) > update_perf_entry(dev, dent, &mds->pmem_perf); > + else if (cxl_dcd_supported(mds)) > + update_dcd_perf(cxlds, dent); > else > dev_dbg(dev, "no partition for dsmas dpa: %pra\n", > &dent->dpa_range); > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 4b51ddd1ff94..3ba465823564 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > @@ -1649,6 +1649,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) > mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; > mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; > mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; > + for (int i = 0; i < CXL_MAX_DC_REGION; i++) > + mds->dc_perf[i].qos_class = CXL_QOS_CLASS_INVALID; > > return mds; > } > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index 0690b917b1e0..c3b889a586d8 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -466,6 +466,8 @@ struct cxl_dc_region_info { > u64 blk_size; > u32 dsmad_handle; > u8 flags; > + bool shareable; > + bool read_only; > u8 name[CXL_DC_REGION_STRLEN]; > }; > > @@ -533,6 +535,7 @@ struct cxl_memdev_state { > > u8 nr_dc_region; > struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; > + struct cxl_dpa_perf dc_perf[CXL_MAX_DC_REGION]; > > struct cxl_event_state event; > struct cxl_poison_state poison; > diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h > index 199afc2cd122..387fc821703a 100644 > --- a/include/acpi/actbl1.h > +++ b/include/acpi/actbl1.h > @@ -403,6 +403,8 @@ struct acpi_cdat_dsmas { > /* Flags for subtable above */ > > #define ACPI_CDAT_DSMAS_NON_VOLATILE (1 << 2) > +#define ACPI_CDAT_DSMAS_SHAREABLE (1 << 3) > +#define ACPI_CDAT_DSMAS_READ_ONLY (1 << 6) > > /* Subtable 1: Device scoped Latency and Bandwidth Information Structure (DSLBIS) */ > Is there an upstream ACPICA commit for this? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-09 14:42 ` Rafael J. Wysocki @ 2024-10-11 20:38 ` Ira Weiny 2024-10-14 20:52 ` Wysocki, Rafael J 0 siblings, 1 reply; 14+ messages in thread From: Ira Weiny @ 2024-10-11 20:38 UTC (permalink / raw) To: Rafael J. Wysocki, Ira Weiny Cc: Dave Jiang, Fan Ni, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel Rafael J. Wysocki wrote: > On Tue, Oct 8, 2024 at 1:17 AM Ira Weiny <ira.weiny@intel.com> wrote: > > [snip] > > diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h > > index 199afc2cd122..387fc821703a 100644 > > --- a/include/acpi/actbl1.h > > +++ b/include/acpi/actbl1.h > > @@ -403,6 +403,8 @@ struct acpi_cdat_dsmas { > > /* Flags for subtable above */ > > > > #define ACPI_CDAT_DSMAS_NON_VOLATILE (1 << 2) > > +#define ACPI_CDAT_DSMAS_SHAREABLE (1 << 3) > > +#define ACPI_CDAT_DSMAS_READ_ONLY (1 << 6) > > > > /* Subtable 1: Device scoped Latency and Bandwidth Information Structure (DSLBIS) */ > > > > Is there an upstream ACPICA commit for this? There is a PR for it now. https://github.com/acpica/acpica/pull/976 Do I need to reference that in this patch? Or wait for it to be merged and drop this hunk? Ira ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-11 20:38 ` Ira Weiny @ 2024-10-14 20:52 ` Wysocki, Rafael J 0 siblings, 0 replies; 14+ messages in thread From: Wysocki, Rafael J @ 2024-10-14 20:52 UTC (permalink / raw) To: Ira Weiny, Rafael J. Wysocki Cc: Dave Jiang, Fan Ni, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Len Brown, linux-acpi, acpica-devel On 10/11/2024 10:38 PM, Ira Weiny wrote: > Rafael J. Wysocki wrote: >> On Tue, Oct 8, 2024 at 1:17 AM Ira Weiny <ira.weiny@intel.com> wrote: > [snip] > >>> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h >>> index 199afc2cd122..387fc821703a 100644 >>> --- a/include/acpi/actbl1.h >>> +++ b/include/acpi/actbl1.h >>> @@ -403,6 +403,8 @@ struct acpi_cdat_dsmas { >>> /* Flags for subtable above */ >>> >>> #define ACPI_CDAT_DSMAS_NON_VOLATILE (1 << 2) >>> +#define ACPI_CDAT_DSMAS_SHAREABLE (1 << 3) >>> +#define ACPI_CDAT_DSMAS_READ_ONLY (1 << 6) >>> >>> /* Subtable 1: Device scoped Latency and Bandwidth Information Structure (DSLBIS) */ >>> >> Is there an upstream ACPICA commit for this? > There is a PR for it now. > > https://github.com/acpica/acpica/pull/976 > > Do I need to reference that in this patch? Or wait for it to be merged > and drop this hunk? Wait for it to be merged first. Then either drop this hunk and wait for an ACPICA release (that may not happen soon, though), or send a Linux patch corresponding to it with a Link tag pointing to the above. Thanks! ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny 2024-10-09 14:42 ` Rafael J. Wysocki @ 2024-10-09 18:16 ` Fan Ni 2024-10-14 1:16 ` Ira Weiny 2024-10-10 12:51 ` Jonathan Cameron 2 siblings, 1 reply; 14+ messages in thread From: Fan Ni @ 2024-10-09 18:16 UTC (permalink / raw) To: Ira Weiny Cc: Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Mon, Oct 07, 2024 at 06:16:18PM -0500, Ira Weiny wrote: > Additional DCD region (partition) information is contained in the DSMAS > CDAT tables, including performance, read only, and shareable attributes. > > Match DCD partitions with DSMAS tables and store the meta data. > > To: Robert Moore <robert.moore@intel.com> > To: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > To: Len Brown <lenb@kernel.org> > Cc: linux-acpi@vger.kernel.org > Cc: acpica-devel@lists.linux.dev > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > One minor comment inline. > --- > Changes: > [iweiny: new patch] > [iweiny: Gather shareable/read-only flags for later use] > --- > drivers/cxl/core/cdat.c | 38 ++++++++++++++++++++++++++++++++++++++ > drivers/cxl/core/mbox.c | 2 ++ > drivers/cxl/cxlmem.h | 3 +++ > include/acpi/actbl1.h | 2 ++ > 4 files changed, 45 insertions(+) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index bd50bb655741..9b2f717a16e5 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -17,6 +17,8 @@ struct dsmas_entry { > struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX]; > int entries; > int qos_class; > + bool shareable; > + bool read_only; > }; > > static u32 cdat_normalize(u16 entry, u64 base, u8 type) > @@ -74,6 +76,8 @@ static int cdat_dsmas_handler(union acpi_subtable_headers *header, void *arg, > return -ENOMEM; > > dent->handle = dsmas->dsmad_handle; > + dent->shareable = dsmas->flags & ACPI_CDAT_DSMAS_SHAREABLE; > + dent->read_only = dsmas->flags & ACPI_CDAT_DSMAS_READ_ONLY; > dent->dpa_range.start = le64_to_cpu((__force __le64)dsmas->dpa_base_address); > dent->dpa_range.end = le64_to_cpu((__force __le64)dsmas->dpa_base_address) + > le64_to_cpu((__force __le64)dsmas->dpa_length) - 1; > @@ -255,6 +259,38 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, > dent->coord[ACCESS_COORDINATE_CPU].write_latency); > } > > + Unwanted blank line. Fan > +static void update_dcd_perf(struct cxl_dev_state *cxlds, > + struct dsmas_entry *dent) > +{ > + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); > + struct device *dev = cxlds->dev; > + > + for (int i = 0; i < mds->nr_dc_region; i++) { > + /* CXL defines a u32 handle while cdat defines u8, ignore upper bits */ > + u8 dc_handle = mds->dc_region[i].dsmad_handle & 0xff; > + > + if (resource_size(&cxlds->dc_res[i])) { > + struct range dc_range = { > + .start = cxlds->dc_res[i].start, > + .end = cxlds->dc_res[i].end, > + }; > + > + if (range_contains(&dent->dpa_range, &dc_range)) { > + if (dent->handle != dc_handle) > + dev_warn(dev, "DC Region/DSMAS mis-matched handle/range; region %pra (%u); dsmas %pra (%u)\n" > + " setting DC region attributes regardless\n", > + &dent->dpa_range, dent->handle, > + &dc_range, dc_handle); > + > + mds->dc_region[i].shareable = dent->shareable; > + mds->dc_region[i].read_only = dent->read_only; > + update_perf_entry(dev, dent, &mds->dc_perf[i]); > + } > + } > + } > +} > + > static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > struct xarray *dsmas_xa) > { > @@ -278,6 +314,8 @@ static void cxl_memdev_set_qos_class(struct cxl_dev_state *cxlds, > else if (resource_size(&cxlds->pmem_res) && > range_contains(&pmem_range, &dent->dpa_range)) > update_perf_entry(dev, dent, &mds->pmem_perf); > + else if (cxl_dcd_supported(mds)) > + update_dcd_perf(cxlds, dent); > else > dev_dbg(dev, "no partition for dsmas dpa: %pra\n", > &dent->dpa_range); > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c > index 4b51ddd1ff94..3ba465823564 100644 > --- a/drivers/cxl/core/mbox.c > +++ b/drivers/cxl/core/mbox.c > @@ -1649,6 +1649,8 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev) > mds->cxlds.type = CXL_DEVTYPE_CLASSMEM; > mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID; > mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID; > + for (int i = 0; i < CXL_MAX_DC_REGION; i++) > + mds->dc_perf[i].qos_class = CXL_QOS_CLASS_INVALID; > > return mds; > } > diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h > index 0690b917b1e0..c3b889a586d8 100644 > --- a/drivers/cxl/cxlmem.h > +++ b/drivers/cxl/cxlmem.h > @@ -466,6 +466,8 @@ struct cxl_dc_region_info { > u64 blk_size; > u32 dsmad_handle; > u8 flags; > + bool shareable; > + bool read_only; > u8 name[CXL_DC_REGION_STRLEN]; > }; > > @@ -533,6 +535,7 @@ struct cxl_memdev_state { > > u8 nr_dc_region; > struct cxl_dc_region_info dc_region[CXL_MAX_DC_REGION]; > + struct cxl_dpa_perf dc_perf[CXL_MAX_DC_REGION]; > > struct cxl_event_state event; > struct cxl_poison_state poison; > diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h > index 199afc2cd122..387fc821703a 100644 > --- a/include/acpi/actbl1.h > +++ b/include/acpi/actbl1.h > @@ -403,6 +403,8 @@ struct acpi_cdat_dsmas { > /* Flags for subtable above */ > > #define ACPI_CDAT_DSMAS_NON_VOLATILE (1 << 2) > +#define ACPI_CDAT_DSMAS_SHAREABLE (1 << 3) > +#define ACPI_CDAT_DSMAS_READ_ONLY (1 << 6) > > /* Subtable 1: Device scoped Latency and Bandwidth Information Structure (DSLBIS) */ > > > -- > 2.46.0 > -- Fan Ni ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-09 18:16 ` Fan Ni @ 2024-10-14 1:16 ` Ira Weiny 0 siblings, 0 replies; 14+ messages in thread From: Ira Weiny @ 2024-10-14 1:16 UTC (permalink / raw) To: Fan Ni, Ira Weiny Cc: Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel Fan Ni wrote: > On Mon, Oct 07, 2024 at 06:16:18PM -0500, Ira Weiny wrote: > > Additional DCD region (partition) information is contained in the DSMAS > > CDAT tables, including performance, read only, and shareable attributes. > > > > Match DCD partitions with DSMAS tables and store the meta data. > > > > To: Robert Moore <robert.moore@intel.com> > > To: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > To: Len Brown <lenb@kernel.org> > > Cc: linux-acpi@vger.kernel.org > > Cc: acpica-devel@lists.linux.dev > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > One minor comment inline. > > > --- > > Changes: > > [iweiny: new patch] > > [iweiny: Gather shareable/read-only flags for later use] > > --- > > drivers/cxl/core/cdat.c | 38 ++++++++++++++++++++++++++++++++++++++ > > drivers/cxl/core/mbox.c | 2 ++ > > drivers/cxl/cxlmem.h | 3 +++ > > include/acpi/actbl1.h | 2 ++ > > 4 files changed, 45 insertions(+) > > > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > > index bd50bb655741..9b2f717a16e5 100644 > > --- a/drivers/cxl/core/cdat.c > > +++ b/drivers/cxl/core/cdat.c > > @@ -17,6 +17,8 @@ struct dsmas_entry { > > struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX]; > > int entries; > > int qos_class; > > + bool shareable; > > + bool read_only; > > }; > > > > static u32 cdat_normalize(u16 entry, u64 base, u8 type) > > @@ -74,6 +76,8 @@ static int cdat_dsmas_handler(union acpi_subtable_headers *header, void *arg, > > return -ENOMEM; > > > > dent->handle = dsmas->dsmad_handle; > > + dent->shareable = dsmas->flags & ACPI_CDAT_DSMAS_SHAREABLE; > > + dent->read_only = dsmas->flags & ACPI_CDAT_DSMAS_READ_ONLY; > > dent->dpa_range.start = le64_to_cpu((__force __le64)dsmas->dpa_base_address); > > dent->dpa_range.end = le64_to_cpu((__force __le64)dsmas->dpa_base_address) + > > le64_to_cpu((__force __le64)dsmas->dpa_length) - 1; > > @@ -255,6 +259,38 @@ static void update_perf_entry(struct device *dev, struct dsmas_entry *dent, > > dent->coord[ACCESS_COORDINATE_CPU].write_latency); > > } > > > > + > Unwanted blank line. Fixed. Thanks. Ira ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny 2024-10-09 14:42 ` Rafael J. Wysocki 2024-10-09 18:16 ` Fan Ni @ 2024-10-10 12:51 ` Jonathan Cameron 2 siblings, 0 replies; 14+ messages in thread From: Jonathan Cameron @ 2024-10-10 12:51 UTC (permalink / raw) To: Ira Weiny Cc: Dave Jiang, Fan Ni, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Mon, 07 Oct 2024 18:16:18 -0500 Ira Weiny <ira.weiny@intel.com> wrote: > Additional DCD region (partition) information is contained in the DSMAS > CDAT tables, including performance, read only, and shareable attributes. > > Match DCD partitions with DSMAS tables and store the meta data. > > To: Robert Moore <robert.moore@intel.com> > To: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > To: Len Brown <lenb@kernel.org> > Cc: linux-acpi@vger.kernel.org > Cc: acpica-devel@lists.linux.dev > Signed-off-by: Ira Weiny <ira.weiny@intel.com> One trivial comment from me. As Rafael has raised, the ACPICA dependency in here is going to be the blocker :( Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > +static void update_dcd_perf(struct cxl_dev_state *cxlds, > + struct dsmas_entry *dent) > +{ > + struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds); > + struct device *dev = cxlds->dev; > + > + for (int i = 0; i < mds->nr_dc_region; i++) { > + /* CXL defines a u32 handle while cdat defines u8, ignore upper bits */ CDAT > + u8 dc_handle = mds->dc_region[i].dsmad_handle & 0xff; > + > + if (resource_size(&cxlds->dc_res[i])) { > + struct range dc_range = { > + .start = cxlds->dc_res[i].start, > + .end = cxlds->dc_res[i].end, > + }; > + > + if (range_contains(&dent->dpa_range, &dc_range)) { > + if (dent->handle != dc_handle) > + dev_warn(dev, "DC Region/DSMAS mis-matched handle/range; region %pra (%u); dsmas %pra (%u)\n" > + " setting DC region attributes regardless\n", > + &dent->dpa_range, dent->handle, > + &dc_range, dc_handle); > + > + mds->dc_region[i].shareable = dent->shareable; > + mds->dc_region[i].read_only = dent->read_only; > + update_perf_entry(dev, dent, &mds->dc_perf[i]); > + } > + } > + } > +} ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny @ 2024-10-08 22:57 ` Fan Ni 2024-10-08 23:06 ` Fan Ni 2024-10-21 16:47 ` Fan Ni 2 siblings, 1 reply; 14+ messages in thread From: Fan Ni @ 2024-10-08 22:57 UTC (permalink / raw) To: Ira Weiny Cc: Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > A git tree of this series can be found here: > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > Series info > =========== > Hi Ira, Based on current DC extent release logic, when the extent to release is in use (for example, created a dax device), no response (4803h) will be sent. Should we send a response with empty extent list instead? Fan > This series has 5 parts: > > Patch 1-3: Add %pra printk format for struct range > Patch 4: Add core range_overlaps() function > Patch 5-6: CXL clean up/prelim patches > Patch 7-26: Core DCD support > Patch 27-28: cxl_test support > > Background > ========== > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > device that allows memory capacity within a region to change > dynamically without the need for resetting the device, reconfiguring > HDM decoders, or reconfiguring software DAX regions. > > One of the biggest use cases for Dynamic Capacity is to allow hosts to > share memory dynamically within a data center without increasing the > per-host attached memory. > > The general flow for the addition or removal of memory is to have an > orchestrator coordinate the use of the memory. Generally there are 5 > actors in such a system, the Orchestrator, Fabric Manager, the Logical > device, the Host Kernel, and a Host User. > > Typical work flows are shown below. > > Orchestrator FM Device Host Kernel Host User > > | | | | | > |-------------- Create region ----------------------->| > | | | | | > | | | |<-- Create ---| > | | | | Region | > |<------------- Signal done --------------------------| > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create --->| > | | | | DAX dev |-- Use memory > | | | | | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > | | | | | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > |-- Remove -->|- Release->|- Release ->| | | > | Capacity | Extent | Extent | | | > | | | | | | > | | | (Release Ignored) | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > | |- Release->|- Release ->| | > | | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | |<- Destroy ---| > | | | | Region | > | | | | | > > Implementation > ============== > > The series still requires the creation of regions and DAX devices to be > closely synchronized with the Orchestrator and Fabric Manager. The host > kernel will reject extents if a region is not yet created. It also > ignores extent release if memory is in use (DAX device created). These > synchronizations are not anticipated to be an issue with real > applications. > > In order to allow for capacity to be added and removed a new concept of > a sparse DAX region is introduced. A sparse DAX region may have 0 or > more bytes of available space. The total space depends on the number > and size of the extents which have been added. > > Initially it is anticipated that users of the memory will carefully > coordinate the surfacing of additional capacity with the creation of DAX > devices which use that capacity. Therefore, the allocation of the > memory to DAX devices does not allow for specific associations between > DAX device and extent. This keeps allocations very similar to existing > DAX region behavior. > > To keep the DAX memory allocation aligned with the existing DAX devices > which do not have tags extents are not allowed to have tags. Future > support for tags is planned. > > Great care was taken to keep the extent tracking simple. Some xarray's > needed to be added but extra software objects were kept to a minimum. > > Region extents continue to be tracked as sub-devices of the DAX region. > This ensures that region destruction cleans up all extent allocations > properly. > > Some review tags were kept if a patch did not change. > > The major functionality of this series includes: > > - Getting the dynamic capacity (DC) configuration information from cxl > devices > > - Configuring the DC partitions reported by hardware > > - Enhancing the CXL and DAX regions for dynamic capacity support > a. Maintain a logical separation between hardware extents and > software managed region extents. This provides an > abstraction between the layers and should allow for > interleaving in the future > > - Get hardware extent lists for endpoint decoders upon > region creation. > > - Adjust extent/region memory available on the following events. > a. Add capacity Events > b. Release capacity events > > - Host response for add capacity > a. do not accept the extent if: > If the region does not exist > or an error occurs realizing the extent > b. If the region does exist > realize a DAX region extent with 1:1 mapping (no > interleave yet) > c. Support the event more bit by processing a list of extents > marked with the more bit together before setting up a > response. > > - Host response for remove capacity > a. If no DAX device references the extent; release the extent > b. If a reference does exist, ignore the request. > (Require FM to issue release again.) > > - Modify DAX device creation/resize to account for extents within a > sparse DAX region > > - Trace Dynamic Capacity events for debugging > > - Add cxl-test infrastructure to allow for faster unit testing > (See new ndctl branch for cxl-dcd.sh test[1]) > > - Only support 0 value extent tags > > Fan Ni's upstream of Qemu DCD was used for testing. > > Remaining work: > > 1) Allow mapping to specific extents (perhaps based on > label/tag) > 1a) devise region size reporting based on tags > 2) Interleave support > > Possible additional work depending on requirements: > > 1) Accept a new extent which extends (but overlaps) an existing > extent(s) > 2) Release extents when DAX devices are released if a release > was previously seen from the device > 3) Rework DAX device interfaces, memfd has been explored a bit > > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01 > > --- > Major changes in v4: > - iweiny: rebase to 6.12-rc > - iweiny: Add qos data to regions > - Jonathan: Fix up shared region detection > - Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags > - iweiny: Change DCD partition sysfs entries to allow for qos class and > additional parameters per partition > - Petr/Andy: s/%par/%pra/ > - Andy: Share logic between printing struct resource and struct range > - Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@intel.com > > --- > Ira Weiny (14): > test printk: Add very basic struct resource tests > printk: Add print format (%pra) for struct range > cxl/cdat: Use %pra for dpa range outputs > range: Add range_overlaps() > dax: Document dax dev range tuple > cxl/pci: Delay event buffer allocation > cxl/cdat: Gather DSMAS data for DCD regions > cxl/region: Refactor common create region code > cxl/events: Split event msgnum configuration from irq setup > cxl/pci: Factor out interrupt policy check > cxl/core: Return endpoint decoder information from region search > dax/bus: Factor out dev dax resize logic > tools/testing/cxl: Make event logs dynamic > tools/testing/cxl: Add DC Regions to mock mem data > > Navneet Singh (14): > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > cxl/mem: Read dynamic capacity configuration from the device > cxl/core: Separate region mode from decoder mode > cxl/region: Add dynamic capacity decoder and region modes > cxl/hdm: Add dynamic capacity size support to endpoint decoders > cxl/mem: Expose DCD partition capabilities in sysfs > cxl/port: Add endpoint decoder DC mode support to sysfs > cxl/region: Add sparse DAX region support > cxl/mem: Configure dynamic capacity interrupts > cxl/extent: Process DCD events and realize region extents > cxl/region/extent: Expose region extent information in sysfs > dax/region: Create resources on sparse DAX regions > cxl/region: Read existing extents on region creation > cxl/mem: Trace Dynamic capacity Event Record > > Documentation/ABI/testing/sysfs-bus-cxl | 120 +++- > Documentation/core-api/printk-formats.rst | 13 + > drivers/cxl/core/Makefile | 2 +- > drivers/cxl/core/cdat.c | 52 +- > drivers/cxl/core/core.h | 33 +- > drivers/cxl/core/extent.c | 486 +++++++++++++++ > drivers/cxl/core/hdm.c | 213 ++++++- > drivers/cxl/core/mbox.c | 605 ++++++++++++++++++- > drivers/cxl/core/memdev.c | 130 +++- > drivers/cxl/core/port.c | 13 +- > drivers/cxl/core/region.c | 170 ++++-- > drivers/cxl/core/trace.h | 65 ++ > drivers/cxl/cxl.h | 122 +++- > drivers/cxl/cxlmem.h | 131 +++- > drivers/cxl/pci.c | 123 +++- > drivers/dax/bus.c | 352 +++++++++-- > drivers/dax/bus.h | 4 +- > drivers/dax/cxl.c | 72 ++- > drivers/dax/dax-private.h | 47 +- > drivers/dax/hmem/hmem.c | 2 +- > drivers/dax/pmem.c | 2 +- > fs/btrfs/ordered-data.c | 10 +- > include/acpi/actbl1.h | 2 + > include/cxl/event.h | 32 + > include/linux/range.h | 7 + > lib/test_printf.c | 70 +++ > lib/vsprintf.c | 55 +- > tools/testing/cxl/Kbuild | 3 +- > tools/testing/cxl/test/mem.c | 960 ++++++++++++++++++++++++++---- > 29 files changed, 3576 insertions(+), 320 deletions(-) > --- > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > Best regards, > -- > Ira Weiny <ira.weiny@intel.com> > -- Fan Ni ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni @ 2024-10-08 23:06 ` Fan Ni 2024-10-10 15:30 ` Ira Weiny 2024-10-10 15:31 ` Ira Weiny 0 siblings, 2 replies; 14+ messages in thread From: Fan Ni @ 2024-10-08 23:06 UTC (permalink / raw) To: ira.weiny Cc: Ira Weiny, Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Tue, Oct 08, 2024 at 03:57:13PM -0700, Fan Ni wrote: > On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > > A git tree of this series can be found here: > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > > > Series info > > =========== > > > > Hi Ira, > > Based on current DC extent release logic, when the extent to release is > in use (for example, created a dax device), no response (4803h) will be sent. > Should we send a response with empty extent list instead? > > Fan Oh. my bad. 4803h does not allow an empty extent list. Fan > > > > This series has 5 parts: > > > > Patch 1-3: Add %pra printk format for struct range > > Patch 4: Add core range_overlaps() function > > Patch 5-6: CXL clean up/prelim patches > > Patch 7-26: Core DCD support > > Patch 27-28: cxl_test support > > > > Background > > ========== > > > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > > device that allows memory capacity within a region to change > > dynamically without the need for resetting the device, reconfiguring > > HDM decoders, or reconfiguring software DAX regions. > > > > One of the biggest use cases for Dynamic Capacity is to allow hosts to > > share memory dynamically within a data center without increasing the > > per-host attached memory. > > > > The general flow for the addition or removal of memory is to have an > > orchestrator coordinate the use of the memory. Generally there are 5 > > actors in such a system, the Orchestrator, Fabric Manager, the Logical > > device, the Host Kernel, and a Host User. > > > > Typical work flows are shown below. > > > > Orchestrator FM Device Host Kernel Host User > > > > | | | | | > > |-------------- Create region ----------------------->| > > | | | | | > > | | | |<-- Create ---| > > | | | | Region | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Accept -|<- Accept -| | > > | | Extent | Extent | | > > | | | |<- Create --->| > > | | | | DAX dev |-- Use memory > > | | | | | | > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > | | | | | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Remove -->|- Release->|- Release ->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Accept -|<- Accept -| | > > | | Extent | Extent | | > > | | | |<- Create ----| > > | | | | DAX dev |-- Use memory > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Remove -->|- Release->|- Release ->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | |<- Create ----| > > | | | | DAX dev |-- Use memory > > | | | | | | > > |-- Remove -->|- Release->|- Release ->| | | > > | Capacity | Extent | Extent | | | > > | | | | | | > > | | | (Release Ignored) | | > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > |<------------- Signal done --------------------------| > > | | | | | > > | |- Release->|- Release ->| | > > | | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | |<- Destroy ---| > > | | | | Region | > > | | | | | > > > > Implementation > > ============== > > > > The series still requires the creation of regions and DAX devices to be > > closely synchronized with the Orchestrator and Fabric Manager. The host > > kernel will reject extents if a region is not yet created. It also > > ignores extent release if memory is in use (DAX device created). These > > synchronizations are not anticipated to be an issue with real > > applications. > > > > In order to allow for capacity to be added and removed a new concept of > > a sparse DAX region is introduced. A sparse DAX region may have 0 or > > more bytes of available space. The total space depends on the number > > and size of the extents which have been added. > > > > Initially it is anticipated that users of the memory will carefully > > coordinate the surfacing of additional capacity with the creation of DAX > > devices which use that capacity. Therefore, the allocation of the > > memory to DAX devices does not allow for specific associations between > > DAX device and extent. This keeps allocations very similar to existing > > DAX region behavior. > > > > To keep the DAX memory allocation aligned with the existing DAX devices > > which do not have tags extents are not allowed to have tags. Future > > support for tags is planned. > > > > Great care was taken to keep the extent tracking simple. Some xarray's > > needed to be added but extra software objects were kept to a minimum. > > > > Region extents continue to be tracked as sub-devices of the DAX region. > > This ensures that region destruction cleans up all extent allocations > > properly. > > > > Some review tags were kept if a patch did not change. > > > > The major functionality of this series includes: > > > > - Getting the dynamic capacity (DC) configuration information from cxl > > devices > > > > - Configuring the DC partitions reported by hardware > > > > - Enhancing the CXL and DAX regions for dynamic capacity support > > a. Maintain a logical separation between hardware extents and > > software managed region extents. This provides an > > abstraction between the layers and should allow for > > interleaving in the future > > > > - Get hardware extent lists for endpoint decoders upon > > region creation. > > > > - Adjust extent/region memory available on the following events. > > a. Add capacity Events > > b. Release capacity events > > > > - Host response for add capacity > > a. do not accept the extent if: > > If the region does not exist > > or an error occurs realizing the extent > > b. If the region does exist > > realize a DAX region extent with 1:1 mapping (no > > interleave yet) > > c. Support the event more bit by processing a list of extents > > marked with the more bit together before setting up a > > response. > > > > - Host response for remove capacity > > a. If no DAX device references the extent; release the extent > > b. If a reference does exist, ignore the request. > > (Require FM to issue release again.) > > > > - Modify DAX device creation/resize to account for extents within a > > sparse DAX region > > > > - Trace Dynamic Capacity events for debugging > > > > - Add cxl-test infrastructure to allow for faster unit testing > > (See new ndctl branch for cxl-dcd.sh test[1]) > > > > - Only support 0 value extent tags > > > > Fan Ni's upstream of Qemu DCD was used for testing. > > > > Remaining work: > > > > 1) Allow mapping to specific extents (perhaps based on > > label/tag) > > 1a) devise region size reporting based on tags > > 2) Interleave support > > > > Possible additional work depending on requirements: > > > > 1) Accept a new extent which extends (but overlaps) an existing > > extent(s) > > 2) Release extents when DAX devices are released if a release > > was previously seen from the device > > 3) Rework DAX device interfaces, memfd has been explored a bit > > > > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01 > > > > --- > > Major changes in v4: > > - iweiny: rebase to 6.12-rc > > - iweiny: Add qos data to regions > > - Jonathan: Fix up shared region detection > > - Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags > > - iweiny: Change DCD partition sysfs entries to allow for qos class and > > additional parameters per partition > > - Petr/Andy: s/%par/%pra/ > > - Andy: Share logic between printing struct resource and struct range > > - Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@intel.com > > > > --- > > Ira Weiny (14): > > test printk: Add very basic struct resource tests > > printk: Add print format (%pra) for struct range > > cxl/cdat: Use %pra for dpa range outputs > > range: Add range_overlaps() > > dax: Document dax dev range tuple > > cxl/pci: Delay event buffer allocation > > cxl/cdat: Gather DSMAS data for DCD regions > > cxl/region: Refactor common create region code > > cxl/events: Split event msgnum configuration from irq setup > > cxl/pci: Factor out interrupt policy check > > cxl/core: Return endpoint decoder information from region search > > dax/bus: Factor out dev dax resize logic > > tools/testing/cxl: Make event logs dynamic > > tools/testing/cxl: Add DC Regions to mock mem data > > > > Navneet Singh (14): > > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > > cxl/mem: Read dynamic capacity configuration from the device > > cxl/core: Separate region mode from decoder mode > > cxl/region: Add dynamic capacity decoder and region modes > > cxl/hdm: Add dynamic capacity size support to endpoint decoders > > cxl/mem: Expose DCD partition capabilities in sysfs > > cxl/port: Add endpoint decoder DC mode support to sysfs > > cxl/region: Add sparse DAX region support > > cxl/mem: Configure dynamic capacity interrupts > > cxl/extent: Process DCD events and realize region extents > > cxl/region/extent: Expose region extent information in sysfs > > dax/region: Create resources on sparse DAX regions > > cxl/region: Read existing extents on region creation > > cxl/mem: Trace Dynamic capacity Event Record > > > > Documentation/ABI/testing/sysfs-bus-cxl | 120 +++- > > Documentation/core-api/printk-formats.rst | 13 + > > drivers/cxl/core/Makefile | 2 +- > > drivers/cxl/core/cdat.c | 52 +- > > drivers/cxl/core/core.h | 33 +- > > drivers/cxl/core/extent.c | 486 +++++++++++++++ > > drivers/cxl/core/hdm.c | 213 ++++++- > > drivers/cxl/core/mbox.c | 605 ++++++++++++++++++- > > drivers/cxl/core/memdev.c | 130 +++- > > drivers/cxl/core/port.c | 13 +- > > drivers/cxl/core/region.c | 170 ++++-- > > drivers/cxl/core/trace.h | 65 ++ > > drivers/cxl/cxl.h | 122 +++- > > drivers/cxl/cxlmem.h | 131 +++- > > drivers/cxl/pci.c | 123 +++- > > drivers/dax/bus.c | 352 +++++++++-- > > drivers/dax/bus.h | 4 +- > > drivers/dax/cxl.c | 72 ++- > > drivers/dax/dax-private.h | 47 +- > > drivers/dax/hmem/hmem.c | 2 +- > > drivers/dax/pmem.c | 2 +- > > fs/btrfs/ordered-data.c | 10 +- > > include/acpi/actbl1.h | 2 + > > include/cxl/event.h | 32 + > > include/linux/range.h | 7 + > > lib/test_printf.c | 70 +++ > > lib/vsprintf.c | 55 +- > > tools/testing/cxl/Kbuild | 3 +- > > tools/testing/cxl/test/mem.c | 960 ++++++++++++++++++++++++++---- > > 29 files changed, 3576 insertions(+), 320 deletions(-) > > --- > > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc > > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > > > Best regards, > > -- > > Ira Weiny <ira.weiny@intel.com> > > > > -- > Fan Ni -- Fan Ni ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-08 23:06 ` Fan Ni @ 2024-10-10 15:30 ` Ira Weiny 2024-10-10 15:31 ` Ira Weiny 1 sibling, 0 replies; 14+ messages in thread From: Ira Weiny @ 2024-10-10 15:30 UTC (permalink / raw) To: Fan Ni, ira.weiny Cc: Ira Weiny, Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel Fan Ni wrote: > On Tue, Oct 08, 2024 at 03:57:13PM -0700, Fan Ni wrote: > > On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > > > A git tree of this series can be found here: > > > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > > > > > Series info > > > =========== > > > > > > > Hi Ira, > > > > Based on current DC extent release logic, when the extent to release is > > in use (for example, created a dax device), no response (4803h) will be sent. > > Should we send a response with empty extent list instead? > > > > Fan > > Oh. my bad. 4803h does not allow an empty extent list. Yep. It is perfectly reasonable and I think intended that releases are ignored when in use. Thanks for reviewing though. As Ming has pointed out I've got some issues still to clean up. Thanks, Ira ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-08 23:06 ` Fan Ni 2024-10-10 15:30 ` Ira Weiny @ 2024-10-10 15:31 ` Ira Weiny 1 sibling, 0 replies; 14+ messages in thread From: Ira Weiny @ 2024-10-10 15:31 UTC (permalink / raw) To: Fan Ni, ira.weiny Cc: Ira Weiny, Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel Fan Ni wrote: > On Tue, Oct 08, 2024 at 03:57:13PM -0700, Fan Ni wrote: > > On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > > > A git tree of this series can be found here: > > > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > > > > > Series info > > > =========== > > > > > > > Hi Ira, > > > > Based on current DC extent release logic, when the extent to release is > > in use (for example, created a dax device), no response (4803h) will be sent. > > Should we send a response with empty extent list instead? > > > > Fan > > Oh. my bad. 4803h does not allow an empty extent list. Yep. It is perfectly reasonable and intended that releases are ignored when in use. Thanks for reviewing though. As Ming has pointed out I've got some issues still to clean up. Thanks, Ira ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny 2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni @ 2024-10-21 16:47 ` Fan Ni 2024-10-22 17:05 ` Jonathan Cameron 2 siblings, 1 reply; 14+ messages in thread From: Fan Ni @ 2024-10-21 16:47 UTC (permalink / raw) To: Ira Weiny Cc: Dave Jiang, Jonathan Cameron, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > A git tree of this series can be found here: > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > Series info > =========== > Hi Ira, I have a question here for DCD. For CXL spec 3.0 and later, the output payload of the command "Identify memory device" has been expanded to include one extra field (dynamic capacity event log size) in Table 8-94. However, in current kernel code, we follow cxl spec 2.0 and do not have the field. If DCD is supported, it means we have a least a 3.0 device as DCD is a 3.0 feature. I think we should at lease expand the payload to align with 3.0 even we do not use it yet. What do you think? Btw, we have that already in QEMU, I do not know why it does not trigger a out-of-bound access issue in the test. Fan > This series has 5 parts: > > Patch 1-3: Add %pra printk format for struct range > Patch 4: Add core range_overlaps() function > Patch 5-6: CXL clean up/prelim patches > Patch 7-26: Core DCD support > Patch 27-28: cxl_test support > > Background > ========== > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > device that allows memory capacity within a region to change > dynamically without the need for resetting the device, reconfiguring > HDM decoders, or reconfiguring software DAX regions. > > One of the biggest use cases for Dynamic Capacity is to allow hosts to > share memory dynamically within a data center without increasing the > per-host attached memory. > > The general flow for the addition or removal of memory is to have an > orchestrator coordinate the use of the memory. Generally there are 5 > actors in such a system, the Orchestrator, Fabric Manager, the Logical > device, the Host Kernel, and a Host User. > > Typical work flows are shown below. > > Orchestrator FM Device Host Kernel Host User > > | | | | | > |-------------- Create region ----------------------->| > | | | | | > | | | |<-- Create ---| > | | | | Region | > |<------------- Signal done --------------------------| > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create --->| > | | | | DAX dev |-- Use memory > | | | | | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > | | | | | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Accept -|<- Accept -| | > | | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > |-- Remove -->|- Release->|- Release ->| | > | Capacity | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | | | > |-- Add ----->|-- Add --->|--- Add --->| | > | Capacity | Extent | Extent | | > | | | |<- Create ----| > | | | | DAX dev |-- Use memory > | | | | | | > |-- Remove -->|- Release->|- Release ->| | | > | Capacity | Extent | Extent | | | > | | | | | | > | | | (Release Ignored) | | > | | | | | | > | | | |<- Release ---| <-+ > | | | | DAX dev | > |<------------- Signal done --------------------------| > | | | | | > | |- Release->|- Release ->| | > | | Extent | Extent | | > | | | | | > | |<- Release-|<- Release -| | > | | Extent | Extent | | > | | | |<- Destroy ---| > | | | | Region | > | | | | | > > Implementation > ============== > > The series still requires the creation of regions and DAX devices to be > closely synchronized with the Orchestrator and Fabric Manager. The host > kernel will reject extents if a region is not yet created. It also > ignores extent release if memory is in use (DAX device created). These > synchronizations are not anticipated to be an issue with real > applications. > > In order to allow for capacity to be added and removed a new concept of > a sparse DAX region is introduced. A sparse DAX region may have 0 or > more bytes of available space. The total space depends on the number > and size of the extents which have been added. > > Initially it is anticipated that users of the memory will carefully > coordinate the surfacing of additional capacity with the creation of DAX > devices which use that capacity. Therefore, the allocation of the > memory to DAX devices does not allow for specific associations between > DAX device and extent. This keeps allocations very similar to existing > DAX region behavior. > > To keep the DAX memory allocation aligned with the existing DAX devices > which do not have tags extents are not allowed to have tags. Future > support for tags is planned. > > Great care was taken to keep the extent tracking simple. Some xarray's > needed to be added but extra software objects were kept to a minimum. > > Region extents continue to be tracked as sub-devices of the DAX region. > This ensures that region destruction cleans up all extent allocations > properly. > > Some review tags were kept if a patch did not change. > > The major functionality of this series includes: > > - Getting the dynamic capacity (DC) configuration information from cxl > devices > > - Configuring the DC partitions reported by hardware > > - Enhancing the CXL and DAX regions for dynamic capacity support > a. Maintain a logical separation between hardware extents and > software managed region extents. This provides an > abstraction between the layers and should allow for > interleaving in the future > > - Get hardware extent lists for endpoint decoders upon > region creation. > > - Adjust extent/region memory available on the following events. > a. Add capacity Events > b. Release capacity events > > - Host response for add capacity > a. do not accept the extent if: > If the region does not exist > or an error occurs realizing the extent > b. If the region does exist > realize a DAX region extent with 1:1 mapping (no > interleave yet) > c. Support the event more bit by processing a list of extents > marked with the more bit together before setting up a > response. > > - Host response for remove capacity > a. If no DAX device references the extent; release the extent > b. If a reference does exist, ignore the request. > (Require FM to issue release again.) > > - Modify DAX device creation/resize to account for extents within a > sparse DAX region > > - Trace Dynamic Capacity events for debugging > > - Add cxl-test infrastructure to allow for faster unit testing > (See new ndctl branch for cxl-dcd.sh test[1]) > > - Only support 0 value extent tags > > Fan Ni's upstream of Qemu DCD was used for testing. > > Remaining work: > > 1) Allow mapping to specific extents (perhaps based on > label/tag) > 1a) devise region size reporting based on tags > 2) Interleave support > > Possible additional work depending on requirements: > > 1) Accept a new extent which extends (but overlaps) an existing > extent(s) > 2) Release extents when DAX devices are released if a release > was previously seen from the device > 3) Rework DAX device interfaces, memfd has been explored a bit > > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01 > > --- > Major changes in v4: > - iweiny: rebase to 6.12-rc > - iweiny: Add qos data to regions > - Jonathan: Fix up shared region detection > - Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags > - iweiny: Change DCD partition sysfs entries to allow for qos class and > additional parameters per partition > - Petr/Andy: s/%par/%pra/ > - Andy: Share logic between printing struct resource and struct range > - Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@intel.com > > --- > Ira Weiny (14): > test printk: Add very basic struct resource tests > printk: Add print format (%pra) for struct range > cxl/cdat: Use %pra for dpa range outputs > range: Add range_overlaps() > dax: Document dax dev range tuple > cxl/pci: Delay event buffer allocation > cxl/cdat: Gather DSMAS data for DCD regions > cxl/region: Refactor common create region code > cxl/events: Split event msgnum configuration from irq setup > cxl/pci: Factor out interrupt policy check > cxl/core: Return endpoint decoder information from region search > dax/bus: Factor out dev dax resize logic > tools/testing/cxl: Make event logs dynamic > tools/testing/cxl: Add DC Regions to mock mem data > > Navneet Singh (14): > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > cxl/mem: Read dynamic capacity configuration from the device > cxl/core: Separate region mode from decoder mode > cxl/region: Add dynamic capacity decoder and region modes > cxl/hdm: Add dynamic capacity size support to endpoint decoders > cxl/mem: Expose DCD partition capabilities in sysfs > cxl/port: Add endpoint decoder DC mode support to sysfs > cxl/region: Add sparse DAX region support > cxl/mem: Configure dynamic capacity interrupts > cxl/extent: Process DCD events and realize region extents > cxl/region/extent: Expose region extent information in sysfs > dax/region: Create resources on sparse DAX regions > cxl/region: Read existing extents on region creation > cxl/mem: Trace Dynamic capacity Event Record > > Documentation/ABI/testing/sysfs-bus-cxl | 120 +++- > Documentation/core-api/printk-formats.rst | 13 + > drivers/cxl/core/Makefile | 2 +- > drivers/cxl/core/cdat.c | 52 +- > drivers/cxl/core/core.h | 33 +- > drivers/cxl/core/extent.c | 486 +++++++++++++++ > drivers/cxl/core/hdm.c | 213 ++++++- > drivers/cxl/core/mbox.c | 605 ++++++++++++++++++- > drivers/cxl/core/memdev.c | 130 +++- > drivers/cxl/core/port.c | 13 +- > drivers/cxl/core/region.c | 170 ++++-- > drivers/cxl/core/trace.h | 65 ++ > drivers/cxl/cxl.h | 122 +++- > drivers/cxl/cxlmem.h | 131 +++- > drivers/cxl/pci.c | 123 +++- > drivers/dax/bus.c | 352 +++++++++-- > drivers/dax/bus.h | 4 +- > drivers/dax/cxl.c | 72 ++- > drivers/dax/dax-private.h | 47 +- > drivers/dax/hmem/hmem.c | 2 +- > drivers/dax/pmem.c | 2 +- > fs/btrfs/ordered-data.c | 10 +- > include/acpi/actbl1.h | 2 + > include/cxl/event.h | 32 + > include/linux/range.h | 7 + > lib/test_printf.c | 70 +++ > lib/vsprintf.c | 55 +- > tools/testing/cxl/Kbuild | 3 +- > tools/testing/cxl/test/mem.c | 960 ++++++++++++++++++++++++++---- > 29 files changed, 3576 insertions(+), 320 deletions(-) > --- > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > Best regards, > -- > Ira Weiny <ira.weiny@intel.com> > -- Fan Ni ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) 2024-10-21 16:47 ` Fan Ni @ 2024-10-22 17:05 ` Jonathan Cameron 0 siblings, 0 replies; 14+ messages in thread From: Jonathan Cameron @ 2024-10-22 17:05 UTC (permalink / raw) To: Fan Ni Cc: Ira Weiny, Dave Jiang, Navneet Singh, Jonathan Corbet, Andrew Morton, Dan Williams, Davidlohr Bueso, Alison Schofield, Vishal Verma, linux-btrfs, linux-cxl, linux-doc, nvdimm, linux-kernel, Petr Mladek, Steven Rostedt, Andy Shevchenko, Rasmus Villemoes, Sergey Senozhatsky, Chris Mason, Josef Bacik, David Sterba, Johannes Thumshirn, Li, Ming, Robert Moore, Rafael J. Wysocki, Len Brown, linux-acpi, acpica-devel On Mon, 21 Oct 2024 09:47:49 -0700 Fan Ni <nifan.cxl@gmail.com> wrote: > On Mon, Oct 07, 2024 at 06:16:06PM -0500, Ira Weiny wrote: > > A git tree of this series can be found here: > > > > https://github.com/weiny2/linux-kernel/tree/dcd-v4-2024-10-04 > > > > Series info > > =========== > > > > Hi Ira, > I have a question here for DCD. > > For CXL spec 3.0 and later, the output payload of the command "Identify > memory device" has been expanded to include one extra field (dynamic > capacity event log size) in Table 8-94. However, in current kernel code, > we follow cxl spec 2.0 and do not have the field. > If DCD is supported, it means we have a least a 3.0 device as DCD is a > 3.0 feature. > I think we should at lease expand the payload to align with 3.0 even we > do not use it yet. > > What do you think? > > Btw, we have that already in QEMU, I do not know why it does not trigger > a out-of-bound access issue in the test. Ignoring new fields should be fine without needing to care about the payload size. This stuff is supposed to be backwards compatible so if it isn't fine we have a problem. Newer device should always 'work' with older kernel. In this case QEMU is a newer device (for this command anyway). Jonathan > > Fan > > > This series has 5 parts: > > > > Patch 1-3: Add %pra printk format for struct range > > Patch 4: Add core range_overlaps() function > > Patch 5-6: CXL clean up/prelim patches > > Patch 7-26: Core DCD support > > Patch 27-28: cxl_test support > > > > Background > > ========== > > > > A Dynamic Capacity Device (DCD) (CXL 3.1 sec 9.13.3) is a CXL memory > > device that allows memory capacity within a region to change > > dynamically without the need for resetting the device, reconfiguring > > HDM decoders, or reconfiguring software DAX regions. > > > > One of the biggest use cases for Dynamic Capacity is to allow hosts to > > share memory dynamically within a data center without increasing the > > per-host attached memory. > > > > The general flow for the addition or removal of memory is to have an > > orchestrator coordinate the use of the memory. Generally there are 5 > > actors in such a system, the Orchestrator, Fabric Manager, the Logical > > device, the Host Kernel, and a Host User. > > > > Typical work flows are shown below. > > > > Orchestrator FM Device Host Kernel Host User > > > > | | | | | > > |-------------- Create region ----------------------->| > > | | | | | > > | | | |<-- Create ---| > > | | | | Region | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Accept -|<- Accept -| | > > | | Extent | Extent | | > > | | | |<- Create --->| > > | | | | DAX dev |-- Use memory > > | | | | | | > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > | | | | | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Remove -->|- Release->|- Release ->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Accept -|<- Accept -| | > > | | Extent | Extent | | > > | | | |<- Create ----| > > | | | | DAX dev |-- Use memory > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > |<------------- Signal done --------------------------| > > | | | | | > > |-- Remove -->|- Release->|- Release ->| | > > | Capacity | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | | | > > |-- Add ----->|-- Add --->|--- Add --->| | > > | Capacity | Extent | Extent | | > > | | | |<- Create ----| > > | | | | DAX dev |-- Use memory > > | | | | | | > > |-- Remove -->|- Release->|- Release ->| | | > > | Capacity | Extent | Extent | | | > > | | | | | | > > | | | (Release Ignored) | | > > | | | | | | > > | | | |<- Release ---| <-+ > > | | | | DAX dev | > > |<------------- Signal done --------------------------| > > | | | | | > > | |- Release->|- Release ->| | > > | | Extent | Extent | | > > | | | | | > > | |<- Release-|<- Release -| | > > | | Extent | Extent | | > > | | | |<- Destroy ---| > > | | | | Region | > > | | | | | > > > > Implementation > > ============== > > > > The series still requires the creation of regions and DAX devices to be > > closely synchronized with the Orchestrator and Fabric Manager. The host > > kernel will reject extents if a region is not yet created. It also > > ignores extent release if memory is in use (DAX device created). These > > synchronizations are not anticipated to be an issue with real > > applications. > > > > In order to allow for capacity to be added and removed a new concept of > > a sparse DAX region is introduced. A sparse DAX region may have 0 or > > more bytes of available space. The total space depends on the number > > and size of the extents which have been added. > > > > Initially it is anticipated that users of the memory will carefully > > coordinate the surfacing of additional capacity with the creation of DAX > > devices which use that capacity. Therefore, the allocation of the > > memory to DAX devices does not allow for specific associations between > > DAX device and extent. This keeps allocations very similar to existing > > DAX region behavior. > > > > To keep the DAX memory allocation aligned with the existing DAX devices > > which do not have tags extents are not allowed to have tags. Future > > support for tags is planned. > > > > Great care was taken to keep the extent tracking simple. Some xarray's > > needed to be added but extra software objects were kept to a minimum. > > > > Region extents continue to be tracked as sub-devices of the DAX region. > > This ensures that region destruction cleans up all extent allocations > > properly. > > > > Some review tags were kept if a patch did not change. > > > > The major functionality of this series includes: > > > > - Getting the dynamic capacity (DC) configuration information from cxl > > devices > > > > - Configuring the DC partitions reported by hardware > > > > - Enhancing the CXL and DAX regions for dynamic capacity support > > a. Maintain a logical separation between hardware extents and > > software managed region extents. This provides an > > abstraction between the layers and should allow for > > interleaving in the future > > > > - Get hardware extent lists for endpoint decoders upon > > region creation. > > > > - Adjust extent/region memory available on the following events. > > a. Add capacity Events > > b. Release capacity events > > > > - Host response for add capacity > > a. do not accept the extent if: > > If the region does not exist > > or an error occurs realizing the extent > > b. If the region does exist > > realize a DAX region extent with 1:1 mapping (no > > interleave yet) > > c. Support the event more bit by processing a list of extents > > marked with the more bit together before setting up a > > response. > > > > - Host response for remove capacity > > a. If no DAX device references the extent; release the extent > > b. If a reference does exist, ignore the request. > > (Require FM to issue release again.) > > > > - Modify DAX device creation/resize to account for extents within a > > sparse DAX region > > > > - Trace Dynamic Capacity events for debugging > > > > - Add cxl-test infrastructure to allow for faster unit testing > > (See new ndctl branch for cxl-dcd.sh test[1]) > > > > - Only support 0 value extent tags > > > > Fan Ni's upstream of Qemu DCD was used for testing. > > > > Remaining work: > > > > 1) Allow mapping to specific extents (perhaps based on > > label/tag) > > 1a) devise region size reporting based on tags > > 2) Interleave support > > > > Possible additional work depending on requirements: > > > > 1) Accept a new extent which extends (but overlaps) an existing > > extent(s) > > 2) Release extents when DAX devices are released if a release > > was previously seen from the device > > 3) Rework DAX device interfaces, memfd has been explored a bit > > > > [1] https://github.com/weiny2/ndctl/tree/dcd-region2-2024-10-01 > > > > --- > > Major changes in v4: > > - iweiny: rebase to 6.12-rc > > - iweiny: Add qos data to regions > > - Jonathan: Fix up shared region detection > > - Jonathan/jgroves/djbw/iweiny: Ignore 0 value tags > > - iweiny: Change DCD partition sysfs entries to allow for qos class and > > additional parameters per partition > > - Petr/Andy: s/%par/%pra/ > > - Andy: Share logic between printing struct resource and struct range > > - Link to v3: https://patch.msgid.link/20240816-dcd-type2-upstream-v3-0-7c9b96cba6d7@intel.com > > > > --- > > Ira Weiny (14): > > test printk: Add very basic struct resource tests > > printk: Add print format (%pra) for struct range > > cxl/cdat: Use %pra for dpa range outputs > > range: Add range_overlaps() > > dax: Document dax dev range tuple > > cxl/pci: Delay event buffer allocation > > cxl/cdat: Gather DSMAS data for DCD regions > > cxl/region: Refactor common create region code > > cxl/events: Split event msgnum configuration from irq setup > > cxl/pci: Factor out interrupt policy check > > cxl/core: Return endpoint decoder information from region search > > dax/bus: Factor out dev dax resize logic > > tools/testing/cxl: Make event logs dynamic > > tools/testing/cxl: Add DC Regions to mock mem data > > > > Navneet Singh (14): > > cxl/mbox: Flag support for Dynamic Capacity Devices (DCD) > > cxl/mem: Read dynamic capacity configuration from the device > > cxl/core: Separate region mode from decoder mode > > cxl/region: Add dynamic capacity decoder and region modes > > cxl/hdm: Add dynamic capacity size support to endpoint decoders > > cxl/mem: Expose DCD partition capabilities in sysfs > > cxl/port: Add endpoint decoder DC mode support to sysfs > > cxl/region: Add sparse DAX region support > > cxl/mem: Configure dynamic capacity interrupts > > cxl/extent: Process DCD events and realize region extents > > cxl/region/extent: Expose region extent information in sysfs > > dax/region: Create resources on sparse DAX regions > > cxl/region: Read existing extents on region creation > > cxl/mem: Trace Dynamic capacity Event Record > > > > Documentation/ABI/testing/sysfs-bus-cxl | 120 +++- > > Documentation/core-api/printk-formats.rst | 13 + > > drivers/cxl/core/Makefile | 2 +- > > drivers/cxl/core/cdat.c | 52 +- > > drivers/cxl/core/core.h | 33 +- > > drivers/cxl/core/extent.c | 486 +++++++++++++++ > > drivers/cxl/core/hdm.c | 213 ++++++- > > drivers/cxl/core/mbox.c | 605 ++++++++++++++++++- > > drivers/cxl/core/memdev.c | 130 +++- > > drivers/cxl/core/port.c | 13 +- > > drivers/cxl/core/region.c | 170 ++++-- > > drivers/cxl/core/trace.h | 65 ++ > > drivers/cxl/cxl.h | 122 +++- > > drivers/cxl/cxlmem.h | 131 +++- > > drivers/cxl/pci.c | 123 +++- > > drivers/dax/bus.c | 352 +++++++++-- > > drivers/dax/bus.h | 4 +- > > drivers/dax/cxl.c | 72 ++- > > drivers/dax/dax-private.h | 47 +- > > drivers/dax/hmem/hmem.c | 2 +- > > drivers/dax/pmem.c | 2 +- > > fs/btrfs/ordered-data.c | 10 +- > > include/acpi/actbl1.h | 2 + > > include/cxl/event.h | 32 + > > include/linux/range.h | 7 + > > lib/test_printf.c | 70 +++ > > lib/vsprintf.c | 55 +- > > tools/testing/cxl/Kbuild | 3 +- > > tools/testing/cxl/test/mem.c | 960 ++++++++++++++++++++++++++---- > > 29 files changed, 3576 insertions(+), 320 deletions(-) > > --- > > base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc > > change-id: 20230604-dcd-type2-upstream-0cd15f6216fd > > > > Best regards, > > -- > > Ira Weiny <ira.weiny@intel.com> > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2024-10-22 17:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-07 23:16 [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Ira Weiny 2024-10-07 23:16 ` [PATCH v4 12/28] cxl/cdat: Gather DSMAS data for DCD regions Ira Weiny 2024-10-09 14:42 ` Rafael J. Wysocki 2024-10-11 20:38 ` Ira Weiny 2024-10-14 20:52 ` Wysocki, Rafael J 2024-10-09 18:16 ` Fan Ni 2024-10-14 1:16 ` Ira Weiny 2024-10-10 12:51 ` Jonathan Cameron 2024-10-08 22:57 ` [PATCH v4 00/28] DCD: Add support for Dynamic Capacity Devices (DCD) Fan Ni 2024-10-08 23:06 ` Fan Ni 2024-10-10 15:30 ` Ira Weiny 2024-10-10 15:31 ` Ira Weiny 2024-10-21 16:47 ` Fan Ni 2024-10-22 17:05 ` Jonathan Cameron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox