* [PATCH v23 00/22] Type2 device basic support
@ 2026-02-01 15:54 alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
` (23 more replies)
0 siblings, 24 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
This patchset should be applied on the cxl next branch using the base
specified at the end of this cover letter.
Dependencies on Dan's work has gone and also on Terry's as the only
patch required is now in next. The other dependency is on Smita patchset
but it does not exist such a dependency as that work will not avoid the
problem with Type2 and DAX/hmem if soft reserved memory. This needs to
be solved by the BIOS and Type2 UEFI driver for populating the CXL.mem
range as EFI_RESERVED_TYPE instead of default EFI_CONVENTIONAL_MEMORY
with the EFI_MEMORY_SP attribute. There exists though a dependency on
one Smita's patches:
[PATCH v5 3/7] cxl/region: Skip decoder reset on detach for autodiscovered regions
This is needed for the default behaviour with current BIOS configuration
where the HDM Type2 decoders will be kept unreset when driver unloads.
This is the main change introduced in v23: committed decoders will not
be reset. Previous v22 functionality supported first driver load finding
committed decoders but resetting them at unload and supporting
uncommitted decoders in next driver loads. This will be suported in
follow-up works.
v23 changes:
patch 11: fixing minor issues and droping change in
should_emulate_decoders (Jonathan Cameron)
patch13: refactoring unregister_region for safety type in Type2 API
sfc changes: slight modifications to error path
v22 changes:
patch 1-3 from Dan's branch without any changes.
patch 11: new
patch 12: moved here from v21 patch 22
patch 13-14: new
patch 23: move check ahead of type3 only checks
All patches with sfc changes adapted to support both options.
v21 changes;
patch1-2: v20 patch1 splitted up doing the code move in the second
patch in v21. (Jonathan)
patch1-4: adding my Signed-off tag along with Dan's
patch5: fix duplication of CXL_NR_PARTITION definition
patch7: dropped the cxl test fixes removing unused function. It was
sent independently ahead of this version.
patch12: optimization for max free space calculation (Jonathan)
patch19: optimization for returning on error (Jonathan)
v20 changes:
patch 1: using release helps (Jonathan).
patch 6: minor fix in comments (Jonathan).
patch 7 & 8: change commit mentioning sfc changes
patch 11: Fix interleave_ways setting (Jonathan)
Change assignament location (Dave)
patch 13: changing error return order (Jonathan)
removing blank line (Dave)
patch 18: Add check for only supporting uncommitted decoders
(Ben, Dave)
Add check for returned value (Dave)
v19 changes:
Removal of cxl_acquire_endpoint and driver callback for unexpected cxl
module removal. Dan's patches made them unnecessary.
patch 4: remove code already moved by Terry's patches (Ben Cheatham)
patch 6: removed unrelated change (Ben Cheatham)
patch 7: fix error report inconsistencies (Jonathan, Dave)
patch 9: remove unnecessary comment (Ben Cheatham)
patch 11: fix __free usage (Jonathan Cameron, Ben Cheatham)
patch 13: style fixes (Jonathan Cameron, Dave Jiag)
patch 14: move code to previous patch (Jonathan Cameron)
patch 18: group code in one locking (Dave Jian)
use __free helper (Ben Cheatham)
v18 changes:
patch 1: minor changes and fixing docs generation (Jonathan, Dan)
patch4: merged with v17 patch5
patch 5: merging v17 patches 6 and 7
patch 6: adding helpers for clarity
patch 9:
- minor changes (Dave)
- simplifying flags check (Dan)
patch 10: minor changes (Jonathan)
patch 11:
- minor changes (Dave)
- fix mess (Jonathan, Dave)
patch 18: minor changes (Jonathan, Dan)
v17 changes: (Dan Williams review)
- use devm for cxl_dev_state allocation
- using current cxl struct for checking capability registers found by
the driver.
- simplify dpa initialization without a mailbox not supporting pmem
- add cxl_acquire_endpoint for protection during initialization
- add callback/action to cxl_create_region for a driver notified about cxl
core kernel modules removal.
- add sfc function to disable CXL-based PIO buffers if such a callback
is invoked.
- Always manage a Type2 created region as private not allowing DAX.
v16 changes:
- rebase against rc4 (Dave Jiang)
- remove duplicate line (Ben Cheatham)
v15 changes:
- remove reference to unused header file (Jonathan Cameron)
- add proper kernel docs to exported functions (Alison Schofield)
- using an array to map the enums to strings (Alison Schofield)
- clarify comment when using bitmap_subset (Jonathan Cameron)
- specify link to type2 support in all patches (Alison Schofield)
Patches changed (minor): 4, 11
v14 changes:
- static null initialization of bitmaps (Jonathan Cameron)
- Fixing cxl tests (Alison Schofield)
- Fixing robot compilation problems
Patches changed (minor): 1, 4, 6, 13
v13 changes:
- using names for headers checking more consistent (Jonathan Cameron)
- using helper for caps bit setting (Jonathan Cameron)
- provide generic function for reporting missing capabilities (Jonathan Cameron)
- rename cxl_pci_setup_memdev_regs to cxl_pci_accel_setup_memdev_regs (Jonathan Cameron)
- cxl_dpa_info size to be set by the Type2 driver (Jonathan Cameron)
- avoiding rc variable when possible (Jonathan Cameron)
- fix spelling (Simon Horman)
- use scoped_guard (Dave Jiang)
- use enum instead of bool (Dave Jiang)
- dropping patch with hardware symbols
v12 changes:
- use new macro cxl_dev_state_create in pci driver (Ben Cheatham)
- add public/private sections in now exported cxl_dev_state struct (Ben
Cheatham)
- fix cxl/pci.h regarding file name for checking if defined
- Clarify capabilities found vs expected in error message. (Ben
Cheatham)
- Clarify new CXL_DECODER_F flag (Ben Cheatham)
- Fix changes about cxl memdev creation support moving code to the
proper patch. (Ben Cheatham)
- Avoid debug and function duplications (Ben Cheatham)
v11 changes:
- Dropping the use of cxl_memdev_state and going back to using
cxl_dev_state.
- Using a helper for an accel driver to allocate its own cxl-related
struct embedding cxl_dev_state.
- Exporting the required structs in include/cxl/cxl.h for an accel
driver being able to know the cxl_dev_state size required in the
previously mentioned helper for allocation.
- Avoid using any struct for dpa initialization by the accel driver
adding a specific function for creating dpa partitions by accel
drivers without a mailbox.
v10 changes:
- Using cxl_memdev_state instead of cxl_dev_state for type2 which has a
memory after all and facilitates the setup.
- Adapt core for using cxl_memdev_state allowing accel drivers to work
with them without further awareness of internal cxl structs.
- Using last DPA changes for creating DPA partitions with accel driver
hardcoding mds values when no mailbox.
- capabilities not a new field but built up when current register maps
is performed and returned to the caller for checking.
- HPA free space supporting interleaving.
- DPA free space droping max-min for a simple alloc size.
v9 changes:
- adding forward definitions (Jonathan Cameron)
- using set_bit instead of bitmap_set (Jonathan Cameron)
- fix rebase problem (Jonathan Cameron)
- Improve error path (Jonathan Cameron)
- fix build problems with cxl region dependency (robot)
- fix error path (Simon Horman)
v8 changes:
- Change error path labeling inside sfc cxl code (Edward Cree)
- Properly handling checks and error in sfc cxl code (Simon Horman)
- Fix bug when checking resource_size (Simon Horman)
- Avoid bisect problems reordering patches (Edward Cree)
- Fix buffer allocation size in sfc (Simon Horman)
v7 changes:
- fixing kernel test robot complains
- fix type with Type3 mandatory capabilities (Zhi Wang)
- optimize code in cxl_request_resource (Kalesh Anakkur Purayil)
- add sanity check when dealing with resources arithmetics (Fan Ni)
- fix typos and blank lines (Fan Ni)
- keep previous log errors/warnings in sfc driver (Martin Habets)
- add WARN_ON_ONCE if region given is NULL
v6 changes:
- update sfc mcdi_pcol.h with full hardware changes most not related to
this patchset. This is an automatic file created from hardware design
changes and not touched by software. It is updated from time to time
and it required update for the sfc driver CXL support.
- remove CXL capabilities definitions not used by the patchset or
previous kernel code. (Dave Jiang, Jonathan Cameron)
- Use bitmap_subset instead of reinventing the wheel ... (Ben Cheatham)
- Use cxl_accel_memdev for new device_type created (Ben Cheatham)
- Fix construct_region use of rwsem (Zhi Wang)
- Obtain region range instead of region params (Allison Schofield, Dave
Jiang)
v5 changes:
- Fix SFC configuration based on kernel CXL configuration
- Add subset check for capabilities.
- fix region creation when HDM decoders programmed by firmware/BIOS (Ben
Cheatham)
- Add option for creating dax region based on driver decission (Ben
Cheatham)
- Using sfc probe_data struct for keeping sfc cxl data
v4 changes:
- Use bitmap for capabilities new field (Jonathan Cameron)
- Use cxl_mem attributes for sysfs based on device type (Dave Jian)
- Add conditional cxl sfc compilation relying on kernel CXL config (kernel test robot)
- Add sfc changes in different patches for facilitating backport (Jonathan Cameron)
- Remove patch for dealing with cxl modules dependencies and using sfc kconfig plus
MODULE_SOFTDEP instead.
v3 changes:
- cxl_dev_state not defined as opaque but only manipulated by accel drivers
through accessors.
- accessors names not identified as only for accel drivers.
- move pci code from pci driver (drivers/cxl/pci.c) to generic pci code
(drivers/cxl/core/pci.c).
- capabilities field from u8 to u32 and initialised by CXL regs discovering
code.
- add capabilities check and removing current check by CXL regs discovering
code.
- Not fail if CXL Device Registers not found. Not mandatory for Type2.
- add timeout in acquire_endpoint for solving a race with the endpoint port
creation.
- handle EPROBE_DEFER by sfc driver.
- Limiting interleave ways to 1 for accel driver HPA/DPA requests.
- factoring out interleave ways and granularity helpers from type2 region
creation patch.
- restricting region_creation for type2 to one endpoint decoder.
v2 changes:
I have removed the introduction about the concerns with BIOS/UEFI after the
discussion leading to confirm the need of the functionality implemented, at
least is some scenarios.
There are two main changes from the RFC:
1) Following concerns about drivers using CXL core without restrictions, the CXL
struct to work with is opaque to those drivers, therefore functions are
implemented for modifying or reading those structs indirectly.
2) The driver for using the added functionality is not a test driver but a real
one: the SFC ethernet network driver. It uses the CXL region mapped for PIO
buffers instead of regions inside PCIe BARs.
RFC:
Current CXL kernel code is focused on supporting Type3 CXL devices, aka memory
expanders. Type2 CXL devices, aka device accelerators, share some functionalities
but require some special handling.
First of all, Type2 are by definition specific to drivers doing something and not just
a memory expander, so it is expected to work with the CXL specifics. This implies the CXL
setup needs to be done by such a driver instead of by a generic CXL PCI driver
as for memory expanders. Most of such setup needs to use current CXL core code
and therefore needs to be accessible to those vendor drivers. This is accomplished
exporting opaque CXL structs and adding and exporting functions for working with
those structs indirectly.
Some of the patches are based on a patchset sent by Dan Williams [1] which was just
partially integrated, most related to making things ready for Type2 but none
related to specific Type2 support. Those patches based on Dan´s work have Dan´s
signing as co-developer, and a link to the original patch.
A final note about CXL.cache is needed. This patchset does not cover it at all,
although the emulated Type2 device advertises it. From the kernel point of view
supporting CXL.cache will imply to be sure the CXL path supports what the Type2
device needs. A device accelerator will likely be connected to a Root Switch,
but other configurations can not be discarded. Therefore the kernel will need to
check not just HPA, DPA, interleave and granularity, but also the available
CXL.cache support and resources in each switch in the CXL path to the Type2
device. I expect to contribute to this support in the following months, and
it would be good to discuss about it when possible.
[1] https://lore.kernel.org/linux-cxl/98b1f61a-e6c2-71d4-c368-50d958501b0c@intel.com/T/
Alejandro Lucero (22):
cxl: Add type2 device basic support
sfc: add cxl support
cxl: Move pci generic code
cxl/sfc: Map cxl component regs
cxl/sfc: Initialize dpa without a mailbox
cxl: Prepare memdev creation for type2
sfc: create type2 cxl memdev
cxl/hdm: Add support for getting region from committed decoder
cxl: Add function for obtaining region range
cxl: Export function for unwinding cxl by accelerators
sfc: obtain decoder and region if committed by firmware
cxl: Define a driver interface for HPA free space enumeration
sfc: get root decoder
cxl: Define a driver interface for DPA allocation
sfc: get endpoint decoder
cxl: Make region type based on endpoint type
cxl/region: Factor out interleave ways setup
cxl/region: Factor out interleave granularity setup
cxl: Allow region creation by type2 drivers
cxl: Avoid dax creation for accelerators
sfc: create cxl region
sfc: support pio mapping based on cxl
drivers/cxl/core/core.h | 5 +-
drivers/cxl/core/hdm.c | 123 ++++++++
drivers/cxl/core/mbox.c | 63 +---
drivers/cxl/core/memdev.c | 113 ++++++-
drivers/cxl/core/pci.c | 63 ++++
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/region.c | 434 +++++++++++++++++++++++---
drivers/cxl/core/regs.c | 2 +-
drivers/cxl/cxl.h | 125 +-------
drivers/cxl/cxlmem.h | 92 +-----
drivers/cxl/cxlpci.h | 21 +-
drivers/cxl/mem.c | 45 ++-
drivers/cxl/pci.c | 85 +----
drivers/net/ethernet/sfc/Kconfig | 10 +
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/ef10.c | 50 ++-
drivers/net/ethernet/sfc/efx.c | 15 +-
drivers/net/ethernet/sfc/efx_cxl.c | 186 +++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 41 +++
drivers/net/ethernet/sfc/net_driver.h | 12 +
drivers/net/ethernet/sfc/nic.h | 3 +
include/cxl/cxl.h | 287 +++++++++++++++++
include/cxl/pci.h | 21 ++
tools/testing/cxl/test/mem.c | 3 +-
24 files changed, 1376 insertions(+), 425 deletions(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
base-commit: 3f7938b1aec7f06d5b23adca83e4542fcf027001
--
2.34.1
^ permalink raw reply [flat|nested] 67+ messages in thread
* [PATCH v23 01/22] cxl: Add type2 device basic support
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 02/22] sfc: add cxl support alejandro.lucero-palau
` (22 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Alison Schofield,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Differentiate CXL memory expanders (type 3) from CXL device accelerators
(type 2) with a new function for initializing cxl_dev_state and a macro
for helping accel drivers to embed cxl_dev_state inside a private
struct.
Move structs to include/cxl as the size of the accel driver private
struct embedding cxl_dev_state needs to know the size of this struct.
Use same new initialization with the type3 pci driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/mbox.c | 12 +-
drivers/cxl/core/memdev.c | 32 +++++
drivers/cxl/cxl.h | 97 +--------------
drivers/cxl/cxlmem.h | 86 +------------
drivers/cxl/pci.c | 14 +--
include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 3 +-
7 files changed, 274 insertions(+), 196 deletions(-)
create mode 100644 include/cxl/cxl.h
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index fa6dd0c94656..bee84d0101d1 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1514,23 +1514,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec)
{
struct cxl_memdev_state *mds;
int rc;
- mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
+ mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
+ dvsec, struct cxl_memdev_state, cxlds,
+ true);
if (!mds) {
dev_err(dev, "No memory available\n");
return ERR_PTR(-ENOMEM);
}
mutex_init(&mds->event.log_lock);
- mds->cxlds.dev = dev;
- mds->cxlds.reg_map.host = dev;
- mds->cxlds.cxl_mbox.host = dev;
- mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
- mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
if (rc == -EOPNOTSUPP)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index af3d0cc65138..22d156f25305 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -656,6 +656,38 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
+static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+ enum cxl_devtype type, u64 serial, u16 dvsec,
+ bool has_mbox)
+{
+ *cxlds = (struct cxl_dev_state) {
+ .dev = dev,
+ .type = type,
+ .serial = serial,
+ .cxl_dvsec = dvsec,
+ .reg_map.host = dev,
+ .reg_map.resource = CXL_RESOURCE_NONE,
+ };
+
+ if (has_mbox)
+ cxlds->cxl_mbox.host = dev;
+}
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox)
+{
+ struct cxl_dev_state *cxlds = devm_kzalloc(dev, size, GFP_KERNEL);
+
+ if (!cxlds)
+ return NULL;
+
+ cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
+ return cxlds;
+}
+EXPORT_SYMBOL_NS_GPL(_devm_cxl_dev_state_create, "CXL");
+
static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
const struct file_operations *fops,
const struct cxl_memdev_attach *attach)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index e1d47062e1d3..3eaa353e430b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -12,6 +12,7 @@
#include <linux/node.h>
#include <linux/io.h>
#include <linux/range.h>
+#include <cxl/cxl.h>
extern const struct nvdimm_security_ops *cxl_security_ops;
@@ -201,97 +202,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
-/*
- * Using struct_group() allows for per register-block-type helper routines,
- * without requiring block-type agnostic code to include the prefix.
- */
-struct cxl_regs {
- /*
- * Common set of CXL Component register block base pointers
- * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
- * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
- */
- struct_group_tagged(cxl_component_regs, component,
- void __iomem *hdm_decoder;
- void __iomem *ras;
- );
- /*
- * Common set of CXL Device register block base pointers
- * @status: CXL 2.0 8.2.8.3 Device Status Registers
- * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
- * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
- */
- struct_group_tagged(cxl_device_regs, device_regs,
- void __iomem *status, *mbox, *memdev;
- );
-
- struct_group_tagged(cxl_pmu_regs, pmu_regs,
- void __iomem *pmu;
- );
-
- /*
- * RCH downstream port specific RAS register
- * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
- */
- struct_group_tagged(cxl_rch_regs, rch_regs,
- void __iomem *dport_aer;
- );
-
- /*
- * RCD upstream port specific PCIe cap register
- * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
- */
- struct_group_tagged(cxl_rcd_regs, rcd_regs,
- void __iomem *rcd_pcie_cap;
- );
-};
-
-struct cxl_reg_map {
- bool valid;
- int id;
- unsigned long offset;
- unsigned long size;
-};
-
-struct cxl_component_reg_map {
- struct cxl_reg_map hdm_decoder;
- struct cxl_reg_map ras;
-};
-
-struct cxl_device_reg_map {
- struct cxl_reg_map status;
- struct cxl_reg_map mbox;
- struct cxl_reg_map memdev;
-};
-
-struct cxl_pmu_reg_map {
- struct cxl_reg_map pmu;
-};
-
-/**
- * struct cxl_register_map - DVSEC harvested register block mapping parameters
- * @host: device for devm operations and logging
- * @base: virtual base of the register-block-BAR + @block_offset
- * @resource: physical resource base of the register block
- * @max_size: maximum mapping size to perform register search
- * @reg_type: see enum cxl_regloc_type
- * @component_map: cxl_reg_map for component registers
- * @device_map: cxl_reg_maps for device registers
- * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
- */
-struct cxl_register_map {
- struct device *host;
- void __iomem *base;
- resource_size_t resource;
- resource_size_t max_size;
- u8 reg_type;
- union {
- struct cxl_component_reg_map component_map;
- struct cxl_device_reg_map device_map;
- struct cxl_pmu_reg_map pmu_map;
- };
-};
-
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
@@ -497,11 +407,6 @@ struct cxl_region_params {
resource_size_t cache_size;
};
-enum cxl_partition_mode {
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index ef202b34e5ea..281546de426e 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -113,8 +113,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
-#define CXL_NR_PARTITIONS_MAX 2
-
struct cxl_dpa_info {
u64 size;
struct cxl_dpa_part_info {
@@ -373,87 +371,6 @@ struct cxl_security_state {
struct kernfs_node *sanitize_node;
};
-/*
- * enum cxl_devtype - delineate type-2 from a generic type-3 device
- * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
- * HDM-DB, no requirement that this device implements a
- * mailbox, or other memory-device-standard manageability
- * flows.
- * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
- * HDM-H and class-mandatory memory device registers
- */
-enum cxl_devtype {
- CXL_DEVTYPE_DEVMEM,
- CXL_DEVTYPE_CLASSMEM,
-};
-
-/**
- * struct cxl_dpa_perf - DPA performance property entry
- * @dpa_range: range for DPA address
- * @coord: QoS performance data (i.e. latency, bandwidth)
- * @cdat_coord: raw QoS performance data from CDAT
- * @qos_class: QoS Class cookies
- */
-struct cxl_dpa_perf {
- struct range dpa_range;
- struct access_coordinate coord[ACCESS_COORDINATE_MAX];
- struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
- int qos_class;
-};
-
-/**
- * struct cxl_dpa_partition - DPA partition descriptor
- * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
- * @perf: performance attributes of the partition from CDAT
- * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
- */
-struct cxl_dpa_partition {
- struct resource res;
- struct cxl_dpa_perf perf;
- enum cxl_partition_mode mode;
-};
-
-/**
- * struct cxl_dev_state - The driver device state
- *
- * cxl_dev_state represents the CXL driver/device state. It provides an
- * interface to mailbox commands as well as some cached data about the device.
- * Currently only memory devices are represented.
- *
- * @dev: The device associated with this CXL state
- * @cxlmd: The device representing the CXL.mem capabilities of @dev
- * @reg_map: component and ras register mapping parameters
- * @regs: Parsed register blocks
- * @cxl_dvsec: Offset to the PCIe device DVSEC
- * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
- * @media_ready: Indicate whether the device media is usable
- * @dpa_res: Overall DPA resource tree for the device
- * @part: DPA partition array
- * @nr_partitions: Number of DPA partitions
- * @serial: PCIe Device Serial Number
- * @type: Generic Memory Class device or Vendor Specific Memory device
- * @cxl_mbox: CXL mailbox context
- * @cxlfs: CXL features context
- */
-struct cxl_dev_state {
- struct device *dev;
- struct cxl_memdev *cxlmd;
- struct cxl_register_map reg_map;
- struct cxl_regs regs;
- int cxl_dvsec;
- bool rcd;
- bool media_ready;
- struct resource dpa_res;
- struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
- unsigned int nr_partitions;
- u64 serial;
- enum cxl_devtype type;
- struct cxl_mailbox cxl_mbox;
-#ifdef CONFIG_CXL_FEATURES
- struct cxl_features_state *cxlfs;
-#endif
-};
-
static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
{
/*
@@ -858,7 +775,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 1cf232220873..24179cc702bf 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -911,25 +911,25 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
int rc, pmu_count;
unsigned int i;
bool irq_avail;
+ u16 dvsec;
rc = pcim_enable_device(pdev);
if (rc)
return rc;
pci_set_master(pdev);
- mds = cxl_memdev_state_create(&pdev->dev);
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec)
+ pci_warn(pdev, "Device DVSEC not present, skip CXL.mem init\n");
+
+ mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
if (IS_ERR(mds))
return PTR_ERR(mds);
cxlds = &mds->cxlds;
pci_set_drvdata(pdev, cxlds);
cxlds->rcd = is_cxl_restricted(pdev);
- cxlds->serial = pci_get_dsn(pdev);
- cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
- if (!cxlds->cxl_dvsec)
- dev_warn(&pdev->dev,
- "Device DVSEC not present, skip CXL.mem init\n");
rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
if (rc)
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..13d448686189
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. */
+/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
+
+#ifndef __CXL_CXL_H__
+#define __CXL_CXL_H__
+
+#include <linux/node.h>
+#include <linux/ioport.h>
+#include <cxl/mailbox.h>
+
+/**
+ * enum cxl_devtype - delineate type-2 from a generic type-3 device
+ * @CXL_DEVTYPE_DEVMEM: Vendor specific CXL Type-2 device implementing HDM-D or
+ * HDM-DB, no requirement that this device implements a
+ * mailbox, or other memory-device-standard manageability
+ * flows.
+ * @CXL_DEVTYPE_CLASSMEM: Common class definition of a CXL Type-3 device with
+ * HDM-H and class-mandatory memory device registers
+ */
+enum cxl_devtype {
+ CXL_DEVTYPE_DEVMEM,
+ CXL_DEVTYPE_CLASSMEM,
+};
+
+struct device;
+
+/*
+ * Using struct_group() allows for per register-block-type helper routines,
+ * without requiring block-type agnostic code to include the prefix.
+ */
+struct cxl_regs {
+ /*
+ * Common set of CXL Component register block base pointers
+ * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
+ * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ */
+ struct_group_tagged(cxl_component_regs, component,
+ void __iomem *hdm_decoder;
+ void __iomem *ras;
+ );
+ /*
+ * Common set of CXL Device register block base pointers
+ * @status: CXL 2.0 8.2.8.3 Device Status Registers
+ * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
+ * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
+ */
+ struct_group_tagged(cxl_device_regs, device_regs,
+ void __iomem *status, *mbox, *memdev;
+ );
+
+ struct_group_tagged(cxl_pmu_regs, pmu_regs,
+ void __iomem *pmu;
+ );
+
+ /*
+ * RCH downstream port specific RAS register
+ * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
+ */
+ struct_group_tagged(cxl_rch_regs, rch_regs,
+ void __iomem *dport_aer;
+ );
+
+ /*
+ * RCD upstream port specific PCIe cap register
+ * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
+ */
+ struct_group_tagged(cxl_rcd_regs, rcd_regs,
+ void __iomem *rcd_pcie_cap;
+ );
+};
+
+struct cxl_reg_map {
+ bool valid;
+ int id;
+ unsigned long offset;
+ unsigned long size;
+};
+
+struct cxl_component_reg_map {
+ struct cxl_reg_map hdm_decoder;
+ struct cxl_reg_map ras;
+};
+
+struct cxl_device_reg_map {
+ struct cxl_reg_map status;
+ struct cxl_reg_map mbox;
+ struct cxl_reg_map memdev;
+};
+
+struct cxl_pmu_reg_map {
+ struct cxl_reg_map pmu;
+};
+
+/**
+ * struct cxl_register_map - DVSEC harvested register block mapping parameters
+ * @host: device for devm operations and logging
+ * @base: virtual base of the register-block-BAR + @block_offset
+ * @resource: physical resource base of the register block
+ * @max_size: maximum mapping size to perform register search
+ * @reg_type: see enum cxl_regloc_type
+ * @component_map: cxl_reg_map for component registers
+ * @device_map: cxl_reg_maps for device registers
+ * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
+ */
+struct cxl_register_map {
+ struct device *host;
+ void __iomem *base;
+ resource_size_t resource;
+ resource_size_t max_size;
+ u8 reg_type;
+ union {
+ struct cxl_component_reg_map component_map;
+ struct cxl_device_reg_map device_map;
+ struct cxl_pmu_reg_map pmu_map;
+ };
+};
+
+/**
+ * struct cxl_dpa_perf - DPA performance property entry
+ * @dpa_range: range for DPA address
+ * @coord: QoS performance data (i.e. latency, bandwidth)
+ * @cdat_coord: raw QoS performance data from CDAT
+ * @qos_class: QoS Class cookies
+ */
+struct cxl_dpa_perf {
+ struct range dpa_range;
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
+ int qos_class;
+};
+
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+/**
+ * struct cxl_dev_state - The driver device state
+ *
+ * cxl_dev_state represents the CXL driver/device state. It provides an
+ * interface to mailbox commands as well as some cached data about the device.
+ * Currently only memory devices are represented.
+ *
+ * @dev: The device associated with this CXL state
+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
+ * @reg_map: component and ras register mapping parameters
+ * @regs: Parsed register blocks
+ * @cxl_dvsec: Offset to the PCIe device DVSEC
+ * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
+ * @dpa_res: Overall DPA resource tree for the device
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
+ * @serial: PCIe Device Serial Number
+ * @type: Generic Memory Class device or Vendor Specific Memory device
+ * @cxl_mbox: CXL mailbox context
+ * @cxlfs: CXL features context
+ */
+struct cxl_dev_state {
+ /* public for Type2 drivers */
+ struct device *dev;
+ struct cxl_memdev *cxlmd;
+
+ /* private for Type2 drivers */
+ struct cxl_register_map reg_map;
+ struct cxl_regs regs;
+ int cxl_dvsec;
+ bool rcd;
+ bool media_ready;
+ struct resource dpa_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
+ u64 serial;
+ enum cxl_devtype type;
+ struct cxl_mailbox cxl_mbox;
+#ifdef CONFIG_CXL_FEATURES
+ struct cxl_features_state *cxlfs;
+#endif
+};
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox);
+
+/**
+ * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
+ * driver specific struct.
+ *
+ * @parent: device behind the request
+ * @type: CXL device type
+ * @serial: device identification
+ * @dvsec: dvsec capability offset
+ * @drv_struct: driver struct embedding a cxl_dev_state struct
+ * @member: drv_struct member as cxl_dev_state
+ * @mbox: true if mailbox supported
+ *
+ * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
+ * struct initialized.
+ *
+ * Introduced for Type2 driver support.
+ */
+#define devm_cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
+ ({ \
+ static_assert(__same_type(struct cxl_dev_state, \
+ ((drv_struct *)NULL)->member)); \
+ static_assert(offsetof(drv_struct, member) == 0); \
+ (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
+ sizeof(drv_struct), mbox); \
+ })
+#endif /* __CXL_CXL_H__ */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index cb87e8c0e63c..79f42f4474d4 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1716,7 +1716,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- mds = cxl_memdev_state_create(dev);
+ mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
if (IS_ERR(mds))
return PTR_ERR(mds);
@@ -1732,7 +1732,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
- cxlds->serial = pdev->id + 1;
if (is_rcd(pdev))
cxlds->rcd = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 02/22] sfc: add cxl support
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 03/22] cxl: Move pci generic code alejandro.lucero-palau
` (21 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/Kconfig | 9 +++++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/efx.c | 15 ++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 56 +++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
drivers/net/ethernet/sfc/net_driver.h | 10 +++++
6 files changed, 130 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index c4c43434f314..979f2801e2a8 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
Driver-Interface) commands and responses, allowing debugging of
driver/firmware interaction. The tracing is actually enabled by
a sysfs file 'mcdi_logging' under the PCI device.
+config SFC_CXL
+ bool "Solarflare SFC9100-family CXL support"
+ depends on SFC && CXL_BUS >= SFC
+ default SFC
+ help
+ This enables SFC CXL support if the kernel is configuring CXL for
+ using CTPIO with CXL.mem. The SFC device with CXL support and
+ with a CXL-aware firmware can be used for minimizing latencies
+ when sending through CTPIO.
source "drivers/net/ethernet/sfc/falcon/Kconfig"
source "drivers/net/ethernet/sfc/siena/Kconfig"
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index d99039ec468d..bb0f1891cde6 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
mae.o tc.o tc_bindings.o tc_counters.o \
tc_encap_actions.o tc_conntrack.o
+sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
obj-$(CONFIG_SFC) += sfc.o
obj-$(CONFIG_SFC_FALCON) += falcon/
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 112e55b98ed3..537668278375 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -34,6 +34,7 @@
#include "selftest.h"
#include "sriov.h"
#include "efx_devlink.h"
+#include "efx_cxl.h"
#include "mcdi_port_common.h"
#include "mcdi_pcol.h"
@@ -981,12 +982,15 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx_pci_remove_main(efx);
efx_fini_io(efx);
+
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ efx_cxl_exit(probe_data);
+
pci_dbg(efx->pci_dev, "shutdown successful\n");
efx_fini_devlink_and_unlock(efx);
efx_fini_struct(efx);
free_netdev(efx->net_dev);
- probe_data = container_of(efx, struct efx_probe_data, efx);
kfree(probe_data);
};
@@ -1190,6 +1194,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
if (rc)
goto fail2;
+ /* A successful cxl initialization implies a CXL region created to be
+ * used for PIO buffers. If there is no CXL support, or initialization
+ * fails, efx_cxl_pio_initialised will be false and legacy PIO buffers
+ * defined at specific PCI BAR regions will be used.
+ */
+ rc = efx_cxl_init(probe_data);
+ if (rc)
+ pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
+
rc = efx_pci_probe_post_io(efx);
if (rc) {
/* On failure, retry once immediately.
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
new file mode 100644
index 000000000000..8e0481d8dced
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/****************************************************************************
+ *
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/pci.h>
+
+#include "net_driver.h"
+#include "efx_cxl.h"
+
+#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+
+int efx_cxl_init(struct efx_probe_data *probe_data)
+{
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl;
+ u16 dvsec;
+
+ probe_data->cxl_pio_initialised = false;
+
+ /* Is the device configured with and using CXL? */
+ if (!pcie_is_cxl(pci_dev))
+ return 0;
+
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec) {
+ pci_err(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability not found\n");
+ return 0;
+ }
+
+ pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
+
+ /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
+ * specifying no mbox available.
+ */
+ cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_dev->dev.id, dvsec, struct efx_cxl,
+ cxlds, false);
+
+ if (!cxl)
+ return -ENOMEM;
+
+ probe_data->cxl = cxl;
+
+ return 0;
+}
+
+void efx_cxl_exit(struct efx_probe_data *probe_data)
+{
+}
+
+MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
new file mode 100644
index 000000000000..961639cef692
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/****************************************************************************
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#ifndef EFX_CXL_H
+#define EFX_CXL_H
+
+#ifdef CONFIG_SFC_CXL
+
+#include <cxl/cxl.h>
+
+struct cxl_root_decoder;
+struct cxl_port;
+struct cxl_endpoint_decoder;
+struct cxl_region;
+struct efx_probe_data;
+
+struct efx_cxl {
+ struct cxl_dev_state cxlds;
+ struct cxl_memdev *cxlmd;
+ struct cxl_root_decoder *cxlrd;
+ struct cxl_port *endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *efx_region;
+ void __iomem *ctpio_cxl;
+};
+
+int efx_cxl_init(struct efx_probe_data *probe_data);
+void efx_cxl_exit(struct efx_probe_data *probe_data);
+#else
+static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
+static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
+#endif
+#endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index b98c259f672d..3964b2c56609 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1197,14 +1197,24 @@ struct efx_nic {
atomic_t n_rx_noskb_drops;
};
+#ifdef CONFIG_SFC_CXL
+struct efx_cxl;
+#endif
+
/**
* struct efx_probe_data - State after hardware probe
* @pci_dev: The PCI device
* @efx: Efx NIC details
+ * @cxl: details of related cxl objects
+ * @cxl_pio_initialised: cxl initialization outcome.
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
struct efx_nic efx;
+#ifdef CONFIG_SFC_CXL
+ struct efx_cxl *cxl;
+ bool cxl_pio_initialised;
+#endif
};
static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 03/22] cxl: Move pci generic code
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 02/22] sfc: add cxl support alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 04/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
` (20 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci_drv.c implements the functionality for a Type3 device
initialization.
Move helper functions from cxl/core/pci_drv.c to cxl/core/pci.c in order
to be exported and shared with CXL Type2 device initialization.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/core.h | 3 +-
drivers/cxl/core/pci.c | 62 ++++++++++++++++++++++++++++++++++++
drivers/cxl/core/regs.c | 1 -
drivers/cxl/cxl.h | 2 --
drivers/cxl/cxlpci.h | 13 ++++++++
drivers/cxl/pci.c | 70 -----------------------------------------
6 files changed, 77 insertions(+), 74 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 422531799af2..256799d39361 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -187,5 +187,6 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
size_t feat_data_size, u32 feat_flag, u16 offset,
u16 *return_code);
#endif
-
+resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
+ struct cxl_dport *dport);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b838c59d7a3c..6b7e50858d56 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -696,6 +696,68 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, "CXL");
+static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
+ struct cxl_register_map *map,
+ struct cxl_dport *dport)
+{
+ resource_size_t component_reg_phys;
+
+ *map = (struct cxl_register_map) {
+ .host = &pdev->dev,
+ .resource = CXL_RESOURCE_NONE,
+ };
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
+ if (component_reg_phys == CXL_RESOURCE_NONE)
+ return -ENXIO;
+
+ map->resource = component_reg_phys;
+ map->reg_type = CXL_REGLOC_RBI_COMPONENT;
+ map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
+
+ return 0;
+}
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map)
+{
+ int rc;
+
+ rc = cxl_find_regblock(pdev, type, map);
+
+ /*
+ * If the Register Locator DVSEC does not exist, check if it
+ * is an RCH and try to extract the Component Registers from
+ * an RCRB.
+ */
+ if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_map_rcd_linkcap(pdev, dport);
+ if (rc)
+ return rc;
+
+ } else if (rc) {
+ return rc;
+ }
+
+ return cxl_setup_regs(map);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
+
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
{
int speed, bw;
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index a010b3214342..93710cf4f0a6 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -641,4 +641,3 @@ resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
return CXL_RESOURCE_NONE;
return __rcrb_to_component(dev, &dport->rcrb, CXL_RCRB_UPSTREAM);
}
-EXPORT_SYMBOL_NS_GPL(cxl_rcd_component_reg_phys, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3eaa353e430b..5d111980d879 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -222,8 +222,6 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
int cxl_setup_regs(struct cxl_register_map *map);
struct cxl_dport;
-resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport);
int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 6f9c78886fd9..d879120b2780 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -74,6 +74,17 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev)
return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
}
+/*
+ * Assume that the caller has already validated that @pdev has CXL
+ * capabilities, any RCiEP with CXL capabilities is treated as a
+ * Restricted CXL Device (RCD) and finds upstream port and endpoint
+ * registers in a Root Complex Register Block (RCRB).
+ */
+static inline bool is_cxl_restricted(struct pci_dev *pdev)
+{
+ return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
+}
+
struct cxl_dev_state;
void read_cdat_data(struct cxl_port *port);
@@ -95,4 +106,6 @@ static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
struct device *host) { }
#endif
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 24179cc702bf..668d44eb1bf5 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -465,76 +465,6 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
return 0;
}
-/*
- * Assume that any RCIEP that emits the CXL memory expander class code
- * is an RCD
- */
-static bool is_cxl_restricted(struct pci_dev *pdev)
-{
- return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
-}
-
-static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
- struct cxl_register_map *map,
- struct cxl_dport *dport)
-{
- resource_size_t component_reg_phys;
-
- *map = (struct cxl_register_map) {
- .host = &pdev->dev,
- .resource = CXL_RESOURCE_NONE,
- };
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
- if (component_reg_phys == CXL_RESOURCE_NONE)
- return -ENXIO;
-
- map->resource = component_reg_phys;
- map->reg_type = CXL_REGLOC_RBI_COMPONENT;
- map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
-
- return 0;
-}
-
-static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map)
-{
- int rc;
-
- rc = cxl_find_regblock(pdev, type, map);
-
- /*
- * If the Register Locator DVSEC does not exist, check if it
- * is an RCH and try to extract the Component Registers from
- * an RCRB.
- */
- if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
- struct cxl_dport *dport;
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
- if (rc)
- return rc;
-
- rc = cxl_dport_map_rcd_linkcap(pdev, dport);
- if (rc)
- return rc;
-
- } else if (rc) {
- return rc;
- }
-
- return cxl_setup_regs(map);
-}
-
static int cxl_pci_ras_unmask(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 04/22] cxl/sfc: Map cxl component regs
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (2 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 03/22] cxl: Move pci generic code alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-03-20 17:22 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
` (19 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device component registers.
Use it in sfc driver cxl initialization.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxl.h | 7 ------
drivers/cxl/cxlpci.h | 12 ----------
drivers/cxl/pci.c | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 35 ++++++++++++++++++++++++++++++
include/cxl/cxl.h | 19 ++++++++++++++++
include/cxl/pci.h | 21 ++++++++++++++++++
9 files changed, 79 insertions(+), 19 deletions(-)
create mode 100644 include/cxl/pci.h
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 6b7e50858d56..ba2d393c540a 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -6,6 +6,7 @@
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
+#include <cxl/pci.h>
#include <linux/aer.h>
#include <cxlpci.h>
#include <cxlmem.h>
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 54f72452fb06..385588b8b30b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <linux/node.h>
#include <cxl/einj.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 93710cf4f0a6..20c2d9fbcfe7 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <pmu.h>
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 5d111980d879..944c5d1ccceb 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -39,10 +39,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
-#define CXL_CM_CAP_CAP_ID_RAS 0x2
-#define CXL_CM_CAP_CAP_ID_HDM 0x5
-#define CXL_CM_CAP_CAP_HDM_VERSION 1
-
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
#define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
@@ -206,9 +202,6 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
struct cxl_device_reg_map *map);
-int cxl_map_component_regs(const struct cxl_register_map *map,
- struct cxl_component_regs *regs,
- unsigned long map_mask);
int cxl_map_device_regs(const struct cxl_register_map *map,
struct cxl_device_regs *regs);
int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index d879120b2780..93df1b1fa326 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -13,16 +13,6 @@
*/
#define CXL_PCI_DEFAULT_MAX_VECTORS 16
-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
- CXL_REGLOC_RBI_EMPTY = 0,
- CXL_REGLOC_RBI_COMPONENT,
- CXL_REGLOC_RBI_VIRT,
- CXL_REGLOC_RBI_MEMDEV,
- CXL_REGLOC_RBI_PMU,
- CXL_REGLOC_RBI_TYPES
-};
-
/*
* Table Access DOE, CDAT Read Entry Response
*
@@ -106,6 +96,4 @@ static inline void cxl_dport_init_ras_reporting(struct cxl_dport *dport,
struct device *host) { }
#endif
-int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 668d44eb1bf5..7b4699fb8870 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/pci.h>
#include <cxl/mailbox.h>
#include "cxlmem.h"
#include "cxlpci.h"
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 8e0481d8dced..34126bc4826c 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -7,6 +7,8 @@
#include <linux/pci.h>
+#include <cxl/cxl.h>
+#include <cxl/pci.h>
#include "net_driver.h"
#include "efx_cxl.h"
@@ -18,6 +20,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
u16 dvsec;
+ int rc;
probe_data->cxl_pio_initialised = false;
@@ -44,6 +47,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (!cxl)
return -ENOMEM;
+ rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
+ &cxl->cxlds.reg_map);
+ if (rc) {
+ pci_err(pci_dev, "No component registers\n");
+ return rc;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
+ pci_err(pci_dev, "Expected HDM component register not found\n");
+ return -ENODEV;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.ras.valid) {
+ pci_err(pci_dev, "Expected RAS component register not found\n");
+ return -ENODEV;
+ }
+
+ rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
+ &cxl->cxlds.regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS));
+ if (rc) {
+ pci_err(pci_dev, "Failed to map RAS capability.\n");
+ return rc;
+ }
+
+ /*
+ * Set media ready explicitly as there are neither mailbox for checking
+ * this state nor the CXL register involved, both not mandatory for
+ * type2.
+ */
+ cxl->cxlds.media_ready = true;
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 13d448686189..7f2e23bce1f7 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -70,6 +70,10 @@ struct cxl_regs {
);
};
+#define CXL_CM_CAP_CAP_ID_RAS 0x2
+#define CXL_CM_CAP_CAP_ID_HDM 0x5
+#define CXL_CM_CAP_CAP_HDM_VERSION 1
+
struct cxl_reg_map {
bool valid;
int id;
@@ -223,4 +227,19 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
(drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
sizeof(drv_struct), mbox); \
})
+
+/**
+ * cxl_map_component_regs - map cxl component registers
+ *
+ * @map: cxl register map to update with the mappings
+ * @regs: cxl component registers to work with
+ * @map_mask: cxl component regs to map
+ *
+ * Returns integer: success (0) or error (-ENOMEM)
+ *
+ * Made public for Type2 driver support.
+ */
+int cxl_map_component_regs(const struct cxl_register_map *map,
+ struct cxl_component_regs *regs,
+ unsigned long map_mask);
#endif /* __CXL_CXL_H__ */
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..a172439f08c6
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+ CXL_REGLOC_RBI_EMPTY = 0,
+ CXL_REGLOC_RBI_COMPONENT,
+ CXL_REGLOC_RBI_VIRT,
+ CXL_REGLOC_RBI_MEMDEV,
+ CXL_REGLOC_RBI_PMU,
+ CXL_REGLOC_RBI_TYPES
+};
+
+struct cxl_register_map;
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (3 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 04/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-03-20 17:24 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 06/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
` (18 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.
Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.
Move related functions to memdev.
Add sfc driver as the client.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/mbox.c | 51 +----------------------
drivers/cxl/core/memdev.c | 66 ++++++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.c | 5 +++
include/cxl/cxl.h | 1 +
5 files changed, 75 insertions(+), 50 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 256799d39361..e3c85ceda248 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -89,6 +89,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
struct dentry *cxl_debugfs_create_dir(const char *dir);
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode);
+struct cxl_memdev_state;
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index bee84d0101d1..d57a0c2d39fb 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1144,7 +1144,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, "CXL");
*
* See CXL @8.2.9.5.2.1 Get Partition Info
*/
-static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_partition_info pi;
@@ -1300,55 +1300,6 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
return -EBUSY;
}
-static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
-{
- int i = info->nr_partitions;
-
- if (size == 0)
- return;
-
- info->part[i].range = (struct range) {
- .start = start,
- .end = start + size - 1,
- };
- info->part[i].mode = mode;
- info->nr_partitions++;
-}
-
-int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
-{
- struct cxl_dev_state *cxlds = &mds->cxlds;
- struct device *dev = cxlds->dev;
- int rc;
-
- if (!cxlds->media_ready) {
- info->size = 0;
- return 0;
- }
-
- info->size = mds->total_bytes;
-
- if (mds->partition_align_bytes == 0) {
- add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->volatile_only_bytes,
- mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
- return 0;
- }
-
- rc = cxl_mem_get_partition_info(mds);
- if (rc) {
- dev_err(dev, "Failed to query partition information\n");
- return rc;
- }
-
- add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
- CXL_PARTMODE_PMEM);
-
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
-
int cxl_get_dirty_count(struct cxl_memdev_state *mds, u32 *count)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 22d156f25305..2c5dd72f43ca 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -582,6 +582,72 @@ bool is_cxl_memdev(const struct device *dev)
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
+static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
+{
+ int i = info->nr_partitions;
+
+ if (size == 0)
+ return;
+
+ info->part[i].range = (struct range) {
+ .start = start,
+ .end = start + size - 1,
+ };
+ info->part[i].mode = mode;
+ info->nr_partitions++;
+}
+
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
+{
+ struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct device *dev = cxlds->dev;
+ int rc;
+
+ if (!cxlds->media_ready) {
+ info->size = 0;
+ return 0;
+ }
+
+ info->size = mds->total_bytes;
+
+ if (mds->partition_align_bytes == 0) {
+ add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ return 0;
+ }
+
+ rc = cxl_mem_get_partition_info(mds);
+ if (rc) {
+ dev_err(dev, "Failed to query partition information\n");
+ return rc;
+ }
+
+ add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
+ CXL_PARTMODE_PMEM);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
+
+/**
+ * cxl_set_capacity: initialize dpa by a driver without a mailbox.
+ *
+ * @cxlds: pointer to cxl_dev_state
+ * @capacity: device volatile memory size
+ */
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
+{
+ struct cxl_dpa_info range_info = {
+ .size = capacity,
+ };
+
+ add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
+ return cxl_dpa_setup(cxlds, &range_info);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
+
/**
* set_exclusive_cxl_commands() - atomically disable user cxl commands
* @mds: The device state to operate on
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 34126bc4826c..0b10a2e6aceb 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -79,6 +79,11 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
*/
cxl->cxlds.media_ready = true;
+ if (cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE)) {
+ pci_err(pci_dev, "dpa capacity setup failed\n");
+ return -ENODEV;
+ }
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 7f2e23bce1f7..fb2f8f2395d5 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -242,4 +242,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 06/22] cxl: Prepare memdev creation for type2
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (4 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 07/22] sfc: create type2 cxl memdev alejandro.lucero-palau
` (17 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.
Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.
Make devm_cxl_add_memdev accessible from an accel driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/memdev.c | 15 +++++++++++--
drivers/cxl/cxlmem.h | 6 ------
drivers/cxl/mem.c | 45 +++++++++++++++++++++++++++++----------
include/cxl/cxl.h | 6 ++++++
4 files changed, 53 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 2c5dd72f43ca..1b43763b8e20 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include "trace.h"
#include "core.h"
@@ -576,9 +577,16 @@ static const struct device_type cxl_memdev_type = {
.groups = cxl_memdev_attribute_groups,
};
+static const struct device_type cxl_accel_memdev_type = {
+ .name = "cxl_accel_memdev",
+ .release = cxl_memdev_release,
+ .devnode = cxl_memdev_devnode,
+};
+
bool is_cxl_memdev(const struct device *dev)
{
- return dev->type == &cxl_memdev_type;
+ return (dev->type == &cxl_memdev_type ||
+ dev->type == &cxl_accel_memdev_type);
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
@@ -781,7 +789,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
dev->parent = cxlds->dev;
dev->bus = &cxl_bus_type;
dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
+ if (cxlds->type == CXL_DEVTYPE_DEVMEM)
+ dev->type = &cxl_accel_memdev_type;
+ else
+ dev->type = &cxl_memdev_type;
device_set_pm_not_required(dev);
INIT_WORK(&cxlmd->detach_work, detach_memdev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 281546de426e..c98db6f18aa2 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -34,10 +34,6 @@
(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
CXLMDEV_RESET_NEEDED_NOT)
-struct cxl_memdev_attach {
- int (*probe)(struct cxl_memdev *cxlmd);
-};
-
/**
* struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
* @dev: driver core device object
@@ -103,8 +99,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
const struct cxl_memdev_attach *attach);
-struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
- const struct cxl_memdev_attach *attach);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 0958bea915ac..39687baedd1a 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -65,6 +65,26 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
cxl_debugfs_poison_clear, "%llx\n");
+static void cxl_memdev_poison_enable(struct cxl_memdev_state *mds,
+ struct cxl_memdev *cxlmd,
+ struct dentry *dentry)
+{
+ /*
+ * Avoid poison debugfs for DEVMEM aka accelerators as they rely on
+ * cxl_memdev_state.
+ */
+ if (!mds)
+ return;
+
+ if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
+ debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
+ &cxl_poison_inject_fops);
+
+ if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
+ debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
+ &cxl_poison_clear_fops);
+}
+
static int cxl_mem_probe(struct device *dev)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
@@ -92,12 +112,7 @@ static int cxl_mem_probe(struct device *dev)
dentry = cxl_debugfs_create_dir(dev_name(dev));
debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
- if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
- debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
- &cxl_poison_inject_fops);
- if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
- debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
- &cxl_poison_clear_fops);
+ cxl_memdev_poison_enable(mds, cxlmd, dentry);
rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
if (rc)
@@ -208,16 +223,24 @@ static ssize_t trigger_poison_list_store(struct device *dev,
}
static DEVICE_ATTR_WO(trigger_poison_list);
-static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+static bool cxl_poison_attr_visible(struct kobject *kobj, struct attribute *a)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
- if (a == &dev_attr_trigger_poison_list.attr)
- if (!test_bit(CXL_POISON_ENABLED_LIST,
- mds->poison.enabled_cmds))
- return 0;
+ if (!mds ||
+ !test_bit(CXL_POISON_ENABLED_LIST, mds->poison.enabled_cmds))
+ return false;
+
+ return true;
+}
+
+static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ if (a == &dev_attr_trigger_poison_list.attr &&
+ !cxl_poison_attr_visible(kobj, a))
+ return 0;
return a->mode;
}
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fb2f8f2395d5..6f8d365067af 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -153,6 +153,10 @@ struct cxl_dpa_partition {
#define CXL_NR_PARTITIONS_MAX 2
+struct cxl_memdev_attach {
+ int (*probe)(struct cxl_memdev *cxlmd);
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -243,4 +247,6 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_attach *attach);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 07/22] sfc: create type2 cxl memdev
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (5 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 06/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
` (16 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Fan Ni, Edward Cree,
Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 0b10a2e6aceb..a77ef4783fcb 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -84,6 +84,12 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return -ENODEV;
}
+ cxl->cxlmd = devm_cxl_add_memdev(&cxl->cxlds, NULL);
+ if (IS_ERR(cxl->cxlmd)) {
+ pci_err(pci_dev, "CXL accel memdev creation failed");
+ return PTR_ERR(cxl->cxlmd);
+ }
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (6 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 07/22] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 09/22] cxl: Add function for obtaining region range alejandro.lucero-palau
` (15 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
A Type2 device configured by the BIOS can already have its HDM
committed. Add a cxl_get_committed_decoder() function for cheking
so after memdev creation. A CXL region should have been created
during memdev initialization, therefore a Type2 driver can ask for
such a region for working with the HPA. If the HDM is not committed,
a Type2 driver will create the region after obtaining proper HPA
and DPA space.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/hdm.c | 39 +++++++++++++++++++++++++++++++++++++++
include/cxl/cxl.h | 3 +++
2 files changed, 42 insertions(+)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 6e516c69b2d2..a172ce4e9b19 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -686,6 +686,45 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
}
+static int find_committed_endpoint_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_port *port;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ port = cxled_to_port(cxled);
+
+ return cxled->cxld.id == port->hdm_end;
+}
+
+struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
+ struct cxl_region **cxlr)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct device *cxled_dev;
+
+ if (!endpoint)
+ return NULL;
+
+ guard(rwsem_read)(&cxl_rwsem.dpa);
+ cxled_dev = device_find_child(&endpoint->dev, NULL,
+ find_committed_endpoint_decoder);
+
+ if (!cxled_dev)
+ return NULL;
+
+ cxled = to_cxl_endpoint_decoder(cxled_dev);
+ *cxlr = cxled->cxld.region;
+
+ put_device(cxled_dev);
+ return cxled;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_committed_decoder, "CXL");
+
static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
{
u16 eig;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 6f8d365067af..928276dba952 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -249,4 +249,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
const struct cxl_memdev_attach *attach);
+struct cxl_region;
+struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
+ struct cxl_region **cxlr);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 09/22] cxl: Add function for obtaining region range
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (7 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
` (14 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A CXL region struct contains the physical address to work with.
Type2 drivers can create a CXL region but have not access to the
related struct as it is defined as private by the kernel CXL core.
Add a function for getting the cxl region range to be used for mapping
such memory range by a Type2 driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
include/cxl/cxl.h | 2 ++
2 files changed, 25 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 96888d87a8df..acf29ba3b205 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2621,6 +2621,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(rc);
}
+/**
+ * cxl_get_region_range - obtain range linked to a CXL region
+ *
+ * @region: a pointer to struct cxl_region
+ * @range: a pointer to a struct range to be set
+ *
+ * Returns 0 or error.
+ */
+int cxl_get_region_range(struct cxl_region *region, struct range *range)
+{
+ if (WARN_ON_ONCE(!region))
+ return -ENODEV;
+
+ if (!region->params.res)
+ return -ENOSPC;
+
+ range->start = region->params.res->start;
+ range->end = region->params.res->end;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_region_range, "CXL");
+
static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
{
return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 928276dba952..906065e0d2a6 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -252,4 +252,6 @@ struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
struct cxl_region;
struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
struct cxl_region **cxlr);
+struct range;
+int cxl_get_region_range(struct cxl_region *region, struct range *range);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (8 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 09/22] cxl: Add function for obtaining region range alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-19 23:16 ` Dave Jiang
2026-02-21 4:48 ` Gregory Price
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
` (13 subsequent siblings)
23 siblings, 2 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Add cxl_unregister_region() to the accelerator driver API
for a clean exit.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/region.c | 17 ++++++++++++-----
include/cxl/cxl.h | 1 +
2 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index acf29ba3b205..954b8fcdbac6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2438,9 +2438,8 @@ static struct cxl_region *to_cxl_region(struct device *dev)
return container_of(dev, struct cxl_region, dev);
}
-static void unregister_region(void *_cxlr)
+void cxl_unregister_region(struct cxl_region *cxlr)
{
- struct cxl_region *cxlr = _cxlr;
struct cxl_region_params *p = &cxlr->params;
int i;
@@ -2457,6 +2456,14 @@ static void unregister_region(void *_cxlr)
cxl_region_iomem_release(cxlr);
put_device(&cxlr->dev);
}
+EXPORT_SYMBOL_NS_GPL(cxl_unregister_region, "CXL");
+
+static void __unregister_region(void *_cxlr)
+{
+ struct cxl_region *cxlr = _cxlr;
+
+ return cxl_unregister_region(cxlr);
+}
static struct lock_class_key cxl_region_key;
@@ -2608,7 +2615,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
if (rc)
goto err;
- rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
+ rc = devm_add_action_or_reset(port->uport_dev, __unregister_region, cxlr);
if (rc)
return ERR_PTR(rc);
@@ -2762,7 +2769,7 @@ static ssize_t delete_region_store(struct device *dev,
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ devm_release_action(port->uport_dev, __unregister_region, cxlr);
put_device(&cxlr->dev);
return len;
@@ -3878,7 +3885,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
rc = __construct_region(cxlr, cxlrd, cxled);
if (rc) {
- devm_release_action(port->uport_dev, unregister_region, cxlr);
+ devm_release_action(port->uport_dev, __unregister_region, cxlr);
return ERR_PTR(rc);
}
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 906065e0d2a6..92880c26b2d5 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -254,4 +254,5 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
struct cxl_region **cxlr);
struct range;
int cxl_get_region_range(struct cxl_region *region, struct range *range);
+void cxl_unregister_region(struct cxl_region *cxlr);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (9 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
` (2 more replies)
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
` (12 subsequent siblings)
23 siblings, 3 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Check if device HDM is already committed during firmware/BIOS
initialization.
A CXL region should exist if so after memdev allocation/initialization.
Get HPA from region and map it.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 28 +++++++++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index a77ef4783fcb..3536eccf1b2a 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -19,6 +19,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
+ struct range range;
u16 dvsec;
int rc;
@@ -90,13 +91,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxlmd);
}
- probe_data->cxl = cxl;
+ cxl->cxled = cxl_get_committed_decoder(cxl->cxlmd, &cxl->efx_region);
+ if (cxl->cxled) {
+ if (!cxl->efx_region) {
+ pci_err(pci_dev, "CXL found committed decoder without a region");
+ return -ENODEV;
+ }
+ rc = cxl_get_region_range(cxl->efx_region, &range);
+ if (rc) {
+ pci_err(pci_dev,
+ "CXL getting regions params from a committed decoder failed");
+ return rc;
+ }
+
+ cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
+ return -ENOMEM;
+ }
+
+ probe_data->cxl = cxl;
+ }
return 0;
}
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
+ if (!probe_data->cxl)
+ return;
+
+ iounmap(probe_data->cxl->ctpio_cxl);
+ cxl_unregister_region(probe_data->cxl->efx_region);
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (10 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
` (2 more replies)
2026-02-01 15:54 ` [PATCH v23 13/22] sfc: get root decoder alejandro.lucero-palau
` (11 subsequent siblings)
23 siblings, 3 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
CXL region creation involves allocating capacity from Device Physical
Address (DPA) and assigning it to decode a given Host Physical Address
(HPA). Before determining how much DPA to allocate the amount of available
HPA must be determined. Also, not all HPA is created equal, some HPA
targets RAM, some targets PMEM, some is prepared for device-memory flows
like HDM-D and HDM-DB, and some is HDM-H (host-only).
In order to support Type2 CXL devices, wrap all of those concerns into
an API that retrieves a root decoder (platform CXL window) that fits the
specified constraints and the capacity available for a new region.
Add a complementary function for releasing the reference to such root
decoder.
Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 3 +
include/cxl/cxl.h | 6 ++
3 files changed, 173 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 954b8fcdbac6..bdefd088f5f1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
return 0;
}
+struct cxlrd_max_context {
+ struct device * const *host_bridges;
+ int interleave_ways;
+ unsigned long flags;
+ resource_size_t max_hpa;
+ struct cxl_root_decoder *cxlrd;
+};
+
+static int find_max_hpa(struct device *dev, void *data)
+{
+ struct cxlrd_max_context *ctx = data;
+ struct cxl_switch_decoder *cxlsd;
+ struct cxl_root_decoder *cxlrd;
+ struct resource *res, *prev;
+ struct cxl_decoder *cxld;
+ resource_size_t free = 0;
+ resource_size_t max;
+ int found = 0;
+
+ if (!is_root_decoder(dev))
+ return 0;
+
+ cxlrd = to_cxl_root_decoder(dev);
+ cxlsd = &cxlrd->cxlsd;
+ cxld = &cxlsd->cxld;
+
+ if ((cxld->flags & ctx->flags) != ctx->flags) {
+ dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
+ cxld->flags, ctx->flags);
+ return 0;
+ }
+
+ for (int i = 0; i < ctx->interleave_ways; i++) {
+ for (int j = 0; j < ctx->interleave_ways; j++) {
+ if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
+ found++;
+ break;
+ }
+ }
+ }
+
+ if (found != ctx->interleave_ways) {
+ dev_dbg(dev,
+ "Not enough host bridges. Found %d for %d interleave ways requested\n",
+ found, ctx->interleave_ways);
+ return 0;
+ }
+
+ /*
+ * Walk the root decoder resource range relying on cxl_rwsem.region to
+ * preclude sibling arrival/departure and find the largest free space
+ * gap.
+ */
+ lockdep_assert_held_read(&cxl_rwsem.region);
+ res = cxlrd->res->child;
+
+ /* With no resource child the whole parent resource is available */
+ if (!res)
+ max = resource_size(cxlrd->res);
+ else
+ max = 0;
+
+ for (prev = NULL; res; prev = res, res = res->sibling) {
+ if (!prev && res->start == cxlrd->res->start &&
+ res->end == cxlrd->res->end) {
+ max = resource_size(cxlrd->res);
+ break;
+ }
+ /*
+ * Sanity check for preventing arithmetic problems below as a
+ * resource with size 0 could imply using the end field below
+ * when set to unsigned zero - 1 or all f in hex.
+ */
+ if (prev && !resource_size(prev))
+ continue;
+
+ if (!prev && res->start > cxlrd->res->start) {
+ free = res->start - cxlrd->res->start;
+ max = max(free, max);
+ }
+ if (prev && res->start > prev->end + 1) {
+ free = res->start - prev->end + 1;
+ max = max(free, max);
+ }
+ }
+
+ if (prev && prev->end + 1 < cxlrd->res->end + 1) {
+ free = cxlrd->res->end + 1 - prev->end + 1;
+ max = max(free, max);
+ }
+
+ dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
+ if (max > ctx->max_hpa) {
+ if (ctx->cxlrd)
+ put_device(cxlrd_dev(ctx->cxlrd));
+ get_device(cxlrd_dev(cxlrd));
+ ctx->cxlrd = cxlrd;
+ ctx->max_hpa = max;
+ }
+ return 0;
+}
+
+/**
+ * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
+ * @cxlmd: the mem device requiring the HPA
+ * @interleave_ways: number of entries in @host_bridges
+ * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
+ * @max_avail_contig: output parameter of max contiguous bytes available in the
+ * returned decoder
+ *
+ * Returns a pointer to a struct cxl_root_decoder
+ *
+ * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
+ * in (@max_avail_contig))' is a point in time snapshot. If by the time the
+ * caller goes to use this decoder and its capacity is reduced then caller needs
+ * to loop and retry.
+ *
+ * The returned root decoder has an elevated reference count that needs to be
+ * put with cxl_put_root_decoder(cxlrd).
+ */
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max_avail_contig)
+{
+ struct cxlrd_max_context ctx = {
+ .flags = flags,
+ .interleave_ways = interleave_ways,
+ };
+ struct cxl_port *root_port;
+ struct cxl_port *endpoint;
+
+ endpoint = cxlmd->endpoint;
+ if (!endpoint) {
+ dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ ctx.host_bridges = &endpoint->host_bridge;
+
+ struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
+ if (!root) {
+ dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ root_port = &root->port;
+ scoped_guard(rwsem_read, &cxl_rwsem.region)
+ device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
+
+ if (!ctx.cxlrd)
+ return ERR_PTR(-ENOMEM);
+
+ *max_avail_contig = ctx.max_hpa;
+ return ctx.cxlrd;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
+
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
+{
+ put_device(cxlrd_dev(cxlrd));
+}
+EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
+
static ssize_t size_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 944c5d1ccceb..c7d9b2c2908f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
+
+#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
+
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 92880c26b2d5..834dc7e78934 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
struct range;
int cxl_get_region_range(struct cxl_region *region, struct range *range);
void cxl_unregister_region(struct cxl_region *cxlr);
+struct cxl_port;
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max);
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 13/22] sfc: get root decoder
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (11 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
` (10 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting HPA (Host Physical Address) to use from a
CXL root decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/cxl.h | 15 ---------------
drivers/net/ethernet/sfc/Kconfig | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 26 +++++++++++++++++++++++---
drivers/net/ethernet/sfc/efx_cxl.h | 1 +
include/cxl/cxl.h | 15 +++++++++++++++
5 files changed, 40 insertions(+), 18 deletions(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index c7d9b2c2908f..d1b010e5e1d0 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -220,21 +220,6 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
#define CXL_TARGET_STRLEN 20
-/*
- * cxl_decoder flags that define the type of memory / devices this
- * decoder supports as well as configuration lock status See "CXL 2.0
- * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
- * Additionally indicate whether decoder settings were autodetected,
- * user customized.
- */
-#define CXL_DECODER_F_RAM BIT(0)
-#define CXL_DECODER_F_PMEM BIT(1)
-#define CXL_DECODER_F_TYPE2 BIT(2)
-#define CXL_DECODER_F_TYPE3 BIT(3)
-#define CXL_DECODER_F_LOCK BIT(4)
-#define CXL_DECODER_F_ENABLE BIT(5)
-#define CXL_DECODER_F_MASK GENMASK(5, 0)
-
enum cxl_decoder_type {
CXL_DECODER_DEVMEM = 2,
CXL_DECODER_HOSTONLYMEM = 3,
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index 979f2801e2a8..e959d9b4f4ce 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
config SFC_CXL
bool "Solarflare SFC9100-family CXL support"
depends on SFC && CXL_BUS >= SFC
+ depends on CXL_REGION
default SFC
help
This enables SFC CXL support if the kernel is configuring CXL for
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 3536eccf1b2a..1a4c1097c315 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
{
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
+ resource_size_t max_size;
struct efx_cxl *cxl;
struct range range;
u16 dvsec;
@@ -110,9 +111,24 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return -ENOMEM;
}
- probe_data->cxl = cxl;
+ cxl->hdm_was_committed = true;
+ } else {
+ cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1, CXL_DECODER_F_RAM |
+ CXL_DECODER_F_TYPE2, &max_size);
+ if (IS_ERR(cxl->cxlrd)) {
+ dev_err(&pci_dev->dev, "cxl_get_hpa_freespace failed\n");
+ return PTR_ERR(cxl->cxlrd);
+ }
+
+ if (max_size < EFX_CTPIO_BUFFER_SIZE) {
+ dev_err(&pci_dev->dev, "%s: not enough free HPA space %pap < %u\n",
+ __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
+ cxl_put_root_decoder(cxl->cxlrd);
+ return -ENOSPC;
+ }
}
+ probe_data->cxl = cxl;
return 0;
}
@@ -121,8 +137,12 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
if (!probe_data->cxl)
return;
- iounmap(probe_data->cxl->ctpio_cxl);
- cxl_unregister_region(probe_data->cxl->efx_region);
+ if (probe_data->cxl->hdm_was_committed) {
+ iounmap(probe_data->cxl->ctpio_cxl);
+ cxl_unregister_region(probe_data->cxl->efx_region);
+ } else {
+ cxl_put_root_decoder(probe_data->cxl->cxlrd);
+ }
}
MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
index 961639cef692..9a92e386695b 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.h
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -27,6 +27,7 @@ struct efx_cxl {
struct cxl_root_decoder *cxlrd;
struct cxl_port *endpoint;
struct cxl_endpoint_decoder *cxled;
+ bool hdm_was_committed;
struct cxl_region *efx_region;
void __iomem *ctpio_cxl;
};
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 834dc7e78934..783ad570a6eb 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -153,6 +153,21 @@ struct cxl_dpa_partition {
#define CXL_NR_PARTITIONS_MAX 2
+/*
+ * cxl_decoder flags that define the type of memory / devices this
+ * decoder supports as well as configuration lock status See "CXL 2.0
+ * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
+ * Additionally indicate whether decoder settings were autodetected,
+ * user customized.
+ */
+#define CXL_DECODER_F_RAM BIT(0)
+#define CXL_DECODER_F_PMEM BIT(1)
+#define CXL_DECODER_F_TYPE2 BIT(2)
+#define CXL_DECODER_F_TYPE3 BIT(3)
+#define CXL_DECODER_F_LOCK BIT(4)
+#define CXL_DECODER_F_ENABLE BIT(5)
+#define CXL_DECODER_F_MASK GENMASK(5, 0)
+
struct cxl_memdev_attach {
int (*probe)(struct cxl_memdev *cxlmd);
};
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (12 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 13/22] sfc: get root decoder alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
` (2 more replies)
2026-02-01 15:54 ` [PATCH v23 15/22] sfc: get endpoint decoder alejandro.lucero-palau
` (9 subsequent siblings)
23 siblings, 3 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Region creation involves finding available DPA (device-physical-address)
capacity to map into HPA (host-physical-address) space.
In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
that tries to allocate the DPA memory the driver requires to operate.The
memory requested should not be bigger than the max available HPA obtained
previously with cxl_get_hpa_freespace().
Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/hdm.c | 84 ++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 1 +
include/cxl/cxl.h | 5 +++
3 files changed, 90 insertions(+)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index a172ce4e9b19..d60a697f12cc 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -3,6 +3,7 @@
#include <linux/seq_file.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <cxl/cxl.h>
#include "cxlmem.h"
#include "core.h"
@@ -546,6 +547,12 @@ bool cxl_resource_contains_addr(const struct resource *res, const resource_size_
return resource_contains(res, &_addr);
}
+/**
+ * cxl_dpa_free - release DPA (Device Physical Address)
+ * @cxled: endpoint decoder linked to the DPA
+ *
+ * Returns 0 or error.
+ */
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
{
struct cxl_port *port = cxled_to_port(cxled);
@@ -572,6 +579,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
devm_cxl_dpa_release(cxled);
return 0;
}
+EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode)
@@ -603,6 +611,82 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
return 0;
}
+static int find_free_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_port *port;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ port = cxled_to_port(cxled);
+
+ return cxled->cxld.id == (port->hdm_end + 1);
+}
+
+static struct cxl_endpoint_decoder *
+cxl_find_free_decoder(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct device *dev;
+
+ guard(rwsem_read)(&cxl_rwsem.dpa);
+ dev = device_find_child(&endpoint->dev, NULL,
+ find_free_decoder);
+ if (!dev)
+ return NULL;
+
+ return to_cxl_endpoint_decoder(dev);
+}
+
+/**
+ * cxl_request_dpa - search and reserve DPA given input constraints
+ * @cxlmd: memdev with an endpoint port with available decoders
+ * @mode: CXL partition mode (ram vs pmem)
+ * @alloc: dpa size required
+ *
+ * Returns a pointer to a 'struct cxl_endpoint_decoder' on success or
+ * an errno encoded pointer on failure.
+ *
+ * Given that a region needs to allocate from limited HPA capacity it
+ * may be the case that a device has more mappable DPA capacity than
+ * available HPA. The expectation is that @alloc is a driver known
+ * value based on the device capacity but which could not be fully
+ * available due to HPA constraints.
+ *
+ * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
+ * reserved, or an error pointer. The caller is also expected to own the
+ * lifetime of the memdev registration associated with the endpoint to
+ * pin the decoder registered as well.
+ */
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc)
+{
+ int rc;
+
+ if (!IS_ALIGNED(alloc, SZ_256M))
+ return ERR_PTR(-EINVAL);
+
+ struct cxl_endpoint_decoder *cxled __free(put_cxled) =
+ cxl_find_free_decoder(cxlmd);
+
+ if (!cxled)
+ return ERR_PTR(-ENODEV);
+
+ rc = cxl_dpa_set_part(cxled, mode);
+ if (rc)
+ return ERR_PTR(rc);
+
+ rc = cxl_dpa_alloc(cxled, alloc);
+ if (rc)
+ return ERR_PTR(rc);
+
+ return no_free_ptr(cxled);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
+
static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d1b010e5e1d0..2b1f7d687a0e 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -667,6 +667,7 @@ struct cxl_root *find_cxl_root(struct cxl_port *port);
DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
+DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxld.dev))
DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev))
DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 783ad570a6eb..4802371db00e 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -7,6 +7,7 @@
#include <linux/node.h>
#include <linux/ioport.h>
+#include <linux/range.h>
#include <cxl/mailbox.h>
/**
@@ -276,4 +277,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
unsigned long flags,
resource_size_t *max);
void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc);
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 15/22] sfc: get endpoint decoder
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (13 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 16/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
` (8 subsequent siblings)
23 siblings, 0 replies; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 1a4c1097c315..2cfd0a46225f 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -126,6 +126,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
cxl_put_root_decoder(cxl->cxlrd);
return -ENOSPC;
}
+
+ cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
+ EFX_CTPIO_BUFFER_SIZE);
+ if (IS_ERR(cxl->cxled)) {
+ pci_err(pci_dev, "CXL accel request DPA failed");
+ cxl_put_root_decoder(cxl->cxlrd);
+ return PTR_ERR(cxl->cxled);
+ }
}
probe_data->cxl = cxl;
@@ -141,6 +149,7 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
iounmap(probe_data->cxl->ctpio_cxl);
cxl_unregister_region(probe_data->cxl->efx_region);
} else {
+ cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 16/22] cxl: Make region type based on endpoint type
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (14 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 15/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 17/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
` (7 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield, Davidlohr Bueso
From: Alejandro Lucero <alucerop@amd.com>
Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
Support for Type2 implies region type needs to be based on the endpoint
type HDM-D[B] instead.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
---
drivers/cxl/core/region.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index bdefd088f5f1..f53b2e9fd9e6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2833,7 +2833,8 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode, int id,
+ enum cxl_decoder_type target_type)
{
int rc;
@@ -2855,7 +2856,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode, target_type);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
@@ -2869,7 +2870,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -4036,7 +4037,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
do {
cxlr = __create_region(cxlrd, cxlds->part[part].mode,
- atomic_read(&cxlrd->region_id));
+ atomic_read(&cxlrd->region_id),
+ cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
if (IS_ERR(cxlr)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 17/22] cxl/region: Factor out interleave ways setup
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (15 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 16/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
` (6 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup for interleave ways.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/region.c | 43 ++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f53b2e9fd9e6..ece1d3df7cf1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -485,22 +485,14 @@ static ssize_t interleave_ways_show(struct device *dev,
static const struct attribute_group *get_cxl_region_target_group(void);
-static ssize_t interleave_ways_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_ways(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- unsigned int val, save;
- int rc;
+ int save, rc;
u8 iw;
- rc = kstrtouint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = ways_to_eiw(val, &iw);
if (rc)
return rc;
@@ -515,9 +507,7 @@ static ssize_t interleave_ways_store(struct device *dev,
return -EINVAL;
}
- ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
- if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
- return rc;
+ lockdep_assert_held_write(&cxl_rwsem.region);
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
return -EBUSY;
@@ -525,10 +515,31 @@ static ssize_t interleave_ways_store(struct device *dev,
save = p->interleave_ways;
p->interleave_ways = val;
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
- if (rc) {
+ if (rc)
p->interleave_ways = save;
+
+ return rc;
+}
+
+static ssize_t interleave_ways_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ unsigned int val;
+ int rc;
+
+ rc = kstrtouint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
+ if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
+ return rc;
+
+ rc = set_interleave_ways(cxlr, val);
+ if (rc)
return rc;
- }
return len;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (16 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 17/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 19/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
` (5 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup forinterleave
granularity.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/region.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index ece1d3df7cf1..63c2aeb2ee1f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -559,21 +559,14 @@ static ssize_t interleave_granularity_show(struct device *dev,
return sysfs_emit(buf, "%d\n", p->interleave_granularity);
}
-static ssize_t interleave_granularity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_granularity(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- int rc, val;
+ int rc;
u16 ig;
- rc = kstrtoint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = granularity_to_eig(val, &ig);
if (rc)
return rc;
@@ -589,14 +582,32 @@ static ssize_t interleave_granularity_store(struct device *dev,
if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
return -EINVAL;
- ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
- if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
- return rc;
-
+ lockdep_assert_held_write(&cxl_rwsem.region);
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
return -EBUSY;
p->interleave_granularity = val;
+ return 0;
+}
+
+static ssize_t interleave_granularity_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ int rc, val;
+
+ rc = kstrtoint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
+ if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
+ return rc;
+
+ rc = set_interleave_granularity(cxlr, val);
+ if (rc)
+ return rc;
return len;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 19/22] cxl: Allow region creation by type2 drivers
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (17 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 20/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
` (4 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Creating a CXL region requires userspace intervention through the cxl
sysfs files. Type2 support should allow accelerator drivers to create
such cxl region from kernel code.
Adding that functionality and integrating it with current support for
memory expanders.
Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 131 ++++++++++++++++++++++++++++++++++++--
include/cxl/cxl.h | 3 +
2 files changed, 127 insertions(+), 7 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 63c2aeb2ee1f..293e63dfef22 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2944,6 +2944,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
return to_cxl_region(region_dev);
}
+static void drop_region(struct cxl_region *cxlr)
+{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+
+ devm_release_action(port->uport_dev, __unregister_region, cxlr);
+}
+
static ssize_t delete_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -4047,14 +4055,12 @@ static int __construct_region(struct cxl_region *cxlr,
return 0;
}
-/* Establish an empty region covering the given HPA range */
-static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled)
+static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- int rc, part = READ_ONCE(cxled->part);
+ int part = READ_ONCE(cxled->part);
struct cxl_region *cxlr;
do {
@@ -4063,13 +4069,26 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
- if (IS_ERR(cxlr)) {
+ if (IS_ERR(cxlr))
dev_err(cxlmd->dev.parent,
"%s:%s: %s failed assign region: %ld\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
__func__, PTR_ERR(cxlr));
+
+ return cxlr;
+}
+
+/* Establish an empty region covering the given HPA range */
+static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
+{
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_region *cxlr;
+ int rc;
+
+ cxlr = construct_region_begin(cxlrd, cxled);
+ if (IS_ERR(cxlr))
return cxlr;
- }
rc = __construct_region(cxlr, cxlrd, cxled);
if (rc) {
@@ -4080,6 +4099,104 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+DEFINE_FREE(cxl_region_drop, struct cxl_region *, if (_T) drop_region(_T))
+
+static struct cxl_region *
+__construct_new_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled, int ways)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled[0]);
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+ struct cxl_region_params *p;
+ resource_size_t size = 0;
+ int rc, i;
+
+ struct cxl_region *cxlr __free(cxl_region_drop) =
+ construct_region_begin(cxlrd, cxled[0]);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ guard(rwsem_write)(&cxl_rwsem.region);
+
+ /*
+ * Sanity check. This should not happen with an accel driver handling
+ * the region creation.
+ */
+ p = &cxlr->params;
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+ dev_err(cxlmd->dev.parent,
+ "%s:%s: %s unexpected region state\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled[0]->cxld.dev),
+ __func__);
+ return ERR_PTR(-EBUSY);
+ }
+
+ rc = set_interleave_ways(cxlr, ways);
+ if (rc)
+ return ERR_PTR(rc);
+
+ rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
+ if (rc)
+ return ERR_PTR(rc);
+
+ scoped_guard(rwsem_read, &cxl_rwsem.dpa) {
+ for (i = 0; i < ways; i++) {
+ if (!cxled[i]->dpa_res)
+ return ERR_PTR(-EINVAL);
+ size += resource_size(cxled[i]->dpa_res);
+ }
+
+ rc = alloc_hpa(cxlr, size);
+ if (rc)
+ return ERR_PTR(rc);
+
+ for (i = 0; i < ways; i++) {
+ rc = cxl_region_attach(cxlr, cxled[i], 0);
+ if (rc)
+ return ERR_PTR(rc);
+ }
+ }
+
+ rc = cxl_region_decode_commit(cxlr);
+ if (rc)
+ return ERR_PTR(rc);
+
+ p->state = CXL_CONFIG_COMMIT;
+
+ return no_free_ptr(cxlr);
+}
+
+/**
+ * cxl_create_region - Establish a region given an endpoint decoder
+ * @cxlrd: root decoder to allocate HPA
+ * @cxled: endpoint decoders with reserved DPA capacity
+ * @ways: interleave ways required
+ *
+ * Returns a fully formed region in the commit state and attached to the
+ * cxl_region driver.
+ */
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways)
+{
+ struct cxl_region *cxlr;
+
+ mutex_lock(&cxlrd->range_lock);
+ cxlr = __construct_new_region(cxlrd, cxled, ways);
+ mutex_unlock(&cxlrd->range_lock);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ if (device_attach(&cxlr->dev) <= 0) {
+ dev_err(&cxlr->dev, "failed to create region\n");
+ drop_region(cxlr);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return cxlr;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
+
static struct cxl_region *
cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
{
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 4802371db00e..50acbd13bcf8 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -281,4 +281,7 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
enum cxl_partition_mode mode,
resource_size_t alloc);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 20/22] cxl: Avoid dax creation for accelerators
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (18 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 19/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 21/22] sfc: create cxl region alejandro.lucero-palau
` (3 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Davidlohr Bueso, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 293e63dfef22..12df717cc881 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -4441,6 +4441,13 @@ static int cxl_region_probe(struct device *dev)
if (rc)
return rc;
+ /*
+ * HDM-D[B] (device-memory) regions have accelerator specific usage.
+ * Skip device-dax registration.
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM)
+ return 0;
+
/*
* From this point on any path that changes the region's state away from
* CXL_CONFIG_COMMIT is also responsible for releasing the driver.
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 21/22] sfc: create cxl region
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (19 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 20/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-01 15:54 ` [PATCH v23 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
` (2 subsequent siblings)
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for creating a region using the endpoint decoder related to
a DPA range.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 2cfd0a46225f..4d5f3974e51d 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -134,6 +134,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
cxl_put_root_decoder(cxl->cxlrd);
return PTR_ERR(cxl->cxled);
}
+
+ cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1);
+ if (IS_ERR(cxl->efx_region)) {
+ pci_err(pci_dev, "CXL accel create region failed");
+ cxl_put_root_decoder(cxl->cxlrd);
+ cxl_dpa_free(cxl->cxled);
+ return PTR_ERR(cxl->efx_region);
+ }
}
probe_data->cxl = cxl;
@@ -147,11 +155,11 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
if (probe_data->cxl->hdm_was_committed) {
iounmap(probe_data->cxl->ctpio_cxl);
- cxl_unregister_region(probe_data->cxl->efx_region);
} else {
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
+ cxl_unregister_region(probe_data->cxl->efx_region);
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH v23 22/22] sfc: support pio mapping based on cxl
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (20 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 21/22] sfc: create cxl region alejandro.lucero-palau
@ 2026-02-01 15:54 ` alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-11 22:12 ` [PATCH v23 00/22] Type2 device basic support Cheatham, Benjamin
2026-03-09 22:43 ` PJ Waskiewicz
23 siblings, 1 reply; 67+ messages in thread
From: alejandro.lucero-palau @ 2026-02-01 15:54 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.
With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.
Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL code happens.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/ef10.c | 50 +++++++++++++++++++++++----
drivers/net/ethernet/sfc/efx_cxl.c | 33 ++++++++++++++----
drivers/net/ethernet/sfc/net_driver.h | 2 ++
drivers/net/ethernet/sfc/nic.h | 3 ++
4 files changed, 75 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index fcec81f862ec..2bb6d3136c7c 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -24,6 +24,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/udp_tunnel.h>
+#include "efx_cxl.h"
/* Hardware control for EF10 architecture including 'Huntington'. */
@@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
{
- MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
+ MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
efx->num_mac_stats);
}
+ if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
+ nic_data->datapath_caps3 = 0;
+ else
+ nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
+ GET_CAPABILITIES_V7_OUT_FLAGS3);
+
return 0;
}
@@ -919,6 +926,9 @@ static void efx_ef10_forget_old_piobufs(struct efx_nic *efx)
static void efx_ef10_remove(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
int rc;
#ifdef CONFIG_SFC_SRIOV
@@ -949,7 +959,12 @@ static void efx_ef10_remove(struct efx_nic *efx)
efx_mcdi_rx_free_indir_table(efx);
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
+#else
if (nic_data->wc_membase)
+#endif
iounmap(nic_data->wc_membase);
rc = efx_mcdi_free_vis(efx);
@@ -1140,6 +1155,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
unsigned int channel_vis, pio_write_vi_base, max_vis;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
unsigned int uc_mem_map_size, wc_mem_map_size;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
void __iomem *membase;
int rc;
@@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
iounmap(efx->membase);
efx->membase = membase;
- /* Set up the WC mapping if needed */
- if (wc_mem_map_size) {
+ if (!wc_mem_map_size)
+ goto skip_pio;
+
+ /* Set up the WC mapping */
+
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if ((nic_data->datapath_caps3 &
+ (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
+ probe_data->cxl_pio_initialised) {
+ /* Using PIO through CXL mapping? */
+ nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
+ (pio_write_vi_base * efx->vi_stride +
+ ER_DZ_TX_PIOBUF - uc_mem_map_size);
+ probe_data->cxl_pio_in_use = true;
+ } else
+#endif
+ {
+ /* Using legacy PIO BAR mapping */
nic_data->wc_membase = ioremap_wc(efx->membase_phys +
uc_mem_map_size,
wc_mem_map_size);
@@ -1279,12 +1314,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
nic_data->wc_membase +
(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
uc_mem_map_size);
-
- rc = efx_ef10_link_piobufs(efx);
- if (rc)
- efx_ef10_free_piobufs(efx);
}
+ rc = efx_ef10_link_piobufs(efx);
+ if (rc)
+ efx_ef10_free_piobufs(efx);
+
+skip_pio:
netif_dbg(efx, probe, efx->net_dev,
"memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
&efx->membase_phys, efx->membase, uc_mem_map_size,
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 4d5f3974e51d..c13e1f2bf7ea 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -11,6 +11,7 @@
#include <cxl/pci.h>
#include "net_driver.h"
#include "efx_cxl.h"
+#include "efx.h"
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
@@ -138,14 +139,34 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1);
if (IS_ERR(cxl->efx_region)) {
pci_err(pci_dev, "CXL accel create region failed");
- cxl_put_root_decoder(cxl->cxlrd);
- cxl_dpa_free(cxl->cxled);
- return PTR_ERR(cxl->efx_region);
+ rc = PTR_ERR(cxl->efx_region);
+ goto err_region;
+ }
+
+ rc = cxl_get_region_range(cxl->efx_region, &range);
+ if (rc) {
+ pci_err(pci_dev, "CXL getting regions params failed");
+ goto err_map;
+ }
+
+ cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
+ rc = -ENOMEM;
+ goto err_map;
}
}
probe_data->cxl = cxl;
+ probe_data->cxl_pio_initialised = true;
return 0;
+
+err_map:
+ cxl_unregister_region(cxl->efx_region);
+err_region:
+ cxl_put_root_decoder(cxl->cxlrd);
+ cxl_dpa_free(cxl->cxled);
+ return rc;
}
void efx_cxl_exit(struct efx_probe_data *probe_data)
@@ -153,9 +174,9 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
if (!probe_data->cxl)
return;
- if (probe_data->cxl->hdm_was_committed) {
- iounmap(probe_data->cxl->ctpio_cxl);
- } else {
+ iounmap(probe_data->cxl->ctpio_cxl);
+
+ if (!probe_data->cxl->hdm_was_committed) {
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 3964b2c56609..bea4eecdf842 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1207,6 +1207,7 @@ struct efx_cxl;
* @efx: Efx NIC details
* @cxl: details of related cxl objects
* @cxl_pio_initialised: cxl initialization outcome.
+ * @cxl_pio_in_use: PIO using CXL mapping
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
@@ -1214,6 +1215,7 @@ struct efx_probe_data {
#ifdef CONFIG_SFC_CXL
struct efx_cxl *cxl;
bool cxl_pio_initialised;
+ bool cxl_pio_in_use;
#endif
};
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 9fa5c4c713ab..c87cc9214690 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -152,6 +152,8 @@ enum {
* %MC_CMD_GET_CAPABILITIES response)
* @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
* %MC_CMD_GET_CAPABILITIES response)
+ * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
+ * %MC_CMD_GET_CAPABILITIES response)
* @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
* @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
* @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
@@ -186,6 +188,7 @@ struct efx_ef10_nic_data {
bool must_check_datapath_caps;
u32 datapath_caps;
u32 datapath_caps2;
+ u32 datapath_caps3;
unsigned int rx_dpcpu_fw_id;
unsigned int tx_dpcpu_fw_id;
bool must_probe_vswitching;
--
2.34.1
^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH v23 20/22] cxl: Avoid dax creation for accelerators
2026-02-01 15:54 ` [PATCH v23 20/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 10:50 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:10 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, Davidlohr Bueso, linux-cxl,
netdev, dan.j.williams, edward.cree, davem, kuba, pabeni,
edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> By definition a type2 cxl device will use the host managed memory for
> specific functionality, therefore it should not be available to other
> uses.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
> drivers/cxl/core/region.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 293e63dfef22..12df717cc881 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4441,6 +4441,13 @@ static int cxl_region_probe(struct device *dev)
> if (rc)
> return rc;
>
> + /*
> + * HDM-D[B] (device-memory) regions have accelerator specific usage.
> + * Skip device-dax registration.
> + */
> + if (cxlr->type == CXL_DECODER_DEVMEM)
> + return 0;
Minor nit: Should probably move this to be the first thing in the function. It would save
having to acquire a lock in cxl_region_can_probe() above. Keep my reviewed-by either way,
it's really just a minor optimization.
> +
> /*
> * From this point on any path that changes the region's state away from
> * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 9:58 ` Alejandro Lucero Palau
2026-02-20 15:42 ` Dave Jiang
2026-02-26 16:13 ` Alejandro Lucero Palau
2 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:10 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from Device Physical
> Address (DPA) and assigning it to decode a given Host Physical Address
> (HPA). Before determining how much DPA to allocate the amount of available
> HPA must be determined. Also, not all HPA is created equal, some HPA
> targets RAM, some targets PMEM, some is prepared for device-memory flows
> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 6 ++
> 3 files changed, 173 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 954b8fcdbac6..bdefd088f5f1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t free = 0;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + if ((cxld->flags & ctx->flags) != ctx->flags) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
This may be over complicated. I'm not quite sure how it works (I'm just slow today I guess), but I understand
what the intention is based on the debug print below. My issue is that ctx->host_bridges is only set to 1 host
bridge (endpoint->host_bridge) in cxl_get_hpa_freespace(), which is the only caller of this function. At that
point, why have the outer loop at all? At that point, you could also simplify ctx->host_bridges to only
be a struct device * const.
Maybe this gets called elsewhere later on in the series? I haven't looked at the rest yet. If I'm wrong, then
I'd probably add a comment saying what the cxlsd->target[] entries are supposed to be pointing at.
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_rwsem.region to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_rwsem.region);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + if (!prev && res->start == cxlrd->res->start &&
> + res->end == cxlrd->res->end) {
> + max = resource_size(cxlrd->res);
> + break;
> + }
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + }
> +
> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - prev->end + 1;
> + max = max(free, max);
> + }
> +
> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(cxlrd_dev(ctx->cxlrd));
> + get_device(cxlrd_dev(cxlrd));
> + ctx->cxlrd = cxlrd;
> + ctx->max_hpa = max;
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @cxlmd: the mem device requiring the HPA
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this decoder and its capacity is reduced then caller needs
> + * to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxlrd_max_context ctx = {
> + .flags = flags,
> + .interleave_ways = interleave_ways,
> + };
> + struct cxl_port *root_port;
> + struct cxl_port *endpoint;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint) {
> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + ctx.host_bridges = &endpoint->host_bridge;
Mentioned earlier, interleave_ways is effectively hardcoded to 1 (unless I'm misunderstanding
something). I think what you want here is to go to the CXL root and pass in the children (i.e. host bridges)?
I'm not sure of what the fix is to get the intended behavior.
It may be worth getting rid of the interleave_ways portion of this function and
add it later when someone needs it. You could also explain it's hard coded to 1/unused
in the doc comment if you know of an immediate need for it.
> +
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
Can just use a guard() here.
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(cxlrd_dev(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> +
> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 944c5d1ccceb..c7d9b2c2908f 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
> bool is_root_decoder(struct device *dev);
> +
> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
> +
> bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 92880c26b2d5..834dc7e78934 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> struct range;
> int cxl_get_region_range(struct cxl_region *region, struct range *range);
> void cxl_unregister_region(struct cxl_region *cxlr);
> +struct cxl_port;
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max);
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
@ 2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 8:55 ` Alejandro Lucero Palau
2026-02-19 23:31 ` Dave Jiang
2026-03-20 17:25 ` Edward Cree
2 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:10 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Check if device HDM is already committed during firmware/BIOS
> initialization.
>
> A CXL region should exist if so after memdev allocation/initialization.
> Get HPA from region and map it.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index a77ef4783fcb..3536eccf1b2a 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -19,6 +19,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> struct efx_cxl *cxl;
> + struct range range;
> u16 dvsec;
> int rc;
>
> @@ -90,13 +91,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> - probe_data->cxl = cxl;
> + cxl->cxled = cxl_get_committed_decoder(cxl->cxlmd, &cxl->efx_region);
> + if (cxl->cxled) {
> + if (!cxl->efx_region) {
> + pci_err(pci_dev, "CXL found committed decoder without a region");
> + return -ENODEV;
> + }
> + rc = cxl_get_region_range(cxl->efx_region, &range);
Missing an empty line above.
> + if (rc) {
> + pci_err(pci_dev,
> + "CXL getting regions params from a committed decoder failed");
> + return rc;
> + }
> +
> + cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
Maybe use range_len() instead for the second parameter?
> + if (!cxl->ctpio_cxl) {
> + pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
> + return -ENOMEM;
> + }
> +
> + probe_data->cxl = cxl;
> + }
>
> return 0;
> }
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> + if (!probe_data->cxl)
> + return;
> +
> + iounmap(probe_data->cxl->ctpio_cxl);
> + cxl_unregister_region(probe_data->cxl->efx_region);
> }
>
> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-11 22:12 ` Cheatham, Benjamin
2026-02-13 16:14 ` [PATCH " Gregory Price
2 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:10 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace().
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/hdm.c | 84 ++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 1 +
> include/cxl/cxl.h | 5 +++
> 3 files changed, 90 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index a172ce4e9b19..d60a697f12cc 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
> #include <linux/seq_file.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <cxl/cxl.h>
>
> #include "cxlmem.h"
> #include "core.h"
> @@ -546,6 +547,12 @@ bool cxl_resource_contains_addr(const struct resource *res, const resource_size_
> return resource_contains(res, &_addr);
> }
>
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +579,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> devm_cxl_dpa_release(cxled);
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>
> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> enum cxl_partition_mode mode)
> @@ -603,6 +611,82 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> return 0;
> }
>
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_port *port;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + port = cxled_to_port(cxled);
> +
> + return cxled->cxld.id == (port->hdm_end + 1);
> +}
> +
> +static struct cxl_endpoint_decoder *
> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct device *dev;
> +
> + guard(rwsem_read)(&cxl_rwsem.dpa);
> + dev = device_find_child(&endpoint->dev, NULL,
> + find_free_decoder);
> + if (!dev)
> + return NULL;
> +
> + return to_cxl_endpoint_decoder(dev);
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: CXL partition mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a 'struct cxl_endpoint_decoder' on success or
> + * an errno encoded pointer on failure.
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but which could not be fully
> + * available due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
> + cxl_find_free_decoder(cxlmd);
> +
> + if (!cxled)
> + return ERR_PTR(-ENODEV);
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
> + if (rc)
> + return ERR_PTR(rc);
Should cxl_dpa_set_part() be unwound here, or does it not matter? If it doesn't matter:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> +
> + return no_free_ptr(cxled);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
> +
> static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index d1b010e5e1d0..2b1f7d687a0e 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -667,6 +667,7 @@ struct cxl_root *find_cxl_root(struct cxl_port *port);
>
> DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
> DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
> +DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxld.dev))
> DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev))
> DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
>
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 783ad570a6eb..4802371db00e 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -7,6 +7,7 @@
>
> #include <linux/node.h>
> #include <linux/ioport.h>
> +#include <linux/range.h>
> #include <cxl/mailbox.h>
>
> /**
> @@ -276,4 +277,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> unsigned long flags,
> resource_size_t *max);
> void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc);
> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 16/22] cxl: Make region type based on endpoint type
2026-02-01 15:54 ` [PATCH v23 16/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
0 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Alison Schofield,
Davidlohr Bueso, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
> Support for Type2 implies region type needs to be based on the endpoint
> type HDM-D[B] instead.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
> ---
> drivers/cxl/core/region.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index bdefd088f5f1..f53b2e9fd9e6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2833,7 +2833,8 @@ static ssize_t create_ram_region_show(struct device *dev,
> }
>
> static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> - enum cxl_partition_mode mode, int id)
> + enum cxl_partition_mode mode, int id,
> + enum cxl_decoder_type target_type)
> {
> int rc;
>
> @@ -2855,7 +2856,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
> return ERR_PTR(-EBUSY);
> }
>
> - return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
> + return devm_cxl_add_region(cxlrd, id, mode, target_type);
> }
>
> static ssize_t create_region_store(struct device *dev, const char *buf,
> @@ -2869,7 +2870,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
> if (rc != 1)
> return -EINVAL;
>
> - cxlr = __create_region(cxlrd, mode, id);
> + cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
I haven't read the ABI docs, but would it be worthwhile to update the documentation for this attribute
to mention it only makes type 3 regions? I'm flip-flopping on whether it's worth the trouble but thought
I should mention it.
Either way:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> @@ -4036,7 +4037,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>
> do {
> cxlr = __create_region(cxlrd, cxlds->part[part].mode,
> - atomic_read(&cxlrd->region_id));
> + atomic_read(&cxlrd->region_id),
> + cxled->cxld.target_type);
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> if (IS_ERR(cxlr)) {
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 17/22] cxl/region: Factor out interleave ways setup
2026-02-01 15:54 ` [PATCH v23 17/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:40 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Alison Schofield,
linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation based on Type3 devices is triggered from user space
> allowing memory combination through interleaving.
>
> In preparation for kernel driven region creation, that is Type2 drivers
> triggering region creation backed with its advertised CXL memory, factor
> out a common helper from the user-sysfs region setup for interleave ways.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> ---
> drivers/cxl/core/region.c | 43 ++++++++++++++++++++++++---------------
> 1 file changed, 27 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f53b2e9fd9e6..ece1d3df7cf1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -485,22 +485,14 @@ static ssize_t interleave_ways_show(struct device *dev,
>
> static const struct attribute_group *get_cxl_region_target_group(void);
>
> -static ssize_t interleave_ways_store(struct device *dev,
> - struct device_attribute *attr,
> - const char *buf, size_t len)
> +static int set_interleave_ways(struct cxl_region *cxlr, int val)
@val should probably stay an unsigned int. You pass an unsigned int in the sysfs function, and the
function was originally coded with that in mind (same with @save below). With that cleaned up:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> - struct cxl_region *cxlr = to_cxl_region(dev);
> struct cxl_region_params *p = &cxlr->params;
> - unsigned int val, save;
> - int rc;
> + int save, rc;
> u8 iw;
>
> - rc = kstrtouint(buf, 0, &val);
> - if (rc)
> - return rc;
> -
> rc = ways_to_eiw(val, &iw);
> if (rc)
> return rc;
> @@ -515,9 +507,7 @@ static ssize_t interleave_ways_store(struct device *dev,
> return -EINVAL;
> }
>
> - ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
> - if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
> - return rc;
> + lockdep_assert_held_write(&cxl_rwsem.region);
>
> if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
> return -EBUSY;
> @@ -525,10 +515,31 @@ static ssize_t interleave_ways_store(struct device *dev,
> save = p->interleave_ways;
> p->interleave_ways = val;
> rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
> - if (rc) {
> + if (rc)
> p->interleave_ways = save;
> +
> + return rc;
> +}
> +
> +static ssize_t interleave_ways_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct cxl_region *cxlr = to_cxl_region(dev);
> + unsigned int val;
> + int rc;
> +
> + rc = kstrtouint(buf, 0, &val);
> + if (rc)
> + return rc;
> +
> + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
> + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
> + return rc;
> +
> + rc = set_interleave_ways(cxlr, val);
> + if (rc)
> return rc;
> - }
>
> return len;
> }
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup
2026-02-01 15:54 ` [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
0 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Alison Schofield,
linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation based on Type3 devices is triggered from user space
> allowing memory combination through interleaving.
>
> In preparation for kernel driven region creation, that is Type2 drivers
> triggering region creation backed with its advertised CXL memory, factor
> out a common helper from the user-sysfs region setup forinterleave
> granularity.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> ---
> drivers/cxl/core/region.c | 39 +++++++++++++++++++++++++--------------
> 1 file changed, 25 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index ece1d3df7cf1..63c2aeb2ee1f 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -559,21 +559,14 @@ static ssize_t interleave_granularity_show(struct device *dev,
> return sysfs_emit(buf, "%d\n", p->interleave_granularity);
> }
>
> -static ssize_t interleave_granularity_store(struct device *dev,
> - struct device_attribute *attr,
> - const char *buf, size_t len)
> +static int set_interleave_granularity(struct cxl_region *cxlr, int val)
Same thing as last patch. Assuming it's fixed:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> {
> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> - struct cxl_region *cxlr = to_cxl_region(dev);
> struct cxl_region_params *p = &cxlr->params;
> - int rc, val;
> + int rc;
> u16 ig;
>
> - rc = kstrtoint(buf, 0, &val);
> - if (rc)
> - return rc;
> -
> rc = granularity_to_eig(val, &ig);
> if (rc)
> return rc;
> @@ -589,14 +582,32 @@ static ssize_t interleave_granularity_store(struct device *dev,
> if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
> return -EINVAL;
>
> - ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
> - if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
> - return rc;
> -
> + lockdep_assert_held_write(&cxl_rwsem.region);
> if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
> return -EBUSY;
>
> p->interleave_granularity = val;
> + return 0;
> +}
> +
> +static ssize_t interleave_granularity_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct cxl_region *cxlr = to_cxl_region(dev);
> + int rc, val;
> +
> + rc = kstrtoint(buf, 0, &val);
> + if (rc)
> + return rc;
> +
> + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
> + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
> + return rc;
> +
> + rc = set_interleave_granularity(cxlr, val);
> + if (rc)
> + return rc;
>
> return len;
> }
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 19/22] cxl: Allow region creation by type2 drivers
2026-02-01 15:54 ` [PATCH v23 19/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:48 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
>
> Adding that functionality and integrating it with current support for
> memory expanders.
>
> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 131 ++++++++++++++++++++++++++++++++++++--
> include/cxl/cxl.h | 3 +
> 2 files changed, 127 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 63c2aeb2ee1f..293e63dfef22 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2944,6 +2944,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
> return to_cxl_region(region_dev);
> }
>
> +static void drop_region(struct cxl_region *cxlr)
> +{
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_port *port = cxlrd_to_port(cxlrd);
> +
> + devm_release_action(port->uport_dev, __unregister_region, cxlr);
> +}
> +
> static ssize_t delete_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> @@ -4047,14 +4055,12 @@ static int __construct_region(struct cxl_region *cxlr,
> return 0;
> }
>
> -/* Establish an empty region covering the given HPA range */
> -static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> - struct cxl_endpoint_decoder *cxled)
> +static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct cxl_port *port = cxlrd_to_port(cxlrd);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - int rc, part = READ_ONCE(cxled->part);
> + int part = READ_ONCE(cxled->part);
> struct cxl_region *cxlr;
>
> do {
> @@ -4063,13 +4069,26 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> cxled->cxld.target_type);
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> - if (IS_ERR(cxlr)) {
> + if (IS_ERR(cxlr))
> dev_err(cxlmd->dev.parent,
> "%s:%s: %s failed assign region: %ld\n",
> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> __func__, PTR_ERR(cxlr));
> +
> + return cxlr;
> +}
> +
> +/* Establish an empty region covering the given HPA range */
> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder *cxled)
> +{
> + struct cxl_port *port = cxlrd_to_port(cxlrd);
> + struct cxl_region *cxlr;
> + int rc;
> +
> + cxlr = construct_region_begin(cxlrd, cxled);
> + if (IS_ERR(cxlr))
> return cxlr;
> - }
>
> rc = __construct_region(cxlr, cxlrd, cxled);
> if (rc) {
> @@ -4080,6 +4099,104 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> return cxlr;
> }
>
> +DEFINE_FREE(cxl_region_drop, struct cxl_region *, if (_T) drop_region(_T))
This needs to be "if (!IS_ERR_OR_NULL(_T) drop_region(_T)". If construct_region_begin() returns an
error pointer, drop_region() will be called with it as of now leading to a garbage pointer deref.
> +
> +static struct cxl_region *
> +__construct_new_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder **cxled, int ways)
> +{
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled[0]);
> + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
> + struct cxl_region_params *p;
> + resource_size_t size = 0;
> + int rc, i;
> +
> + struct cxl_region *cxlr __free(cxl_region_drop) =
> + construct_region_begin(cxlrd, cxled[0]);
> + if (IS_ERR(cxlr))
> + return cxlr;
> +
> + guard(rwsem_write)(&cxl_rwsem.region);
> +
> + /*
> + * Sanity check. This should not happen with an accel driver handling
> + * the region creation.
> + */
> + p = &cxlr->params;
> + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
> + dev_err(cxlmd->dev.parent,
> + "%s:%s: %s unexpected region state\n",
> + dev_name(&cxlmd->dev), dev_name(&cxled[0]->cxld.dev),
> + __func__);
> + return ERR_PTR(-EBUSY);
> + }
> +
> + rc = set_interleave_ways(cxlr, ways);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + scoped_guard(rwsem_read, &cxl_rwsem.dpa) {
> + for (i = 0; i < ways; i++) {
> + if (!cxled[i]->dpa_res)
> + return ERR_PTR(-EINVAL);
> + size += resource_size(cxled[i]->dpa_res);
> + }
> +
> + rc = alloc_hpa(cxlr, size);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + for (i = 0; i < ways; i++) {
> + rc = cxl_region_attach(cxlr, cxled[i], 0);
Position parameter is hardcoded to 0. It should be set to i, right? This kind of goes back to my
issues in patch 12/22; the interleaving functionality is there but it looks unused.
> + if (rc)
> + return ERR_PTR(rc);
> + }
> + }
> +
> + rc = cxl_region_decode_commit(cxlr);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + p->state = CXL_CONFIG_COMMIT;
> +
> + return no_free_ptr(cxlr);
> +}
> +
> +/**
> + * cxl_create_region - Establish a region given an endpoint decoder
> + * @cxlrd: root decoder to allocate HPA
> + * @cxled: endpoint decoders with reserved DPA capacity
> + * @ways: interleave ways required
> + *
> + * Returns a fully formed region in the commit state and attached to the
> + * cxl_region driver.
> + */
> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder **cxled,
> + int ways)
> +{
> + struct cxl_region *cxlr;
> +
> + mutex_lock(&cxlrd->range_lock);
> + cxlr = __construct_new_region(cxlrd, cxled, ways);
> + mutex_unlock(&cxlrd->range_lock);
> + if (IS_ERR(cxlr))
> + return cxlr;
> +
> + if (device_attach(&cxlr->dev) <= 0) {
> + dev_err(&cxlr->dev, "failed to create region\n");
> + drop_region(cxlr);
> + return ERR_PTR(-ENODEV);
> + }
> +
> + return cxlr;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
> +
> static struct cxl_region *
> cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
> {
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 4802371db00e..50acbd13bcf8 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -281,4 +281,7 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> enum cxl_partition_mode mode,
> resource_size_t alloc);
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder **cxled,
> + int ways);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 01/22] cxl: Add type2 device basic support
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 8:52 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, Alison Schofield, linux-cxl,
netdev, dan.j.williams, edward.cree, davem, kuba, pabeni,
edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Differentiate CXL memory expanders (type 3) from CXL device accelerators
> (type 2) with a new function for initializing cxl_dev_state and a macro
> for helping accel drivers to embed cxl_dev_state inside a private
> struct.
>
> Move structs to include/cxl as the size of the accel driver private
> struct embedding cxl_dev_state needs to know the size of this struct.
>
> Use same new initialization with the type3 pci driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
> drivers/cxl/core/mbox.c | 12 +-
> drivers/cxl/core/memdev.c | 32 +++++
> drivers/cxl/cxl.h | 97 +--------------
> drivers/cxl/cxlmem.h | 86 +------------
> drivers/cxl/pci.c | 14 +--
> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
> tools/testing/cxl/test/mem.c | 3 +-
> 7 files changed, 274 insertions(+), 196 deletions(-)
> create mode 100644 include/cxl/cxl.h
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index fa6dd0c94656..bee84d0101d1 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1514,23 +1514,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>
> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
> + u16 dvsec)
> {
> struct cxl_memdev_state *mds;
> int rc;
>
> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
> + mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
> + dvsec, struct cxl_memdev_state, cxlds,
> + true);
> if (!mds) {
> dev_err(dev, "No memory available\n");
> return ERR_PTR(-ENOMEM);
> }
>
> mutex_init(&mds->event.log_lock);
> - mds->cxlds.dev = dev;
> - mds->cxlds.reg_map.host = dev;
> - mds->cxlds.cxl_mbox.host = dev;
> - mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> - mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
>
> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
> if (rc == -EOPNOTSUPP)
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index af3d0cc65138..22d156f25305 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -656,6 +656,38 @@ static void detach_memdev(struct work_struct *work)
>
> static struct lock_class_key cxl_memdev_key;
>
> +static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
> + enum cxl_devtype type, u64 serial, u16 dvsec,
> + bool has_mbox)
> +{
> + *cxlds = (struct cxl_dev_state) {
> + .dev = dev,
> + .type = type,
> + .serial = serial,
> + .cxl_dvsec = dvsec,
> + .reg_map.host = dev,
> + .reg_map.resource = CXL_RESOURCE_NONE,
> + };
> +
> + if (has_mbox)
> + cxlds->cxl_mbox.host = dev;
> +}
> +
> +struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> + enum cxl_devtype type,
> + u64 serial, u16 dvsec,
> + size_t size, bool has_mbox)
> +{
> + struct cxl_dev_state *cxlds = devm_kzalloc(dev, size, GFP_KERNEL);
> +
> + if (!cxlds)
> + return NULL;
> +
> + cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
Nit: Having a second function to do the init seems overkill here, especially since cxl_dev_state_init() isn't called outside this
function. I'd fold it into this function instead, but I'm fine with it either way (especially if you were told otherwise before).
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> + return cxlds;
> +}
> +EXPORT_SYMBOL_NS_GPL(_devm_cxl_dev_state_create, "CXL");
> +
> static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> const struct file_operations *fops,
> const struct cxl_memdev_attach *attach)
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index e1d47062e1d3..3eaa353e430b 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -12,6 +12,7 @@
> #include <linux/node.h>
> #include <linux/io.h>
> #include <linux/range.h>
> +#include <cxl/cxl.h>
>
> extern const struct nvdimm_security_ops *cxl_security_ops;
>
> @@ -201,97 +202,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
> #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
>
> -/*
> - * Using struct_group() allows for per register-block-type helper routines,
> - * without requiring block-type agnostic code to include the prefix.
> - */
> -struct cxl_regs {
> - /*
> - * Common set of CXL Component register block base pointers
> - * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
> - * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
> - */
> - struct_group_tagged(cxl_component_regs, component,
> - void __iomem *hdm_decoder;
> - void __iomem *ras;
> - );
> - /*
> - * Common set of CXL Device register block base pointers
> - * @status: CXL 2.0 8.2.8.3 Device Status Registers
> - * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
> - * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
> - */
> - struct_group_tagged(cxl_device_regs, device_regs,
> - void __iomem *status, *mbox, *memdev;
> - );
> -
> - struct_group_tagged(cxl_pmu_regs, pmu_regs,
> - void __iomem *pmu;
> - );
> -
> - /*
> - * RCH downstream port specific RAS register
> - * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
> - */
> - struct_group_tagged(cxl_rch_regs, rch_regs,
> - void __iomem *dport_aer;
> - );
> -
> - /*
> - * RCD upstream port specific PCIe cap register
> - * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
> - */
> - struct_group_tagged(cxl_rcd_regs, rcd_regs,
> - void __iomem *rcd_pcie_cap;
> - );
> -};
> -
> -struct cxl_reg_map {
> - bool valid;
> - int id;
> - unsigned long offset;
> - unsigned long size;
> -};
> -
> -struct cxl_component_reg_map {
> - struct cxl_reg_map hdm_decoder;
> - struct cxl_reg_map ras;
> -};
> -
> -struct cxl_device_reg_map {
> - struct cxl_reg_map status;
> - struct cxl_reg_map mbox;
> - struct cxl_reg_map memdev;
> -};
> -
> -struct cxl_pmu_reg_map {
> - struct cxl_reg_map pmu;
> -};
> -
> -/**
> - * struct cxl_register_map - DVSEC harvested register block mapping parameters
> - * @host: device for devm operations and logging
> - * @base: virtual base of the register-block-BAR + @block_offset
> - * @resource: physical resource base of the register block
> - * @max_size: maximum mapping size to perform register search
> - * @reg_type: see enum cxl_regloc_type
> - * @component_map: cxl_reg_map for component registers
> - * @device_map: cxl_reg_maps for device registers
> - * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> - */
> -struct cxl_register_map {
> - struct device *host;
> - void __iomem *base;
> - resource_size_t resource;
> - resource_size_t max_size;
> - u8 reg_type;
> - union {
> - struct cxl_component_reg_map component_map;
> - struct cxl_device_reg_map device_map;
> - struct cxl_pmu_reg_map pmu_map;
> - };
> -};
> -
> void cxl_probe_component_regs(struct device *dev, void __iomem *base,
> struct cxl_component_reg_map *map);
> void cxl_probe_device_regs(struct device *dev, void __iomem *base,
> @@ -497,11 +407,6 @@ struct cxl_region_params {
> resource_size_t cache_size;
> };
>
> -enum cxl_partition_mode {
> - CXL_PARTMODE_RAM,
> - CXL_PARTMODE_PMEM,
> -};
> -
> /*
> * Indicate whether this region has been assembled by autodetection or
> * userspace assembly. Prevent endpoint decoders outside of automatic
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index ef202b34e5ea..281546de426e 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -113,8 +113,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
> resource_size_t base, resource_size_t len,
> resource_size_t skipped);
>
> -#define CXL_NR_PARTITIONS_MAX 2
> -
> struct cxl_dpa_info {
> u64 size;
> struct cxl_dpa_part_info {
> @@ -373,87 +371,6 @@ struct cxl_security_state {
> struct kernfs_node *sanitize_node;
> };
>
> -/*
> - * enum cxl_devtype - delineate type-2 from a generic type-3 device
> - * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
> - * HDM-DB, no requirement that this device implements a
> - * mailbox, or other memory-device-standard manageability
> - * flows.
> - * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
> - * HDM-H and class-mandatory memory device registers
> - */
> -enum cxl_devtype {
> - CXL_DEVTYPE_DEVMEM,
> - CXL_DEVTYPE_CLASSMEM,
> -};
> -
> -/**
> - * struct cxl_dpa_perf - DPA performance property entry
> - * @dpa_range: range for DPA address
> - * @coord: QoS performance data (i.e. latency, bandwidth)
> - * @cdat_coord: raw QoS performance data from CDAT
> - * @qos_class: QoS Class cookies
> - */
> -struct cxl_dpa_perf {
> - struct range dpa_range;
> - struct access_coordinate coord[ACCESS_COORDINATE_MAX];
> - struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
> - int qos_class;
> -};
> -
> -/**
> - * struct cxl_dpa_partition - DPA partition descriptor
> - * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
> - * @perf: performance attributes of the partition from CDAT
> - * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
> - */
> -struct cxl_dpa_partition {
> - struct resource res;
> - struct cxl_dpa_perf perf;
> - enum cxl_partition_mode mode;
> -};
> -
> -/**
> - * struct cxl_dev_state - The driver device state
> - *
> - * cxl_dev_state represents the CXL driver/device state. It provides an
> - * interface to mailbox commands as well as some cached data about the device.
> - * Currently only memory devices are represented.
> - *
> - * @dev: The device associated with this CXL state
> - * @cxlmd: The device representing the CXL.mem capabilities of @dev
> - * @reg_map: component and ras register mapping parameters
> - * @regs: Parsed register blocks
> - * @cxl_dvsec: Offset to the PCIe device DVSEC
> - * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> - * @media_ready: Indicate whether the device media is usable
> - * @dpa_res: Overall DPA resource tree for the device
> - * @part: DPA partition array
> - * @nr_partitions: Number of DPA partitions
> - * @serial: PCIe Device Serial Number
> - * @type: Generic Memory Class device or Vendor Specific Memory device
> - * @cxl_mbox: CXL mailbox context
> - * @cxlfs: CXL features context
> - */
> -struct cxl_dev_state {
> - struct device *dev;
> - struct cxl_memdev *cxlmd;
> - struct cxl_register_map reg_map;
> - struct cxl_regs regs;
> - int cxl_dvsec;
> - bool rcd;
> - bool media_ready;
> - struct resource dpa_res;
> - struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
> - unsigned int nr_partitions;
> - u64 serial;
> - enum cxl_devtype type;
> - struct cxl_mailbox cxl_mbox;
> -#ifdef CONFIG_CXL_FEATURES
> - struct cxl_features_state *cxlfs;
> -#endif
> -};
> -
> static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
> {
> /*
> @@ -858,7 +775,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
> int cxl_await_media_ready(struct cxl_dev_state *cxlds);
> int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
> + u16 dvsec);
> void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 1cf232220873..24179cc702bf 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -911,25 +911,25 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> int rc, pmu_count;
> unsigned int i;
> bool irq_avail;
> + u16 dvsec;
>
> rc = pcim_enable_device(pdev);
> if (rc)
> return rc;
> pci_set_master(pdev);
>
> - mds = cxl_memdev_state_create(&pdev->dev);
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec)
> + pci_warn(pdev, "Device DVSEC not present, skip CXL.mem init\n");
> +
> + mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
> if (IS_ERR(mds))
> return PTR_ERR(mds);
> cxlds = &mds->cxlds;
> pci_set_drvdata(pdev, cxlds);
>
> cxlds->rcd = is_cxl_restricted(pdev);
> - cxlds->serial = pci_get_dsn(pdev);
> - cxlds->cxl_dvsec = pci_find_dvsec_capability(
> - pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
> - if (!cxlds->cxl_dvsec)
> - dev_warn(&pdev->dev,
> - "Device DVSEC not present, skip CXL.mem init\n");
>
> rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> if (rc)
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> new file mode 100644
> index 000000000000..13d448686189
> --- /dev/null
> +++ b/include/cxl/cxl.h
> @@ -0,0 +1,226 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2020 Intel Corporation. */
> +/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
> +
> +#ifndef __CXL_CXL_H__
> +#define __CXL_CXL_H__
> +
> +#include <linux/node.h>
> +#include <linux/ioport.h>
> +#include <cxl/mailbox.h>
> +
> +/**
> + * enum cxl_devtype - delineate type-2 from a generic type-3 device
> + * @CXL_DEVTYPE_DEVMEM: Vendor specific CXL Type-2 device implementing HDM-D or
> + * HDM-DB, no requirement that this device implements a
> + * mailbox, or other memory-device-standard manageability
> + * flows.
> + * @CXL_DEVTYPE_CLASSMEM: Common class definition of a CXL Type-3 device with
> + * HDM-H and class-mandatory memory device registers
> + */
> +enum cxl_devtype {
> + CXL_DEVTYPE_DEVMEM,
> + CXL_DEVTYPE_CLASSMEM,
> +};
> +
> +struct device;
> +
> +/*
> + * Using struct_group() allows for per register-block-type helper routines,
> + * without requiring block-type agnostic code to include the prefix.
> + */
> +struct cxl_regs {
> + /*
> + * Common set of CXL Component register block base pointers
> + * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
> + * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
> + */
> + struct_group_tagged(cxl_component_regs, component,
> + void __iomem *hdm_decoder;
> + void __iomem *ras;
> + );
> + /*
> + * Common set of CXL Device register block base pointers
> + * @status: CXL 2.0 8.2.8.3 Device Status Registers
> + * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
> + * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
> + */
> + struct_group_tagged(cxl_device_regs, device_regs,
> + void __iomem *status, *mbox, *memdev;
> + );
> +
> + struct_group_tagged(cxl_pmu_regs, pmu_regs,
> + void __iomem *pmu;
> + );
> +
> + /*
> + * RCH downstream port specific RAS register
> + * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
> + */
> + struct_group_tagged(cxl_rch_regs, rch_regs,
> + void __iomem *dport_aer;
> + );
> +
> + /*
> + * RCD upstream port specific PCIe cap register
> + * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
> + */
> + struct_group_tagged(cxl_rcd_regs, rcd_regs,
> + void __iomem *rcd_pcie_cap;
> + );
> +};
> +
> +struct cxl_reg_map {
> + bool valid;
> + int id;
> + unsigned long offset;
> + unsigned long size;
> +};
> +
> +struct cxl_component_reg_map {
> + struct cxl_reg_map hdm_decoder;
> + struct cxl_reg_map ras;
> +};
> +
> +struct cxl_device_reg_map {
> + struct cxl_reg_map status;
> + struct cxl_reg_map mbox;
> + struct cxl_reg_map memdev;
> +};
> +
> +struct cxl_pmu_reg_map {
> + struct cxl_reg_map pmu;
> +};
> +
> +/**
> + * struct cxl_register_map - DVSEC harvested register block mapping parameters
> + * @host: device for devm operations and logging
> + * @base: virtual base of the register-block-BAR + @block_offset
> + * @resource: physical resource base of the register block
> + * @max_size: maximum mapping size to perform register search
> + * @reg_type: see enum cxl_regloc_type
> + * @component_map: cxl_reg_map for component registers
> + * @device_map: cxl_reg_maps for device registers
> + * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
> + */
> +struct cxl_register_map {
> + struct device *host;
> + void __iomem *base;
> + resource_size_t resource;
> + resource_size_t max_size;
> + u8 reg_type;
> + union {
> + struct cxl_component_reg_map component_map;
> + struct cxl_device_reg_map device_map;
> + struct cxl_pmu_reg_map pmu_map;
> + };
> +};
> +
> +/**
> + * struct cxl_dpa_perf - DPA performance property entry
> + * @dpa_range: range for DPA address
> + * @coord: QoS performance data (i.e. latency, bandwidth)
> + * @cdat_coord: raw QoS performance data from CDAT
> + * @qos_class: QoS Class cookies
> + */
> +struct cxl_dpa_perf {
> + struct range dpa_range;
> + struct access_coordinate coord[ACCESS_COORDINATE_MAX];
> + struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
> + int qos_class;
> +};
> +
> +enum cxl_partition_mode {
> + CXL_PARTMODE_RAM,
> + CXL_PARTMODE_PMEM,
> +};
> +
> +/**
> + * struct cxl_dpa_partition - DPA partition descriptor
> + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
> + * @perf: performance attributes of the partition from CDAT
> + * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
> + */
> +struct cxl_dpa_partition {
> + struct resource res;
> + struct cxl_dpa_perf perf;
> + enum cxl_partition_mode mode;
> +};
> +
> +#define CXL_NR_PARTITIONS_MAX 2
> +
> +/**
> + * struct cxl_dev_state - The driver device state
> + *
> + * cxl_dev_state represents the CXL driver/device state. It provides an
> + * interface to mailbox commands as well as some cached data about the device.
> + * Currently only memory devices are represented.
> + *
> + * @dev: The device associated with this CXL state
> + * @cxlmd: The device representing the CXL.mem capabilities of @dev
> + * @reg_map: component and ras register mapping parameters
> + * @regs: Parsed register blocks
> + * @cxl_dvsec: Offset to the PCIe device DVSEC
> + * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> + * @media_ready: Indicate whether the device media is usable
> + * @dpa_res: Overall DPA resource tree for the device
> + * @part: DPA partition array
> + * @nr_partitions: Number of DPA partitions
> + * @serial: PCIe Device Serial Number
> + * @type: Generic Memory Class device or Vendor Specific Memory device
> + * @cxl_mbox: CXL mailbox context
> + * @cxlfs: CXL features context
> + */
> +struct cxl_dev_state {
> + /* public for Type2 drivers */
> + struct device *dev;
> + struct cxl_memdev *cxlmd;
> +
> + /* private for Type2 drivers */
> + struct cxl_register_map reg_map;
> + struct cxl_regs regs;
> + int cxl_dvsec;
> + bool rcd;
> + bool media_ready;
> + struct resource dpa_res;
> + struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
> + unsigned int nr_partitions;
> + u64 serial;
> + enum cxl_devtype type;
> + struct cxl_mailbox cxl_mbox;
> +#ifdef CONFIG_CXL_FEATURES
> + struct cxl_features_state *cxlfs;
> +#endif
> +};
> +
> +struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> + enum cxl_devtype type,
> + u64 serial, u16 dvsec,
> + size_t size, bool has_mbox);
> +
> +/**
> + * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
> + * driver specific struct.
> + *
> + * @parent: device behind the request
> + * @type: CXL device type
> + * @serial: device identification
> + * @dvsec: dvsec capability offset
> + * @drv_struct: driver struct embedding a cxl_dev_state struct
> + * @member: drv_struct member as cxl_dev_state
> + * @mbox: true if mailbox supported
> + *
> + * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
> + * struct initialized.
> + *
> + * Introduced for Type2 driver support.
> + */
> +#define devm_cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
> + ({ \
> + static_assert(__same_type(struct cxl_dev_state, \
> + ((drv_struct *)NULL)->member)); \
> + static_assert(offsetof(drv_struct, member) == 0); \
> + (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
> + sizeof(drv_struct), mbox); \
> + })
> +#endif /* __CXL_CXL_H__ */
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index cb87e8c0e63c..79f42f4474d4 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -1716,7 +1716,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> if (rc)
> return rc;
>
> - mds = cxl_memdev_state_create(dev);
> + mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
> if (IS_ERR(mds))
> return PTR_ERR(mds);
>
> @@ -1732,7 +1732,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
> mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
> INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
>
> - cxlds->serial = pdev->id + 1;
> if (is_rcd(pdev))
> cxlds->rcd = true;
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-02-01 15:54 ` [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
@ 2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-12 9:16 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> A Type2 device configured by the BIOS can already have its HDM
> committed. Add a cxl_get_committed_decoder() function for cheking
> so after memdev creation. A CXL region should have been created
> during memdev initialization, therefore a Type2 driver can ask for
> such a region for working with the HPA. If the HDM is not committed,
> a Type2 driver will create the region after obtaining proper HPA
> and DPA space.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/hdm.c | 39 +++++++++++++++++++++++++++++++++++++++
> include/cxl/cxl.h | 3 +++
> 2 files changed, 42 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 6e516c69b2d2..a172ce4e9b19 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -686,6 +686,45 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> }
>
> +static int find_committed_endpoint_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_port *port;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + port = cxled_to_port(cxled);
> +
> + return cxled->cxld.id == port->hdm_end;
Is this the way you're supposed to check if a decoder is committed? The doc comment for @hdm_end in
struct cxl_port says it's just the last allocated decoder. If allocated decoders are always committed then
I'm fine with this, otherwise I think you'd want to a register read or something to find the commit state.
> +}
> +
> +struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> + struct cxl_region **cxlr)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct device *cxled_dev;
> +
> + if (!endpoint)
> + return NULL;
> +
> + guard(rwsem_read)(&cxl_rwsem.dpa);
> + cxled_dev = device_find_child(&endpoint->dev, NULL,
> + find_committed_endpoint_decoder);
> +
> + if (!cxled_dev)
> + return NULL;
> +
> + cxled = to_cxl_endpoint_decoder(cxled_dev);
> + *cxlr = cxled->cxld.region;
> +
> + put_device(cxled_dev);
> + return cxled;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_committed_decoder, "CXL");
> +
> static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> {
> u16 eig;
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 6f8d365067af..928276dba952 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -249,4 +249,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> const struct cxl_memdev_attach *attach);
> +struct cxl_region;
> +struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> + struct cxl_region **cxlr);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 00/22] Type2 device basic support
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (21 preceding siblings ...)
2026-02-01 15:54 ` [PATCH v23 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2026-02-11 22:12 ` Cheatham, Benjamin
2026-03-09 22:43 ` PJ Waskiewicz
23 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:12 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> This patchset should be applied on the cxl next branch using the base
> specified at the end of this cover letter.
>
> Dependencies on Dan's work has gone and also on Terry's as the only
> patch required is now in next. The other dependency is on Smita patchset
> but it does not exist such a dependency as that work will not avoid the
> problem with Type2 and DAX/hmem if soft reserved memory. This needs to
> be solved by the BIOS and Type2 UEFI driver for populating the CXL.mem
> range as EFI_RESERVED_TYPE instead of default EFI_CONVENTIONAL_MEMORY
> with the EFI_MEMORY_SP attribute. There exists though a dependency on
> one Smita's patches:
>
> [PATCH v5 3/7] cxl/region: Skip decoder reset on detach for autodiscovered regions
>
> This is needed for the default behaviour with current BIOS configuration
> where the HDM Type2 decoders will be kept unreset when driver unloads.
> This is the main change introduced in v23: committed decoders will not
> be reset. Previous v22 functionality supported first driver load finding
> committed decoders but resetting them at unload and supporting
> uncommitted decoders in next driver loads. This will be suported in
> follow-up works.
>
> v23 changes:
>
> patch 11: fixing minor issues and droping change in
> should_emulate_decoders (Jonathan Cameron)
>
> patch13: refactoring unregister_region for safety type in Type2 API
>
> sfc changes: slight modifications to error path
>
This cover letter is really long, I'd remove the change logs for anything more
than 3 revisions back (assuming a v24 is needed). After that you could leave
a lore link for older revisions if you want, but it's not needed imo.
Also, feel free to add my Reviewed-by for anything I didn't leave a comment on
(felt I should cut down on the mail).
Thanks,
Ben
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-11 22:12 ` Cheatham, Benjamin
2026-02-19 10:26 ` Alejandro Lucero Palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2 siblings, 1 reply; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-11 22:12 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: Alejandro Lucero, Jonathan Cameron, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace().
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/hdm.c | 84 ++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 1 +
> include/cxl/cxl.h | 5 +++
> 3 files changed, 90 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index a172ce4e9b19..d60a697f12cc 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
> #include <linux/seq_file.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <cxl/cxl.h>
>
> #include "cxlmem.h"
> #include "core.h"
> @@ -546,6 +547,12 @@ bool cxl_resource_contains_addr(const struct resource *res, const resource_size_
> return resource_contains(res, &_addr);
> }
>
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +579,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> devm_cxl_dpa_release(cxled);
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>
> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> enum cxl_partition_mode mode)
> @@ -603,6 +611,82 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> return 0;
> }
>
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_port *port;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + port = cxled_to_port(cxled);
> +
> + return cxled->cxld.id == (port->hdm_end + 1);
> +}
> +
> +static struct cxl_endpoint_decoder *
> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct device *dev;
> +
> + guard(rwsem_read)(&cxl_rwsem.dpa);
> + dev = device_find_child(&endpoint->dev, NULL,
> + find_free_decoder);
> + if (!dev)
> + return NULL;
> +
> + return to_cxl_endpoint_decoder(dev);
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: CXL partition mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a 'struct cxl_endpoint_decoder' on success or
> + * an errno encoded pointer on failure.
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but which could not be fully
> + * available due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
> + cxl_find_free_decoder(cxlmd);
> +
> + if (!cxled)
> + return ERR_PTR(-ENODEV);
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
> + if (rc)
> + return ERR_PTR(rc);
Should cxl_dpa_set_part() be unwound here, or does it not matter? If it doesn't matter:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> +
> + return no_free_ptr(cxled);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
> +
> static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index d1b010e5e1d0..2b1f7d687a0e 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -667,6 +667,7 @@ struct cxl_root *find_cxl_root(struct cxl_port *port);
>
> DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
> DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
> +DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxld.dev))
> DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev))
> DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
>
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 783ad570a6eb..4802371db00e 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -7,6 +7,7 @@
>
> #include <linux/node.h>
> #include <linux/ioport.h>
> +#include <linux/range.h>
> #include <cxl/mailbox.h>
>
> /**
> @@ -276,4 +277,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> unsigned long flags,
> resource_size_t *max);
> void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc);
> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-02-11 22:11 ` Cheatham, Benjamin
@ 2026-02-12 9:16 ` Alejandro Lucero Palau
2026-03-09 22:49 ` PJ Waskiewicz
0 siblings, 1 reply; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-12 9:16 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 2/11/26 22:11, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> A Type2 device configured by the BIOS can already have its HDM
>> committed. Add a cxl_get_committed_decoder() function for cheking
>> so after memdev creation. A CXL region should have been created
>> during memdev initialization, therefore a Type2 driver can ask for
>> such a region for working with the HPA. If the HDM is not committed,
>> a Type2 driver will create the region after obtaining proper HPA
>> and DPA space.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/hdm.c | 39 +++++++++++++++++++++++++++++++++++++++
>> include/cxl/cxl.h | 3 +++
>> 2 files changed, 42 insertions(+)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index 6e516c69b2d2..a172ce4e9b19 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -686,6 +686,45 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
>> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>> }
>>
>> +static int find_committed_endpoint_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_port *port;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + port = cxled_to_port(cxled);
>> +
>> + return cxled->cxld.id == port->hdm_end;
> Is this the way you're supposed to check if a decoder is committed? The doc comment for @hdm_end in
> struct cxl_port says it's just the last allocated decoder. If allocated decoders are always committed then
> I'm fine with this, otherwise I think you'd want to a register read or something to find the commit state.
Hi Ben,
Yes, I think you are right. This works in my tests and it is safe
because I check the region does exist before using it. But the error
inside sfc should then not be fatal for cxl sfc initialization and
fallback to the other cxl initialization possibility.
If I add the check for the decoder state, I guess I can keep the
function names. If I rely on the region being there, I should change
them. I will think about it.
This also brings the question of what is more than one hdm present. This
is not needed in my use case and likely this is also true for other
coming Type2 devices, but it does also require further thinking.
Thank you!
>> +}
>> +
>> +struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
>> + struct cxl_region **cxlr)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxl_endpoint_decoder *cxled;
>> + struct device *cxled_dev;
>> +
>> + if (!endpoint)
>> + return NULL;
>> +
>> + guard(rwsem_read)(&cxl_rwsem.dpa);
>> + cxled_dev = device_find_child(&endpoint->dev, NULL,
>> + find_committed_endpoint_decoder);
>> +
>> + if (!cxled_dev)
>> + return NULL;
>> +
>> + cxled = to_cxl_endpoint_decoder(cxled_dev);
>> + *cxlr = cxled->cxld.region;
>> +
>> + put_device(cxled_dev);
>> + return cxled;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_committed_decoder, "CXL");
>> +
>> static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
>> {
>> u16 eig;
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 6f8d365067af..928276dba952 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -249,4 +249,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
>> int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
>> const struct cxl_memdev_attach *attach);
>> +struct cxl_region;
>> +struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
>> + struct cxl_region **cxlr);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 14/22] cxl: Define a driver interface for DPA allocation
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-11 22:12 ` Cheatham, Benjamin
@ 2026-02-13 16:14 ` Gregory Price
2026-02-16 12:34 ` Alejandro Lucero Palau
2 siblings, 1 reply; 67+ messages in thread
From: Gregory Price @ 2026-02-13 16:14 UTC (permalink / raw)
To: Alejandro Lucero; +Cc: linux-cxl, netdev, linux-kernel
This review was generated by kreview-0811365ff2.
This is not an automated email, the reviewer though this report
looked valid enough to consider discussion.
----
Cursory browse, this does look like a legitimate reference leak.
~Gregory
----
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index a172ce4e9b19..d60a697f12cc 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
[ ... ]
> +static struct cxl_endpoint_decoder *
> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct device *dev;
> +
> + guard(rwsem_read)(&cxl_rwsem.dpa);
> + dev = device_find_child(&endpoint->dev, NULL,
> + find_free_decoder);
> + if (!dev)
> + return NULL;
> +
> + return to_cxl_endpoint_decoder(dev);
> +}
[ ... ]
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
> + cxl_find_free_decoder(cxlmd);
> +
> + if (!cxled)
> + return ERR_PTR(-ENODEV);
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
> + if (rc)
> + return ERR_PTR(rc);
> +
> + return no_free_ptr(cxled);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
Does this leak the device reference obtained by device_find_child()
in cxl_find_free_decoder()?
device_find_child() increments the device refcount (reference A).
Then cxl_dpa_alloc() -> __cxl_dpa_reserve() calls get_device() on
the same device (reference B).
On the success path, no_free_ptr() inhibits the put_cxled cleanup,
so reference A is transferred to the caller. The matching cleanup
function cxl_dpa_free() calls __cxl_dpa_release(), which drops
reference B via put_device(), but reference A is never released:
cxl_dpa_free()
-> devm_cxl_dpa_release()
-> __cxl_dpa_release()
-> put_device() /* drops reference B only */
/* reference A from device_find_child() is still held */
The only caller in this series (sfc efx_cxl_init/efx_cxl_exit)
never calls put_device() on the returned cxled either, so the
endpoint decoder device refcount remains permanently elevated.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 21/22] sfc: create cxl region
2026-02-01 15:54 ` [PATCH v23 21/22] sfc: create cxl region alejandro.lucero-palau
@ 2026-02-13 16:14 ` Gregory Price
2026-02-20 8:00 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Gregory Price @ 2026-02-13 16:14 UTC (permalink / raw)
To: Alejandro Lucero; +Cc: linux-cxl, netdev, linux-kernel
This review was generated by kreview-0811365ff2.
This is not an automated email, the reviewer though this report
looked valid enough to consider discussion.
----
sfc code so won't speak to the internals, but ordering question seems
legit.
~Gregory
----
On Sat, Feb 01, 2026, Alejandro Lucero wrote:
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 2cfd0a46225f..4d5f3974e51d 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
[ ... ]
> @@ -147,11 +155,11 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
>
> if (probe_data->cxl->hdm_was_committed) {
> iounmap(probe_data->cxl->ctpio_cxl);
> - cxl_unregister_region(probe_data->cxl->efx_region);
> } else {
> cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
> }
> + cxl_unregister_region(probe_data->cxl->efx_region);
In the non-committed path, cxl_dpa_free() is called while the
endpoint decoder is still attached to the region. Looking at
cxl_dpa_free() in drivers/cxl/core/hdm.c:
if (cxled->cxld.region) {
dev_dbg(dev, "decoder assigned to: %s\n",
dev_name(&cxled->cxld.region->dev));
return -EBUSY;
}
Since cxl_unregister_region() has not run yet, cxled->cxld.region
is still set, and cxl_dpa_free() returns -EBUSY without freeing
the DPA. The return value is not checked.
Should cxl_unregister_region() be called before cxl_dpa_free()
and cxl_put_root_decoder() in the else branch, matching the
reverse order of allocation in efx_cxl_init()?
The cover letter notes that v23 expects committed decoders as the
primary flow, and uncommitted decoder support is deferred to
follow-up work, so this else branch may not be reachable in
practice today. Still worth fixing the ordering now so it
doesn't bite when the uncommitted path is enabled later.
This issue is not fixed by the remaining commits in the series
(through 10fe989f9e85).
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 22/22] sfc: support pio mapping based on cxl
2026-02-01 15:54 ` [PATCH v23 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2026-02-13 16:14 ` Gregory Price
2026-02-20 8:04 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Gregory Price @ 2026-02-13 16:14 UTC (permalink / raw)
To: Alejandro Lucero; +Cc: linux-cxl, netdev, linux-kernel
This review was generated by kreview-0811365ff2.
This is not an automated email, the reviewer though this report
looked valid enough to consider discussion.
----
I am completely unfamiliar with this code, but the question it poses
at least seems reasonable.
~Gregory
----
On Sat, Feb 01, 2026, Alejandro Lucero wrote:
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index fcec81f862ec..2bb6d3136c7c 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
[ ... ]
> @@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> iounmap(efx->membase);
> efx->membase = membase;
>
> - /* Set up the WC mapping if needed */
> - if (wc_mem_map_size) {
> + if (!wc_mem_map_size)
> + goto skip_pio;
> +
> + /* Set up the WC mapping */
> +
> +#ifdef CONFIG_SFC_CXL
> + probe_data = container_of(efx, struct efx_probe_data, efx);
> + if ((nic_data->datapath_caps3 &
> + (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
> + probe_data->cxl_pio_initialised) {
> + /* Using PIO through CXL mapping? */
> + nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
> + (pio_write_vi_base * efx->vi_stride +
> + ER_DZ_TX_PIOBUF - uc_mem_map_size);
> + probe_data->cxl_pio_in_use = true;
> + } else
> +#endif
> + {
> + /* Using legacy PIO BAR mapping */
> nic_data->wc_membase = ioremap_wc(efx->membase_phys +
> uc_mem_map_size,
> wc_mem_map_size);
> @@ -1279,12 +1314,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> nic_data->wc_membase +
> (pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
> uc_mem_map_size);
> -
> - rc = efx_ef10_link_piobufs(efx);
> - if (rc)
> - efx_ef10_free_piobufs(efx);
> }
The CXL path sets nic_data->pio_write_base but does not set
nic_data->pio_write_vi_base, while the legacy path does:
nic_data->pio_write_vi_base = pio_write_vi_base;
Since nic_data is kzalloc'd, pio_write_vi_base stays at 0 in the CXL
path. efx_ef10_link_piobufs() then uses nic_data->pio_write_vi_base
to issue MC_CMD_LINK_PIOBUF commands:
MCDI_SET_DWORD(inbuf, LINK_PIOBUF_IN_TXQ_INSTANCE,
nic_data->pio_write_vi_base + index);
and also for the special-case check:
if (tx_queue->queue == nic_data->pio_write_vi_base) {
Wouldn't this link PIO buffers to incorrect VI instances when using
CXL, since the local variable pio_write_vi_base has the correct
non-zero value but the struct field was never updated?
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 14/22] cxl: Define a driver interface for DPA allocation
2026-02-13 16:14 ` [PATCH " Gregory Price
@ 2026-02-16 12:34 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-16 12:34 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl, netdev, linux-kernel
On 2/13/26 16:14, Gregory Price wrote:
> This review was generated by kreview-0811365ff2.
>
> This is not an automated email, the reviewer though this report
> looked valid enough to consider discussion.
>
> ----
>
> Cursory browse, this does look like a legitimate reference leak.
>
> ~Gregory
>
> ----
>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index a172ce4e9b19..d60a697f12cc 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
> [ ... ]
>
>> +static struct cxl_endpoint_decoder *
>> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct device *dev;
>> +
>> + guard(rwsem_read)(&cxl_rwsem.dpa);
>> + dev = device_find_child(&endpoint->dev, NULL,
>> + find_free_decoder);
>> + if (!dev)
>> + return NULL;
>> +
>> + return to_cxl_endpoint_decoder(dev);
>> +}
> [ ... ]
>
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc)
>> +{
>> + int rc;
>> +
>> + if (!IS_ALIGNED(alloc, SZ_256M))
>> + return ERR_PTR(-EINVAL);
>> +
>> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
>> + cxl_find_free_decoder(cxlmd);
>> +
>> + if (!cxled)
>> + return ERR_PTR(-ENODEV);
>> +
>> + rc = cxl_dpa_set_part(cxled, mode);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + rc = cxl_dpa_alloc(cxled, alloc);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + return no_free_ptr(cxled);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
> Does this leak the device reference obtained by device_find_child()
> in cxl_find_free_decoder()?
>
> device_find_child() increments the device refcount (reference A).
> Then cxl_dpa_alloc() -> __cxl_dpa_reserve() calls get_device() on
> the same device (reference B).
>
> On the success path, no_free_ptr() inhibits the put_cxled cleanup,
> so reference A is transferred to the caller. The matching cleanup
> function cxl_dpa_free() calls __cxl_dpa_release(), which drops
> reference B via put_device(), but reference A is never released:
>
> cxl_dpa_free()
> -> devm_cxl_dpa_release()
> -> __cxl_dpa_release()
> -> put_device() /* drops reference B only */
>
> /* reference A from device_find_child() is still held */
>
> The only caller in this series (sfc efx_cxl_init/efx_cxl_exit)
> never calls put_device() on the returned cxled either, so the
> endpoint decoder device refcount remains permanently elevated.
This is right, and it took a good bunch of time to debug it. Was it
detected by an automatic tool?
Anyways, I had one patch for solving this which I forgot to apply to v23
since the focus there was to mainly support the auto-discover region
which does not go through this path:
+ /* removing the reference from cxl_find_free_decoder ...
+ * when alloc succeds another get happened
+ */
+
+ put_device(&cxled->cxld.dev);
I added that comment because it is not trivial to know if it is right to
do the put while you get a new reference to the device. I will apply it.
Thanks!
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 01/22] cxl: Add type2 device basic support
2026-02-11 22:11 ` Cheatham, Benjamin
@ 2026-02-19 8:52 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 8:52 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Jonathan Cameron, Alison Schofield, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/11/26 22:11, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>> (type 2) with a new function for initializing cxl_dev_state and a macro
>> for helping accel drivers to embed cxl_dev_state inside a private
>> struct.
>>
>> Move structs to include/cxl as the size of the accel driver private
>> struct embedding cxl_dev_state needs to know the size of this struct.
>>
>> Use same new initialization with the type3 pci driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> ---
>> drivers/cxl/core/mbox.c | 12 +-
>> drivers/cxl/core/memdev.c | 32 +++++
>> drivers/cxl/cxl.h | 97 +--------------
>> drivers/cxl/cxlmem.h | 86 +------------
>> drivers/cxl/pci.c | 14 +--
>> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
>> tools/testing/cxl/test/mem.c | 3 +-
>> 7 files changed, 274 insertions(+), 196 deletions(-)
>> create mode 100644 include/cxl/cxl.h
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index fa6dd0c94656..bee84d0101d1 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1514,23 +1514,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>>
>> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>> + u16 dvsec)
>> {
>> struct cxl_memdev_state *mds;
>> int rc;
>>
>> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>> + mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
>> + dvsec, struct cxl_memdev_state, cxlds,
>> + true);
>> if (!mds) {
>> dev_err(dev, "No memory available\n");
>> return ERR_PTR(-ENOMEM);
>> }
>>
>> mutex_init(&mds->event.log_lock);
>> - mds->cxlds.dev = dev;
>> - mds->cxlds.reg_map.host = dev;
>> - mds->cxlds.cxl_mbox.host = dev;
>> - mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
>> - mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
>>
>> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
>> if (rc == -EOPNOTSUPP)
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index af3d0cc65138..22d156f25305 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -656,6 +656,38 @@ static void detach_memdev(struct work_struct *work)
>>
>> static struct lock_class_key cxl_memdev_key;
>>
>> +static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
>> + enum cxl_devtype type, u64 serial, u16 dvsec,
>> + bool has_mbox)
>> +{
>> + *cxlds = (struct cxl_dev_state) {
>> + .dev = dev,
>> + .type = type,
>> + .serial = serial,
>> + .cxl_dvsec = dvsec,
>> + .reg_map.host = dev,
>> + .reg_map.resource = CXL_RESOURCE_NONE,
>> + };
>> +
>> + if (has_mbox)
>> + cxlds->cxl_mbox.host = dev;
>> +}
>> +
>> +struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> + enum cxl_devtype type,
>> + u64 serial, u16 dvsec,
>> + size_t size, bool has_mbox)
>> +{
>> + struct cxl_dev_state *cxlds = devm_kzalloc(dev, size, GFP_KERNEL);
>> +
>> + if (!cxlds)
>> + return NULL;
>> +
>> + cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
> Nit: Having a second function to do the init seems overkill here, especially since cxl_dev_state_init() isn't called outside this
> function. I'd fold it into this function instead, but I'm fine with it either way (especially if you were told otherwise before).
Hi Ben,
I do not remember why this was done this way. Maybe some initial need
which disappeared later.
I can not see a reason now, so I will do so in v24.
Thank you!
>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> + return cxlds;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(_devm_cxl_dev_state_create, "CXL");
>> +
>> static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>> const struct file_operations *fops,
>> const struct cxl_memdev_attach *attach)
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index e1d47062e1d3..3eaa353e430b 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -12,6 +12,7 @@
>> #include <linux/node.h>
>> #include <linux/io.h>
>> #include <linux/range.h>
>> +#include <cxl/cxl.h>
>>
>> extern const struct nvdimm_security_ops *cxl_security_ops;
>>
>> @@ -201,97 +202,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
>> #define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
>> #define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
>>
>> -/*
>> - * Using struct_group() allows for per register-block-type helper routines,
>> - * without requiring block-type agnostic code to include the prefix.
>> - */
>> -struct cxl_regs {
>> - /*
>> - * Common set of CXL Component register block base pointers
>> - * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
>> - * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
>> - */
>> - struct_group_tagged(cxl_component_regs, component,
>> - void __iomem *hdm_decoder;
>> - void __iomem *ras;
>> - );
>> - /*
>> - * Common set of CXL Device register block base pointers
>> - * @status: CXL 2.0 8.2.8.3 Device Status Registers
>> - * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
>> - * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
>> - */
>> - struct_group_tagged(cxl_device_regs, device_regs,
>> - void __iomem *status, *mbox, *memdev;
>> - );
>> -
>> - struct_group_tagged(cxl_pmu_regs, pmu_regs,
>> - void __iomem *pmu;
>> - );
>> -
>> - /*
>> - * RCH downstream port specific RAS register
>> - * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
>> - */
>> - struct_group_tagged(cxl_rch_regs, rch_regs,
>> - void __iomem *dport_aer;
>> - );
>> -
>> - /*
>> - * RCD upstream port specific PCIe cap register
>> - * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
>> - */
>> - struct_group_tagged(cxl_rcd_regs, rcd_regs,
>> - void __iomem *rcd_pcie_cap;
>> - );
>> -};
>> -
>> -struct cxl_reg_map {
>> - bool valid;
>> - int id;
>> - unsigned long offset;
>> - unsigned long size;
>> -};
>> -
>> -struct cxl_component_reg_map {
>> - struct cxl_reg_map hdm_decoder;
>> - struct cxl_reg_map ras;
>> -};
>> -
>> -struct cxl_device_reg_map {
>> - struct cxl_reg_map status;
>> - struct cxl_reg_map mbox;
>> - struct cxl_reg_map memdev;
>> -};
>> -
>> -struct cxl_pmu_reg_map {
>> - struct cxl_reg_map pmu;
>> -};
>> -
>> -/**
>> - * struct cxl_register_map - DVSEC harvested register block mapping parameters
>> - * @host: device for devm operations and logging
>> - * @base: virtual base of the register-block-BAR + @block_offset
>> - * @resource: physical resource base of the register block
>> - * @max_size: maximum mapping size to perform register search
>> - * @reg_type: see enum cxl_regloc_type
>> - * @component_map: cxl_reg_map for component registers
>> - * @device_map: cxl_reg_maps for device registers
>> - * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
>> - */
>> -struct cxl_register_map {
>> - struct device *host;
>> - void __iomem *base;
>> - resource_size_t resource;
>> - resource_size_t max_size;
>> - u8 reg_type;
>> - union {
>> - struct cxl_component_reg_map component_map;
>> - struct cxl_device_reg_map device_map;
>> - struct cxl_pmu_reg_map pmu_map;
>> - };
>> -};
>> -
>> void cxl_probe_component_regs(struct device *dev, void __iomem *base,
>> struct cxl_component_reg_map *map);
>> void cxl_probe_device_regs(struct device *dev, void __iomem *base,
>> @@ -497,11 +407,6 @@ struct cxl_region_params {
>> resource_size_t cache_size;
>> };
>>
>> -enum cxl_partition_mode {
>> - CXL_PARTMODE_RAM,
>> - CXL_PARTMODE_PMEM,
>> -};
>> -
>> /*
>> * Indicate whether this region has been assembled by autodetection or
>> * userspace assembly. Prevent endpoint decoders outside of automatic
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index ef202b34e5ea..281546de426e 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -113,8 +113,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
>> resource_size_t base, resource_size_t len,
>> resource_size_t skipped);
>>
>> -#define CXL_NR_PARTITIONS_MAX 2
>> -
>> struct cxl_dpa_info {
>> u64 size;
>> struct cxl_dpa_part_info {
>> @@ -373,87 +371,6 @@ struct cxl_security_state {
>> struct kernfs_node *sanitize_node;
>> };
>>
>> -/*
>> - * enum cxl_devtype - delineate type-2 from a generic type-3 device
>> - * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
>> - * HDM-DB, no requirement that this device implements a
>> - * mailbox, or other memory-device-standard manageability
>> - * flows.
>> - * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
>> - * HDM-H and class-mandatory memory device registers
>> - */
>> -enum cxl_devtype {
>> - CXL_DEVTYPE_DEVMEM,
>> - CXL_DEVTYPE_CLASSMEM,
>> -};
>> -
>> -/**
>> - * struct cxl_dpa_perf - DPA performance property entry
>> - * @dpa_range: range for DPA address
>> - * @coord: QoS performance data (i.e. latency, bandwidth)
>> - * @cdat_coord: raw QoS performance data from CDAT
>> - * @qos_class: QoS Class cookies
>> - */
>> -struct cxl_dpa_perf {
>> - struct range dpa_range;
>> - struct access_coordinate coord[ACCESS_COORDINATE_MAX];
>> - struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
>> - int qos_class;
>> -};
>> -
>> -/**
>> - * struct cxl_dpa_partition - DPA partition descriptor
>> - * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
>> - * @perf: performance attributes of the partition from CDAT
>> - * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
>> - */
>> -struct cxl_dpa_partition {
>> - struct resource res;
>> - struct cxl_dpa_perf perf;
>> - enum cxl_partition_mode mode;
>> -};
>> -
>> -/**
>> - * struct cxl_dev_state - The driver device state
>> - *
>> - * cxl_dev_state represents the CXL driver/device state. It provides an
>> - * interface to mailbox commands as well as some cached data about the device.
>> - * Currently only memory devices are represented.
>> - *
>> - * @dev: The device associated with this CXL state
>> - * @cxlmd: The device representing the CXL.mem capabilities of @dev
>> - * @reg_map: component and ras register mapping parameters
>> - * @regs: Parsed register blocks
>> - * @cxl_dvsec: Offset to the PCIe device DVSEC
>> - * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
>> - * @media_ready: Indicate whether the device media is usable
>> - * @dpa_res: Overall DPA resource tree for the device
>> - * @part: DPA partition array
>> - * @nr_partitions: Number of DPA partitions
>> - * @serial: PCIe Device Serial Number
>> - * @type: Generic Memory Class device or Vendor Specific Memory device
>> - * @cxl_mbox: CXL mailbox context
>> - * @cxlfs: CXL features context
>> - */
>> -struct cxl_dev_state {
>> - struct device *dev;
>> - struct cxl_memdev *cxlmd;
>> - struct cxl_register_map reg_map;
>> - struct cxl_regs regs;
>> - int cxl_dvsec;
>> - bool rcd;
>> - bool media_ready;
>> - struct resource dpa_res;
>> - struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
>> - unsigned int nr_partitions;
>> - u64 serial;
>> - enum cxl_devtype type;
>> - struct cxl_mailbox cxl_mbox;
>> -#ifdef CONFIG_CXL_FEATURES
>> - struct cxl_features_state *cxlfs;
>> -#endif
>> -};
>> -
>> static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
>> {
>> /*
>> @@ -858,7 +775,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
>> int cxl_await_media_ready(struct cxl_dev_state *cxlds);
>> int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
>> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
>> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
>> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>> + u16 dvsec);
>> void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>> unsigned long *cmds);
>> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 1cf232220873..24179cc702bf 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -911,25 +911,25 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> int rc, pmu_count;
>> unsigned int i;
>> bool irq_avail;
>> + u16 dvsec;
>>
>> rc = pcim_enable_device(pdev);
>> if (rc)
>> return rc;
>> pci_set_master(pdev);
>>
>> - mds = cxl_memdev_state_create(&pdev->dev);
>> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
>> + PCI_DVSEC_CXL_DEVICE);
>> + if (!dvsec)
>> + pci_warn(pdev, "Device DVSEC not present, skip CXL.mem init\n");
>> +
>> + mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
>> if (IS_ERR(mds))
>> return PTR_ERR(mds);
>> cxlds = &mds->cxlds;
>> pci_set_drvdata(pdev, cxlds);
>>
>> cxlds->rcd = is_cxl_restricted(pdev);
>> - cxlds->serial = pci_get_dsn(pdev);
>> - cxlds->cxl_dvsec = pci_find_dvsec_capability(
>> - pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
>> - if (!cxlds->cxl_dvsec)
>> - dev_warn(&pdev->dev,
>> - "Device DVSEC not present, skip CXL.mem init\n");
>>
>> rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>> if (rc)
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> new file mode 100644
>> index 000000000000..13d448686189
>> --- /dev/null
>> +++ b/include/cxl/cxl.h
>> @@ -0,0 +1,226 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2020 Intel Corporation. */
>> +/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
>> +
>> +#ifndef __CXL_CXL_H__
>> +#define __CXL_CXL_H__
>> +
>> +#include <linux/node.h>
>> +#include <linux/ioport.h>
>> +#include <cxl/mailbox.h>
>> +
>> +/**
>> + * enum cxl_devtype - delineate type-2 from a generic type-3 device
>> + * @CXL_DEVTYPE_DEVMEM: Vendor specific CXL Type-2 device implementing HDM-D or
>> + * HDM-DB, no requirement that this device implements a
>> + * mailbox, or other memory-device-standard manageability
>> + * flows.
>> + * @CXL_DEVTYPE_CLASSMEM: Common class definition of a CXL Type-3 device with
>> + * HDM-H and class-mandatory memory device registers
>> + */
>> +enum cxl_devtype {
>> + CXL_DEVTYPE_DEVMEM,
>> + CXL_DEVTYPE_CLASSMEM,
>> +};
>> +
>> +struct device;
>> +
>> +/*
>> + * Using struct_group() allows for per register-block-type helper routines,
>> + * without requiring block-type agnostic code to include the prefix.
>> + */
>> +struct cxl_regs {
>> + /*
>> + * Common set of CXL Component register block base pointers
>> + * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
>> + * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
>> + */
>> + struct_group_tagged(cxl_component_regs, component,
>> + void __iomem *hdm_decoder;
>> + void __iomem *ras;
>> + );
>> + /*
>> + * Common set of CXL Device register block base pointers
>> + * @status: CXL 2.0 8.2.8.3 Device Status Registers
>> + * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
>> + * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
>> + */
>> + struct_group_tagged(cxl_device_regs, device_regs,
>> + void __iomem *status, *mbox, *memdev;
>> + );
>> +
>> + struct_group_tagged(cxl_pmu_regs, pmu_regs,
>> + void __iomem *pmu;
>> + );
>> +
>> + /*
>> + * RCH downstream port specific RAS register
>> + * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
>> + */
>> + struct_group_tagged(cxl_rch_regs, rch_regs,
>> + void __iomem *dport_aer;
>> + );
>> +
>> + /*
>> + * RCD upstream port specific PCIe cap register
>> + * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
>> + */
>> + struct_group_tagged(cxl_rcd_regs, rcd_regs,
>> + void __iomem *rcd_pcie_cap;
>> + );
>> +};
>> +
>> +struct cxl_reg_map {
>> + bool valid;
>> + int id;
>> + unsigned long offset;
>> + unsigned long size;
>> +};
>> +
>> +struct cxl_component_reg_map {
>> + struct cxl_reg_map hdm_decoder;
>> + struct cxl_reg_map ras;
>> +};
>> +
>> +struct cxl_device_reg_map {
>> + struct cxl_reg_map status;
>> + struct cxl_reg_map mbox;
>> + struct cxl_reg_map memdev;
>> +};
>> +
>> +struct cxl_pmu_reg_map {
>> + struct cxl_reg_map pmu;
>> +};
>> +
>> +/**
>> + * struct cxl_register_map - DVSEC harvested register block mapping parameters
>> + * @host: device for devm operations and logging
>> + * @base: virtual base of the register-block-BAR + @block_offset
>> + * @resource: physical resource base of the register block
>> + * @max_size: maximum mapping size to perform register search
>> + * @reg_type: see enum cxl_regloc_type
>> + * @component_map: cxl_reg_map for component registers
>> + * @device_map: cxl_reg_maps for device registers
>> + * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
>> + */
>> +struct cxl_register_map {
>> + struct device *host;
>> + void __iomem *base;
>> + resource_size_t resource;
>> + resource_size_t max_size;
>> + u8 reg_type;
>> + union {
>> + struct cxl_component_reg_map component_map;
>> + struct cxl_device_reg_map device_map;
>> + struct cxl_pmu_reg_map pmu_map;
>> + };
>> +};
>> +
>> +/**
>> + * struct cxl_dpa_perf - DPA performance property entry
>> + * @dpa_range: range for DPA address
>> + * @coord: QoS performance data (i.e. latency, bandwidth)
>> + * @cdat_coord: raw QoS performance data from CDAT
>> + * @qos_class: QoS Class cookies
>> + */
>> +struct cxl_dpa_perf {
>> + struct range dpa_range;
>> + struct access_coordinate coord[ACCESS_COORDINATE_MAX];
>> + struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
>> + int qos_class;
>> +};
>> +
>> +enum cxl_partition_mode {
>> + CXL_PARTMODE_RAM,
>> + CXL_PARTMODE_PMEM,
>> +};
>> +
>> +/**
>> + * struct cxl_dpa_partition - DPA partition descriptor
>> + * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
>> + * @perf: performance attributes of the partition from CDAT
>> + * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
>> + */
>> +struct cxl_dpa_partition {
>> + struct resource res;
>> + struct cxl_dpa_perf perf;
>> + enum cxl_partition_mode mode;
>> +};
>> +
>> +#define CXL_NR_PARTITIONS_MAX 2
>> +
>> +/**
>> + * struct cxl_dev_state - The driver device state
>> + *
>> + * cxl_dev_state represents the CXL driver/device state. It provides an
>> + * interface to mailbox commands as well as some cached data about the device.
>> + * Currently only memory devices are represented.
>> + *
>> + * @dev: The device associated with this CXL state
>> + * @cxlmd: The device representing the CXL.mem capabilities of @dev
>> + * @reg_map: component and ras register mapping parameters
>> + * @regs: Parsed register blocks
>> + * @cxl_dvsec: Offset to the PCIe device DVSEC
>> + * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
>> + * @media_ready: Indicate whether the device media is usable
>> + * @dpa_res: Overall DPA resource tree for the device
>> + * @part: DPA partition array
>> + * @nr_partitions: Number of DPA partitions
>> + * @serial: PCIe Device Serial Number
>> + * @type: Generic Memory Class device or Vendor Specific Memory device
>> + * @cxl_mbox: CXL mailbox context
>> + * @cxlfs: CXL features context
>> + */
>> +struct cxl_dev_state {
>> + /* public for Type2 drivers */
>> + struct device *dev;
>> + struct cxl_memdev *cxlmd;
>> +
>> + /* private for Type2 drivers */
>> + struct cxl_register_map reg_map;
>> + struct cxl_regs regs;
>> + int cxl_dvsec;
>> + bool rcd;
>> + bool media_ready;
>> + struct resource dpa_res;
>> + struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
>> + unsigned int nr_partitions;
>> + u64 serial;
>> + enum cxl_devtype type;
>> + struct cxl_mailbox cxl_mbox;
>> +#ifdef CONFIG_CXL_FEATURES
>> + struct cxl_features_state *cxlfs;
>> +#endif
>> +};
>> +
>> +struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> + enum cxl_devtype type,
>> + u64 serial, u16 dvsec,
>> + size_t size, bool has_mbox);
>> +
>> +/**
>> + * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
>> + * driver specific struct.
>> + *
>> + * @parent: device behind the request
>> + * @type: CXL device type
>> + * @serial: device identification
>> + * @dvsec: dvsec capability offset
>> + * @drv_struct: driver struct embedding a cxl_dev_state struct
>> + * @member: drv_struct member as cxl_dev_state
>> + * @mbox: true if mailbox supported
>> + *
>> + * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
>> + * struct initialized.
>> + *
>> + * Introduced for Type2 driver support.
>> + */
>> +#define devm_cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
>> + ({ \
>> + static_assert(__same_type(struct cxl_dev_state, \
>> + ((drv_struct *)NULL)->member)); \
>> + static_assert(offsetof(drv_struct, member) == 0); \
>> + (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
>> + sizeof(drv_struct), mbox); \
>> + })
>> +#endif /* __CXL_CXL_H__ */
>> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
>> index cb87e8c0e63c..79f42f4474d4 100644
>> --- a/tools/testing/cxl/test/mem.c
>> +++ b/tools/testing/cxl/test/mem.c
>> @@ -1716,7 +1716,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>> if (rc)
>> return rc;
>>
>> - mds = cxl_memdev_state_create(dev);
>> + mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
>> if (IS_ERR(mds))
>> return PTR_ERR(mds);
>>
>> @@ -1732,7 +1732,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
>> mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
>> INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
>>
>> - cxlds->serial = pdev->id + 1;
>> if (is_rcd(pdev))
>> cxlds->rcd = true;
>>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-19 8:55 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 8:55 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 2/11/26 22:10, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Check if device HDM is already committed during firmware/BIOS
>> initialization.
>>
>> A CXL region should exist if so after memdev allocation/initialization.
>> Get HPA from region and map it.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 28 +++++++++++++++++++++++++++-
>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index a77ef4783fcb..3536eccf1b2a 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -19,6 +19,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> struct efx_cxl *cxl;
>> + struct range range;
>> u16 dvsec;
>> int rc;
>>
>> @@ -90,13 +91,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> - probe_data->cxl = cxl;
>> + cxl->cxled = cxl_get_committed_decoder(cxl->cxlmd, &cxl->efx_region);
>> + if (cxl->cxled) {
>> + if (!cxl->efx_region) {
>> + pci_err(pci_dev, "CXL found committed decoder without a region");
>> + return -ENODEV;
>> + }
>> + rc = cxl_get_region_range(cxl->efx_region, &range);
> Missing an empty line above.
Right.
>
>> + if (rc) {
>> + pci_err(pci_dev,
>> + "CXL getting regions params from a committed decoder failed");
>> + return rc;
>> + }
>> +
>> + cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
> Maybe use range_len() instead for the second parameter?
Sure.
Thanks!
>
>> + if (!cxl->ctpio_cxl) {
>> + pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
>> + return -ENOMEM;
>> + }
>> +
>> + probe_data->cxl = cxl;
>> + }
>>
>> return 0;
>> }
>>
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> + if (!probe_data->cxl)
>> + return;
>> +
>> + iounmap(probe_data->cxl->ctpio_cxl);
>> + cxl_unregister_region(probe_data->cxl->efx_region);
>> }
>>
>> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-19 9:58 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
0 siblings, 1 reply; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 9:58 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Jonathan Cameron, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/11/26 22:10, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> CXL region creation involves allocating capacity from Device Physical
>> Address (DPA) and assigning it to decode a given Host Physical Address
>> (HPA). Before determining how much DPA to allocate the amount of available
>> HPA must be determined. Also, not all HPA is created equal, some HPA
>> targets RAM, some targets PMEM, some is prepared for device-memory flows
>> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>>
>> In order to support Type2 CXL devices, wrap all of those concerns into
>> an API that retrieves a root decoder (platform CXL window) that fits the
>> specified constraints and the capacity available for a new region.
>>
>> Add a complementary function for releasing the reference to such root
>> decoder.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 3 +
>> include/cxl/cxl.h | 6 ++
>> 3 files changed, 173 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 954b8fcdbac6..bdefd088f5f1 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
>> return 0;
>> }
>>
>> +struct cxlrd_max_context {
>> + struct device * const *host_bridges;
>> + int interleave_ways;
>> + unsigned long flags;
>> + resource_size_t max_hpa;
>> + struct cxl_root_decoder *cxlrd;
>> +};
>> +
>> +static int find_max_hpa(struct device *dev, void *data)
>> +{
>> + struct cxlrd_max_context *ctx = data;
>> + struct cxl_switch_decoder *cxlsd;
>> + struct cxl_root_decoder *cxlrd;
>> + struct resource *res, *prev;
>> + struct cxl_decoder *cxld;
>> + resource_size_t free = 0;
>> + resource_size_t max;
>> + int found = 0;
>> +
>> + if (!is_root_decoder(dev))
>> + return 0;
>> +
>> + cxlrd = to_cxl_root_decoder(dev);
>> + cxlsd = &cxlrd->cxlsd;
>> + cxld = &cxlsd->cxld;
>> +
>> + if ((cxld->flags & ctx->flags) != ctx->flags) {
>> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
>> + cxld->flags, ctx->flags);
>> + return 0;
>> + }
>> +
>> + for (int i = 0; i < ctx->interleave_ways; i++) {
>> + for (int j = 0; j < ctx->interleave_ways; j++) {
>> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
>> + found++;
>> + break;
>> + }
>> + }
>> + }
> This may be over complicated. I'm not quite sure how it works (I'm just slow today I guess), but I understand
> what the intention is based on the debug print below. My issue is that ctx->host_bridges is only set to 1 host
> bridge (endpoint->host_bridge) in cxl_get_hpa_freespace(), which is the only caller of this function. At that
> point, why have the outer loop at all? At that point, you could also simplify ctx->host_bridges to only
> be a struct device * const.
>
> Maybe this gets called elsewhere later on in the series? I haven't looked at the rest yet. If I'm wrong, then
> I'd probably add a comment saying what the cxlsd->target[] entries are supposed to be pointing at.
Hi Ben,
I do remember this one.
Dan's original patches had this support for interleaving, then I removed
it as the case for Type2 and interleaving is quite unlikely, at least
right now and likely in the near future. But I was told why do not
support it as it was trivial to do so. FWIW, If I think only about the
use case coming with the patchset, I agree with you, but because those
previous discussions, I think I have to leave it.
Thank you
>> +
>> + if (found != ctx->interleave_ways) {
>> + dev_dbg(dev,
>> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
>> + found, ctx->interleave_ways);
>> + return 0;
>> + }
>> +
>> + /*
>> + * Walk the root decoder resource range relying on cxl_rwsem.region to
>> + * preclude sibling arrival/departure and find the largest free space
>> + * gap.
>> + */
>> + lockdep_assert_held_read(&cxl_rwsem.region);
>> + res = cxlrd->res->child;
>> +
>> + /* With no resource child the whole parent resource is available */
>> + if (!res)
>> + max = resource_size(cxlrd->res);
>> + else
>> + max = 0;
>> +
>> + for (prev = NULL; res; prev = res, res = res->sibling) {
>> + if (!prev && res->start == cxlrd->res->start &&
>> + res->end == cxlrd->res->end) {
>> + max = resource_size(cxlrd->res);
>> + break;
>> + }
>> + /*
>> + * Sanity check for preventing arithmetic problems below as a
>> + * resource with size 0 could imply using the end field below
>> + * when set to unsigned zero - 1 or all f in hex.
>> + */
>> + if (prev && !resource_size(prev))
>> + continue;
>> +
>> + if (!prev && res->start > cxlrd->res->start) {
>> + free = res->start - cxlrd->res->start;
>> + max = max(free, max);
>> + }
>> + if (prev && res->start > prev->end + 1) {
>> + free = res->start - prev->end + 1;
>> + max = max(free, max);
>> + }
>> + }
>> +
>> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
>> + free = cxlrd->res->end + 1 - prev->end + 1;
>> + max = max(free, max);
>> + }
>> +
>> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
>> + if (max > ctx->max_hpa) {
>> + if (ctx->cxlrd)
>> + put_device(cxlrd_dev(ctx->cxlrd));
>> + get_device(cxlrd_dev(cxlrd));
>> + ctx->cxlrd = cxlrd;
>> + ctx->max_hpa = max;
>> + }
>> + return 0;
>> +}
>> +
>> +/**
>> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
>> + * @cxlmd: the mem device requiring the HPA
>> + * @interleave_ways: number of entries in @host_bridges
>> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
>> + * @max_avail_contig: output parameter of max contiguous bytes available in the
>> + * returned decoder
>> + *
>> + * Returns a pointer to a struct cxl_root_decoder
>> + *
>> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
>> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
>> + * caller goes to use this decoder and its capacity is reduced then caller needs
>> + * to loop and retry.
>> + *
>> + * The returned root decoder has an elevated reference count that needs to be
>> + * put with cxl_put_root_decoder(cxlrd).
>> + */
>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> + int interleave_ways,
>> + unsigned long flags,
>> + resource_size_t *max_avail_contig)
>> +{
>> + struct cxlrd_max_context ctx = {
>> + .flags = flags,
>> + .interleave_ways = interleave_ways,
>> + };
>> + struct cxl_port *root_port;
>> + struct cxl_port *endpoint;
>> +
>> + endpoint = cxlmd->endpoint;
>> + if (!endpoint) {
>> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + ctx.host_bridges = &endpoint->host_bridge;
> Mentioned earlier, interleave_ways is effectively hardcoded to 1 (unless I'm misunderstanding
> something). I think what you want here is to go to the CXL root and pass in the children (i.e. host bridges)?
> I'm not sure of what the fix is to get the intended behavior.
>
> It may be worth getting rid of the interleave_ways portion of this function and
> add it later when someone needs it. You could also explain it's hard coded to 1/unused
> in the doc comment if you know of an immediate need for it.
>
>> +
>> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
>> + if (!root) {
>> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + root_port = &root->port;
>> + scoped_guard(rwsem_read, &cxl_rwsem.region)
>> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> Can just use a guard() here.
>
>> +
>> + if (!ctx.cxlrd)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + *max_avail_contig = ctx.max_hpa;
>> + return ctx.cxlrd;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
>> +
>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
>> +{
>> + put_device(cxlrd_dev(cxlrd));
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
>> +
>> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
>> const char *buf, size_t len)
>> {
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 944c5d1ccceb..c7d9b2c2908f 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
>> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
>> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
>> bool is_root_decoder(struct device *dev);
>> +
>> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
>> +
>> bool is_switch_decoder(struct device *dev);
>> bool is_endpoint_decoder(struct device *dev);
>> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 92880c26b2d5..834dc7e78934 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
>> struct range;
>> int cxl_get_region_range(struct cxl_region *region, struct range *range);
>> void cxl_unregister_region(struct cxl_region *cxlr);
>> +struct cxl_port;
>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> + int interleave_ways,
>> + unsigned long flags,
>> + resource_size_t *max);
>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation
2026-02-11 22:12 ` Cheatham, Benjamin
@ 2026-02-19 10:26 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 10:26 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Jonathan Cameron, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/11/26 22:12, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation involves finding available DPA (device-physical-address)
>> capacity to map into HPA (host-physical-address) space.
>>
>> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
>> that tries to allocate the DPA memory the driver requires to operate.The
>> memory requested should not be bigger than the max available HPA obtained
>> previously with cxl_get_hpa_freespace().
>>
>> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/cxl/core/hdm.c | 84 ++++++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 1 +
>> include/cxl/cxl.h | 5 +++
>> 3 files changed, 90 insertions(+)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index a172ce4e9b19..d60a697f12cc 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -3,6 +3,7 @@
>> #include <linux/seq_file.h>
>> #include <linux/device.h>
>> #include <linux/delay.h>
>> +#include <cxl/cxl.h>
>>
>> #include "cxlmem.h"
>> #include "core.h"
>> @@ -546,6 +547,12 @@ bool cxl_resource_contains_addr(const struct resource *res, const resource_size_
>> return resource_contains(res, &_addr);
>> }
>>
>> +/**
>> + * cxl_dpa_free - release DPA (Device Physical Address)
>> + * @cxled: endpoint decoder linked to the DPA
>> + *
>> + * Returns 0 or error.
>> + */
>> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_port *port = cxled_to_port(cxled);
>> @@ -572,6 +579,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> devm_cxl_dpa_release(cxled);
>> return 0;
>> }
>> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>>
>> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>> enum cxl_partition_mode mode)
>> @@ -603,6 +611,82 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>> return 0;
>> }
>>
>> +static int find_free_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_port *port;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + port = cxled_to_port(cxled);
>> +
>> + return cxled->cxld.id == (port->hdm_end + 1);
>> +}
>> +
>> +static struct cxl_endpoint_decoder *
>> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct device *dev;
>> +
>> + guard(rwsem_read)(&cxl_rwsem.dpa);
>> + dev = device_find_child(&endpoint->dev, NULL,
>> + find_free_decoder);
>> + if (!dev)
>> + return NULL;
>> +
>> + return to_cxl_endpoint_decoder(dev);
>> +}
>> +
>> +/**
>> + * cxl_request_dpa - search and reserve DPA given input constraints
>> + * @cxlmd: memdev with an endpoint port with available decoders
>> + * @mode: CXL partition mode (ram vs pmem)
>> + * @alloc: dpa size required
>> + *
>> + * Returns a pointer to a 'struct cxl_endpoint_decoder' on success or
>> + * an errno encoded pointer on failure.
>> + *
>> + * Given that a region needs to allocate from limited HPA capacity it
>> + * may be the case that a device has more mappable DPA capacity than
>> + * available HPA. The expectation is that @alloc is a driver known
>> + * value based on the device capacity but which could not be fully
>> + * available due to HPA constraints.
>> + *
>> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
>> + * reserved, or an error pointer. The caller is also expected to own the
>> + * lifetime of the memdev registration associated with the endpoint to
>> + * pin the decoder registered as well.
>> + */
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc)
>> +{
>> + int rc;
>> +
>> + if (!IS_ALIGNED(alloc, SZ_256M))
>> + return ERR_PTR(-EINVAL);
>> +
>> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
>> + cxl_find_free_decoder(cxlmd);
>> +
>> + if (!cxled)
>> + return ERR_PTR(-ENODEV);
>> +
>> + rc = cxl_dpa_set_part(cxled, mode);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + rc = cxl_dpa_alloc(cxled, alloc);
>> + if (rc)
>> + return ERR_PTR(rc);
> Should cxl_dpa_set_part() be unwound here, or does it not matter?
No, I do not think that is necessary. The CXL initialization fails, and
the result is the modified struct will be released sooner or later.
> If it doesn't matter:
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Thank you
>> +
>> + return no_free_ptr(cxled);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
>> +
>> static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
>> {
>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index d1b010e5e1d0..2b1f7d687a0e 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -667,6 +667,7 @@ struct cxl_root *find_cxl_root(struct cxl_port *port);
>>
>> DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
>> DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
>> +DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxld.dev))
>> DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev))
>> DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
>>
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 783ad570a6eb..4802371db00e 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -7,6 +7,7 @@
>>
>> #include <linux/node.h>
>> #include <linux/ioport.h>
>> +#include <linux/range.h>
>> #include <cxl/mailbox.h>
>>
>> /**
>> @@ -276,4 +277,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> unsigned long flags,
>> resource_size_t *max);
>> void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc);
>> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 17/22] cxl/region: Factor out interleave ways setup
2026-02-11 22:11 ` Cheatham, Benjamin
@ 2026-02-19 10:40 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
0 siblings, 1 reply; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 10:40 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Zhi Wang, Jonathan Cameron, Alison Schofield, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/11/26 22:11, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation based on Type3 devices is triggered from user space
>> allowing memory combination through interleaving.
>>
>> In preparation for kernel driven region creation, that is Type2 drivers
>> triggering region creation backed with its advertised CXL memory, factor
>> out a common helper from the user-sysfs region setup for interleave ways.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> ---
>> drivers/cxl/core/region.c | 43 ++++++++++++++++++++++++---------------
>> 1 file changed, 27 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index f53b2e9fd9e6..ece1d3df7cf1 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -485,22 +485,14 @@ static ssize_t interleave_ways_show(struct device *dev,
>>
>> static const struct attribute_group *get_cxl_region_target_group(void);
>>
>> -static ssize_t interleave_ways_store(struct device *dev,
>> - struct device_attribute *attr,
>> - const char *buf, size_t len)
>> +static int set_interleave_ways(struct cxl_region *cxlr, int val)
> @val should probably stay an unsigned int. You pass an unsigned int in the sysfs function, and the
> function was originally coded with that in mind (same with @save below).
Good catch. I wonder if I should just change the way the value is
obtained, using kstrtoint instead of kstrtouint, as those values are
used for cxl_region_params fields defined as int. In other words, it
seems doing that simpler than changing all the other places you mention
and the structs involved. I can not see a reason for using unsigned int
so I think I will follow that approach. Tell me if you think otherwise.
Thank you
> With that cleaned up:
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>
>> {
>> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
>> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>> struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
>> - struct cxl_region *cxlr = to_cxl_region(dev);
>> struct cxl_region_params *p = &cxlr->params;
>> - unsigned int val, save;
>> - int rc;
>> + int save, rc;
>> u8 iw;
>>
>> - rc = kstrtouint(buf, 0, &val);
>> - if (rc)
>> - return rc;
>> -
>> rc = ways_to_eiw(val, &iw);
>> if (rc)
>> return rc;
>> @@ -515,9 +507,7 @@ static ssize_t interleave_ways_store(struct device *dev,
>> return -EINVAL;
>> }
>>
>> - ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
>> - if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
>> - return rc;
>> + lockdep_assert_held_write(&cxl_rwsem.region);
>>
>> if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
>> return -EBUSY;
>> @@ -525,10 +515,31 @@ static ssize_t interleave_ways_store(struct device *dev,
>> save = p->interleave_ways;
>> p->interleave_ways = val;
>> rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
>> - if (rc) {
>> + if (rc)
>> p->interleave_ways = save;
>> +
>> + return rc;
>> +}
>> +
>> +static ssize_t interleave_ways_store(struct device *dev,
>> + struct device_attribute *attr,
>> + const char *buf, size_t len)
>> +{
>> + struct cxl_region *cxlr = to_cxl_region(dev);
>> + unsigned int val;
>> + int rc;
>> +
>> + rc = kstrtouint(buf, 0, &val);
>> + if (rc)
>> + return rc;
>> +
>> + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
>> + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
>> + return rc;
>> +
>> + rc = set_interleave_ways(cxlr, val);
>> + if (rc)
>> return rc;
>> - }
>>
>> return len;
>> }
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 19/22] cxl: Allow region creation by type2 drivers
2026-02-11 22:11 ` Cheatham, Benjamin
@ 2026-02-19 10:48 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 10:48 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Jonathan Cameron, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/11/26 22:11, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Creating a CXL region requires userspace intervention through the cxl
>> sysfs files. Type2 support should allow accelerator drivers to create
>> such cxl region from kernel code.
>>
>> Adding that functionality and integrating it with current support for
>> memory expanders.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/cxl/core/region.c | 131 ++++++++++++++++++++++++++++++++++++--
>> include/cxl/cxl.h | 3 +
>> 2 files changed, 127 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 63c2aeb2ee1f..293e63dfef22 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2944,6 +2944,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
>> return to_cxl_region(region_dev);
>> }
>>
>> +static void drop_region(struct cxl_region *cxlr)
>> +{
>> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>> + struct cxl_port *port = cxlrd_to_port(cxlrd);
>> +
>> + devm_release_action(port->uport_dev, __unregister_region, cxlr);
>> +}
>> +
>> static ssize_t delete_region_store(struct device *dev,
>> struct device_attribute *attr,
>> const char *buf, size_t len)
>> @@ -4047,14 +4055,12 @@ static int __construct_region(struct cxl_region *cxlr,
>> return 0;
>> }
>>
>> -/* Establish an empty region covering the given HPA range */
>> -static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> - struct cxl_endpoint_decoder *cxled)
>> +static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>> - struct cxl_port *port = cxlrd_to_port(cxlrd);
>> struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> - int rc, part = READ_ONCE(cxled->part);
>> + int part = READ_ONCE(cxled->part);
>> struct cxl_region *cxlr;
>>
>> do {
>> @@ -4063,13 +4069,26 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> cxled->cxld.target_type);
>> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>>
>> - if (IS_ERR(cxlr)) {
>> + if (IS_ERR(cxlr))
>> dev_err(cxlmd->dev.parent,
>> "%s:%s: %s failed assign region: %ld\n",
>> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
>> __func__, PTR_ERR(cxlr));
>> +
>> + return cxlr;
>> +}
>> +
>> +/* Establish an empty region covering the given HPA range */
>> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder *cxled)
>> +{
>> + struct cxl_port *port = cxlrd_to_port(cxlrd);
>> + struct cxl_region *cxlr;
>> + int rc;
>> +
>> + cxlr = construct_region_begin(cxlrd, cxled);
>> + if (IS_ERR(cxlr))
>> return cxlr;
>> - }
>>
>> rc = __construct_region(cxlr, cxlrd, cxled);
>> if (rc) {
>> @@ -4080,6 +4099,104 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> return cxlr;
>> }
>>
>> +DEFINE_FREE(cxl_region_drop, struct cxl_region *, if (_T) drop_region(_T))
> This needs to be "if (!IS_ERR_OR_NULL(_T) drop_region(_T)". If construct_region_begin() returns an
> error pointer, drop_region() will be called with it as of now leading to a garbage pointer deref.
That's true. I will fix it.
Thank you!
>> +
>> +static struct cxl_region *
>> +__construct_new_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder **cxled, int ways)
>> +{
>> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled[0]);
>> + struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
>> + struct cxl_region_params *p;
>> + resource_size_t size = 0;
>> + int rc, i;
>> +
>> + struct cxl_region *cxlr __free(cxl_region_drop) =
>> + construct_region_begin(cxlrd, cxled[0]);
>> + if (IS_ERR(cxlr))
>> + return cxlr;
>> +
>> + guard(rwsem_write)(&cxl_rwsem.region);
>> +
>> + /*
>> + * Sanity check. This should not happen with an accel driver handling
>> + * the region creation.
>> + */
>> + p = &cxlr->params;
>> + if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
>> + dev_err(cxlmd->dev.parent,
>> + "%s:%s: %s unexpected region state\n",
>> + dev_name(&cxlmd->dev), dev_name(&cxled[0]->cxld.dev),
>> + __func__);
>> + return ERR_PTR(-EBUSY);
>> + }
>> +
>> + rc = set_interleave_ways(cxlr, ways);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + scoped_guard(rwsem_read, &cxl_rwsem.dpa) {
>> + for (i = 0; i < ways; i++) {
>> + if (!cxled[i]->dpa_res)
>> + return ERR_PTR(-EINVAL);
>> + size += resource_size(cxled[i]->dpa_res);
>> + }
>> +
>> + rc = alloc_hpa(cxlr, size);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + for (i = 0; i < ways; i++) {
>> + rc = cxl_region_attach(cxlr, cxled[i], 0);
> Position parameter is hardcoded to 0. It should be set to i, right? This kind of goes back to my
> issues in patch 12/22; the interleaving functionality is there but it looks unused.
>
>> + if (rc)
>> + return ERR_PTR(rc);
>> + }
>> + }
>> +
>> + rc = cxl_region_decode_commit(cxlr);
>> + if (rc)
>> + return ERR_PTR(rc);
>> +
>> + p->state = CXL_CONFIG_COMMIT;
>> +
>> + return no_free_ptr(cxlr);
>> +}
>> +
>> +/**
>> + * cxl_create_region - Establish a region given an endpoint decoder
>> + * @cxlrd: root decoder to allocate HPA
>> + * @cxled: endpoint decoders with reserved DPA capacity
>> + * @ways: interleave ways required
>> + *
>> + * Returns a fully formed region in the commit state and attached to the
>> + * cxl_region driver.
>> + */
>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder **cxled,
>> + int ways)
>> +{
>> + struct cxl_region *cxlr;
>> +
>> + mutex_lock(&cxlrd->range_lock);
>> + cxlr = __construct_new_region(cxlrd, cxled, ways);
>> + mutex_unlock(&cxlrd->range_lock);
>> + if (IS_ERR(cxlr))
>> + return cxlr;
>> +
>> + if (device_attach(&cxlr->dev) <= 0) {
>> + dev_err(&cxlr->dev, "failed to create region\n");
>> + drop_region(cxlr);
>> + return ERR_PTR(-ENODEV);
>> + }
>> +
>> + return cxlr;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
>> +
>> static struct cxl_region *
>> cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
>> {
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 4802371db00e..50acbd13bcf8 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -281,4 +281,7 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> enum cxl_partition_mode mode,
>> resource_size_t alloc);
>> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder **cxled,
>> + int ways);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 20/22] cxl: Avoid dax creation for accelerators
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-19 10:50 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-19 10:50 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau
Cc: Jonathan Cameron, Davidlohr Bueso, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/11/26 22:10, Cheatham, Benjamin wrote:
> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> By definition a type2 cxl device will use the host managed memory for
>> specific functionality, therefore it should not be available to other
>> uses.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> ---
>> drivers/cxl/core/region.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 293e63dfef22..12df717cc881 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -4441,6 +4441,13 @@ static int cxl_region_probe(struct device *dev)
>> if (rc)
>> return rc;
>>
>> + /*
>> + * HDM-D[B] (device-memory) regions have accelerator specific usage.
>> + * Skip device-dax registration.
>> + */
>> + if (cxlr->type == CXL_DECODER_DEVMEM)
>> + return 0;
> Minor nit: Should probably move this to be the first thing in the function. It would save
> having to acquire a lock in cxl_region_can_probe() above. Keep my reviewed-by either way,
> it's really just a minor optimization.
It makes sense. I'll do it.
Thanks
>> +
>> /*
>> * From this point on any path that changes the region's state away from
>> * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 17/22] cxl/region: Factor out interleave ways setup
2026-02-19 10:40 ` Alejandro Lucero Palau
@ 2026-02-19 17:29 ` Cheatham, Benjamin
0 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-19 17:29 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau
Cc: Zhi Wang, Jonathan Cameron, Alison Schofield, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 2/19/2026 4:40 AM, Alejandro Lucero Palau wrote:
>
> On 2/11/26 22:11, Cheatham, Benjamin wrote:
>> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Region creation based on Type3 devices is triggered from user space
>>> allowing memory combination through interleaving.
>>>
>>> In preparation for kernel driven region creation, that is Type2 drivers
>>> triggering region creation backed with its advertised CXL memory, factor
>>> out a common helper from the user-sysfs region setup for interleave ways.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>>> ---
>>> drivers/cxl/core/region.c | 43 ++++++++++++++++++++++++---------------
>>> 1 file changed, 27 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index f53b2e9fd9e6..ece1d3df7cf1 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -485,22 +485,14 @@ static ssize_t interleave_ways_show(struct device *dev,
>>> static const struct attribute_group *get_cxl_region_target_group(void);
>>> -static ssize_t interleave_ways_store(struct device *dev,
>>> - struct device_attribute *attr,
>>> - const char *buf, size_t len)
>>> +static int set_interleave_ways(struct cxl_region *cxlr, int val)
>> @val should probably stay an unsigned int. You pass an unsigned int in the sysfs function, and the
>> function was originally coded with that in mind (same with @save below).
>
> Good catch. I wonder if I should just change the way the value is obtained, using kstrtoint instead of kstrtouint, as those values are used for cxl_region_params fields defined as int. In other words, it seems doing that simpler than changing all the other places you mention and the structs involved. I can not see a reason for using unsigned int so I think I will follow that approach. Tell me if you think otherwise.
>
If I had to guess unsigned int was used because a negative interleave granularity/ways makes no sense. I think your suggestion is fine though since no one
in their right mind would give anything but a (relatively) small and positive value for these.
Thanks,
Ben
>
> Thank you
>
>
>> With that cleaned up:
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>
>>> {
>>> - struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
>>> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>>> struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
>>> - struct cxl_region *cxlr = to_cxl_region(dev);
>>> struct cxl_region_params *p = &cxlr->params;
>>> - unsigned int val, save;
>>> - int rc;
>>> + int save, rc;
>>> u8 iw;
>>> - rc = kstrtouint(buf, 0, &val);
>>> - if (rc)
>>> - return rc;
>>> -
>>> rc = ways_to_eiw(val, &iw);
>>> if (rc)
>>> return rc;
>>> @@ -515,9 +507,7 @@ static ssize_t interleave_ways_store(struct device *dev,
>>> return -EINVAL;
>>> }
>>> - ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
>>> - if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
>>> - return rc;
>>> + lockdep_assert_held_write(&cxl_rwsem.region);
>>> if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
>>> return -EBUSY;
>>> @@ -525,10 +515,31 @@ static ssize_t interleave_ways_store(struct device *dev,
>>> save = p->interleave_ways;
>>> p->interleave_ways = val;
>>> rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
>>> - if (rc) {
>>> + if (rc)
>>> p->interleave_ways = save;
>>> +
>>> + return rc;
>>> +}
>>> +
>>> +static ssize_t interleave_ways_store(struct device *dev,
>>> + struct device_attribute *attr,
>>> + const char *buf, size_t len)
>>> +{
>>> + struct cxl_region *cxlr = to_cxl_region(dev);
>>> + unsigned int val;
>>> + int rc;
>>> +
>>> + rc = kstrtouint(buf, 0, &val);
>>> + if (rc)
>>> + return rc;
>>> +
>>> + ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
>>> + if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
>>> + return rc;
>>> +
>>> + rc = set_interleave_ways(cxlr, val);
>>> + if (rc)
>>> return rc;
>>> - }
>>> return len;
>>> }
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-19 9:58 ` Alejandro Lucero Palau
@ 2026-02-19 17:29 ` Cheatham, Benjamin
0 siblings, 0 replies; 67+ messages in thread
From: Cheatham, Benjamin @ 2026-02-19 17:29 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau
Cc: Jonathan Cameron, linux-cxl, netdev, dan.j.williams, edward.cree,
davem, kuba, pabeni, edumazet, dave.jiang
On 2/19/2026 3:58 AM, Alejandro Lucero Palau wrote:
>
> On 2/11/26 22:10, Cheatham, Benjamin wrote:
>> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> CXL region creation involves allocating capacity from Device Physical
>>> Address (DPA) and assigning it to decode a given Host Physical Address
>>> (HPA). Before determining how much DPA to allocate the amount of available
>>> HPA must be determined. Also, not all HPA is created equal, some HPA
>>> targets RAM, some targets PMEM, some is prepared for device-memory flows
>>> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>>>
>>> In order to support Type2 CXL devices, wrap all of those concerns into
>>> an API that retrieves a root decoder (platform CXL window) that fits the
>>> specified constraints and the capacity available for a new region.
>>>
>>> Add a complementary function for releasing the reference to such root
>>> decoder.
>>>
>>> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
>>> drivers/cxl/cxl.h | 3 +
>>> include/cxl/cxl.h | 6 ++
>>> 3 files changed, 173 insertions(+)
>>>
>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>>> index 954b8fcdbac6..bdefd088f5f1 100644
>>> --- a/drivers/cxl/core/region.c
>>> +++ b/drivers/cxl/core/region.c
>>> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
>>> return 0;
>>> }
>>> +struct cxlrd_max_context {
>>> + struct device * const *host_bridges;
>>> + int interleave_ways;
>>> + unsigned long flags;
>>> + resource_size_t max_hpa;
>>> + struct cxl_root_decoder *cxlrd;
>>> +};
>>> +
>>> +static int find_max_hpa(struct device *dev, void *data)
>>> +{
>>> + struct cxlrd_max_context *ctx = data;
>>> + struct cxl_switch_decoder *cxlsd;
>>> + struct cxl_root_decoder *cxlrd;
>>> + struct resource *res, *prev;
>>> + struct cxl_decoder *cxld;
>>> + resource_size_t free = 0;
>>> + resource_size_t max;
>>> + int found = 0;
>>> +
>>> + if (!is_root_decoder(dev))
>>> + return 0;
>>> +
>>> + cxlrd = to_cxl_root_decoder(dev);
>>> + cxlsd = &cxlrd->cxlsd;
>>> + cxld = &cxlsd->cxld;
>>> +
>>> + if ((cxld->flags & ctx->flags) != ctx->flags) {
>>> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
>>> + cxld->flags, ctx->flags);
>>> + return 0;
>>> + }
>>> +
>>> + for (int i = 0; i < ctx->interleave_ways; i++) {
>>> + for (int j = 0; j < ctx->interleave_ways; j++) {
>>> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
>>> + found++;
>>> + break;
>>> + }
>>> + }
>>> + }
>> This may be over complicated. I'm not quite sure how it works (I'm just slow today I guess), but I understand
>> what the intention is based on the debug print below. My issue is that ctx->host_bridges is only set to 1 host
>> bridge (endpoint->host_bridge) in cxl_get_hpa_freespace(), which is the only caller of this function. At that
>> point, why have the outer loop at all? At that point, you could also simplify ctx->host_bridges to only
>> be a struct device * const.
>>
>> Maybe this gets called elsewhere later on in the series? I haven't looked at the rest yet. If I'm wrong, then
>> I'd probably add a comment saying what the cxlsd->target[] entries are supposed to be pointing at.
>
>
> Hi Ben,
>
>
> I do remember this one.
>
>
> Dan's original patches had this support for interleaving, then I removed it as the case for Type2 and interleaving is quite unlikely, at least right now and likely in the near future. But I was told why do not support it as it was trivial to do so. FWIW, If I think only about the use case coming with the patchset, I agree with you, but because those previous discussions, I think I have to leave it.
>
I'm fine with that, but I would at least do the fix with the decoder position in 19/22 and make a note that the
interleave_ways parameter in cxl_get_hpa_freespace() below is currently unused (unless I'm misunderstanding
the endpoint->host_bridge member).
That way, the support is mostly there and just requires a small, previously noted, addition to enable. If you're
fine with that then feel free to add my Reviewed-by after implementing in v24.
Thanks,
Ben
>
> Thank you
>
>
>>> +
>>> + if (found != ctx->interleave_ways) {
>>> + dev_dbg(dev,
>>> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
>>> + found, ctx->interleave_ways);
>>> + return 0;
>>> + }
>>> +
>>> + /*
>>> + * Walk the root decoder resource range relying on cxl_rwsem.region to
>>> + * preclude sibling arrival/departure and find the largest free space
>>> + * gap.
>>> + */
>>> + lockdep_assert_held_read(&cxl_rwsem.region);
>>> + res = cxlrd->res->child;
>>> +
>>> + /* With no resource child the whole parent resource is available */
>>> + if (!res)
>>> + max = resource_size(cxlrd->res);
>>> + else
>>> + max = 0;
>>> +
>>> + for (prev = NULL; res; prev = res, res = res->sibling) {
>>> + if (!prev && res->start == cxlrd->res->start &&
>>> + res->end == cxlrd->res->end) {
>>> + max = resource_size(cxlrd->res);
>>> + break;
>>> + }
>>> + /*
>>> + * Sanity check for preventing arithmetic problems below as a
>>> + * resource with size 0 could imply using the end field below
>>> + * when set to unsigned zero - 1 or all f in hex.
>>> + */
>>> + if (prev && !resource_size(prev))
>>> + continue;
>>> +
>>> + if (!prev && res->start > cxlrd->res->start) {
>>> + free = res->start - cxlrd->res->start;
>>> + max = max(free, max);
>>> + }
>>> + if (prev && res->start > prev->end + 1) {
>>> + free = res->start - prev->end + 1;
>>> + max = max(free, max);
>>> + }
>>> + }
>>> +
>>> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
>>> + free = cxlrd->res->end + 1 - prev->end + 1;
>>> + max = max(free, max);
>>> + }
>>> +
>>> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
>>> + if (max > ctx->max_hpa) {
>>> + if (ctx->cxlrd)
>>> + put_device(cxlrd_dev(ctx->cxlrd));
>>> + get_device(cxlrd_dev(cxlrd));
>>> + ctx->cxlrd = cxlrd;
>>> + ctx->max_hpa = max;
>>> + }
>>> + return 0;
>>> +}
>>> +
>>> +/**
>>> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
>>> + * @cxlmd: the mem device requiring the HPA
>>> + * @interleave_ways: number of entries in @host_bridges
>>> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
>>> + * @max_avail_contig: output parameter of max contiguous bytes available in the
>>> + * returned decoder
>>> + *
>>> + * Returns a pointer to a struct cxl_root_decoder
>>> + *
>>> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
>>> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
>>> + * caller goes to use this decoder and its capacity is reduced then caller needs
>>> + * to loop and retry.
>>> + *
>>> + * The returned root decoder has an elevated reference count that needs to be
>>> + * put with cxl_put_root_decoder(cxlrd).
>>> + */
>>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>>> + int interleave_ways,
>>> + unsigned long flags,
>>> + resource_size_t *max_avail_contig)
>>> +{
>>> + struct cxlrd_max_context ctx = {
>>> + .flags = flags,
>>> + .interleave_ways = interleave_ways,
>>> + };
>>> + struct cxl_port *root_port;
>>> + struct cxl_port *endpoint;
>>> +
>>> + endpoint = cxlmd->endpoint;
>>> + if (!endpoint) {
>>> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
>>> + return ERR_PTR(-ENXIO);
>>> + }
>>> +
>>> + ctx.host_bridges = &endpoint->host_bridge;
>> Mentioned earlier, interleave_ways is effectively hardcoded to 1 (unless I'm misunderstanding
>> something). I think what you want here is to go to the CXL root and pass in the children (i.e. host bridges)?
>> I'm not sure of what the fix is to get the intended behavior.
>>
>> It may be worth getting rid of the interleave_ways portion of this function and
>> add it later when someone needs it. You could also explain it's hard coded to 1/unused
>> in the doc comment if you know of an immediate need for it.
>>
>>> +
>>> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
>>> + if (!root) {
>>> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
>>> + return ERR_PTR(-ENXIO);
>>> + }
>>> +
>>> + root_port = &root->port;
>>> + scoped_guard(rwsem_read, &cxl_rwsem.region)
>>> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
>> Can just use a guard() here.
>>
>>> +
>>> + if (!ctx.cxlrd)
>>> + return ERR_PTR(-ENOMEM);
>>> +
>>> + *max_avail_contig = ctx.max_hpa;
>>> + return ctx.cxlrd;
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
>>> +
>>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
>>> +{
>>> + put_device(cxlrd_dev(cxlrd));
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
>>> +
>>> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
>>> const char *buf, size_t len)
>>> {
>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>>> index 944c5d1ccceb..c7d9b2c2908f 100644
>>> --- a/drivers/cxl/cxl.h
>>> +++ b/drivers/cxl/cxl.h
>>> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
>>> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
>>> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
>>> bool is_root_decoder(struct device *dev);
>>> +
>>> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
>>> +
>>> bool is_switch_decoder(struct device *dev);
>>> bool is_endpoint_decoder(struct device *dev);
>>> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>>> index 92880c26b2d5..834dc7e78934 100644
>>> --- a/include/cxl/cxl.h
>>> +++ b/include/cxl/cxl.h
>>> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
>>> struct range;
>>> int cxl_get_region_range(struct cxl_region *region, struct range *range);
>>> void cxl_unregister_region(struct cxl_region *cxlr);
>>> +struct cxl_port;
>>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>>> + int interleave_ways,
>>> + unsigned long flags,
>>> + resource_size_t *max);
>>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
>>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
@ 2026-02-19 23:16 ` Dave Jiang
2026-02-21 4:48 ` Gregory Price
1 sibling, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2026-02-19 23:16 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
On 2/1/26 8:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Add cxl_unregister_region() to the accelerator driver API
> for a clean exit.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 17 ++++++++++++-----
> include/cxl/cxl.h | 1 +
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index acf29ba3b205..954b8fcdbac6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2438,9 +2438,8 @@ static struct cxl_region *to_cxl_region(struct device *dev)
> return container_of(dev, struct cxl_region, dev);
> }
>
> -static void unregister_region(void *_cxlr)
> +void cxl_unregister_region(struct cxl_region *cxlr)
> {
> - struct cxl_region *cxlr = _cxlr;
> struct cxl_region_params *p = &cxlr->params;
> int i;
>
> @@ -2457,6 +2456,14 @@ static void unregister_region(void *_cxlr)
> cxl_region_iomem_release(cxlr);
> put_device(&cxlr->dev);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_unregister_region, "CXL");
> +
> +static void __unregister_region(void *_cxlr)
> +{
> + struct cxl_region *cxlr = _cxlr;
> +
> + return cxl_unregister_region(cxlr);
> +}
>
> static struct lock_class_key cxl_region_key;
>
> @@ -2608,7 +2615,7 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> if (rc)
> goto err;
>
> - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
> + rc = devm_add_action_or_reset(port->uport_dev, __unregister_region, cxlr);
> if (rc)
> return ERR_PTR(rc);
>
> @@ -2762,7 +2769,7 @@ static ssize_t delete_region_store(struct device *dev,
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + devm_release_action(port->uport_dev, __unregister_region, cxlr);
> put_device(&cxlr->dev);
>
> return len;
> @@ -3878,7 +3885,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>
> rc = __construct_region(cxlr, cxlrd, cxled);
> if (rc) {
> - devm_release_action(port->uport_dev, unregister_region, cxlr);
> + devm_release_action(port->uport_dev, __unregister_region, cxlr);
> return ERR_PTR(rc);
> }
>
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 906065e0d2a6..92880c26b2d5 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -254,4 +254,5 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> struct cxl_region **cxlr);
> struct range;
> int cxl_get_region_range(struct cxl_region *region, struct range *range);
> +void cxl_unregister_region(struct cxl_region *cxlr);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-19 23:31 ` Dave Jiang
2026-02-20 8:08 ` Alejandro Lucero Palau
2026-03-20 17:25 ` Edward Cree
2 siblings, 1 reply; 67+ messages in thread
From: Dave Jiang @ 2026-02-19 23:31 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
On 2/1/26 8:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Check if device HDM is already committed during firmware/BIOS
> initialization.
>
> A CXL region should exist if so after memdev allocation/initialization.
> Get HPA from region and map it.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 28 +++++++++++++++++++++++++++-
> 1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index a77ef4783fcb..3536eccf1b2a 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -19,6 +19,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> struct efx_cxl *cxl;
> + struct range range;
> u16 dvsec;
> int rc;
>
> @@ -90,13 +91,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> - probe_data->cxl = cxl;
> + cxl->cxled = cxl_get_committed_decoder(cxl->cxlmd, &cxl->efx_region);
> + if (cxl->cxled) {
if (!cxl->cxled)
return 0;
Should save you a level of indent.
DJ
> + if (!cxl->efx_region) {
> + pci_err(pci_dev, "CXL found committed decoder without a region");
> + return -ENODEV;
> + }
> + rc = cxl_get_region_range(cxl->efx_region, &range);
> + if (rc) {
> + pci_err(pci_dev,
> + "CXL getting regions params from a committed decoder failed");
> + return rc;
> + }
> +
> + cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
> + if (!cxl->ctpio_cxl) {
> + pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
> + return -ENOMEM;
> + }
> +
> + probe_data->cxl = cxl;
> + }
>
> return 0;
> }
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> + if (!probe_data->cxl)
> + return;
> +
> + iounmap(probe_data->cxl->ctpio_cxl);
> + cxl_unregister_region(probe_data->cxl->efx_region);
> }
>
> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 21/22] sfc: create cxl region
2026-02-13 16:14 ` [PATCH " Gregory Price
@ 2026-02-20 8:00 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-20 8:00 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl, netdev, linux-kernel
On 2/13/26 16:14, Gregory Price wrote:
> This review was generated by kreview-0811365ff2.
>
> This is not an automated email, the reviewer though this report
> looked valid enough to consider discussion.
>
> ----
> sfc code so won't speak to the internals, but ordering question seems
> legit.
Hi Gregory,
Yes, it makes sense and pointing out to those changes introduced in v22
and mainly in v23.
I'll fix it.
Regarding the below comment, which if I am not wrong comes from kreview,
I think the patchset needs to support both cases and therefore the code
needs to deal with both module exit paths.
Thank you
>
> ~Gregory
> ----
>
> On Sat, Feb 01, 2026, Alejandro Lucero wrote:
>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 2cfd0a46225f..4d5f3974e51d 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> [ ... ]
>
>> @@ -147,11 +155,11 @@ void efx_cxl_exit(struct efx_probe_data *probe_data)
>>
>> if (probe_data->cxl->hdm_was_committed) {
>> iounmap(probe_data->cxl->ctpio_cxl);
>> - cxl_unregister_region(probe_data->cxl->efx_region);
>> } else {
>> cxl_dpa_free(probe_data->cxl->cxled);
>> cxl_put_root_decoder(probe_data->cxl->cxlrd);
>> }
>> + cxl_unregister_region(probe_data->cxl->efx_region);
> In the non-committed path, cxl_dpa_free() is called while the
> endpoint decoder is still attached to the region. Looking at
> cxl_dpa_free() in drivers/cxl/core/hdm.c:
>
> if (cxled->cxld.region) {
> dev_dbg(dev, "decoder assigned to: %s\n",
> dev_name(&cxled->cxld.region->dev));
> return -EBUSY;
> }
>
> Since cxl_unregister_region() has not run yet, cxled->cxld.region
> is still set, and cxl_dpa_free() returns -EBUSY without freeing
> the DPA. The return value is not checked.
>
> Should cxl_unregister_region() be called before cxl_dpa_free()
> and cxl_put_root_decoder() in the else branch, matching the
> reverse order of allocation in efx_cxl_init()?
>
> The cover letter notes that v23 expects committed decoders as the
> primary flow, and uncommitted decoder support is deferred to
> follow-up work, so this else branch may not be reachable in
> practice today. Still worth fixing the ordering now so it
> doesn't bite when the uncommitted path is enabled later.
>
> This issue is not fixed by the remaining commits in the series
> (through 10fe989f9e85).
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH 22/22] sfc: support pio mapping based on cxl
2026-02-13 16:14 ` [PATCH " Gregory Price
@ 2026-02-20 8:04 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-20 8:04 UTC (permalink / raw)
To: Gregory Price; +Cc: linux-cxl, netdev, linux-kernel
On 2/13/26 16:14, Gregory Price wrote:
> This review was generated by kreview-0811365ff2.
>
> This is not an automated email, the reviewer though this report
> looked valid enough to consider discussion.
>
> ----
> I am completely unfamiliar with this code, but the question it poses
> at least seems reasonable.
Yes, and again, it makes sense. We have only tried with one VI, so that
explains why we have not suffered the issue. But it needs to be fixed.
Thanks!
> ~Gregory
> ----
>
> On Sat, Feb 01, 2026, Alejandro Lucero wrote:
>
>> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
>> index fcec81f862ec..2bb6d3136c7c 100644
>> --- a/drivers/net/ethernet/sfc/ef10.c
>> +++ b/drivers/net/ethernet/sfc/ef10.c
> [ ... ]
>
>> @@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
>> iounmap(efx->membase);
>> efx->membase = membase;
>>
>> - /* Set up the WC mapping if needed */
>> - if (wc_mem_map_size) {
>> + if (!wc_mem_map_size)
>> + goto skip_pio;
>> +
>> + /* Set up the WC mapping */
>> +
>> +#ifdef CONFIG_SFC_CXL
>> + probe_data = container_of(efx, struct efx_probe_data, efx);
>> + if ((nic_data->datapath_caps3 &
>> + (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
>> + probe_data->cxl_pio_initialised) {
>> + /* Using PIO through CXL mapping? */
>> + nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
>> + (pio_write_vi_base * efx->vi_stride +
>> + ER_DZ_TX_PIOBUF - uc_mem_map_size);
>> + probe_data->cxl_pio_in_use = true;
>> + } else
>> +#endif
>> + {
>> + /* Using legacy PIO BAR mapping */
>> nic_data->wc_membase = ioremap_wc(efx->membase_phys +
>> uc_mem_map_size,
>> wc_mem_map_size);
>> @@ -1279,12 +1314,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
>> nic_data->wc_membase +
>> (pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
>> uc_mem_map_size);
>> -
>> - rc = efx_ef10_link_piobufs(efx);
>> - if (rc)
>> - efx_ef10_free_piobufs(efx);
>> }
> The CXL path sets nic_data->pio_write_base but does not set
> nic_data->pio_write_vi_base, while the legacy path does:
>
> nic_data->pio_write_vi_base = pio_write_vi_base;
>
> Since nic_data is kzalloc'd, pio_write_vi_base stays at 0 in the CXL
> path. efx_ef10_link_piobufs() then uses nic_data->pio_write_vi_base
> to issue MC_CMD_LINK_PIOBUF commands:
>
> MCDI_SET_DWORD(inbuf, LINK_PIOBUF_IN_TXQ_INSTANCE,
> nic_data->pio_write_vi_base + index);
>
> and also for the special-case check:
>
> if (tx_queue->queue == nic_data->pio_write_vi_base) {
>
> Wouldn't this link PIO buffers to incorrect VI instances when using
> CXL, since the local variable pio_write_vi_base has the correct
> non-zero value but the struct field was never updated?
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-19 23:31 ` Dave Jiang
@ 2026-02-20 8:08 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-20 8:08 UTC (permalink / raw)
To: Dave Jiang, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet
On 2/19/26 23:31, Dave Jiang wrote:
>
> On 2/1/26 8:54 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Check if device HDM is already committed during firmware/BIOS
>> initialization.
>>
>> A CXL region should exist if so after memdev allocation/initialization.
>> Get HPA from region and map it.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 28 +++++++++++++++++++++++++++-
>> 1 file changed, 27 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index a77ef4783fcb..3536eccf1b2a 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -19,6 +19,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> struct efx_cxl *cxl;
>> + struct range range;
>> u16 dvsec;
>> int rc;
>>
>> @@ -90,13 +91,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> - probe_data->cxl = cxl;
>> + cxl->cxled = cxl_get_committed_decoder(cxl->cxlmd, &cxl->efx_region);
>> + if (cxl->cxled) {
> if (!cxl->cxled)
> return 0;
>
> Should save you a level of indent.
Yes, but subsequent patches add the else branch ...
Thanks
>
> DJ
>
>> + if (!cxl->efx_region) {
>> + pci_err(pci_dev, "CXL found committed decoder without a region");
>> + return -ENODEV;
>> + }
>> + rc = cxl_get_region_range(cxl->efx_region, &range);
>> + if (rc) {
>> + pci_err(pci_dev,
>> + "CXL getting regions params from a committed decoder failed");
>> + return rc;
>> + }
>> +
>> + cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
>> + if (!cxl->ctpio_cxl) {
>> + pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
>> + return -ENOMEM;
>> + }
>> +
>> + probe_data->cxl = cxl;
>> + }
>>
>> return 0;
>> }
>>
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> + if (!probe_data->cxl)
>> + return;
>> +
>> + iounmap(probe_data->cxl->ctpio_cxl);
>> + cxl_unregister_region(probe_data->cxl->efx_region);
>> }
>>
>> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
@ 2026-02-20 15:42 ` Dave Jiang
2026-02-26 16:13 ` Alejandro Lucero Palau
2 siblings, 0 replies; 67+ messages in thread
From: Dave Jiang @ 2026-02-20 15:42 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero, Jonathan Cameron
On 2/1/26 8:54 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from Device Physical
> Address (DPA) and assigning it to decode a given Host Physical Address
> (HPA). Before determining how much DPA to allocate the amount of available
> HPA must be determined. Also, not all HPA is created equal, some HPA
> targets RAM, some targets PMEM, some is prepared for device-memory flows
> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 6 ++
> 3 files changed, 173 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 954b8fcdbac6..bdefd088f5f1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t free = 0;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + if ((cxld->flags & ctx->flags) != ctx->flags) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_rwsem.region to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_rwsem.region);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + if (!prev && res->start == cxlrd->res->start &&
> + res->end == cxlrd->res->end) {
> + max = resource_size(cxlrd->res);
> + break;
> + }
Can this block be pulled out of the for loop so it only needs to run once?
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + }
> +
> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - prev->end + 1;
> + max = max(free, max);
> + }
> +
> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(cxlrd_dev(ctx->cxlrd));
> + get_device(cxlrd_dev(cxlrd));
> + ctx->cxlrd = cxlrd;
> + ctx->max_hpa = max;
Is there any chance that ctx->cxlrd == cxlrd? Maybe you can do:
if (ctx->cxlrd && ctx->cxlrd != cxlrd) {
put_device(cxlrd_dev(ctx->cxlrd));
get_device(cxlrd_dev(cxlrd));
ctx->cxlrd = cxlrd;
}
ctx->max_hpa = max;
DJ
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @cxlmd: the mem device requiring the HPA
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this decoder and its capacity is reduced then caller needs
> + * to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxlrd_max_context ctx = {
> + .flags = flags,
> + .interleave_ways = interleave_ways,
> + };
> + struct cxl_port *root_port;
> + struct cxl_port *endpoint;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint) {
> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + ctx.host_bridges = &endpoint->host_bridge;
> +
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(cxlrd_dev(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> +
> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 944c5d1ccceb..c7d9b2c2908f 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
> bool is_root_decoder(struct device *dev);
> +
> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
> +
> bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 92880c26b2d5..834dc7e78934 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> struct range;
> int cxl_get_region_range(struct cxl_region *region, struct range *range);
> void cxl_unregister_region(struct cxl_region *cxlr);
> +struct cxl_port;
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max);
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
2026-02-19 23:16 ` Dave Jiang
@ 2026-02-21 4:48 ` Gregory Price
1 sibling, 0 replies; 67+ messages in thread
From: Gregory Price @ 2026-02-21 4:48 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Sun, Feb 01, 2026 at 03:54:26PM +0000, alejandro.lucero-palau@amd.com wrote:
> -static void unregister_region(void *_cxlr)
> +void cxl_unregister_region(struct cxl_region *cxlr)
> {
> - struct cxl_region *cxlr = _cxlr;
> struct cxl_region_params *p = &cxlr->params;
> int i;
>
> @@ -2457,6 +2456,14 @@ static void unregister_region(void *_cxlr)
> cxl_region_iomem_release(cxlr);
> put_device(&cxlr->dev);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_unregister_region, "CXL");
> +
kreview suggested you probably want this:
void cxl_destroy_region(struct cxl_region *cxlr)
{
struct cxl_port *port = cxlrd_to_port(cxlr->cxlrd);
devm_release_action(port->uport_dev, __unregister_region, cxlr);
}
EXPORT_SYMBOL_NS_GPL(cxl_destroy_region, "CXL");
During testing I experienced some double-releases when doing aggressive
loads and unloads of some drivers. This was one of the fixes.
~Gregory
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-20 15:42 ` Dave Jiang
@ 2026-02-26 16:13 ` Alejandro Lucero Palau
2 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-02-26 16:13 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 2/1/26 15:54, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from Device Physical
> Address (DPA) and assigning it to decode a given Host Physical Address
> (HPA). Before determining how much DPA to allocate the amount of available
> HPA must be determined. Also, not all HPA is created equal, some HPA
> targets RAM, some targets PMEM, some is prepared for device-memory flows
> like HDM-D and HDM-DB, and some is HDM-H (host-only).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 164 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 6 ++
> 3 files changed, 173 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 954b8fcdbac6..bdefd088f5f1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -705,6 +705,170 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t free = 0;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + if ((cxld->flags & ctx->flags) != ctx->flags) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_rwsem.region to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_rwsem.region);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + if (!prev && res->start == cxlrd->res->start &&
> + res->end == cxlrd->res->end) {
> + max = resource_size(cxlrd->res);
> + break;
> + }
When working on sending this patch independently, as I'm doing for
facilitating all this Type2 integration, I did realize the above check
is completely wrong.
FWIW, I did add it in v22 which was a rush job for getting a version
before LPC to be tested by PJ, and although "it works" for the second
time the driver loads after HDMs are reset during driver unload, it is
embarrassingly wrong and only "fixing" the initialization for the second
and subsequent driver loads. The real problem was (I found it later but
not changed here because v23 does not care) the driver unload was to
releasing the resources of the first region created.
It will not be there in the coming patch for this functionality.
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + }
> +
> + if (prev && prev->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - prev->end + 1;
> + max = max(free, max);
> + }
> +
> + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(cxlrd_dev(ctx->cxlrd));
> + get_device(cxlrd_dev(cxlrd));
> + ctx->cxlrd = cxlrd;
> + ctx->max_hpa = max;
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @cxlmd: the mem device requiring the HPA
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this decoder and its capacity is reduced then caller needs
> + * to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxlrd_max_context ctx = {
> + .flags = flags,
> + .interleave_ways = interleave_ways,
> + };
> + struct cxl_port *root_port;
> + struct cxl_port *endpoint;
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint) {
> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + ctx.host_bridges = &endpoint->host_bridge;
> +
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(cxlrd_dev(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> +
> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 944c5d1ccceb..c7d9b2c2908f 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -706,6 +706,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
> bool is_root_decoder(struct device *dev);
> +
> +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
> +
> bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 92880c26b2d5..834dc7e78934 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -255,4 +255,10 @@ struct cxl_endpoint_decoder *cxl_get_committed_decoder(struct cxl_memdev *cxlmd,
> struct range;
> int cxl_get_region_range(struct cxl_region *region, struct range *range);
> void cxl_unregister_region(struct cxl_region *cxlr);
> +struct cxl_port;
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max);
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 00/22] Type2 device basic support
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
` (22 preceding siblings ...)
2026-02-11 22:12 ` [PATCH v23 00/22] Type2 device basic support Cheatham, Benjamin
@ 2026-03-09 22:43 ` PJ Waskiewicz
2026-03-10 14:02 ` Alejandro Lucero Palau
23 siblings, 1 reply; 67+ messages in thread
From: PJ Waskiewicz @ 2026-03-09 22:43 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
Hi Alejandrom
On Sun, 2026-02-01 at 15:54 +0000, alejandro.lucero-palau@amd.com
wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> This patchset should be applied on the cxl next branch using the base
> specified at the end of this cover letter.
>
> Dependencies on Dan's work has gone and also on Terry's as the only
> patch required is now in next. The other dependency is on Smita
> patchset
> but it does not exist such a dependency as that work will not avoid
> the
> problem with Type2 and DAX/hmem if soft reserved memory. This needs
> to
> be solved by the BIOS and Type2 UEFI driver for populating the
> CXL.mem
> range as EFI_RESERVED_TYPE instead of default EFI_CONVENTIONAL_MEMORY
> with the EFI_MEMORY_SP attribute. There exists though a dependency on
> one Smita's patches:
>
> [PATCH v5 3/7] cxl/region: Skip decoder reset on detach for
> autodiscovered regions
>
> This is needed for the default behaviour with current BIOS
> configuration
> where the HDM Type2 decoders will be kept unreset when driver
> unloads.
> This is the main change introduced in v23: committed decoders will
> not
> be reset. Previous v22 functionality supported first driver load
> finding
> committed decoders but resetting them at unload and supporting
> uncommitted decoders in next driver loads. This will be suported in
> follow-up works.
>
> v23 changes:
>
> patch 11: fixing minor issues and droping change in
> should_emulate_decoders (Jonathan Cameron)
>
> patch13: refactoring unregister_region for safety type in Type2 API
>
> sfc changes: slight modifications to error path
>
I've been able to mostly get these to work on a very boiled down
driver. I still need to port these into my full driver stack, but
moving that whole stack (multiple drivers) each time a new API is
proposed was becoming a blocker. So I have a very basic driver that is
testing the interface against our HW for now.
I have a slight issue that I'll address in Patch 8, or at least ask.
In the meantime, I'll start moving things over to our full stack to try
and get the same level of replication.
I'm hopeful I can start adding Tested-by:'s to the patches very soon.
-PJ
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-02-12 9:16 ` Alejandro Lucero Palau
@ 2026-03-09 22:49 ` PJ Waskiewicz
2026-03-10 13:54 ` Alejandro Lucero Palau
2026-03-13 2:03 ` Dan Williams
0 siblings, 2 replies; 67+ messages in thread
From: PJ Waskiewicz @ 2026-03-09 22:49 UTC (permalink / raw)
To: Alejandro Lucero Palau, Cheatham, Benjamin,
alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On Thu, 2026-02-12 at 09:16 +0000, Alejandro Lucero Palau wrote:
>
> On 2/11/26 22:11, Cheatham, Benjamin wrote:
> > On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
> > > From: Alejandro Lucero <alucerop@amd.com>
> > >
> > > A Type2 device configured by the BIOS can already have its HDM
> > > committed. Add a cxl_get_committed_decoder() function for cheking
> > > so after memdev creation. A CXL region should have been created
> > > during memdev initialization, therefore a Type2 driver can ask
> > > for
> > > such a region for working with the HPA. If the HDM is not
> > > committed,
> > > a Type2 driver will create the region after obtaining proper HPA
> > > and DPA space.
> > >
> > > Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> > > ---
> > > drivers/cxl/core/hdm.c | 39
> > > +++++++++++++++++++++++++++++++++++++++
> > > include/cxl/cxl.h | 3 +++
> > > 2 files changed, 42 insertions(+)
> > >
> > > diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> > > index 6e516c69b2d2..a172ce4e9b19 100644
> > > --- a/drivers/cxl/core/hdm.c
> > > +++ b/drivers/cxl/core/hdm.c
> > > @@ -686,6 +686,45 @@ int cxl_dpa_alloc(struct
> > > cxl_endpoint_decoder *cxled, u64 size)
> > > return devm_add_action_or_reset(&port->dev,
> > > cxl_dpa_release, cxled);
> > > }
> > >
> > > +static int find_committed_endpoint_decoder(struct device *dev,
> > > const void *data)
> > > +{
> > > + struct cxl_endpoint_decoder *cxled;
> > > + struct cxl_port *port;
> > > +
> > > + if (!is_endpoint_decoder(dev))
> > > + return 0;
> > > +
> > > + cxled = to_cxl_endpoint_decoder(dev);
> > > + port = cxled_to_port(cxled);
> > > +
> > > + return cxled->cxld.id == port->hdm_end;
> > Is this the way you're supposed to check if a decoder is committed?
> > The doc comment for @hdm_end in
> > struct cxl_port says it's just the last allocated decoder. If
> > allocated decoders are always committed then
> > I'm fine with this, otherwise I think you'd want to a register read
> > or something to find the commit state.
>
>
> Hi Ben,
>
>
> Yes, I think you are right. This works in my tests and it is safe
> because I check the region does exist before using it. But the error
> inside sfc should then not be fatal for cxl sfc initialization and
> fallback to the other cxl initialization possibility.
>
So I'm running into this situation I think.
When you're testing, are you surviving a reload of the driver? Right
now, I can load and successfully create the region0 device. However,
following the same teardown path in SFC, I cannot reload my driver
afterwards and map the region. I get:
cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
<driver> 0000:c1:00.0: CXL found committed decoder without a region
<driver> 0000:c1:00.0: CXL init failed
I'd be surprised if SFC in its current patch state would survive this
same insmod/rmmod/insmod test.
-PJ
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-03-09 22:49 ` PJ Waskiewicz
@ 2026-03-10 13:54 ` Alejandro Lucero Palau
2026-03-13 2:03 ` Dan Williams
1 sibling, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-10 13:54 UTC (permalink / raw)
To: PJ Waskiewicz, Cheatham, Benjamin, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 3/9/26 22:49, PJ Waskiewicz wrote:
> On Thu, 2026-02-12 at 09:16 +0000, Alejandro Lucero Palau wrote:
>> On 2/11/26 22:11, Cheatham, Benjamin wrote:
>>> On 2/1/2026 9:54 AM, alejandro.lucero-palau@amd.com wrote:
>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>
>>>> A Type2 device configured by the BIOS can already have its HDM
>>>> committed. Add a cxl_get_committed_decoder() function for cheking
>>>> so after memdev creation. A CXL region should have been created
>>>> during memdev initialization, therefore a Type2 driver can ask
>>>> for
>>>> such a region for working with the HPA. If the HDM is not
>>>> committed,
>>>> a Type2 driver will create the region after obtaining proper HPA
>>>> and DPA space.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>>> ---
>>>> drivers/cxl/core/hdm.c | 39
>>>> +++++++++++++++++++++++++++++++++++++++
>>>> include/cxl/cxl.h | 3 +++
>>>> 2 files changed, 42 insertions(+)
>>>>
>>>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>>>> index 6e516c69b2d2..a172ce4e9b19 100644
>>>> --- a/drivers/cxl/core/hdm.c
>>>> +++ b/drivers/cxl/core/hdm.c
>>>> @@ -686,6 +686,45 @@ int cxl_dpa_alloc(struct
>>>> cxl_endpoint_decoder *cxled, u64 size)
>>>> return devm_add_action_or_reset(&port->dev,
>>>> cxl_dpa_release, cxled);
>>>> }
>>>>
>>>> +static int find_committed_endpoint_decoder(struct device *dev,
>>>> const void *data)
>>>> +{
>>>> + struct cxl_endpoint_decoder *cxled;
>>>> + struct cxl_port *port;
>>>> +
>>>> + if (!is_endpoint_decoder(dev))
>>>> + return 0;
>>>> +
>>>> + cxled = to_cxl_endpoint_decoder(dev);
>>>> + port = cxled_to_port(cxled);
>>>> +
>>>> + return cxled->cxld.id == port->hdm_end;
>>> Is this the way you're supposed to check if a decoder is committed?
>>> The doc comment for @hdm_end in
>>> struct cxl_port says it's just the last allocated decoder. If
>>> allocated decoders are always committed then
>>> I'm fine with this, otherwise I think you'd want to a register read
>>> or something to find the commit state.
>>
>> Hi Ben,
>>
>>
>> Yes, I think you are right. This works in my tests and it is safe
>> because I check the region does exist before using it. But the error
>> inside sfc should then not be fatal for cxl sfc initialization and
>> fallback to the other cxl initialization possibility.
>>
> So I'm running into this situation I think.
>
> When you're testing, are you surviving a reload of the driver? Right
> now, I can load and successfully create the region0 device. However,
> following the same teardown path in SFC, I cannot reload my driver
> afterwards and map the region. I get:
>
> cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
> <driver> 0000:c1:00.0: CXL found committed decoder without a region
> <driver> 0000:c1:00.0: CXL init failed
>
> I'd be surprised if SFC in its current patch state would survive this
> same insmod/rmmod/insmod test.
Yes, I can load and unload the sfc driver and always getting the HDM
decoder committed, what was the purpose behind v23.
I wonder if you applied the patch from smita's series:
https://lore.kernel.org/linux-cxl/20260210064501.157591-1-Smita.KoralahalliChannabasappa@amd.com/T/#mdad81d3817def8baace77ead9e2e305e775cf51d
If you did, then not sure what could be happening. Could you post here
what you see at /sys/bus/cxl/devices after loading your driver, and
after unloading it?
>
> -PJ
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 00/22] Type2 device basic support
2026-03-09 22:43 ` PJ Waskiewicz
@ 2026-03-10 14:02 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-10 14:02 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 3/9/26 22:43, PJ Waskiewicz wrote:
> Hi Alejandrom
>
> On Sun, 2026-02-01 at 15:54 +0000, alejandro.lucero-palau@amd.com
> wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> This patchset should be applied on the cxl next branch using the base
>> specified at the end of this cover letter.
>>
>> Dependencies on Dan's work has gone and also on Terry's as the only
>> patch required is now in next. The other dependency is on Smita
>> patchset
>> but it does not exist such a dependency as that work will not avoid
>> the
>> problem with Type2 and DAX/hmem if soft reserved memory. This needs
>> to
>> be solved by the BIOS and Type2 UEFI driver for populating the
>> CXL.mem
>> range as EFI_RESERVED_TYPE instead of default EFI_CONVENTIONAL_MEMORY
>> with the EFI_MEMORY_SP attribute. There exists though a dependency on
>> one Smita's patches:
>>
>> [PATCH v5 3/7] cxl/region: Skip decoder reset on detach for
>> autodiscovered regions
>>
>> This is needed for the default behaviour with current BIOS
>> configuration
>> where the HDM Type2 decoders will be kept unreset when driver
>> unloads.
>> This is the main change introduced in v23: committed decoders will
>> not
>> be reset. Previous v22 functionality supported first driver load
>> finding
>> committed decoders but resetting them at unload and supporting
>> uncommitted decoders in next driver loads. This will be suported in
>> follow-up works.
>>
>> v23 changes:
>>
>> patch 11: fixing minor issues and droping change in
>> should_emulate_decoders (Jonathan Cameron)
>>
>> patch13: refactoring unregister_region for safety type in Type2 API
>>
>> sfc changes: slight modifications to error path
>>
> I've been able to mostly get these to work on a very boiled down
> driver. I still need to port these into my full driver stack, but
> moving that whole stack (multiple drivers) each time a new API is
> proposed was becoming a blocker. So I have a very basic driver that is
> testing the interface against our HW for now.
>
> I have a slight issue that I'll address in Patch 8, or at least ask.
>
> In the meantime, I'll start moving things over to our full stack to try
> and get the same level of replication.
>
> I'm hopeful I can start adding Tested-by:'s to the patches very soon.
Hi PJ,
I'm afraid it will not be useful ... as a new approach is being followed
now, with some of the patches sent independently. I have already sent 5
in two patchsets, which I hope will be applied soon after getting
reviews and minor fixes. Working now on sending at least two other
patchsets ahead. The idea is to keep all the sfc changes to a last
patchset with only minor changes to the cxl core then.
Anyway, thank you for testing this last full patchset and the feedback
given.
>
> -PJ
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-03-09 22:49 ` PJ Waskiewicz
2026-03-10 13:54 ` Alejandro Lucero Palau
@ 2026-03-13 2:03 ` Dan Williams
2026-03-13 13:10 ` Alejandro Lucero Palau
1 sibling, 1 reply; 67+ messages in thread
From: Dan Williams @ 2026-03-13 2:03 UTC (permalink / raw)
To: PJ Waskiewicz, Alejandro Lucero Palau, Cheatham, Benjamin,
alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
PJ Waskiewicz wrote:
[..]
> > Yes, I think you are right. This works in my tests and it is safe
> > because I check the region does exist before using it. But the error
> > inside sfc should then not be fatal for cxl sfc initialization and
> > fallback to the other cxl initialization possibility.
> >
>
> So I'm running into this situation I think.
>
> When you're testing, are you surviving a reload of the driver? Right
> now, I can load and successfully create the region0 device. However,
> following the same teardown path in SFC, I cannot reload my driver
> afterwards and map the region. I get:
>
> cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
> <driver> 0000:c1:00.0: CXL found committed decoder without a region
> <driver> 0000:c1:00.0: CXL init failed
>
> I'd be surprised if SFC in its current patch state would survive this
> same insmod/rmmod/insmod test.
So over here [1] I reviewed Smita's patch to stop resetting decoders by
default if they were part of region auto-assembly. While that stops
resetting the decoders it does not allow the device to get a hint of
where it should place its HPAs if the decoders get reset while the
driver is detached.
I am going to draft some patches to allow an accelerator to mark an
address range as "designated" so that it can recall the memory it was
assigned by boot firmware.
This also dovetails with the conversation I had with Paul Blinzer at
Plumbers about an ability to designate Soft Reserve memory. So a generic
facility to designate memory allows accelerators to recall their address
range if the decoders ever lose their configuration. It also tells the
rest of the CXL subsystem "hands off, this range was accelerator
designated by platform firmware".
[1]: http://lore.kernel.org/69b1e0aacb9d0_2132100c5@dwillia2-mobl4.notmuch
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-03-13 2:03 ` Dan Williams
@ 2026-03-13 13:10 ` Alejandro Lucero Palau
2026-03-16 14:33 ` Alejandro Lucero Palau
0 siblings, 1 reply; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-13 13:10 UTC (permalink / raw)
To: Dan Williams, PJ Waskiewicz, Cheatham, Benjamin,
alejandro.lucero-palau
Cc: linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 3/13/26 02:03, Dan Williams wrote:
> PJ Waskiewicz wrote:
> [..]
>>> Yes, I think you are right. This works in my tests and it is safe
>>> because I check the region does exist before using it. But the error
>>> inside sfc should then not be fatal for cxl sfc initialization and
>>> fallback to the other cxl initialization possibility.
>>>
>> So I'm running into this situation I think.
>>
>> When you're testing, are you surviving a reload of the driver? Right
>> now, I can load and successfully create the region0 device. However,
>> following the same teardown path in SFC, I cannot reload my driver
>> afterwards and map the region. I get:
>>
>> cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
>> <driver> 0000:c1:00.0: CXL found committed decoder without a region
>> <driver> 0000:c1:00.0: CXL init failed
>>
>> I'd be surprised if SFC in its current patch state would survive this
>> same insmod/rmmod/insmod test.
> So over here [1] I reviewed Smita's patch to stop resetting decoders by
> default if they were part of region auto-assembly. While that stops
> resetting the decoders it does not allow the device to get a hint of
> where it should place its HPAs if the decoders get reset while the
> driver is detached.
That is already what type2 support is about, and what was from the
beginning: to get an hpa from the root decoder. The HPA will be found
when the driver loads and the memdev is created and when the related
region is going to need such HPA, and based on what is free there.
Before v22 that was the only case contemplated, assuming the BIOS would
not configure the device decoders. v22 added support for getting the
region from autodiscovery if the decoders were committed, and v23 was
for not resetting those decoders if that was the case when the driver
unloads.
I'm pretty sure what Type2 pre-v22, v22 or v23 do in this regard is not
perfect (v23 was a quick hack for PJ to test the new functionality you
demanded), in fact I'm changing the way hpa is allocated for Type2
because after Gregory's concurrency tests and pmem patchset, I really
think the approach needs to change. But as I said in Smita's review, you
are precluding the basic stuff with your never-ending "improvements".
You are not in a better position than me to have an opinion of what
Type2 drivers need, and your comment is this thread is just a lack of
respect to me. Yes, it is a blunt assertion, and I will repeat it as
many times as necessary.
>
> I am going to draft some patches to allow an accelerator to mark an
> address range as "designated" so that it can recall the memory it was
> assigned by boot firmware.
If you do so, I will start seriously about passing this work to another
engineer, not necessarily from AMD.
>
> This also dovetails with the conversation I had with Paul Blinzer at
> Plumbers about an ability to designate Soft Reserve memory. So a generic
> facility to designate memory allows accelerators to recall their address
> range if the decoders ever lose their configuration. It also tells the
> rest of the CXL subsystem "hands off, this range was accelerator
> designated by platform firmware".
>
> [1]: http://lore.kernel.org/69b1e0aacb9d0_2132100c5@dwillia2-mobl4.notmuch
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder
2026-03-13 13:10 ` Alejandro Lucero Palau
@ 2026-03-16 14:33 ` Alejandro Lucero Palau
0 siblings, 0 replies; 67+ messages in thread
From: Alejandro Lucero Palau @ 2026-03-16 14:33 UTC (permalink / raw)
To: Dan Williams, PJ Waskiewicz, Cheatham, Benjamin,
alejandro.lucero-palau
Cc: linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 3/13/26 13:10, Alejandro Lucero Palau wrote:
>
> On 3/13/26 02:03, Dan Williams wrote:
>> PJ Waskiewicz wrote:
>> [..]
>>>> Yes, I think you are right. This works in my tests and it is safe
>>>> because I check the region does exist before using it. But the error
>>>> inside sfc should then not be fatal for cxl sfc initialization and
>>>> fallback to the other cxl initialization possibility.
>>>>
>>> So I'm running into this situation I think.
>>>
>>> When you're testing, are you surviving a reload of the driver? Right
>>> now, I can load and successfully create the region0 device. However,
>>> following the same teardown path in SFC, I cannot reload my driver
>>> afterwards and map the region. I get:
>>>
>>> cxl_port endpoint5: failed to attach decoder5 to region0: -6 (ENXIO)
>>> <driver> 0000:c1:00.0: CXL found committed decoder without a region
>>> <driver> 0000:c1:00.0: CXL init failed
>>>
>>> I'd be surprised if SFC in its current patch state would survive this
>>> same insmod/rmmod/insmod test.
>> So over here [1] I reviewed Smita's patch to stop resetting decoders by
>> default if they were part of region auto-assembly. While that stops
>> resetting the decoders it does not allow the device to get a hint of
>> where it should place its HPAs if the decoders get reset while the
>> driver is detached.
>
>
> That is already what type2 support is about, and what was from the
> beginning: to get an hpa from the root decoder. The HPA will be found
> when the driver loads and the memdev is created and when the related
> region is going to need such HPA, and based on what is free there.
> Before v22 that was the only case contemplated, assuming the BIOS
> would not configure the device decoders. v22 added support for getting
> the region from autodiscovery if the decoders were committed, and v23
> was for not resetting those decoders if that was the case when the
> driver unloads.
>
>
> I'm pretty sure what Type2 pre-v22, v22 or v23 do in this regard is
> not perfect (v23 was a quick hack for PJ to test the new
> functionality you demanded), in fact I'm changing the way hpa is
> allocated for Type2 because after Gregory's concurrency tests and pmem
> patchset, I really think the approach needs to change. But as I said
> in Smita's review, you are precluding the basic stuff with your
> never-ending "improvements". You are not in a better position than me
> to have an opinion of what Type2 drivers need, and your comment is
> this thread is just a lack of respect to me. Yes, it is a blunt
> assertion, and I will repeat it as many times as necessary.
>
>
After looking at the series proposing DVSEC save/restore for supporting
device resets, I think I misunderstood your comment here, and if so,
you want to address such a reset and not the HDM reset triggered by
software ...
>>
>> I am going to draft some patches to allow an accelerator to mark an
>> address range as "designated" so that it can recall the memory it was
>> assigned by boot firmware.
>
>
> If you do so, I will start seriously about passing this work to
> another engineer, not necessarily from AMD.
>
so this rant is missing the point, and I have to apologize. Once I have
said that, I neither understand why you are proposing something that
reset series will avoid or would set the path to support the case you
have in mind, nor why are you mention it in this thread. Is it because
supporting that reset is a requirement for type2 support? I have been
aware of having to deal with this but not as a priority or part of the
basic support. If that is what you want, why did not you say so time ago?
>
>>
>> This also dovetails with the conversation I had with Paul Blinzer at
>> Plumbers about an ability to designate Soft Reserve memory. So a generic
>> facility to designate memory allows accelerators to recall their address
>> range if the decoders ever lose their configuration. It also tells the
>> rest of the CXL subsystem "hands off, this range was accelerator
>> designated by platform firmware".
>>
>> [1]:
>> http://lore.kernel.org/69b1e0aacb9d0_2132100c5@dwillia2-mobl4.notmuch
>
>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 04/22] cxl/sfc: Map cxl component regs
2026-02-01 15:54 ` [PATCH v23 04/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
@ 2026-03-20 17:22 ` Edward Cree
0 siblings, 0 replies; 67+ messages in thread
From: Edward Cree @ 2026-03-20 17:22 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Ben Cheatham
On 01/02/2026 15:54, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Export cxl core functions for a Type2 driver being able to discover and
> map the device component registers.
>
> Use it in sfc driver cxl initialization.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox
2026-02-01 15:54 ` [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
@ 2026-03-20 17:24 ` Edward Cree
0 siblings, 0 replies; 67+ messages in thread
From: Edward Cree @ 2026-03-20 17:24 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
On 01/02/2026 15:54, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DPA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile hardware partition.
>
> Move related functions to memdev.
>
> Add sfc driver as the client.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 23:31 ` Dave Jiang
@ 2026-03-20 17:25 ` Edward Cree
2 siblings, 0 replies; 67+ messages in thread
From: Edward Cree @ 2026-03-20 17:25 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
On 01/02/2026 15:54, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Check if device HDM is already committed during firmware/BIOS
> initialization.
>
> A CXL region should exist if so after memdev allocation/initialization.
> Get HPA from region and map it.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2026-03-20 17:25 UTC | newest]
Thread overview: 67+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-01 15:54 [PATCH v23 00/22] Type2 device basic support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 01/22] cxl: Add type2 " alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 8:52 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 02/22] sfc: add cxl support alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 03/22] cxl: Move pci generic code alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 04/22] cxl/sfc: Map cxl component regs alejandro.lucero-palau
2026-03-20 17:22 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 05/22] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2026-03-20 17:24 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 06/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 07/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 08/22] cxl/hdm: Add support for getting region from committed decoder alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-12 9:16 ` Alejandro Lucero Palau
2026-03-09 22:49 ` PJ Waskiewicz
2026-03-10 13:54 ` Alejandro Lucero Palau
2026-03-13 2:03 ` Dan Williams
2026-03-13 13:10 ` Alejandro Lucero Palau
2026-03-16 14:33 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 09/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 10/22] cxl: Export function for unwinding cxl by accelerators alejandro.lucero-palau
2026-02-19 23:16 ` Dave Jiang
2026-02-21 4:48 ` Gregory Price
2026-02-01 15:54 ` [PATCH v23 11/22] sfc: obtain decoder and region if committed by firmware alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 8:55 ` Alejandro Lucero Palau
2026-02-19 23:31 ` Dave Jiang
2026-02-20 8:08 ` Alejandro Lucero Palau
2026-03-20 17:25 ` Edward Cree
2026-02-01 15:54 ` [PATCH v23 12/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 9:58 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
2026-02-20 15:42 ` Dave Jiang
2026-02-26 16:13 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 13/22] sfc: get root decoder alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 14/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-11 22:12 ` Cheatham, Benjamin
2026-02-19 10:26 ` Alejandro Lucero Palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-16 12:34 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 15/22] sfc: get endpoint decoder alejandro.lucero-palau
2026-02-01 15:54 ` [PATCH v23 16/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 17/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:40 ` Alejandro Lucero Palau
2026-02-19 17:29 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 18/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-01 15:54 ` [PATCH v23 19/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2026-02-11 22:11 ` Cheatham, Benjamin
2026-02-19 10:48 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 20/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2026-02-11 22:10 ` Cheatham, Benjamin
2026-02-19 10:50 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 21/22] sfc: create cxl region alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-20 8:00 ` Alejandro Lucero Palau
2026-02-01 15:54 ` [PATCH v23 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2026-02-13 16:14 ` [PATCH " Gregory Price
2026-02-20 8:04 ` Alejandro Lucero Palau
2026-02-11 22:12 ` [PATCH v23 00/22] Type2 device basic support Cheatham, Benjamin
2026-03-09 22:43 ` PJ Waskiewicz
2026-03-10 14:02 ` Alejandro Lucero Palau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox