* [PATCH v17 00/22] Type2 device basic support
@ 2025-06-24 14:13 alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
` (23 more replies)
0 siblings, 24 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
v17 changes: (Dan Williams review)
- use devm for cxl_dev_state allocation
- using current cxl struct for checking capability registers found by
the driver.
- simplify dpa initialization without a mailbox not supporting pmem
- add cxl_acquire_endpoint for protection during initialization
- add callback/action to cxl_create_region for a driver notified about cxl
core kernel modules removal.
- add sfc function to disable CXL-based PIO buffers if such a callback
is invoked.
- Always manage a Type2 created region as private not allowing DAX.
v16 changes:
- rebase against rc4 (Dave Jiang)
- remove duplicate line (Ben Cheatham)
v15 changes:
- remove reference to unused header file (Jonathan Cameron)
- add proper kernel docs to exported functions (Alison Schofield)
- using an array to map the enums to strings (Alison Schofield)
- clarify comment when using bitmap_subset (Jonathan Cameron)
- specify link to type2 support in all patches (Alison Schofield)
Patches changed (minor): 4, 11
v14 changes:
- static null initialization of bitmaps (Jonathan Cameron)
- Fixing cxl tests (Alison Schofield)
- Fixing robot compilation problems
Patches changed (minor): 1, 4, 6, 13
v13 changes:
- using names for headers checking more consistent (Jonathan Cameron)
- using helper for caps bit setting (Jonathan Cameron)
- provide generic function for reporting missing capabilities (Jonathan Cameron)
- rename cxl_pci_setup_memdev_regs to cxl_pci_accel_setup_memdev_regs (Jonathan Cameron)
- cxl_dpa_info size to be set by the Type2 driver (Jonathan Cameron)
- avoiding rc variable when possible (Jonathan Cameron)
- fix spelling (Simon Horman)
- use scoped_guard (Dave Jiang)
- use enum instead of bool (Dave Jiang)
- dropping patch with hardware symbols
v12 changes:
- use new macro cxl_dev_state_create in pci driver (Ben Cheatham)
- add public/private sections in now exported cxl_dev_state struct (Ben
Cheatham)
- fix cxl/pci.h regarding file name for checking if defined
- Clarify capabilities found vs expected in error message. (Ben
Cheatham)
- Clarify new CXL_DECODER_F flag (Ben Cheatham)
- Fix changes about cxl memdev creation support moving code to the
proper patch. (Ben Cheatham)
- Avoid debug and function duplications (Ben Cheatham)
- Fix robot compilation error reported by Simon Horman as well.
- Add doc about new param in clx_create_region (Simon Horman).
v11 changes:
- Dropping the use of cxl_memdev_state and going back to using
cxl_dev_state.
- Using a helper for an accel driver to allocate its own cxl-related
struct embedding cxl_dev_state.
- Exporting the required structs in include/cxl/cxl.h for an accel
driver being able to know the cxl_dev_state size required in the
previously mentioned helper for allocation.
- Avoid using any struct for dpa initialization by the accel driver
adding a specific function for creating dpa partitions by accel
drivers without a mailbox.
v10 changes:
- Using cxl_memdev_state instead of cxl_dev_state for type2 which has a
memory after all and facilitates the setup.
- Adapt core for using cxl_memdev_state allowing accel drivers to work
with them without further awareness of internal cxl structs.
- Using last DPA changes for creating DPA partitions with accel driver
hardcoding mds values when no mailbox.
- capabilities not a new field but built up when current register maps
is performed and returned to the caller for checking.
- HPA free space supporting interleaving.
- DPA free space droping max-min for a simple alloc size.
v9 changes:
- adding forward definitions (Jonathan Cameron)
- using set_bit instead of bitmap_set (Jonathan Cameron)
- fix rebase problem (Jonathan Cameron)
- Improve error path (Jonathan Cameron)
- fix build problems with cxl region dependency (robot)
- fix error path (Simon Horman)
v8 changes:
- Change error path labeling inside sfc cxl code (Edward Cree)
- Properly handling checks and error in sfc cxl code (Simon Horman)
- Fix bug when checking resource_size (Simon Horman)
- Avoid bisect problems reordering patches (Edward Cree)
- Fix buffer allocation size in sfc (Simon Horman)
v7 changes:
- fixing kernel test robot complains
- fix type with Type3 mandatory capabilities (Zhi Wang)
- optimize code in cxl_request_resource (Kalesh Anakkur Purayil)
- add sanity check when dealing with resources arithmetics (Fan Ni)
- fix typos and blank lines (Fan Ni)
- keep previous log errors/warnings in sfc driver (Martin Habets)
- add WARN_ON_ONCE if region given is NULL
v6 changes:
- update sfc mcdi_pcol.h with full hardware changes most not related to
this patchset. This is an automatic file created from hardware design
changes and not touched by software. It is updated from time to time
and it required update for the sfc driver CXL support.
- remove CXL capabilities definitions not used by the patchset or
previous kernel code. (Dave Jiang, Jonathan Cameron)
- Use bitmap_subset instead of reinventing the wheel ... (Ben Cheatham)
- Use cxl_accel_memdev for new device_type created (Ben Cheatham)
- Fix construct_region use of rwsem (Zhi Wang)
- Obtain region range instead of region params (Allison Schofield, Dave
Jiang)
v5 changes:
- Fix SFC configuration based on kernel CXL configuration
- Add subset check for capabilities.
- fix region creation when HDM decoders programmed by firmware/BIOS (Ben
Cheatham)
- Add option for creating dax region based on driver decission (Ben
Cheatham)
- Using sfc probe_data struct for keeping sfc cxl data
v4 changes:
- Use bitmap for capabilities new field (Jonathan Cameron)
- Use cxl_mem attributes for sysfs based on device type (Dave Jian)
- Add conditional cxl sfc compilation relying on kernel CXL config (kernel test robot)
- Add sfc changes in different patches for facilitating backport (Jonathan Cameron)
- Remove patch for dealing with cxl modules dependencies and using sfc kconfig plus
MODULE_SOFTDEP instead.
v3 changes:
- cxl_dev_state not defined as opaque but only manipulated by accel drivers
through accessors.
- accessors names not identified as only for accel drivers.
- move pci code from pci driver (drivers/cxl/pci.c) to generic pci code
(drivers/cxl/core/pci.c).
- capabilities field from u8 to u32 and initialised by CXL regs discovering
code.
- add capabilities check and removing current check by CXL regs discovering
code.
- Not fail if CXL Device Registers not found. Not mandatory for Type2.
- add timeout in acquire_endpoint for solving a race with the endpoint port
creation.
- handle EPROBE_DEFER by sfc driver.
- Limiting interleave ways to 1 for accel driver HPA/DPA requests.
- factoring out interleave ways and granularity helpers from type2 region
creation patch.
- restricting region_creation for type2 to one endpoint decoder.
- add accessor for release_resource.
- handle errors and errors messages properly.
v2 changes:
I have removed the introduction about the concerns with BIOS/UEFI after the
discussion leading to confirm the need of the functionality implemented, at
least is some scenarios.
There are two main changes from the RFC:
1) Following concerns about drivers using CXL core without restrictions, the CXL
struct to work with is opaque to those drivers, therefore functions are
implemented for modifying or reading those structs indirectly.
2) The driver for using the added functionality is not a test driver but a real
one: the SFC ethernet network driver. It uses the CXL region mapped for PIO
buffers instead of regions inside PCIe BARs.
RFC:
Current CXL kernel code is focused on supporting Type3 CXL devices, aka memory
expanders. Type2 CXL devices, aka device accelerators, share some functionalities
but require some special handling.
First of all, Type2 are by definition specific to drivers doing something and not just
a memory expander, so it is expected to work with the CXL specifics. This implies the CXL
setup needs to be done by such a driver instead of by a generic CXL PCI driver
as for memory expanders. Most of such setup needs to use current CXL core code
and therefore needs to be accessible to those vendor drivers. This is accomplished
exporting opaque CXL structs and adding and exporting functions for working with
those structs indirectly.
Some of the patches are based on a patchset sent by Dan Williams [1] which was just
partially integrated, most related to making things ready for Type2 but none
related to specific Type2 support. Those patches based on Dan´s work have Dan´s
signing as co-developer, and a link to the original patch.
A final note about CXL.cache is needed. This patchset does not cover it at all,
although the emulated Type2 device advertises it. From the kernel point of view
supporting CXL.cache will imply to be sure the CXL path supports what the Type2
device needs. A device accelerator will likely be connected to a Root Switch,
but other configurations can not be discarded. Therefore the kernel will need to
check not just HPA, DPA, interleave and granularity, but also the available
CXL.cache support and resources in each switch in the CXL path to the Type2
device. I expect to contribute to this support in the following months, and
it would be good to discuss about it when possible.
[1] https://lore.kernel.org/linux-cxl/98b1f61a-e6c2-71d4-c368-50d958501b0c@intel.com/T/
Alejandro Lucero (22):
cxl: Add type2 device basic support
sfc: add cxl support
cxl: Move pci generic code
cxl: allow Type2 drivers to map cxl component regs
sfc: setup cxl component regs and set media ready
cxl: Support dpa initialization without a mailbox
sfc: initialize dpa
cxl: Prepare memdev creation for type2
sfc: create type2 cxl memdev
cx/memdev: Indicate probe deferral
cxl: Define a driver interface for HPA free space enumeration
sfc: get endpoint decoder
cxl: Define a driver interface for DPA allocation
sfc: get endpoint decoder
cxl: Make region type based on endpoint type
cxl/region: Factor out interleave ways setup
cxl/region: Factor out interleave granularity setup
cxl: Allow region creation by type2 drivers
cxl: Avoid dax creation for accelerators
sfc: create cxl region
cxl: Add function for obtaining region range
sfc: support pio mapping based on cxl
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/hdm.c | 93 ++++++
drivers/cxl/core/mbox.c | 29 +-
drivers/cxl/core/memdev.c | 89 ++++-
drivers/cxl/core/pci.c | 63 ++++
drivers/cxl/core/port.c | 3 +-
drivers/cxl/core/region.c | 446 +++++++++++++++++++++++---
drivers/cxl/core/regs.c | 2 +-
drivers/cxl/cxl.h | 111 +------
drivers/cxl/cxlmem.h | 87 +----
drivers/cxl/cxlpci.h | 31 --
drivers/cxl/mem.c | 32 +-
drivers/cxl/pci.c | 87 +----
drivers/cxl/port.c | 5 +-
drivers/net/ethernet/sfc/Kconfig | 10 +
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/ef10.c | 62 +++-
drivers/net/ethernet/sfc/efx.c | 15 +-
drivers/net/ethernet/sfc/efx.h | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 181 +++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++
drivers/net/ethernet/sfc/net_driver.h | 12 +
drivers/net/ethernet/sfc/nic.h | 3 +
include/cxl/cxl.h | 262 +++++++++++++++
include/cxl/pci.h | 51 +++
tools/testing/cxl/Kbuild | 1 -
tools/testing/cxl/test/mem.c | 3 +-
tools/testing/cxl/test/mock.c | 17 -
28 files changed, 1347 insertions(+), 392 deletions(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
--
2.34.1
^ permalink raw reply [flat|nested] 112+ messages in thread
* [PATCH v17 01/22] cxl: Add type2 device basic support
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-25 14:06 ` Jonathan Cameron
2025-07-25 21:46 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
` (22 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Differentiate CXL memory expanders (type 3) from CXL device accelerators
(type 2) with a new function for initializing cxl_dev_state and a macro
for helping accel drivers to embed cxl_dev_state inside a private
struct.
Move structs to include/cxl as the size of the accel driver private
struct embedding cxl_dev_state needs to know the size of this struct.
Use same new initialization with the type3 pci driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/mbox.c | 12 +-
drivers/cxl/core/memdev.c | 32 +++++
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxl.h | 97 +--------------
drivers/cxl/cxlmem.h | 85 +------------
drivers/cxl/cxlpci.h | 21 ----
drivers/cxl/pci.c | 17 +--
include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
include/cxl/pci.h | 23 ++++
tools/testing/cxl/test/mem.c | 3 +-
11 files changed, 303 insertions(+), 215 deletions(-)
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..d78f6039f997 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1484,23 +1484,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec)
{
struct cxl_memdev_state *mds;
int rc;
- mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
+ mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
+ dvsec, struct cxl_memdev_state, cxlds,
+ true);
if (!mds) {
dev_err(dev, "No memory available\n");
return ERR_PTR(-ENOMEM);
}
mutex_init(&mds->event.log_lock);
- mds->cxlds.dev = dev;
- mds->cxlds.reg_map.host = dev;
- mds->cxlds.cxl_mbox.host = dev;
- mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
- mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
if (rc == -EOPNOTSUPP)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index a16a5886d40a..c73582d24dd7 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -633,6 +633,38 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
+static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+ enum cxl_devtype type, u64 serial, u16 dvsec,
+ bool has_mbox)
+{
+ *cxlds = (struct cxl_dev_state) {
+ .dev = dev,
+ .type = type,
+ .serial = serial,
+ .cxl_dvsec = dvsec,
+ .reg_map.host = dev,
+ .reg_map.resource = CXL_RESOURCE_NONE,
+ };
+
+ if (has_mbox)
+ cxlds->cxl_mbox.host = dev;
+}
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox)
+{
+ struct cxl_dev_state *cxlds = devm_kzalloc(dev, size, GFP_KERNEL);
+
+ if (!cxlds)
+ return NULL;
+
+ cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
+ return cxlds;
+}
+EXPORT_SYMBOL_NS_GPL(_devm_cxl_dev_state_create, "CXL");
+
static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
const struct file_operations *fops)
{
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 3b80e9a76ba8..0eb339c91413 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -7,6 +7,7 @@
#include <linux/pci.h>
#include <linux/pci-doe.h>
#include <linux/aer.h>
+#include <cxl/pci.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..ecdb22ae6952 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <pmu.h>
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a9ab46eb0610..844dc0782a5f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -11,6 +11,7 @@
#include <linux/log2.h>
#include <linux/node.h>
#include <linux/io.h>
+#include <cxl/cxl.h>
extern const struct nvdimm_security_ops *cxl_security_ops;
@@ -200,97 +201,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
-/*
- * Using struct_group() allows for per register-block-type helper routines,
- * without requiring block-type agnostic code to include the prefix.
- */
-struct cxl_regs {
- /*
- * Common set of CXL Component register block base pointers
- * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
- * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
- */
- struct_group_tagged(cxl_component_regs, component,
- void __iomem *hdm_decoder;
- void __iomem *ras;
- );
- /*
- * Common set of CXL Device register block base pointers
- * @status: CXL 2.0 8.2.8.3 Device Status Registers
- * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
- * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
- */
- struct_group_tagged(cxl_device_regs, device_regs,
- void __iomem *status, *mbox, *memdev;
- );
-
- struct_group_tagged(cxl_pmu_regs, pmu_regs,
- void __iomem *pmu;
- );
-
- /*
- * RCH downstream port specific RAS register
- * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
- */
- struct_group_tagged(cxl_rch_regs, rch_regs,
- void __iomem *dport_aer;
- );
-
- /*
- * RCD upstream port specific PCIe cap register
- * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
- */
- struct_group_tagged(cxl_rcd_regs, rcd_regs,
- void __iomem *rcd_pcie_cap;
- );
-};
-
-struct cxl_reg_map {
- bool valid;
- int id;
- unsigned long offset;
- unsigned long size;
-};
-
-struct cxl_component_reg_map {
- struct cxl_reg_map hdm_decoder;
- struct cxl_reg_map ras;
-};
-
-struct cxl_device_reg_map {
- struct cxl_reg_map status;
- struct cxl_reg_map mbox;
- struct cxl_reg_map memdev;
-};
-
-struct cxl_pmu_reg_map {
- struct cxl_reg_map pmu;
-};
-
-/**
- * struct cxl_register_map - DVSEC harvested register block mapping parameters
- * @host: device for devm operations and logging
- * @base: virtual base of the register-block-BAR + @block_offset
- * @resource: physical resource base of the register block
- * @max_size: maximum mapping size to perform register search
- * @reg_type: see enum cxl_regloc_type
- * @component_map: cxl_reg_map for component registers
- * @device_map: cxl_reg_maps for device registers
- * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
- */
-struct cxl_register_map {
- struct device *host;
- void __iomem *base;
- resource_size_t resource;
- resource_size_t max_size;
- u8 reg_type;
- union {
- struct cxl_component_reg_map component_map;
- struct cxl_device_reg_map device_map;
- struct cxl_pmu_reg_map pmu_map;
- };
-};
-
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
@@ -482,11 +392,6 @@ struct cxl_region_params {
resource_size_t cache_size;
};
-enum cxl_partition_mode {
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..9cc4337cacfb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -7,6 +7,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/node.h>
+#include <cxl/cxl.h>
#include <cxl/event.h>
#include <cxl/mailbox.h>
#include "cxl.h"
@@ -357,87 +358,6 @@ struct cxl_security_state {
struct kernfs_node *sanitize_node;
};
-/*
- * enum cxl_devtype - delineate type-2 from a generic type-3 device
- * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
- * HDM-DB, no requirement that this device implements a
- * mailbox, or other memory-device-standard manageability
- * flows.
- * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
- * HDM-H and class-mandatory memory device registers
- */
-enum cxl_devtype {
- CXL_DEVTYPE_DEVMEM,
- CXL_DEVTYPE_CLASSMEM,
-};
-
-/**
- * struct cxl_dpa_perf - DPA performance property entry
- * @dpa_range: range for DPA address
- * @coord: QoS performance data (i.e. latency, bandwidth)
- * @cdat_coord: raw QoS performance data from CDAT
- * @qos_class: QoS Class cookies
- */
-struct cxl_dpa_perf {
- struct range dpa_range;
- struct access_coordinate coord[ACCESS_COORDINATE_MAX];
- struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
- int qos_class;
-};
-
-/**
- * struct cxl_dpa_partition - DPA partition descriptor
- * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
- * @perf: performance attributes of the partition from CDAT
- * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
- */
-struct cxl_dpa_partition {
- struct resource res;
- struct cxl_dpa_perf perf;
- enum cxl_partition_mode mode;
-};
-
-/**
- * struct cxl_dev_state - The driver device state
- *
- * cxl_dev_state represents the CXL driver/device state. It provides an
- * interface to mailbox commands as well as some cached data about the device.
- * Currently only memory devices are represented.
- *
- * @dev: The device associated with this CXL state
- * @cxlmd: The device representing the CXL.mem capabilities of @dev
- * @reg_map: component and ras register mapping parameters
- * @regs: Parsed register blocks
- * @cxl_dvsec: Offset to the PCIe device DVSEC
- * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
- * @media_ready: Indicate whether the device media is usable
- * @dpa_res: Overall DPA resource tree for the device
- * @part: DPA partition array
- * @nr_partitions: Number of DPA partitions
- * @serial: PCIe Device Serial Number
- * @type: Generic Memory Class device or Vendor Specific Memory device
- * @cxl_mbox: CXL mailbox context
- * @cxlfs: CXL features context
- */
-struct cxl_dev_state {
- struct device *dev;
- struct cxl_memdev *cxlmd;
- struct cxl_register_map reg_map;
- struct cxl_regs regs;
- int cxl_dvsec;
- bool rcd;
- bool media_ready;
- struct resource dpa_res;
- struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
- unsigned int nr_partitions;
- u64 serial;
- enum cxl_devtype type;
- struct cxl_mailbox cxl_mbox;
-#ifdef CONFIG_CXL_FEATURES
- struct cxl_features_state *cxlfs;
-#endif
-};
-
static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
{
/*
@@ -833,7 +753,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 54e219b0049e..570e53e26f11 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -7,29 +7,8 @@
#define CXL_MEMORY_PROGIF 0x10
-/*
- * See section 8.1 Configuration Space Registers in the CXL 2.0
- * Specification. Names are taken straight from the specification with "CXL" and
- * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
- */
#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
-/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE 0
-#define CXL_DVSEC_CAP_OFFSET 0xA
-#define CXL_DVSEC_MEM_CAPABLE BIT(2)
-#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
-#define CXL_DVSEC_CTRL_OFFSET 0xC
-#define CXL_DVSEC_MEM_ENABLE BIT(2)
-#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
-#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
-#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
-#define CXL_DVSEC_MEM_ACTIVE BIT(1)
-#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
-#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
-#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
-#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
-
#define CXL_DVSEC_RANGE_MAX 2
/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 785aa2af5eaa..0d3c67867965 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,8 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/cxl.h>
+#include <cxl/pci.h>
#include <cxl/mailbox.h>
#include "cxlmem.h"
#include "cxlpci.h"
@@ -911,6 +913,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
int rc, pmu_count;
unsigned int i;
bool irq_avail;
+ u16 dvsec;
/*
* Double check the anonymous union trickery in struct cxl_regs
@@ -924,19 +927,19 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return rc;
pci_set_master(pdev);
- mds = cxl_memdev_state_create(&pdev->dev);
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (!dvsec)
+ dev_warn(&pdev->dev,
+ "Device DVSEC not present, skip CXL.mem init\n");
+
+ mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
if (IS_ERR(mds))
return PTR_ERR(mds);
cxlds = &mds->cxlds;
pci_set_drvdata(pdev, cxlds);
cxlds->rcd = is_cxl_restricted(pdev);
- cxlds->serial = pci_get_dsn(pdev);
- cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
- if (!cxlds->cxl_dvsec)
- dev_warn(&pdev->dev,
- "Device DVSEC not present, skip CXL.mem init\n");
rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
if (rc)
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..9c1a82c8af3d
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. */
+/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
+
+#ifndef __CXL_CXL_H__
+#define __CXL_CXL_H__
+
+#include <linux/node.h>
+#include <linux/ioport.h>
+#include <cxl/mailbox.h>
+
+/**
+ * enum cxl_devtype - delineate type-2 from a generic type-3 device
+ * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
+ * HDM-DB, no requirement that this device implements a
+ * mailbox, or other memory-device-standard manageability
+ * flows.
+ * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
+ * HDM-H and class-mandatory memory device registers
+ */
+enum cxl_devtype {
+ CXL_DEVTYPE_DEVMEM,
+ CXL_DEVTYPE_CLASSMEM,
+};
+
+struct device;
+
+/*
+ * Using struct_group() allows for per register-block-type helper routines,
+ * without requiring block-type agnostic code to include the prefix.
+ */
+struct cxl_regs {
+ /*
+ * Common set of CXL Component register block base pointers
+ * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
+ * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ */
+ struct_group_tagged(cxl_component_regs, component,
+ void __iomem *hdm_decoder;
+ void __iomem *ras;
+ );
+ /*
+ * Common set of CXL Device register block base pointers
+ * @status: CXL 2.0 8.2.8.3 Device Status Registers
+ * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
+ * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
+ */
+ struct_group_tagged(cxl_device_regs, device_regs,
+ void __iomem *status, *mbox, *memdev;
+ );
+
+ struct_group_tagged(cxl_pmu_regs, pmu_regs,
+ void __iomem *pmu;
+ );
+
+ /*
+ * RCH downstream port specific RAS register
+ * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
+ */
+ struct_group_tagged(cxl_rch_regs, rch_regs,
+ void __iomem *dport_aer;
+ );
+
+ /*
+ * RCD upstream port specific PCIe cap register
+ * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
+ */
+ struct_group_tagged(cxl_rcd_regs, rcd_regs,
+ void __iomem *rcd_pcie_cap;
+ );
+};
+
+struct cxl_reg_map {
+ bool valid;
+ int id;
+ unsigned long offset;
+ unsigned long size;
+};
+
+struct cxl_component_reg_map {
+ struct cxl_reg_map hdm_decoder;
+ struct cxl_reg_map ras;
+};
+
+struct cxl_device_reg_map {
+ struct cxl_reg_map status;
+ struct cxl_reg_map mbox;
+ struct cxl_reg_map memdev;
+};
+
+struct cxl_pmu_reg_map {
+ struct cxl_reg_map pmu;
+};
+
+/**
+ * struct cxl_register_map - DVSEC harvested register block mapping parameters
+ * @host: device for devm operations and logging
+ * @base: virtual base of the register-block-BAR + @block_offset
+ * @resource: physical resource base of the register block
+ * @max_size: maximum mapping size to perform register search
+ * @reg_type: see enum cxl_regloc_type
+ * @component_map: cxl_reg_map for component registers
+ * @device_map: cxl_reg_maps for device registers
+ * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
+ */
+struct cxl_register_map {
+ struct device *host;
+ void __iomem *base;
+ resource_size_t resource;
+ resource_size_t max_size;
+ u8 reg_type;
+ union {
+ struct cxl_component_reg_map component_map;
+ struct cxl_device_reg_map device_map;
+ struct cxl_pmu_reg_map pmu_map;
+ };
+};
+
+/**
+ * struct cxl_dpa_perf - DPA performance property entry
+ * @dpa_range: range for DPA address
+ * @coord: QoS performance data (i.e. latency, bandwidth)
+ * @cdat_coord: raw QoS performance data from CDAT
+ * @qos_class: QoS Class cookies
+ */
+struct cxl_dpa_perf {
+ struct range dpa_range;
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
+ int qos_class;
+};
+
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+/**
+ * struct cxl_dev_state - The driver device state
+ *
+ * cxl_dev_state represents the CXL driver/device state. It provides an
+ * interface to mailbox commands as well as some cached data about the device.
+ * Currently only memory devices are represented.
+ *
+ * @dev: The device associated with this CXL state
+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
+ * @reg_map: component and ras register mapping parameters
+ * @regs: Parsed register blocks
+ * @cxl_dvsec: Offset to the PCIe device DVSEC
+ * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
+ * @dpa_res: Overall DPA resource tree for the device
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
+ * @serial: PCIe Device Serial Number
+ * @type: Generic Memory Class device or Vendor Specific Memory device
+ * @cxl_mbox: CXL mailbox context
+ * @cxlfs: CXL features context
+ */
+struct cxl_dev_state {
+ /* public for Type2 drivers */
+ struct device *dev;
+ struct cxl_memdev *cxlmd;
+
+ /* private for Type2 drivers */
+ struct cxl_register_map reg_map;
+ struct cxl_regs regs;
+ int cxl_dvsec;
+ bool rcd;
+ bool media_ready;
+ struct resource dpa_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
+ u64 serial;
+ enum cxl_devtype type;
+ struct cxl_mailbox cxl_mbox;
+#ifdef CONFIG_CXL_FEATURES
+ struct cxl_features_state *cxlfs;
+#endif
+};
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox);
+
+/**
+ * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
+ * driver specific struct.
+ *
+ * @parent: device behind the request
+ * @type: CXL device type
+ * @serial: device identification
+ * @dvsec: dvsec capability offset
+ * @drv_struct: driver struct embedding a cxl_dev_state struct
+ * @member: drv_struct member as cxl_dev_state
+ * @mbox: true if mailbox supported
+ *
+ * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
+ * struct initialized.
+ *
+ * Introduced for Type2 driver support.
+ */
+#define devm_cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
+ ({ \
+ static_assert(__same_type(struct cxl_dev_state, \
+ ((drv_struct *)NULL)->member)); \
+ static_assert(offsetof(drv_struct, member) == 0); \
+ (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
+ sizeof(drv_struct), mbox); \
+ })
+#endif /* __CXL_CXL_H__ */
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..5729a93b252a
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
+#define CXL_DVSEC_PCIE_DEVICE 0
+#define CXL_DVSEC_CAP_OFFSET 0xA
+#define CXL_DVSEC_MEM_CAPABLE BIT(2)
+#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
+#define CXL_DVSEC_CTRL_OFFSET 0xC
+#define CXL_DVSEC_MEM_ENABLE BIT(2)
+#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + ((i) * 0x10))
+#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + ((i) * 0x10))
+#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
+#define CXL_DVSEC_MEM_ACTIVE BIT(1)
+#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
+#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + ((i) * 0x10))
+#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + ((i) * 0x10))
+#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
+
+#endif
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index bf9caa908f89..e62cb5049cf5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1717,7 +1717,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- mds = cxl_memdev_state_create(dev);
+ mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
if (IS_ERR(mds))
return PTR_ERR(mds);
@@ -1733,7 +1733,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
- cxlds->serial = pdev->id + 1;
if (is_rcd(pdev))
cxlds->rcd = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 02/22] sfc: add cxl support
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-25 16:37 ` Jonathan Cameron
2025-07-25 22:16 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 03/22] cxl: Move pci generic code alejandro.lucero-palau
` (21 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/net/ethernet/sfc/Kconfig | 9 +++++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/efx.c | 15 +++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 55 +++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
drivers/net/ethernet/sfc/net_driver.h | 10 +++++
6 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index c4c43434f314..979f2801e2a8 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
Driver-Interface) commands and responses, allowing debugging of
driver/firmware interaction. The tracing is actually enabled by
a sysfs file 'mcdi_logging' under the PCI device.
+config SFC_CXL
+ bool "Solarflare SFC9100-family CXL support"
+ depends on SFC && CXL_BUS >= SFC
+ default SFC
+ help
+ This enables SFC CXL support if the kernel is configuring CXL for
+ using CTPIO with CXL.mem. The SFC device with CXL support and
+ with a CXL-aware firmware can be used for minimizing latencies
+ when sending through CTPIO.
source "drivers/net/ethernet/sfc/falcon/Kconfig"
source "drivers/net/ethernet/sfc/siena/Kconfig"
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index d99039ec468d..bb0f1891cde6 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
mae.o tc.o tc_bindings.o tc_counters.o \
tc_encap_actions.o tc_conntrack.o
+sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
obj-$(CONFIG_SFC) += sfc.o
obj-$(CONFIG_SFC_FALCON) += falcon/
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 112e55b98ed3..537668278375 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -34,6 +34,7 @@
#include "selftest.h"
#include "sriov.h"
#include "efx_devlink.h"
+#include "efx_cxl.h"
#include "mcdi_port_common.h"
#include "mcdi_pcol.h"
@@ -981,12 +982,15 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx_pci_remove_main(efx);
efx_fini_io(efx);
+
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ efx_cxl_exit(probe_data);
+
pci_dbg(efx->pci_dev, "shutdown successful\n");
efx_fini_devlink_and_unlock(efx);
efx_fini_struct(efx);
free_netdev(efx->net_dev);
- probe_data = container_of(efx, struct efx_probe_data, efx);
kfree(probe_data);
};
@@ -1190,6 +1194,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
if (rc)
goto fail2;
+ /* A successful cxl initialization implies a CXL region created to be
+ * used for PIO buffers. If there is no CXL support, or initialization
+ * fails, efx_cxl_pio_initialised will be false and legacy PIO buffers
+ * defined at specific PCI BAR regions will be used.
+ */
+ rc = efx_cxl_init(probe_data);
+ if (rc)
+ pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
+
rc = efx_pci_probe_post_io(efx);
if (rc) {
/* On failure, retry once immediately.
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
new file mode 100644
index 000000000000..f1db7284dee8
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/****************************************************************************
+ *
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <cxl/pci.h>
+#include <linux/pci.h>
+
+#include "net_driver.h"
+#include "efx_cxl.h"
+
+#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+
+int efx_cxl_init(struct efx_probe_data *probe_data)
+{
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl;
+ u16 dvsec;
+
+ probe_data->cxl_pio_initialised = false;
+
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (!dvsec)
+ return 0;
+
+ pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
+
+ /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
+ * specifying no mbox available.
+ */
+ cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_dev->dev.id, dvsec, struct efx_cxl,
+ cxlds, false);
+
+ if (!cxl)
+ return -ENOMEM;
+
+ probe_data->cxl = cxl;
+
+ return 0;
+}
+
+void efx_cxl_exit(struct efx_probe_data *probe_data)
+{
+}
+
+MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
new file mode 100644
index 000000000000..961639cef692
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/****************************************************************************
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#ifndef EFX_CXL_H
+#define EFX_CXL_H
+
+#ifdef CONFIG_SFC_CXL
+
+#include <cxl/cxl.h>
+
+struct cxl_root_decoder;
+struct cxl_port;
+struct cxl_endpoint_decoder;
+struct cxl_region;
+struct efx_probe_data;
+
+struct efx_cxl {
+ struct cxl_dev_state cxlds;
+ struct cxl_memdev *cxlmd;
+ struct cxl_root_decoder *cxlrd;
+ struct cxl_port *endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *efx_region;
+ void __iomem *ctpio_cxl;
+};
+
+int efx_cxl_init(struct efx_probe_data *probe_data);
+void efx_cxl_exit(struct efx_probe_data *probe_data);
+#else
+static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
+static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
+#endif
+#endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 5c0f306fb019..0e685b8a9980 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1199,14 +1199,24 @@ struct efx_nic {
atomic_t n_rx_noskb_drops;
};
+#ifdef CONFIG_SFC_CXL
+struct efx_cxl;
+#endif
+
/**
* struct efx_probe_data - State after hardware probe
* @pci_dev: The PCI device
* @efx: Efx NIC details
+ * @cxl: details of related cxl objects
+ * @cxl_pio_initialised: cxl initialization outcome.
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
struct efx_nic efx;
+#ifdef CONFIG_SFC_CXL
+ struct efx_cxl *cxl;
+ bool cxl_pio_initialised;
+#endif
};
static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 03/22] cxl: Move pci generic code
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-07-25 22:41 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
` (20 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci.c implements the functionality for a Type3 device
initialization.
Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
exported and shared with CXL Type2 device initialization.
Fix cxl mock tests affected by the code move.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/pci.c | 62 +++++++++++++++++++++++++++++++
drivers/cxl/core/regs.c | 1 -
drivers/cxl/cxl.h | 2 -
drivers/cxl/cxlpci.h | 2 +
drivers/cxl/pci.c | 70 -----------------------------------
include/cxl/pci.h | 13 +++++++
tools/testing/cxl/Kbuild | 1 -
tools/testing/cxl/test/mock.c | 17 ---------
9 files changed, 79 insertions(+), 91 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 17b692eb3257..2f39944074f6 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -134,4 +134,6 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
u16 *return_code);
#endif
+resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
+ struct cxl_dport *dport);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 0eb339c91413..447dc8d3138f 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1033,6 +1033,68 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, "CXL");
+static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
+ struct cxl_register_map *map,
+ struct cxl_dport *dport)
+{
+ resource_size_t component_reg_phys;
+
+ *map = (struct cxl_register_map) {
+ .host = &pdev->dev,
+ .resource = CXL_RESOURCE_NONE,
+ };
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
+ if (component_reg_phys == CXL_RESOURCE_NONE)
+ return -ENXIO;
+
+ map->resource = component_reg_phys;
+ map->reg_type = CXL_REGLOC_RBI_COMPONENT;
+ map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
+
+ return 0;
+}
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map)
+{
+ int rc;
+
+ rc = cxl_find_regblock(pdev, type, map);
+
+ /*
+ * If the Register Locator DVSEC does not exist, check if it
+ * is an RCH and try to extract the Component Registers from
+ * an RCRB.
+ */
+ if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_map_rcd_linkcap(pdev, dport);
+ if (rc)
+ return rc;
+
+ } else if (rc) {
+ return rc;
+ }
+
+ return cxl_setup_regs(map);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
+
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
{
int speed, bw;
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ecdb22ae6952..fdb99d05a66c 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -642,4 +642,3 @@ resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
return CXL_RESOURCE_NONE;
return __rcrb_to_component(dev, &dport->rcrb, CXL_RCRB_UPSTREAM);
}
-EXPORT_SYMBOL_NS_GPL(cxl_rcd_component_reg_phys, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 844dc0782a5f..b60738f5d11a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -221,8 +221,6 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
int cxl_setup_regs(struct cxl_register_map *map);
struct cxl_dport;
-resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport);
int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 570e53e26f11..0611d96d76da 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -114,4 +114,6 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0d3c67867965..57f125e39051 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -467,76 +467,6 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
return 0;
}
-/*
- * Assume that any RCIEP that emits the CXL memory expander class code
- * is an RCD
- */
-static bool is_cxl_restricted(struct pci_dev *pdev)
-{
- return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
-}
-
-static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
- struct cxl_register_map *map,
- struct cxl_dport *dport)
-{
- resource_size_t component_reg_phys;
-
- *map = (struct cxl_register_map) {
- .host = &pdev->dev,
- .resource = CXL_RESOURCE_NONE,
- };
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
- if (component_reg_phys == CXL_RESOURCE_NONE)
- return -ENXIO;
-
- map->resource = component_reg_phys;
- map->reg_type = CXL_REGLOC_RBI_COMPONENT;
- map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
-
- return 0;
-}
-
-static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map)
-{
- int rc;
-
- rc = cxl_find_regblock(pdev, type, map);
-
- /*
- * If the Register Locator DVSEC does not exist, check if it
- * is an RCH and try to extract the Component Registers from
- * an RCRB.
- */
- if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
- struct cxl_dport *dport;
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
- if (rc)
- return rc;
-
- rc = cxl_dport_map_rcd_linkcap(pdev, dport);
- if (rc)
- return rc;
-
- } else if (rc) {
- return rc;
- }
-
- return cxl_setup_regs(map);
-}
-
static int cxl_pci_ras_unmask(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index 5729a93b252a..e1a1727de3b3 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -4,6 +4,19 @@
#ifndef __CXL_CXL_PCI_H__
#define __CXL_CXL_PCI_H__
+#include <linux/pci.h>
+
+/*
+ * Assume that the caller has already validated that @pdev has CXL
+ * capabilities, any RCIEp with CXL capabilities is treated as a
+ * Restricted CXL Device (RCD) and finds upstream port and endpoint
+ * registers in a Root Complex Register Block (RCRB).
+ */
+static inline bool is_cxl_restricted(struct pci_dev *pdev)
+{
+ return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
+}
+
/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
#define CXL_DVSEC_PCIE_DEVICE 0
#define CXL_DVSEC_CAP_OFFSET 0xA
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 387f3df8b988..2455fabc317d 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -12,7 +12,6 @@ ldflags-y += --wrap=cxl_await_media_ready
ldflags-y += --wrap=cxl_hdm_decode_init
ldflags-y += --wrap=cxl_dvsec_rr_decode
ldflags-y += --wrap=devm_cxl_add_rch_dport
-ldflags-y += --wrap=cxl_rcd_component_reg_phys
ldflags-y += --wrap=cxl_endpoint_parse_cdat
ldflags-y += --wrap=cxl_dport_init_ras_reporting
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index af2594e4f35d..3c6a071fbbe3 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -268,23 +268,6 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
-resource_size_t __wrap_cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport)
-{
- int index;
- resource_size_t component_reg_phys;
- struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-
- if (ops && ops->is_mock_port(dev))
- component_reg_phys = CXL_RESOURCE_NONE;
- else
- component_reg_phys = cxl_rcd_component_reg_phys(dev, dport);
- put_cxl_mock_ops(index);
-
- return component_reg_phys;
-}
-EXPORT_SYMBOL_NS_GPL(__wrap_cxl_rcd_component_reg_phys, "CXL");
-
void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
{
int index;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (2 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 03/22] cxl: Move pci generic code alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 8:27 ` Jonathan Cameron
2025-07-25 22:55 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
` (19 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device component registers.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/port.c | 1 +
drivers/cxl/cxl.h | 7 -------
drivers/cxl/cxlpci.h | 12 ------------
include/cxl/cxl.h | 8 ++++++++
include/cxl/pci.h | 15 +++++++++++++++
5 files changed, 24 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 726bd4a7de27..9acf8c7afb6b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <linux/node.h>
#include <cxl/einj.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b60738f5d11a..b35eff0977a8 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -38,10 +38,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
-#define CXL_CM_CAP_CAP_ID_RAS 0x2
-#define CXL_CM_CAP_CAP_ID_HDM 0x5
-#define CXL_CM_CAP_CAP_HDM_VERSION 1
-
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
#define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
@@ -205,9 +201,6 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
struct cxl_device_reg_map *map);
-int cxl_map_component_regs(const struct cxl_register_map *map,
- struct cxl_component_regs *regs,
- unsigned long map_mask);
int cxl_map_device_regs(const struct cxl_register_map *map,
struct cxl_device_regs *regs);
int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 0611d96d76da..cb4aa5c702f0 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -45,16 +45,6 @@
*/
#define CXL_PCI_DEFAULT_MAX_VECTORS 16
-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
- CXL_REGLOC_RBI_EMPTY = 0,
- CXL_REGLOC_RBI_COMPONENT,
- CXL_REGLOC_RBI_VIRT,
- CXL_REGLOC_RBI_MEMDEV,
- CXL_REGLOC_RBI_PMU,
- CXL_REGLOC_RBI_TYPES
-};
-
/*
* Table Access DOE, CDAT Read Entry Response
*
@@ -114,6 +104,4 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
-int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 9c1a82c8af3d..0810c18d7aef 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -70,6 +70,10 @@ struct cxl_regs {
);
};
+#define CXL_CM_CAP_CAP_ID_RAS 0x2
+#define CXL_CM_CAP_CAP_ID_HDM 0x5
+#define CXL_CM_CAP_CAP_HDM_VERSION 1
+
struct cxl_reg_map {
bool valid;
int id;
@@ -223,4 +227,8 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
(drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
sizeof(drv_struct), mbox); \
})
+
+int cxl_map_component_regs(const struct cxl_register_map *map,
+ struct cxl_component_regs *regs,
+ unsigned long map_mask);
#endif /* __CXL_CXL_H__ */
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index e1a1727de3b3..521d3449382c 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -34,3 +34,18 @@ static inline bool is_cxl_restricted(struct pci_dev *pdev)
#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
#endif
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+ CXL_REGLOC_RBI_EMPTY = 0,
+ CXL_REGLOC_RBI_COMPONENT,
+ CXL_REGLOC_RBI_VIRT,
+ CXL_REGLOC_RBI_MEMDEV,
+ CXL_REGLOC_RBI_PMU,
+ CXL_REGLOC_RBI_TYPES
+};
+
+struct cxl_register_map;
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (3 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 8:39 ` Jonathan Cameron
` (2 more replies)
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
` (18 subsequent siblings)
23 siblings, 3 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Use cxl code for registers discovery and mapping regarding cxl component
regs and validate registers found are as expected.
Set media ready explicitly as there is no means for doing so without
a mailbox, and without the related cxl register, not mandatory for type2.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 34 ++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index f1db7284dee8..ea02eb82b73c 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -9,6 +9,7 @@
* by the Free Software Foundation, incorporated herein by reference.
*/
+#include <cxl/cxl.h>
#include <cxl/pci.h>
#include <linux/pci.h>
@@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
u16 dvsec;
+ int rc;
probe_data->cxl_pio_initialised = false;
@@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (!cxl)
return -ENOMEM;
+ rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
+ &cxl->cxlds.reg_map);
+ if (rc) {
+ dev_warn(&pci_dev->dev, "No component registers (err=%d)\n", rc);
+ return rc;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
+ dev_err(&pci_dev->dev, "Expected HDM component register not found\n");
+ return -ENODEV;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.ras.valid) {
+ dev_err(&pci_dev->dev, "Expected RAS component register not found\n");
+ return -ENODEV;
+ }
+
+ rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
+ &cxl->cxlds.regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS));
+ if (rc) {
+ dev_err(&pci_dev->dev, "Failed to map RAS capability.\n");
+ return rc;
+ }
+
+ /*
+ * Set media ready explicitly as there are neither mailbox for checking
+ * this state nor the CXL register involved, both not mandatory for
+ * type2.
+ */
+ cxl->cxlds.media_ready = true;
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (4 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 8:42 ` Jonathan Cameron
` (2 more replies)
2025-06-24 14:13 ` [PATCH v17 07/22] sfc: initialize dpa alejandro.lucero-palau
` (17 subsequent siblings)
23 siblings, 3 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.
Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
º
---
drivers/cxl/core/mbox.c | 17 +++++++++++++++++
include/cxl/cxl.h | 1 +
2 files changed, 18 insertions(+)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d78f6039f997..d3b4ba5214d5 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
info->nr_partitions++;
}
+/**
+ * cxl_set_capacity: initialize dpa by a driver without a mailbox.
+ *
+ * @cxlds: pointer to cxl_dev_state
+ * @capacity: device volatile memory size
+ */
+void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
+{
+ struct cxl_dpa_info range_info = {
+ .size = capacity,
+ };
+
+ add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
+ cxl_dpa_setup(cxlds, &range_info);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
+
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 0810c18d7aef..4975ead488b4 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
+void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 07/22] sfc: initialize dpa
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (5 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-07-26 0:55 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 08/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
` (16 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Use hardcoded values for initializing dpa as there is no mbox available.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index ea02eb82b73c..5d68ee4e818d 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -77,6 +77,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
*/
cxl->cxlds.media_ready = true;
+ cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 08/22] cxl: Prepare memdev creation for type2
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (6 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 07/22] sfc: initialize dpa alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-07-26 1:05 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 09/22] sfc: create type2 cxl memdev alejandro.lucero-palau
` (15 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.
Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.
Make devm_cxl_add_memdev accessible from a accel driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/memdev.c | 15 +++++++++++++--
drivers/cxl/cxlmem.h | 2 --
drivers/cxl/mem.c | 25 +++++++++++++++++++------
include/cxl/cxl.h | 2 ++
4 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index c73582d24dd7..f43d2aa2928e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include "trace.h"
#include "core.h"
@@ -562,9 +563,16 @@ static const struct device_type cxl_memdev_type = {
.groups = cxl_memdev_attribute_groups,
};
+static const struct device_type cxl_accel_memdev_type = {
+ .name = "cxl_accel_memdev",
+ .release = cxl_memdev_release,
+ .devnode = cxl_memdev_devnode,
+};
+
bool is_cxl_memdev(const struct device *dev)
{
- return dev->type == &cxl_memdev_type;
+ return (dev->type == &cxl_memdev_type ||
+ dev->type == &cxl_accel_memdev_type);
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
@@ -689,7 +697,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
dev->parent = cxlds->dev;
dev->bus = &cxl_bus_type;
dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
+ if (cxlds->type == CXL_DEVTYPE_DEVMEM)
+ dev->type = &cxl_accel_memdev_type;
+ else
+ dev->type = &cxl_memdev_type;
device_set_pm_not_required(dev);
INIT_WORK(&cxlmd->detach_work, detach_memdev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 9cc4337cacfb..7be51f70902a 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -88,8 +88,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
return is_cxl_memdev(port->uport_dev);
}
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 9675243bd05b..7f39790d9d98 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -130,12 +130,18 @@ static int cxl_mem_probe(struct device *dev)
dentry = cxl_debugfs_create_dir(dev_name(dev));
debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
- if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
- debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
- &cxl_poison_inject_fops);
- if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
- debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
- &cxl_poison_clear_fops);
+ /*
+ * Avoid poison debugfs files for Type2 devices as they rely on
+ * cxl_memdev_state.
+ */
+ if (mds) {
+ if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
+ debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
+ &cxl_poison_inject_fops);
+ if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
+ debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
+ &cxl_poison_clear_fops);
+ }
rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
if (rc)
@@ -219,6 +225,13 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ /*
+ * Avoid poison sysfs files for Type2 devices as they rely on
+ * cxl_memdev_state.
+ */
+ if (!mds)
+ return 0;
+
if (a == &dev_attr_trigger_poison_list.attr)
if (!test_bit(CXL_POISON_ENABLED_LIST,
mds->poison.enabled_cmds))
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 4975ead488b4..fcdf98231ffb 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -232,4 +232,6 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlmds);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 09/22] sfc: create type2 cxl memdev
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (7 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 08/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 8:51 ` Jonathan Cameron
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
` (14 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Fan Ni, Edward Cree,
Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 5d68ee4e818d..e2d52ed49535 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -79,6 +79,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
+ cxl->cxlmd = devm_cxl_add_memdev(&pci_dev->dev, &cxl->cxlds);
+
+ if (IS_ERR(cxl->cxlmd)) {
+ pci_err(pci_dev, "CXL accel memdev creation failed");
+ return PTR_ERR(cxl->cxlmd);
+ }
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (8 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 09/22] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 8:59 ` Jonathan Cameron
` (3 more replies)
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
` (13 subsequent siblings)
23 siblings, 4 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
The first step for a CXL accelerator driver that wants to establish new
CXL.mem regions is to register a 'struct cxl_memdev'. That kicks off
cxl_mem_probe() to enumerate all 'struct cxl_port' instances in the
topology up to the root.
If the port driver has not attached yet the expectation is that the
driver waits until that link is established. The common cxl_pci driver
has reason to keep the 'struct cxl_memdev' device attached to the bus
until the root driver attaches. An accelerator may want to instead defer
probing until CXL resources can be acquired.
Use the @endpoint attribute of a 'struct cxl_memdev' to convey when a
accelerator driver probing should be deferred vs failed. Provide that
indication via a new cxl_acquire_endpoint() API that can retrieve the
probe status of the memdev.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/memdev.c | 42 +++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/port.c | 2 +-
drivers/cxl/mem.c | 7 +++++--
include/cxl/cxl.h | 2 ++
4 files changed, 50 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index f43d2aa2928e..e2c6b5b532db 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1124,6 +1124,48 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+/*
+ * Try to get a locked reference on a memdev's CXL port topology
+ * connection. Be careful to observe when cxl_mem_probe() has deposited
+ * a probe deferral awaiting the arrival of the CXL root driver.
+ */
+struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint;
+ int rc = -ENXIO;
+
+ device_lock(&cxlmd->dev);
+
+ endpoint = cxlmd->endpoint;
+ if (!endpoint)
+ goto err;
+
+ if (IS_ERR(endpoint)) {
+ rc = PTR_ERR(endpoint);
+ goto err;
+ }
+
+ device_lock(&endpoint->dev);
+ if (!endpoint->dev.driver)
+ goto err_endpoint;
+
+ return endpoint;
+
+err_endpoint:
+ device_unlock(&endpoint->dev);
+err:
+ device_unlock(&cxlmd->dev);
+ return ERR_PTR(rc);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
+
+void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
+{
+ device_unlock(&endpoint->dev);
+ device_unlock(&cxlmd->dev);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
+
static void sanitize_teardown_notifier(void *data)
{
struct cxl_memdev_state *mds = data;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 9acf8c7afb6b..fa10a1643e4c 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1563,7 +1563,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
*/
dev_dbg(&cxlmd->dev, "%s is a root dport\n",
dev_name(dport_dev));
- return -ENXIO;
+ return -EPROBE_DEFER;
}
struct cxl_port *parent_port __free(put_cxl_port) =
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 7f39790d9d98..cda0b2ff73ce 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -148,14 +148,17 @@ static int cxl_mem_probe(struct device *dev)
return rc;
rc = devm_cxl_enumerate_ports(cxlmd);
- if (rc)
+ if (rc) {
+ cxlmd->endpoint = ERR_PTR(rc);
return rc;
+ }
struct cxl_port *parent_port __free(put_cxl_port) =
cxl_mem_find_port(cxlmd, &dport);
if (!parent_port) {
dev_err(dev, "CXL port topology not found\n");
- return -ENXIO;
+ cxlmd->endpoint = ERR_PTR(-EPROBE_DEFER);
+ return -EPROBE_DEFER;
}
if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fcdf98231ffb..2928e16a62e2 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -234,4 +234,6 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
struct cxl_dev_state *cxlmds);
+struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
+void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (9 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 22:42 ` Dave Jiang
2025-08-05 16:14 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
` (12 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
CXL region creation involves allocating capacity from device DPA
(device-physical-address space) and assigning it to decode a given HPA
(host-physical-address space). Before determining how much DPA to
allocate the amount of available HPA must be determined. Also, not all
HPA is created equal, some specifically targets RAM, some target PMEM,
some is prepared for device-memory flows like HDM-D and HDM-DB, and some
is host-only (HDM-H).
In order to support Type2 CXL devices, wrap all of those concerns into
an API that retrieves a root decoder (platform CXL window) that fits the
specified constraints and the capacity available for a new region.
Add a complementary function for releasing the reference to such root
decoder.
Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 169 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 3 +
include/cxl/cxl.h | 11 +++
3 files changed, 183 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..03e058ab697e 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -695,6 +695,175 @@ static int free_hpa(struct cxl_region *cxlr)
return 0;
}
+struct cxlrd_max_context {
+ struct device * const *host_bridges;
+ int interleave_ways;
+ unsigned long flags;
+ resource_size_t max_hpa;
+ struct cxl_root_decoder *cxlrd;
+};
+
+static int find_max_hpa(struct device *dev, void *data)
+{
+ struct cxlrd_max_context *ctx = data;
+ struct cxl_switch_decoder *cxlsd;
+ struct cxl_root_decoder *cxlrd;
+ struct resource *res, *prev;
+ struct cxl_decoder *cxld;
+ resource_size_t max;
+ int found = 0;
+
+ if (!is_root_decoder(dev))
+ return 0;
+
+ cxlrd = to_cxl_root_decoder(dev);
+ cxlsd = &cxlrd->cxlsd;
+ cxld = &cxlsd->cxld;
+
+ /*
+ * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
+ * 32 bits, the bitmap functions can be used.
+ */
+ if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
+ dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
+ cxld->flags, ctx->flags);
+ return 0;
+ }
+
+ for (int i = 0; i < ctx->interleave_ways; i++) {
+ for (int j = 0; j < ctx->interleave_ways; j++) {
+ if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
+ found++;
+ break;
+ }
+ }
+ }
+
+ if (found != ctx->interleave_ways) {
+ dev_dbg(dev,
+ "Not enough host bridges. Found %d for %d interleave ways requested\n",
+ found, ctx->interleave_ways);
+ return 0;
+ }
+
+ /*
+ * Walk the root decoder resource range relying on cxl_region_rwsem to
+ * preclude sibling arrival/departure and find the largest free space
+ * gap.
+ */
+ lockdep_assert_held_read(&cxl_region_rwsem);
+ res = cxlrd->res->child;
+
+ /* With no resource child the whole parent resource is available */
+ if (!res)
+ max = resource_size(cxlrd->res);
+ else
+ max = 0;
+
+ for (prev = NULL; res; prev = res, res = res->sibling) {
+ struct resource *next = res->sibling;
+ resource_size_t free = 0;
+
+ /*
+ * Sanity check for preventing arithmetic problems below as a
+ * resource with size 0 could imply using the end field below
+ * when set to unsigned zero - 1 or all f in hex.
+ */
+ if (prev && !resource_size(prev))
+ continue;
+
+ if (!prev && res->start > cxlrd->res->start) {
+ free = res->start - cxlrd->res->start;
+ max = max(free, max);
+ }
+ if (prev && res->start > prev->end + 1) {
+ free = res->start - prev->end + 1;
+ max = max(free, max);
+ }
+ if (next && res->end + 1 < next->start) {
+ free = next->start - res->end + 1;
+ max = max(free, max);
+ }
+ if (!next && res->end + 1 < cxlrd->res->end + 1) {
+ free = cxlrd->res->end + 1 - res->end + 1;
+ max = max(free, max);
+ }
+ }
+
+ dev_dbg(CXLRD_DEV(cxlrd), "found %pa bytes of free space\n", &max);
+ if (max > ctx->max_hpa) {
+ if (ctx->cxlrd)
+ put_device(CXLRD_DEV(ctx->cxlrd));
+ get_device(CXLRD_DEV(cxlrd));
+ ctx->cxlrd = cxlrd;
+ ctx->max_hpa = max;
+ }
+ return 0;
+}
+
+/**
+ * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
+ * @endpoint: the endpoint requiring the HPA
+ * @interleave_ways: number of entries in @host_bridges
+ * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
+ * @max_avail_contig: output parameter of max contiguous bytes available in the
+ * returned decoder
+ *
+ * Returns a pointer to a struct cxl_root_decoder
+ *
+ * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
+ * in (@max_avail_contig))' is a point in time snapshot. If by the time the
+ * caller goes to use this root decoder's capacity the capacity is reduced then
+ * caller needs to loop and retry.
+ *
+ * The returned root decoder has an elevated reference count that needs to be
+ * put with cxl_put_root_decoder(cxlrd).
+ */
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max_avail_contig)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxlrd_max_context ctx = {
+ .host_bridges = &endpoint->host_bridge,
+ .flags = flags,
+ };
+ struct cxl_port *root_port;
+ struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
+
+ if (!endpoint) {
+ dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ if (!root) {
+ dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ root_port = &root->port;
+ scoped_guard(rwsem_read, &cxl_region_rwsem)
+ device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
+
+ if (!ctx.cxlrd)
+ return ERR_PTR(-ENOMEM);
+
+ *max_avail_contig = ctx.max_hpa;
+ return ctx.cxlrd;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
+
+/*
+ * TODO: those references released here should avoid the decoder to be
+ * unregistered.
+ */
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
+{
+ put_device(CXLRD_DEV(cxlrd));
+}
+EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
+
static ssize_t size_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b35eff0977a8..3af8821f7c15 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -665,6 +665,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
+
+#define CXLRD_DEV(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
+
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 2928e16a62e2..dd37b1d88454 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -25,6 +25,11 @@ enum cxl_devtype {
struct device;
+#define CXL_DECODER_F_RAM BIT(0)
+#define CXL_DECODER_F_PMEM BIT(1)
+#define CXL_DECODER_F_TYPE2 BIT(2)
+#define CXL_DECODER_F_MAX 3
+
/*
* Using struct_group() allows for per register-block-type helper routines,
* without requiring block-type agnostic code to include the prefix.
@@ -236,4 +241,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
struct cxl_dev_state *cxlmds);
struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
+struct cxl_port;
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max);
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 12/22] sfc: get endpoint decoder
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (10 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:10 ` Jonathan Cameron
2025-07-28 16:30 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
` (11 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/Kconfig | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
2 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index 979f2801e2a8..e959d9b4f4ce 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
config SFC_CXL
bool "Solarflare SFC9100-family CXL support"
depends on SFC && CXL_BUS >= SFC
+ depends on CXL_REGION
default SFC
help
This enables SFC CXL support if the kernel is configuring CXL for
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index e2d52ed49535..c0adfd99cc78 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
{
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
+ resource_size_t max_size;
struct efx_cxl *cxl;
u16 dvsec;
int rc;
@@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxlmd);
}
+ cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
+ if (IS_ERR(cxl->endpoint))
+ return PTR_ERR(cxl->endpoint);
+
+ cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
+ CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
+ &max_size);
+
+ if (IS_ERR(cxl->cxlrd)) {
+ pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
+ rc = PTR_ERR(cxl->cxlrd);
+ goto endpoint_release;
+ }
+
+ if (max_size < EFX_CTPIO_BUFFER_SIZE) {
+ pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
+ __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
+ rc = -ENOSPC;
+ goto put_root_decoder;
+ }
+
probe_data->cxl = cxl;
- return 0;
+ goto endpoint_release;
+
+put_root_decoder:
+ cxl_put_root_decoder(cxl->cxlrd);
+endpoint_release:
+ cxl_release_endpoint(cxl->cxlmd, cxl->endpoint);
+ return rc;
}
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
+ if (probe_data->cxl)
+ cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (11 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:06 ` Jonathan Cameron
2025-06-27 20:46 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
` (10 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Region creation involves finding available DPA (device-physical-address)
capacity to map into HPA (host-physical-address) space.
In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
that tries to allocate the DPA memory the driver requires to operate.The
memory requested should not be bigger than the max available HPA obtained
previously with cxl_get_hpa_freespace.
Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/hdm.c | 93 ++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 2 +
include/cxl/cxl.h | 5 +++
3 files changed, 100 insertions(+)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 70cae4ebf8a4..b17381e49836 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -3,6 +3,7 @@
#include <linux/seq_file.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <cxl/cxl.h>
#include "cxlmem.h"
#include "core.h"
@@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
return base;
}
+/**
+ * cxl_dpa_free - release DPA (Device Physical Address)
+ *
+ * @cxled: endpoint decoder linked to the DPA
+ *
+ * Returns 0 or error.
+ */
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
{
struct cxl_port *port = cxled_to_port(cxled);
@@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
devm_cxl_dpa_release(cxled);
return 0;
}
+EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode)
@@ -686,6 +695,90 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
}
+static int find_free_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_port *port;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ port = cxled_to_port(cxled);
+
+ if (cxled->cxld.id != port->hdm_end + 1)
+ return 0;
+
+ return 1;
+}
+
+static struct cxl_endpoint_decoder *
+cxl_find_free_decoder(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct device *dev;
+
+ scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
+ dev = device_find_child(&endpoint->dev, NULL,
+ find_free_decoder);
+ }
+ if (dev)
+ return to_cxl_endpoint_decoder(dev);
+
+ return NULL;
+}
+
+/**
+ * cxl_request_dpa - search and reserve DPA given input constraints
+ * @cxlmd: memdev with an endpoint port with available decoders
+ * @mode: DPA operation mode (ram vs pmem)
+ * @alloc: dpa size required
+ *
+ * Returns a pointer to a cxl_endpoint_decoder struct or an error
+ *
+ * Given that a region needs to allocate from limited HPA capacity it
+ * may be the case that a device has more mappable DPA capacity than
+ * available HPA. The expectation is that @alloc is a driver known
+ * value based on the device capacity but it could not be available
+ * due to HPA constraints.
+ *
+ * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
+ * reserved, or an error pointer. The caller is also expected to own the
+ * lifetime of the memdev registration associated with the endpoint to
+ * pin the decoder registered as well.
+ */
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc)
+{
+ struct cxl_endpoint_decoder *cxled __free(put_cxled) =
+ cxl_find_free_decoder(cxlmd);
+ struct device *cxled_dev;
+ int rc;
+
+ if (!IS_ALIGNED(alloc, SZ_256M))
+ return ERR_PTR(-EINVAL);
+
+ if (!cxled) {
+ rc = -ENODEV;
+ goto err;
+ }
+
+ rc = cxl_dpa_set_part(cxled, mode);
+ if (rc)
+ goto err;
+
+ rc = cxl_dpa_alloc(cxled, alloc);
+ if (rc)
+ goto err;
+
+ return cxled;
+err:
+ put_device(cxled_dev);
+ return ERR_PTR(rc);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
+
static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
{
u16 eig;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 3af8821f7c15..6e724a8440f5 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -636,6 +636,8 @@ void put_cxl_root(struct cxl_root *cxl_root);
DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T))
DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
+DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&_T->cxld.dev))
+
int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
void cxl_bus_rescan(void);
void cxl_bus_drain(void);
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index dd37b1d88454..a2f3e683724a 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -7,6 +7,7 @@
#include <linux/node.h>
#include <linux/ioport.h>
+#include <linux/range.h>
#include <cxl/mailbox.h>
/**
@@ -247,4 +248,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
unsigned long flags,
resource_size_t *max);
void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc);
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 14/22] sfc: get endpoint decoder
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (12 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:11 ` Jonathan Cameron
2025-07-16 23:48 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
` (9 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index c0adfd99cc78..ffbf0e706330 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
goto put_root_decoder;
}
+ cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
+ EFX_CTPIO_BUFFER_SIZE);
+ if (IS_ERR(cxl->cxled)) {
+ pci_err(pci_dev, "CXL accel request DPA failed");
+ rc = PTR_ERR(cxl->cxled);
+ goto put_root_decoder;
+ }
+
probe_data->cxl = cxl;
goto endpoint_release;
@@ -121,8 +129,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
- if (probe_data->cxl)
+ if (probe_data->cxl) {
+ cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
+ }
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 15/22] cxl: Make region type based on endpoint type
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (13 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-09-03 17:20 ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
` (8 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
Support for Type2 implies region type needs to be based on the endpoint
type instead.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 03e058ab697e..c8ef30db2157 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2717,7 +2717,8 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode, int id,
+ enum cxl_decoder_type target_type)
{
int rc;
@@ -2739,7 +2740,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode, target_type);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
@@ -2753,7 +2754,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -3525,7 +3526,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
do {
cxlr = __create_region(cxlrd, cxlds->part[part].mode,
- atomic_read(&cxlrd->region_id));
+ atomic_read(&cxlrd->region_id),
+ cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
if (IS_ERR(cxlr)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 16/22] cxl/region: Factor out interleave ways setup
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (14 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:13 ` Jonathan Cameron
2025-06-24 14:13 ` [PATCH v17 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
` (7 subsequent siblings)
23 siblings, 1 reply; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup for interleave ways.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 46 +++++++++++++++++++++++----------------
1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c8ef30db2157..c0ad6ff67977 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -464,22 +464,14 @@ static ssize_t interleave_ways_show(struct device *dev,
static const struct attribute_group *get_cxl_region_target_group(void);
-static ssize_t interleave_ways_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_ways(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- unsigned int val, save;
- int rc;
+ int save, rc;
u8 iw;
- rc = kstrtouint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = ways_to_eiw(val, &iw);
if (rc)
return rc;
@@ -494,20 +486,36 @@ static ssize_t interleave_ways_store(struct device *dev,
return -EINVAL;
}
- rc = down_write_killable(&cxl_region_rwsem);
- if (rc)
- return rc;
- if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
- rc = -EBUSY;
- goto out;
- }
+ lockdep_assert_held_write(&cxl_region_rwsem);
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ return -EBUSY;
save = p->interleave_ways;
p->interleave_ways = val;
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
if (rc)
p->interleave_ways = save;
-out:
+
+ return rc;
+}
+
+static ssize_t interleave_ways_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ unsigned int val;
+ int rc;
+
+ rc = kstrtouint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ rc = down_write_killable(&cxl_region_rwsem);
+ if (rc)
+ return rc;
+
+ rc = set_interleave_ways(cxlr, val);
up_write(&cxl_region_rwsem);
if (rc)
return rc;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 17/22] cxl/region: Factor out interleave granularity setup
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (15 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
` (6 subsequent siblings)
23 siblings, 0 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup forinterleave
granularity.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c0ad6ff67977..21cf8c11efe3 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -540,21 +540,14 @@ static ssize_t interleave_granularity_show(struct device *dev,
return rc;
}
-static ssize_t interleave_granularity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_granularity(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- int rc, val;
+ int rc;
u16 ig;
- rc = kstrtoint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = granularity_to_eig(val, &ig);
if (rc)
return rc;
@@ -570,16 +563,30 @@ static ssize_t interleave_granularity_store(struct device *dev,
if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
return -EINVAL;
+ lockdep_assert_held_write(&cxl_region_rwsem);
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ return -EBUSY;
+
+ p->interleave_granularity = val;
+ return 0;
+}
+
+static ssize_t interleave_granularity_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ int rc, val;
+
+ rc = kstrtoint(buf, 0, &val);
+ if (rc)
+ return rc;
+
rc = down_write_killable(&cxl_region_rwsem);
if (rc)
return rc;
- if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
- rc = -EBUSY;
- goto out;
- }
- p->interleave_granularity = val;
-out:
+ rc = set_interleave_granularity(cxlr, val);
up_write(&cxl_region_rwsem);
if (rc)
return rc;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 18/22] cxl: Allow region creation by type2 drivers
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (16 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:32 ` Jonathan Cameron
2025-08-05 16:33 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
` (5 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Creating a CXL region requires userspace intervention through the cxl
sysfs files. Type2 support should allow accelerator drivers to create
such cxl region from kernel code.
Adding that functionality and integrating it with current support for
memory expanders.
Support an action by the type2 driver to be linked to the created region
for unwinding the resources allocated properly.
Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 152 ++++++++++++++++++++++++++++++++++++--
drivers/cxl/port.c | 5 +-
include/cxl/cxl.h | 5 ++
3 files changed, 153 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 21cf8c11efe3..4ca5ade54ad9 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2319,6 +2319,12 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
return rc;
}
+/**
+ * cxl_decoder_kill_region - detach a region from device
+ *
+ * @cxled: endpoint decoder to detach the region from.
+ *
+ */
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
{
down_write(&cxl_region_rwsem);
@@ -2326,6 +2332,7 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
cxl_region_detach(cxled);
up_write(&cxl_region_rwsem);
}
+EXPORT_SYMBOL_NS_GPL(cxl_decoder_kill_region, "CXL");
static int attach_target(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled, int pos,
@@ -2825,6 +2832,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
return to_cxl_region(region_dev);
}
+static void drop_region(struct cxl_region *cxlr)
+{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+
+ devm_release_action(port->uport_dev, unregister_region, cxlr);
+}
+
static ssize_t delete_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -3529,14 +3544,12 @@ static int __construct_region(struct cxl_region *cxlr,
return 0;
}
-/* Establish an empty region covering the given HPA range */
-static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled)
+static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- int rc, part = READ_ONCE(cxled->part);
+ int part = READ_ONCE(cxled->part);
struct cxl_region *cxlr;
do {
@@ -3545,13 +3558,24 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
- if (IS_ERR(cxlr)) {
+ if (IS_ERR(cxlr))
dev_err(cxlmd->dev.parent,
"%s:%s: %s failed assign region: %ld\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
__func__, PTR_ERR(cxlr));
- return cxlr;
- }
+
+ return cxlr;
+}
+
+/* Establish an empty region covering the given HPA range */
+static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
+{
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_region *cxlr;
+ int rc;
+
+ cxlr = construct_region_begin(cxlrd, cxled);
rc = __construct_region(cxlr, cxlrd, cxled);
if (rc) {
@@ -3562,6 +3586,118 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+static struct cxl_region *
+__construct_new_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled, int ways)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled[0]);
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+ struct cxl_region_params *p;
+ resource_size_t size = 0;
+ struct cxl_region *cxlr;
+ int rc, i;
+
+ cxlr = construct_region_begin(cxlrd, cxled[0]);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ guard(rwsem_write)(&cxl_region_rwsem);
+
+ /*
+ * Sanity check. This should not happen with an accel driver handling
+ * the region creation.
+ */
+ p = &cxlr->params;
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+ dev_err(cxlmd->dev.parent,
+ "%s:%s: %s unexpected region state\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled[0]->cxld.dev),
+ __func__);
+ rc = -EBUSY;
+ goto err;
+ }
+
+ rc = set_interleave_ways(cxlr, ways);
+ if (rc)
+ goto err;
+
+ rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
+ if (rc)
+ goto err;
+
+ scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
+ for (i = 0; i < ways; i++) {
+ if (!cxled[i]->dpa_res)
+ break;
+ size += resource_size(cxled[i]->dpa_res);
+ }
+ }
+
+ if (i < ways)
+ goto err;
+
+ rc = alloc_hpa(cxlr, size);
+ if (rc)
+ goto err;
+
+ scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
+ for (i = 0; i < ways; i++) {
+ rc = cxl_region_attach(cxlr, cxled[i], 0);
+ if (rc)
+ goto err;
+ }
+ }
+
+ if (rc)
+ goto err;
+
+ rc = cxl_region_decode_commit(cxlr);
+ if (rc)
+ goto err;
+
+ p->state = CXL_CONFIG_COMMIT;
+
+ return cxlr;
+err:
+ drop_region(cxlr);
+ return ERR_PTR(rc);
+}
+
+/**
+ * cxl_create_region - Establish a region given an endpoint decoder
+ * @cxlrd: root decoder to allocate HPA
+ * @cxled: endpoint decoder with reserved DPA capacity
+ * @ways: interleave ways required
+ *
+ * Returns a fully formed region in the commit state and attached to the
+ * cxl_region driver.
+ */
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways, void (*action)(void *),
+ void *data)
+{
+ struct cxl_region *cxlr;
+
+ scoped_guard(mutex, &cxlrd->range_lock) {
+ cxlr = __construct_new_region(cxlrd, cxled, ways);
+ if (IS_ERR(cxlr))
+ return cxlr;
+ }
+
+ if (device_attach(&cxlr->dev) <= 0) {
+ dev_err(&cxlr->dev, "failed to create region\n");
+ drop_region(cxlr);
+ return ERR_PTR(-ENODEV);
+ }
+
+ if (action)
+ devm_add_action_or_reset(&cxlr->dev, action, data);
+
+ return cxlr;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
+
int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index a35fc5552845..69b8d8344029 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -33,6 +33,7 @@ static void schedule_detach(void *cxlmd)
static int discover_region(struct device *dev, void *root)
{
struct cxl_endpoint_decoder *cxled;
+ struct cxl_memdev *cxlmd;
int rc;
if (!is_endpoint_decoder(dev))
@@ -42,7 +43,9 @@ static int discover_region(struct device *dev, void *root)
if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
return 0;
- if (cxled->state != CXL_DECODER_STATE_AUTO)
+ cxlmd = cxled_to_memdev(cxled);
+ if (cxled->state != CXL_DECODER_STATE_AUTO ||
+ cxlmd->cxlds->type == CXL_DEVTYPE_DEVMEM)
return 0;
/*
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index a2f3e683724a..5067f71143ef 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -252,4 +252,9 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
enum cxl_partition_mode mode,
resource_size_t alloc);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways, void (*action)(void *),
+ void *data);
+void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 19/22] cxl: Avoid dax creation for accelerators
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (17 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:33 ` Jonathan Cameron
2025-09-03 17:24 ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
` (4 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/region.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 4ca5ade54ad9..e933e4ebed1c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3857,6 +3857,13 @@ static int cxl_region_probe(struct device *dev)
if (rc)
return rc;
+ /*
+ * HDM-D[B] (device-memory) regions have accelerator specific usage.
+ * Skip device-dax registration.
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM)
+ return 0;
+
switch (cxlr->mode) {
case CXL_PARTMODE_PMEM:
return devm_cxl_add_pmem_region(cxlr);
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 20/22] sfc: create cxl region
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (18 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:38 ` Jonathan Cameron
2025-07-28 16:20 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
` (3 subsequent siblings)
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for creating a region using the endpoint decoder related to
a DPA range.
Add a callback for unwinding sfc cxl initialization when the endpoint port
is destroyed by potential cxl_acpi or cxl_mem modules removal.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index ffbf0e706330..7365effe974e 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -18,6 +18,16 @@
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+static void efx_release_cxl_region(void *priv_cxl)
+{
+ struct efx_probe_data *probe_data = priv_cxl;
+ struct efx_cxl *cxl = probe_data->cxl;
+
+ iounmap(cxl->ctpio_cxl);
+ cxl_put_root_decoder(cxl->cxlrd);
+ probe_data->cxl_pio_initialised = false;
+}
+
int efx_cxl_init(struct efx_probe_data *probe_data)
{
struct efx_nic *efx = &probe_data->efx;
@@ -116,10 +126,21 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
goto put_root_decoder;
}
+ cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1,
+ efx_release_cxl_region,
+ &probe_data);
+ if (IS_ERR(cxl->efx_region)) {
+ pci_err(pci_dev, "CXL accel create region failed");
+ rc = PTR_ERR(cxl->efx_region);
+ goto err_region;
+ }
+
probe_data->cxl = cxl;
goto endpoint_release;
+err_region:
+ cxl_dpa_free(cxl->cxled);
put_root_decoder:
cxl_put_root_decoder(cxl->cxlrd);
endpoint_release:
@@ -129,7 +150,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
- if (probe_data->cxl) {
+ if (probe_data->cxl_pio_initialised) {
+ cxl_decoder_kill_region(probe_data->cxl->cxled);
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 21/22] cxl: Add function for obtaining region range
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (19 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
` (2 subsequent siblings)
23 siblings, 0 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A CXL region struct contains the physical address to work with.
Type2 drivers can create a CXL region but have not access to the
related struct as it is defined as private by the kernel CXL core.
Add a function for getting the cxl region range to be used for mapping
such memory range by a Type2 driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
include/cxl/cxl.h | 2 ++
2 files changed, 25 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e933e4ebed1c..8c624019bf9b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2721,6 +2721,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(rc);
}
+/**
+ * cxl_get_region_range - obtain range linked to a CXL region
+ *
+ * @region: a pointer to struct cxl_region
+ * @range: a pointer to a struct range to be set
+ *
+ * Returns 0 or error.
+ */
+int cxl_get_region_range(struct cxl_region *region, struct range *range)
+{
+ if (WARN_ON_ONCE(!region))
+ return -ENODEV;
+
+ if (!region->params.res)
+ return -ENOSPC;
+
+ range->start = region->params.res->start;
+ range->end = region->params.res->end;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_region_range, "CXL");
+
static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
{
return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 5067f71143ef..5b4786a412b1 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -257,4 +257,6 @@ struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
int ways, void (*action)(void *),
void *data);
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled);
+struct range;
+int cxl_get_region_range(struct cxl_region *region, struct range *range);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* [PATCH v17 22/22] sfc: support pio mapping based on cxl
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (20 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
@ 2025-06-24 14:13 ` alejandro.lucero-palau
2025-06-27 9:46 ` Jonathan Cameron
2025-08-27 17:26 ` ALOK TIWARI
2025-07-25 20:51 ` [PATCH v17 00/22] Type2 device basic support dan.j.williams
2025-08-27 16:48 ` PJ Waskiewicz
23 siblings, 2 replies; 112+ messages in thread
From: alejandro.lucero-palau @ 2025-06-24 14:13 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.
With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.
Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL code happens.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/ef10.c | 62 ++++++++++++++++++++++++---
drivers/net/ethernet/sfc/efx.h | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 21 +++++++++
drivers/net/ethernet/sfc/net_driver.h | 2 +
drivers/net/ethernet/sfc/nic.h | 3 ++
5 files changed, 82 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 47349c148c0c..87904fff40fc 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -24,6 +24,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/udp_tunnel.h>
+#include "efx_cxl.h"
/* Hardware control for EF10 architecture including 'Huntington'. */
@@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
{
- MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
+ MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
efx->num_mac_stats);
}
+ if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
+ nic_data->datapath_caps3 = 0;
+ else
+ nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
+ GET_CAPABILITIES_V7_OUT_FLAGS3);
+
return 0;
}
@@ -771,6 +778,18 @@ static int efx_ef10_alloc_piobufs(struct efx_nic *efx, unsigned int n)
return rc;
}
+#ifdef CONFIG_SFC_CXL
+void efx_ef10_disable_piobufs(struct efx_nic *efx)
+{
+ struct efx_tx_queue *tx_queue;
+ struct efx_channel *channel;
+
+ efx_for_each_channel(channel, efx)
+ efx_for_each_channel_tx_queue(tx_queue, channel)
+ tx_queue->piobuf = NULL;
+}
+#endif
+
static int efx_ef10_link_piobufs(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
@@ -919,6 +938,9 @@ static void efx_ef10_forget_old_piobufs(struct efx_nic *efx)
static void efx_ef10_remove(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
int rc;
#ifdef CONFIG_SFC_SRIOV
@@ -949,7 +971,12 @@ static void efx_ef10_remove(struct efx_nic *efx)
efx_mcdi_rx_free_indir_table(efx);
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
+#else
if (nic_data->wc_membase)
+#endif
iounmap(nic_data->wc_membase);
rc = efx_mcdi_free_vis(efx);
@@ -1140,6 +1167,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
unsigned int channel_vis, pio_write_vi_base, max_vis;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
unsigned int uc_mem_map_size, wc_mem_map_size;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
void __iomem *membase;
int rc;
@@ -1263,8 +1293,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
iounmap(efx->membase);
efx->membase = membase;
- /* Set up the WC mapping if needed */
- if (wc_mem_map_size) {
+ if (!wc_mem_map_size)
+ goto skip_pio;
+
+ /* Set up the WC mapping */
+
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if ((nic_data->datapath_caps3 &
+ (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
+ probe_data->cxl_pio_initialised) {
+ /* Using PIO through CXL mapping? */
+ nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
+ (pio_write_vi_base * efx->vi_stride +
+ ER_DZ_TX_PIOBUF - uc_mem_map_size);
+ probe_data->cxl_pio_in_use = true;
+ } else
+#endif
+ {
+ /* Using legacy PIO BAR mapping */
nic_data->wc_membase = ioremap_wc(efx->membase_phys +
uc_mem_map_size,
wc_mem_map_size);
@@ -1279,12 +1326,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
nic_data->wc_membase +
(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
uc_mem_map_size);
-
- rc = efx_ef10_link_piobufs(efx);
- if (rc)
- efx_ef10_free_piobufs(efx);
}
+ rc = efx_ef10_link_piobufs(efx);
+ if (rc)
+ efx_ef10_free_piobufs(efx);
+
+skip_pio:
netif_dbg(efx, probe, efx->net_dev,
"memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
&efx->membase_phys, efx->membase, uc_mem_map_size,
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 45e191686625..37fd1cf96582 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -237,4 +237,5 @@ static inline bool efx_rwsem_assert_write_locked(struct rw_semaphore *sem)
int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
bool flush);
+void efx_ef10_disable_piobufs(struct efx_nic *efx);
#endif /* EFX_EFX_H */
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 7365effe974e..a9f48946dcf5 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -15,14 +15,17 @@
#include "net_driver.h"
#include "efx_cxl.h"
+#include "efx.h"
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
static void efx_release_cxl_region(void *priv_cxl)
{
struct efx_probe_data *probe_data = priv_cxl;
+ struct efx_nic *efx = &probe_data->efx;
struct efx_cxl *cxl = probe_data->cxl;
+ efx_ef10_disable_piobufs(efx);
iounmap(cxl->ctpio_cxl);
cxl_put_root_decoder(cxl->cxlrd);
probe_data->cxl_pio_initialised = false;
@@ -34,6 +37,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
resource_size_t max_size;
struct efx_cxl *cxl;
+ struct range range;
u16 dvsec;
int rc;
@@ -135,10 +139,26 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
goto err_region;
}
+ rc = cxl_get_region_range(cxl->efx_region, &range);
+ if (rc) {
+ pci_err(pci_dev, "CXL getting regions params failed");
+ goto err_region_params;
+ }
+
+ cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) pfailed", &range);
+ rc = -ENOMEM;
+ goto err_region_params;
+ }
+
probe_data->cxl = cxl;
+ probe_data->cxl_pio_initialised = true;
goto endpoint_release;
+err_region_params:
+ cxl_decoder_kill_region(cxl->cxled);
err_region:
cxl_dpa_free(cxl->cxled);
put_root_decoder:
@@ -151,6 +171,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
if (probe_data->cxl_pio_initialised) {
+ iounmap(probe_data->cxl->ctpio_cxl);
cxl_decoder_kill_region(probe_data->cxl->cxled);
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 0e685b8a9980..894b62d6ada9 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1209,6 +1209,7 @@ struct efx_cxl;
* @efx: Efx NIC details
* @cxl: details of related cxl objects
* @cxl_pio_initialised: cxl initialization outcome.
+ * @cxl_pio_in_use: PIO using CXL mapping
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
@@ -1216,6 +1217,7 @@ struct efx_probe_data {
#ifdef CONFIG_SFC_CXL
struct efx_cxl *cxl;
bool cxl_pio_initialised;
+ bool cxl_pio_in_use;
#endif
};
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 9fa5c4c713ab..c87cc9214690 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -152,6 +152,8 @@ enum {
* %MC_CMD_GET_CAPABILITIES response)
* @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
* %MC_CMD_GET_CAPABILITIES response)
+ * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
+ * %MC_CMD_GET_CAPABILITIES response)
* @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
* @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
* @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
@@ -186,6 +188,7 @@ struct efx_ef10_nic_data {
bool must_check_datapath_caps;
u32 datapath_caps;
u32 datapath_caps2;
+ u32 datapath_caps3;
unsigned int rx_dpcpu_fw_id;
unsigned int tx_dpcpu_fw_id;
bool must_probe_vswitching;
--
2.34.1
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH v17 01/22] cxl: Add type2 device basic support
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2025-06-25 14:06 ` Jonathan Cameron
2025-06-30 14:38 ` Alejandro Lucero Palau
2025-07-25 21:46 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-25 14:06 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Alison Schofield
On Tue, 24 Jun 2025 15:13:34 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Differentiate CXL memory expanders (type 3) from CXL device accelerators
> (type 2) with a new function for initializing cxl_dev_state and a macro
> for helping accel drivers to embed cxl_dev_state inside a private
> struct.
>
> Move structs to include/cxl as the size of the accel driver private
> struct embedding cxl_dev_state needs to know the size of this struct.
>
> Use same new initialization with the type3 pci driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Hi Alejandro,
A few really minor comments inline.
> ---
> drivers/cxl/core/mbox.c | 12 +-
> drivers/cxl/core/memdev.c | 32 +++++
> drivers/cxl/core/pci.c | 1 +
> drivers/cxl/core/regs.c | 1 +
> drivers/cxl/cxl.h | 97 +--------------
> drivers/cxl/cxlmem.h | 85 +------------
> drivers/cxl/cxlpci.h | 21 ----
> drivers/cxl/pci.c | 17 +--
> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
> include/cxl/pci.h | 23 ++++
> tools/testing/cxl/test/mem.c | 3 +-
> 11 files changed, 303 insertions(+), 215 deletions(-)
> create mode 100644 include/cxl/cxl.h
> create mode 100644 include/cxl/pci.h
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d72764056ce6..d78f6039f997 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1484,23 +1484,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>
> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
> + u16 dvsec)
> {
> struct cxl_memdev_state *mds;
> int rc;
>
> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
> + mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
> + dvsec, struct cxl_memdev_state, cxlds,
> + true);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 3ec6b906371b..9cc4337cacfb 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -7,6 +7,7 @@
> #include <linux/cdev.h>
> #include <linux/uuid.h>
> #include <linux/node.h>
Is node still used in here? If the includes was just for
struct access_coordinates then that is now gone from this file.
> +#include <cxl/cxl.h>
> #include <cxl/event.h>
> #include <cxl/mailbox.h>
> #include "cxl.h"
> @@ -357,87 +358,6 @@ struct cxl_security_state {
> struct kernfs_node *sanitize_node;
> };
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 785aa2af5eaa..0d3c67867965 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -924,19 +927,19 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> return rc;
> pci_set_master(pdev);
>
> - mds = cxl_memdev_state_create(&pdev->dev);
> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + dev_warn(&pdev->dev,
> + "Device DVSEC not present, skip CXL.mem init\n");
Could use pci_warn(pdev, "..."); Not particularly important.
> +
> + mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
Jonathan
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
@ 2025-06-25 16:37 ` Jonathan Cameron
2025-06-30 14:52 ` Alejandro Lucero Palau
2025-07-25 22:16 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-25 16:37 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Edward Cree,
Alison Schofield
On Tue, 24 Jun 2025 15:13:35 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Add CXL initialization based on new CXL API for accel drivers and make
> it dependent on kernel CXL configuration.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Hi Alejandro,
I think I'm missing something with respect to the relative life times.
Throwing one devm_ call into the middle of a probe is normally a recipe
for at least hard to read code, if not actual bugs. It should be done
with care and accompanied by at least a comment.
Jonathan
> ---
> drivers/net/ethernet/sfc/Kconfig | 9 +++++
> drivers/net/ethernet/sfc/Makefile | 1 +
> drivers/net/ethernet/sfc/efx.c | 15 +++++++-
> drivers/net/ethernet/sfc/efx_cxl.c | 55 +++++++++++++++++++++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
> drivers/net/ethernet/sfc/net_driver.h | 10 +++++
> 6 files changed, 129 insertions(+), 1 deletion(-)
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
>
> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
> index c4c43434f314..979f2801e2a8 100644
> --- a/drivers/net/ethernet/sfc/Kconfig
> +++ b/drivers/net/ethernet/sfc/Kconfig
> @@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
> Driver-Interface) commands and responses, allowing debugging of
> driver/firmware interaction. The tracing is actually enabled by
> a sysfs file 'mcdi_logging' under the PCI device.
> +config SFC_CXL
> + bool "Solarflare SFC9100-family CXL support"
> + depends on SFC && CXL_BUS >= SFC
> + default SFC
> + help
> + This enables SFC CXL support if the kernel is configuring CXL for
> + using CTPIO with CXL.mem. The SFC device with CXL support and
> + with a CXL-aware firmware can be used for minimizing latencies
> + when sending through CTPIO.
>
> source "drivers/net/ethernet/sfc/falcon/Kconfig"
> source "drivers/net/ethernet/sfc/siena/Kconfig"
> diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
> index d99039ec468d..bb0f1891cde6 100644
> --- a/drivers/net/ethernet/sfc/Makefile
> +++ b/drivers/net/ethernet/sfc/Makefile
> @@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
> mae.o tc.o tc_bindings.o tc_counters.o \
> tc_encap_actions.o tc_conntrack.o
>
> +sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
> obj-$(CONFIG_SFC) += sfc.o
>
> obj-$(CONFIG_SFC_FALCON) += falcon/
> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
> index 112e55b98ed3..537668278375 100644
> --- a/drivers/net/ethernet/sfc/efx.c
> +++ b/drivers/net/ethernet/sfc/efx.c
> @@ -34,6 +34,7 @@
> #include "selftest.h"
> #include "sriov.h"
> #include "efx_devlink.h"
> +#include "efx_cxl.h"
>
> #include "mcdi_port_common.h"
> #include "mcdi_pcol.h"
> @@ -981,12 +982,15 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
> efx_pci_remove_main(efx);
>
> efx_fini_io(efx);
> +
> + probe_data = container_of(efx, struct efx_probe_data, efx);
> + efx_cxl_exit(probe_data);
> +
> pci_dbg(efx->pci_dev, "shutdown successful\n");
>
> efx_fini_devlink_and_unlock(efx);
> efx_fini_struct(efx);
> free_netdev(efx->net_dev);
> - probe_data = container_of(efx, struct efx_probe_data, efx);
> kfree(probe_data);
> };
>
> @@ -1190,6 +1194,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
> if (rc)
> goto fail2;
>
> + /* A successful cxl initialization implies a CXL region created to be
> + * used for PIO buffers. If there is no CXL support, or initialization
> + * fails, efx_cxl_pio_initialised will be false and legacy PIO buffers
> + * defined at specific PCI BAR regions will be used.
> + */
> + rc = efx_cxl_init(probe_data);
> + if (rc)
> + pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
> +
> rc = efx_pci_probe_post_io(efx);
> if (rc) {
> /* On failure, retry once immediately.
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> new file mode 100644
> index 000000000000..f1db7284dee8
> --- /dev/null
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -0,0 +1,55 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/****************************************************************************
> + *
> + * Driver for AMD network controllers and boards
> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation, incorporated herein by reference.
> + */
> +
> +#include <cxl/pci.h>
> +#include <linux/pci.h>
> +
> +#include "net_driver.h"
> +#include "efx_cxl.h"
> +
> +#define EFX_CTPIO_BUFFER_SIZE SZ_256M
> +
> +int efx_cxl_init(struct efx_probe_data *probe_data)
> +{
> + struct efx_nic *efx = &probe_data->efx;
> + struct pci_dev *pci_dev = efx->pci_dev;
> + struct efx_cxl *cxl;
> + u16 dvsec;
> +
> + probe_data->cxl_pio_initialised = false;
> +
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + return 0;
> +
> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> +
> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
> + * specifying no mbox available.
> + */
> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_dev->dev.id, dvsec, struct efx_cxl,
> + cxlds, false);
The life time of this will outlast everything else in the efx driver.
Is that definitely safe to do? Mostly from a reviewability and difficulty
of reasoning we avoid such late releasing of resources.
Perhaps add to the comment before this call what you are doing to ensure that
it is fine to release this after everything in efx_pci_remove()
Or wrap it up in a devres group and release that group in efx_cxl_exit().
See devres_open_group(), devres_release_group()
> +
> + if (!cxl)
> + return -ENOMEM;
> +
> + probe_data->cxl = cxl;
> +
> + return 0;
> +}
> +
> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> +{
> +}
> +
> +MODULE_IMPORT_NS("CXL");
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
> new file mode 100644
> index 000000000000..961639cef692
> --- /dev/null
> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/****************************************************************************
> + * Driver for AMD network controllers and boards
> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation, incorporated herein by reference.
> + */
> +
> +#ifndef EFX_CXL_H
> +#define EFX_CXL_H
> +
> +#ifdef CONFIG_SFC_CXL
> +
> +#include <cxl/cxl.h>
> +
> +struct cxl_root_decoder;
> +struct cxl_port;
> +struct cxl_endpoint_decoder;
> +struct cxl_region;
> +struct efx_probe_data;
> +
> +struct efx_cxl {
> + struct cxl_dev_state cxlds;
> + struct cxl_memdev *cxlmd;
> + struct cxl_root_decoder *cxlrd;
> + struct cxl_port *endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *efx_region;
> + void __iomem *ctpio_cxl;
> +};
> +
> +int efx_cxl_init(struct efx_probe_data *probe_data);
> +void efx_cxl_exit(struct efx_probe_data *probe_data);
> +#else
> +static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
> +static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
> +#endif
> +#endif
> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> index 5c0f306fb019..0e685b8a9980 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -1199,14 +1199,24 @@ struct efx_nic {
> atomic_t n_rx_noskb_drops;
> };
>
> +#ifdef CONFIG_SFC_CXL
> +struct efx_cxl;
> +#endif
> +
> /**
> * struct efx_probe_data - State after hardware probe
> * @pci_dev: The PCI device
> * @efx: Efx NIC details
> + * @cxl: details of related cxl objects
> + * @cxl_pio_initialised: cxl initialization outcome.
> */
> struct efx_probe_data {
> struct pci_dev *pci_dev;
> struct efx_nic efx;
> +#ifdef CONFIG_SFC_CXL
> + struct efx_cxl *cxl;
> + bool cxl_pio_initialised;
> +#endif
> };
>
> static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
@ 2025-06-27 8:27 ` Jonathan Cameron
2025-07-25 22:55 ` dan.j.williams
1 sibling, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:27 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:37 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Export cxl core functions for a Type2 driver being able to discover and
> map the device component registers.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
I'll see how this is used in later patches, but on it's own looks reasonable to me
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
@ 2025-06-27 8:39 ` Jonathan Cameron
2025-06-30 15:57 ` Alejandro Lucero Palau
2025-06-27 8:45 ` Jonathan Cameron
2025-07-25 23:04 ` dan.j.williams
2 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:39 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:38 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl code for registers discovery and mapping regarding cxl component
> regs and validate registers found are as expected.
>
> Set media ready explicitly as there is no means for doing so without
> a mailbox, and without the related cxl register, not mandatory for type2.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Perhaps add a brief note to the description on why you decided on the
mix of warn vs err messages in the different conditions.
Superficially there is a call in here that can defer. If it can't
add a comment on why as if it can you should be failing the main
driver probe until it doesn't defer (or adding a bunch of descriptive
comments on why that doesn't make sense!)
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 34 ++++++++++++++++++++++++++++++
> 1 file changed, 34 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index f1db7284dee8..ea02eb82b73c 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -9,6 +9,7 @@
> * by the Free Software Foundation, incorporated herein by reference.
> */
>
> +#include <cxl/cxl.h>
> #include <cxl/pci.h>
> #include <linux/pci.h>
>
> @@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> struct pci_dev *pci_dev = efx->pci_dev;
> struct efx_cxl *cxl;
> u16 dvsec;
> + int rc;
>
> probe_data->cxl_pio_initialised = false;
>
> @@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> if (!cxl)
> return -ENOMEM;
>
> + rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
> + &cxl->cxlds.reg_map);
> + if (rc) {
> + dev_warn(&pci_dev->dev, "No component registers (err=%d)\n", rc);
> + return rc;
I haven't checked the code paths to see if we might hit them but this might
defer. In which case
return dev_err_probe() is appropriate as it stashes away the
cause of deferral for debugging purposes and doesn't print if that's what
happened as we'll be back later.
If we can hit the deferral then you should catch that at the caller of efx_cxl_init()
and fail the probe (we'll be back a bit later and should then succeed).
> + }
> +
> + if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
> + dev_err(&pci_dev->dev, "Expected HDM component register not found\n");
> + return -ENODEV;
Trivial but given this is new code maybe differing from style of existing sfc
and using
return dev_err_probe(&pci->dev, "Expected HDM component register not found\n");
would be a nice to have. Given deferral isn't a thing for this call, it just saves on about
2 lines of code for each use.
or use pci_err() and pci_warn()?
> + }
> +
> + if (!cxl->cxlds.reg_map.component_map.ras.valid) {
> + dev_err(&pci_dev->dev, "Expected RAS component register not found\n");
> + return -ENODEV;
> + }
> +
> + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
> + &cxl->cxlds.regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS));
> + if (rc) {
> + dev_err(&pci_dev->dev, "Failed to map RAS capability.\n");
> + return rc;
> + }
> +
> + /*
> + * Set media ready explicitly as there are neither mailbox for checking
> + * this state nor the CXL register involved, both not mandatory for
> + * type2.
> + */
> + cxl->cxlds.media_ready = true;
> +
> probe_data->cxl = cxl;
>
> return 0;
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
@ 2025-06-27 8:42 ` Jonathan Cameron
2025-06-27 16:43 ` Dave Jiang
2025-07-01 15:23 ` Alejandro Lucero Palau
2025-06-27 8:43 ` Jonathan Cameron
2025-07-26 0:54 ` dan.j.williams
2 siblings, 2 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:42 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:39 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DPA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile hardware partition.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> º
? Looks like an accidental degree symbol.
> ---
> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
Location make sense? I'd like some reasoning text for that in the patch
description. After all whole point is this isn't a mailbox thing!
Maybe moving add_part and this to somewhere more general makes sense?
> include/cxl/cxl.h | 1 +
> 2 files changed, 18 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d78f6039f997..d3b4ba5214d5 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
> info->nr_partitions++;
> }
>
> +/**
> + * cxl_set_capacity: initialize dpa by a driver without a mailbox.
> + *
> + * @cxlds: pointer to cxl_dev_state
> + * @capacity: device volatile memory size
> + */
> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
> +{
> + struct cxl_dpa_info range_info = {
> + .size = capacity,
> + };
> +
> + add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
> + cxl_dpa_setup(cxlds, &range_info);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
> +
> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 0810c18d7aef..4975ead488b4 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> int cxl_map_component_regs(const struct cxl_register_map *map,
> struct cxl_component_regs *regs,
> unsigned long map_mask);
> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-06-27 8:42 ` Jonathan Cameron
@ 2025-06-27 8:43 ` Jonathan Cameron
2025-07-01 15:25 ` Alejandro Lucero Palau
2025-07-26 0:54 ` dan.j.williams
2 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:43 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:39 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DPA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile hardware partition.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> º
> ---
> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
> include/cxl/cxl.h | 1 +
> 2 files changed, 18 insertions(+)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d78f6039f997..d3b4ba5214d5 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
> info->nr_partitions++;
> }
>
> +/**
> + * cxl_set_capacity: initialize dpa by a driver without a mailbox.
> + *
> + * @cxlds: pointer to cxl_dev_state
> + * @capacity: device volatile memory size
> + */
> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
> +{
> + struct cxl_dpa_info range_info = {
> + .size = capacity,
> + };
> +
> + add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
> + cxl_dpa_setup(cxlds, &range_info);
I missed that this function can in general fail. If that either can't occur
here for some reason or we don't care if does, add a comment. Otherwise handle
the error.
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
> +
> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
> {
> struct cxl_dev_state *cxlds = &mds->cxlds;
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 0810c18d7aef..4975ead488b4 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> int cxl_map_component_regs(const struct cxl_register_map *map,
> struct cxl_component_regs *regs,
> unsigned long map_mask);
> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
2025-06-27 8:39 ` Jonathan Cameron
@ 2025-06-27 8:45 ` Jonathan Cameron
2025-08-08 13:14 ` Alejandro Lucero Palau
2025-07-25 23:04 ` dan.j.williams
2 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:45 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:38 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl code for registers discovery and mapping regarding cxl component
> regs and validate registers found are as expected.
>
> Set media ready explicitly as there is no means for doing so without
> a mailbox, and without the related cxl register, not mandatory for type2.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Few things came to mind reading later patches...
Some of the calls in here register extra devm stuff. So, given we just
eat any errors in this cxl setup in my mind we should clean them up.
The devres group approach suggested earlier deals with that for you as
all the CXL devm stuff will end up in that group and you can tear it
down on error in efx_cxl_init()
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 09/22] sfc: create type2 cxl memdev
2025-06-24 14:13 ` [PATCH v17 09/22] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2025-06-27 8:51 ` Jonathan Cameron
2025-07-01 15:30 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:51 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Martin Habets,
Fan Ni, Edward Cree
On Tue, 24 Jun 2025 15:13:42 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl API for creating a cxl memory device using the type2
> cxl_dev_state struct.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 5d68ee4e818d..e2d52ed49535 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -79,6 +79,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>
> cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
>
> + cxl->cxlmd = devm_cxl_add_memdev(&pci_dev->dev, &cxl->cxlds);
> +
Trivial style thing but common practice (I haven't checked local style)
is no blank line between call and associated error check.
> + if (IS_ERR(cxl->cxlmd)) {
> + pci_err(pci_dev, "CXL accel memdev creation failed");
> + return PTR_ERR(cxl->cxlmd);
> + }
> +
> probe_data->cxl = cxl;
>
> return 0;
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
@ 2025-06-27 8:59 ` Jonathan Cameron
2025-06-27 9:42 ` Jonathan Cameron
` (2 subsequent siblings)
3 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 8:59 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:43 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> The first step for a CXL accelerator driver that wants to establish new
> CXL.mem regions is to register a 'struct cxl_memdev'. That kicks off
> cxl_mem_probe() to enumerate all 'struct cxl_port' instances in the
> topology up to the root.
>
> If the port driver has not attached yet the expectation is that the
> driver waits until that link is established. The common cxl_pci driver
> has reason to keep the 'struct cxl_memdev' device attached to the bus
> until the root driver attaches. An accelerator may want to instead defer
> probing until CXL resources can be acquired.
>
> Use the @endpoint attribute of a 'struct cxl_memdev' to convey when a
> accelerator driver probing should be deferred vs failed. Provide that
> indication via a new cxl_acquire_endpoint() API that can retrieve the
> probe status of the memdev.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
This looks fine to me, but it needs more eyes.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2025-06-27 9:06 ` Jonathan Cameron
2025-07-04 15:18 ` Alejandro Lucero Palau
2025-06-27 20:46 ` Dave Jiang
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:06 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham
On Tue, 24 Jun 2025 15:13:46 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Hmm. I wouldn't trust this last guy not to have missed a few things.
See below.
> +static struct cxl_endpoint_decoder *
> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct device *dev;
> +
> + scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
> + dev = device_find_child(&endpoint->dev, NULL,
> + find_free_decoder);
> + }
> + if (dev)
> + return to_cxl_endpoint_decoder(dev);
> +
> + return NULL;
If this code isn't going to get modified later, could be simpler as
guard(rwsem_read)(&cxl_dpa_rwsem) {
dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
if (!dev)
return NULL
return to_cxl_endpoint_decoder(dev);
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: DPA operation mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but it could not be available
> + * due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
> + cxl_find_free_decoder(cxlmd);
> + struct device *cxled_dev;
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + if (!cxled) {
> + rc = -ENODEV;
> + goto err;
> + }
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + goto err;
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
> + if (rc)
> + goto err;
> +
> + return cxled;
I was kind of expecting us to disable the put above wuth a return_ptr()
here. If there is a reason why not, add a comment as it is not obvious
to me anyway!
> +err:
> + put_device(cxled_dev);
It's not been assigned. I'm surprised if none of the standard tooling
(sparse, smatch etc screamed about this one).
For complex series like this it's worth running them on each patch just to
avoid possible bot warnings later!
> + return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
__CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 12/22] sfc: get endpoint decoder
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-06-27 9:10 ` Jonathan Cameron
2025-07-04 14:51 ` Alejandro Lucero Palau
2025-07-28 16:30 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:10 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Martin Habets,
Edward Cree
On Tue, 24 Jun 2025 15:13:45 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/Kconfig | 1 +
> drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
> 2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
> index 979f2801e2a8..e959d9b4f4ce 100644
> --- a/drivers/net/ethernet/sfc/Kconfig
> +++ b/drivers/net/ethernet/sfc/Kconfig
> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
> config SFC_CXL
> bool "Solarflare SFC9100-family CXL support"
> depends on SFC && CXL_BUS >= SFC
> + depends on CXL_REGION
> default SFC
> help
> This enables SFC CXL support if the kernel is configuring CXL for
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index e2d52ed49535..c0adfd99cc78 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> + resource_size_t max_size;
> struct efx_cxl *cxl;
> u16 dvsec;
> int rc;
> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
> + if (IS_ERR(cxl->endpoint))
> + return PTR_ERR(cxl->endpoint);
> +
> + cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
> + CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
> + &max_size);
> +
> + if (IS_ERR(cxl->cxlrd)) {
> + pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
> + rc = PTR_ERR(cxl->cxlrd);
> + goto endpoint_release;
> + }
> +
> + if (max_size < EFX_CTPIO_BUFFER_SIZE) {
> + pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
> + __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
> + rc = -ENOSPC;
> + goto put_root_decoder;
> + }
> +
> probe_data->cxl = cxl;
>
> - return 0;
> + goto endpoint_release;
I'd avoid the spiders nest here and just duplicate the release
or if you really want to avoid that duplication, factor out everything where
it is held into another function and have aqcuire/function/release as all that
is seen here.
> +
> +put_root_decoder:
> + cxl_put_root_decoder(cxl->cxlrd);
> +endpoint_release:
> + cxl_release_endpoint(cxl->cxlmd, cxl->endpoint);
> + return rc;
> }
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> + if (probe_data->cxl)
> + cxl_put_root_decoder(probe_data->cxl->cxlrd);
> }
>
> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 14/22] sfc: get endpoint decoder
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-06-27 9:11 ` Jonathan Cameron
2025-07-07 11:24 ` Alejandro Lucero Palau
2025-07-16 23:48 ` Dave Jiang
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:11 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Martin Habets,
Edward Cree
On Tue, 24 Jun 2025 15:13:47 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index c0adfd99cc78..ffbf0e706330 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> goto put_root_decoder;
> }
>
> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
> + EFX_CTPIO_BUFFER_SIZE);
> + if (IS_ERR(cxl->cxled)) {
> + pci_err(pci_dev, "CXL accel request DPA failed");
> + rc = PTR_ERR(cxl->cxled);
> + goto put_root_decoder;
> + }
> +
> probe_data->cxl = cxl;
>
> goto endpoint_release;
> @@ -121,8 +129,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> - if (probe_data->cxl)
> + if (probe_data->cxl) {
Given this is going to get more complex I'd be tempted to go with early exist
approach for !cxl
if (!probe_data->cxl)
return;
cxl_dpa_free()
etc.
> + cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
> + }
> }
>
> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 16/22] cxl/region: Factor out interleave ways setup
2025-06-24 14:13 ` [PATCH v17 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2025-06-27 9:13 ` Jonathan Cameron
2025-06-27 23:05 ` Dave Jiang
0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:13 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Ben Cheatham
On Tue, 24 Jun 2025 15:13:49 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation based on Type3 devices is triggered from user space
> allowing memory combination through interleaving.
>
> In preparation for kernel driven region creation, that is Type2 drivers
> triggering region creation backed with its advertised CXL memory, factor
> out a common helper from the user-sysfs region setup for interleave ways.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
As a heads up, this code changes a fair bit in Dan's ACQUIRE() series
that may well land before this. Dave can ask for whatever resolution he
wants when we get to that stage!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 18/22] cxl: Allow region creation by type2 drivers
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2025-06-27 9:32 ` Jonathan Cameron
2025-07-07 11:31 ` Alejandro Lucero Palau
2025-08-05 16:33 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:32 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:51 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
>
> Adding that functionality and integrating it with current support for
> memory expanders.
>
> Support an action by the type2 driver to be linked to the created region
> for unwinding the resources allocated properly.
>
> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
One question in here for others (probably Dan). When does it makes sense to
manually request devm region cleanup and when should we let if flow out
as we are failing the CXL creation anyway and it's one of many things to
clean up if that happens.
> ---
> drivers/cxl/core/region.c | 152 ++++++++++++++++++++++++++++++++++++--
> drivers/cxl/port.c | 5 +-
> include/cxl/cxl.h | 5 ++
> 3 files changed, 153 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 21cf8c11efe3..4ca5ade54ad9 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2319,6 +2319,12 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> return rc;
> }
>
> +/**
> + * cxl_decoder_kill_region - detach a region from device
> + *
> + * @cxled: endpoint decoder to detach the region from.
> + *
Stray blank line.
> + */
> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> {
> down_write(&cxl_region_rwsem);
> @@ -2326,6 +2332,7 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> cxl_region_detach(cxled);
> up_write(&cxl_region_rwsem);
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_decoder_kill_region, "CXL");
> +/**
> + * cxl_create_region - Establish a region given an endpoint decoder
> + * @cxlrd: root decoder to allocate HPA
> + * @cxled: endpoint decoder with reserved DPA capacity
> + * @ways: interleave ways required
> + *
> + * Returns a fully formed region in the commit state and attached to the
> + * cxl_region driver.
> + */
> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder **cxled,
> + int ways, void (*action)(void *),
> + void *data)
> +{
> + struct cxl_region *cxlr;
> +
> + scoped_guard(mutex, &cxlrd->range_lock) {
> + cxlr = __construct_new_region(cxlrd, cxled, ways);
> + if (IS_ERR(cxlr))
> + return cxlr;
> + }
> +
> + if (device_attach(&cxlr->dev) <= 0) {
> + dev_err(&cxlr->dev, "failed to create region\n");
> + drop_region(cxlr);
I'm in two minds about this. If we were to have wrapped the whole thing
up in a devres group and on failure (so carrying on without cxl support)
we tidy that group up, then we'd not need to clean this up here.
However we do some local devm cleanup in construct_region today so maybe
keeping this local makes sense... Dan, maybe you have a better view of
whether cleaning up here is sensible or not?
> + return ERR_PTR(-ENODEV);
> + }
> +
> + if (action)
> + devm_add_action_or_reset(&cxlr->dev, action, data);
This is a little odd looking (and can fail so should be error checkeD)
I'd push the devm registration to the caller.
> +
> + return cxlr;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
> +
> int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 19/22] cxl: Avoid dax creation for accelerators
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2025-06-27 9:33 ` Jonathan Cameron
2025-09-03 17:24 ` Davidlohr Bueso
1 sibling, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:33 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:52 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> By definition a type2 cxl device will use the host managed memory for
> specific functionality, therefore it should not be available to other
> uses.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 20/22] sfc: create cxl region
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
@ 2025-06-27 9:38 ` Jonathan Cameron
2025-07-07 11:37 ` Alejandro Lucero Palau
2025-07-28 16:20 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:38 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:53 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for creating a region using the endpoint decoder related to
> a DPA range.
>
> Add a callback for unwinding sfc cxl initialization when the endpoint port
> is destroyed by potential cxl_acpi or cxl_mem modules removal.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index ffbf0e706330..7365effe974e 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -18,6 +18,16 @@
>
> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>
> +static void efx_release_cxl_region(void *priv_cxl)
> +{
> + struct efx_probe_data *probe_data = priv_cxl;
> + struct efx_cxl *cxl = probe_data->cxl;
> +
> + iounmap(cxl->ctpio_cxl);
> + cxl_put_root_decoder(cxl->cxlrd);
> + probe_data->cxl_pio_initialised = false;
> +}
> +
> int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> struct efx_nic *efx = &probe_data->efx;
> @@ -116,10 +126,21 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> goto put_root_decoder;
> }
>
> + cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1,
> + efx_release_cxl_region,
As per earlier comment - given when it's released, I'd register the devm callback
out here not in cxl_create_region(). Might irritate the net maintainers though
as it would be a devm callback registered in non CXL code, but I don't think
that is a reason to jump through the hoops you currently have.
> + &probe_data);
> + if (IS_ERR(cxl->efx_region)) {
> + pci_err(pci_dev, "CXL accel create region failed");
> + rc = PTR_ERR(cxl->efx_region);
> + goto err_region;
> + }
> +
> probe_data->cxl = cxl;
>
> goto endpoint_release;
>
> +err_region:
> + cxl_dpa_free(cxl->cxled);
> put_root_decoder:
> cxl_put_root_decoder(cxl->cxlrd);
> endpoint_release:
> @@ -129,7 +150,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> - if (probe_data->cxl) {
> + if (probe_data->cxl_pio_initialised) {
Doesn't make sense yet because it's never true yet. I assume the code
doesn't always fail in a way it didn't until now?
> + cxl_decoder_kill_region(probe_data->cxl->cxled);
> cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
> }
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
2025-06-27 8:59 ` Jonathan Cameron
@ 2025-06-27 9:42 ` Jonathan Cameron
2025-07-01 15:30 ` Alejandro Lucero Palau
2025-06-27 18:17 ` Dave Jiang
2025-07-16 22:52 ` Dave Jiang
3 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:42 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:43 +0100
<alejandro.lucero-palau@amd.com> wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
Typo in patch title.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 22/22] sfc: support pio mapping based on cxl
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2025-06-27 9:46 ` Jonathan Cameron
2025-07-07 12:06 ` Alejandro Lucero Palau
2025-08-27 17:26 ` ALOK TIWARI
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-27 9:46 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025 15:13:55 +0100
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> A PIO buffer is a region of device memory to which the driver can write a
> packet for TX, with the device handling the transmit doorbell without
> requiring a DMA for getting the packet data, which helps reducing latency
> in certain exchanges. With CXL mem protocol this latency can be lowered
> further.
>
> With a device supporting CXL and successfully initialised, use the cxl
> region to map the memory range and use this mapping for PIO buffers.
>
> Add the disabling of those CXL-based PIO buffers if the callback for
> potential cxl endpoint removal by the CXL code happens.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
There is quite a bit of ifdef magic in here. If there is any way
to push that to stubs in headers, it would probably improved code
readability.
I was expecting to at somepoint see handling of the CXL code being
called returning EPROBE_DEFER but that's not here so I don't
understand exactly how that is supposed to work if the CXL infrastructure
hasn't arrived at time of first probe.
Otherwise, main overall concern is that lifetimes are (I think) more
complex than they need to be. I suggest a solution in an earlier patch (and
in reply to previous version) Devres groups are really handy for wrapping
up a bunch of devm calls with the option to unwind them all on error or at
a specific point in the remove() path for a driver. That should resolve
most of my concerns as you'll have something closely approximating a non devm flow.
Jonathan
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-27 8:42 ` Jonathan Cameron
@ 2025-06-27 16:43 ` Dave Jiang
2025-07-01 15:23 ` Alejandro Lucero Palau
1 sibling, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2025-06-27 16:43 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, Alejandro Lucero
On 6/27/25 1:42 AM, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:39 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
>> memdev state params which end up being used for DPA initialization.
>>
>> Allow a Type2 driver to initialize DPA simply by giving the size of its
>> volatile hardware partition.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> º
>
> ? Looks like an accidental degree symbol.
>
>> ---
>> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
> Location make sense? I'd like some reasoning text for that in the patch
> description. After all whole point is this isn't a mailbox thing!
>
> Maybe moving add_part and this to somewhere more general makes sense?
core/memdev.c? Seems like a memdev type of thing.
DJ
>
>> include/cxl/cxl.h | 1 +
>> 2 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index d78f6039f997..d3b4ba5214d5 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
>> info->nr_partitions++;
>> }
>>
>> +/**
>> + * cxl_set_capacity: initialize dpa by a driver without a mailbox.
>> + *
>> + * @cxlds: pointer to cxl_dev_state
>> + * @capacity: device volatile memory size
>> + */
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
>> +{
>> + struct cxl_dpa_info range_info = {
>> + .size = capacity,
>> + };
>> +
>> + add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
>> + cxl_dpa_setup(cxlds, &range_info);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
>> +
>> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
>> {
>> struct cxl_dev_state *cxlds = &mds->cxlds;
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 0810c18d7aef..4975ead488b4 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> int cxl_map_component_regs(const struct cxl_register_map *map,
>> struct cxl_component_regs *regs,
>> unsigned long map_mask);
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>> #endif /* __CXL_CXL_H__ */
>
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
2025-06-27 8:59 ` Jonathan Cameron
2025-06-27 9:42 ` Jonathan Cameron
@ 2025-06-27 18:17 ` Dave Jiang
2025-06-30 16:20 ` Jonathan Cameron
2025-07-01 16:02 ` Alejandro Lucero Palau
2025-07-16 22:52 ` Dave Jiang
3 siblings, 2 replies; 112+ messages in thread
From: Dave Jiang @ 2025-06-27 18:17 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> The first step for a CXL accelerator driver that wants to establish new
> CXL.mem regions is to register a 'struct cxl_memdev'. That kicks off
> cxl_mem_probe() to enumerate all 'struct cxl_port' instances in the
> topology up to the root.
>
> If the port driver has not attached yet the expectation is that the
> driver waits until that link is established. The common cxl_pci driver
> has reason to keep the 'struct cxl_memdev' device attached to the bus
> until the root driver attaches. An accelerator may want to instead defer
> probing until CXL resources can be acquired.
>
> Use the @endpoint attribute of a 'struct cxl_memdev' to convey when a
> accelerator driver probing should be deferred vs failed. Provide that
> indication via a new cxl_acquire_endpoint() API that can retrieve the
> probe status of the memdev.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/memdev.c | 42 +++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/port.c | 2 +-
> drivers/cxl/mem.c | 7 +++++--
> include/cxl/cxl.h | 2 ++
> 4 files changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f43d2aa2928e..e2c6b5b532db 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -1124,6 +1124,48 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>
> +/*
> + * Try to get a locked reference on a memdev's CXL port topology
> + * connection. Be careful to observe when cxl_mem_probe() has deposited
> + * a probe deferral awaiting the arrival of the CXL root driver.
> + */
> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
Annotation of __acquires() is needed here to annotate that this function is taking multiple locks and keeping the locks.
> +{
> + struct cxl_port *endpoint;
> + int rc = -ENXIO;
> +
> + device_lock(&cxlmd->dev);
> +> + endpoint = cxlmd->endpoint;
> + if (!endpoint)
> + goto err;
> +
> + if (IS_ERR(endpoint)) {
> + rc = PTR_ERR(endpoint);
> + goto err;
> + }
> +
> + device_lock(&endpoint->dev);
> + if (!endpoint->dev.driver)> + goto err_endpoint;
> +
> + return endpoint;
> +
> +err_endpoint:
> + device_unlock(&endpoint->dev);
> +err:
> + device_unlock(&cxlmd->dev);
> + return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
> +
> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
And __releases() here to release the lock annotations
> +{
> + device_unlock(&endpoint->dev);
> + device_unlock(&cxlmd->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
> +
> static void sanitize_teardown_notifier(void *data)
> {
> struct cxl_memdev_state *mds = data;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 9acf8c7afb6b..fa10a1643e4c 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1563,7 +1563,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> */
> dev_dbg(&cxlmd->dev, "%s is a root dport\n",
> dev_name(dport_dev));
> - return -ENXIO;
> + return -EPROBE_DEFER;
> }
>
> struct cxl_port *parent_port __free(put_cxl_port) =
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 7f39790d9d98..cda0b2ff73ce 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -148,14 +148,17 @@ static int cxl_mem_probe(struct device *dev)
> return rc;
>
> rc = devm_cxl_enumerate_ports(cxlmd);
> - if (rc)
> + if (rc) {
> + cxlmd->endpoint = ERR_PTR(rc);
> return rc;
> + }
>
> struct cxl_port *parent_port __free(put_cxl_port) =
> cxl_mem_find_port(cxlmd, &dport);
> if (!parent_port) {
> dev_err(dev, "CXL port topology not found\n");
> - return -ENXIO;
> + cxlmd->endpoint = ERR_PTR(-EPROBE_DEFER);
kdoc to 'struct cxl_memdev' will be needed to explain this change of expectation for the endpoint member.
> + return -EPROBE_DEFER;
Can you please explain how the accelerator driver init path is different in this instance that it requires cxl_mem driver to defer probing? Currently with a type3, the cxl_acpi driver will setup the CXL root, hostbridges and PCI root ports. At that point the memdev driver will enumerate the rest of the ports and attempt to establish the hierarchy. However if cxl_acpi is not done, the mem probe will fail. But, the cxl_acpi probe will trigger a re-probe sequence at the end when it is done. At that point, the mem probe should discover all the necessary ports if things are correct. If the accelerator init path is different, can we introduce some documentation to explain the difference?
Also, it seems as long as port topology is not found, it will always go to deferred probing. At what point do we conclude that things may be missing/broken and we need to fail?
DJ
> }
>
> if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index fcdf98231ffb..2928e16a62e2 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -234,4 +234,6 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> struct cxl_dev_state *cxlmds);
> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-06-27 9:06 ` Jonathan Cameron
@ 2025-06-27 20:46 ` Dave Jiang
2025-07-04 15:21 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2025-06-27 20:46 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
cxl_get_hpa_freespace()
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/hdm.c | 93 ++++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 2 +
> include/cxl/cxl.h | 5 +++
> 3 files changed, 100 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 70cae4ebf8a4..b17381e49836 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
> #include <linux/seq_file.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <cxl/cxl.h>
>
> #include "cxlmem.h"
> #include "core.h"
> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
> return base;
> }
>
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + *
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> devm_cxl_dpa_release(cxled);
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>
> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> enum cxl_partition_mode mode)
> @@ -686,6 +695,90 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> }
>
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_port *port;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + port = cxled_to_port(cxled);
> +
> + if (cxled->cxld.id != port->hdm_end + 1)
> + return 0;
> +
> + return 1;
return cxled->cxld.id == port->hdm_end + 1;
> +}
> +
> +static struct cxl_endpoint_decoder *
> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct device *dev;
> +
> + scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
Probably ok to just use guard() here
> + dev = device_find_child(&endpoint->dev, NULL,
> + find_free_decoder);
> + }
> + if (dev)
> + return to_cxl_endpoint_decoder(dev);
> +
> + return NULL;
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: DPA operation mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but it could not be available
> + * due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
> + cxl_find_free_decoder(cxlmd);
> + struct device *cxled_dev;
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + if (!cxled) {
> + rc = -ENODEV;
> + goto err;
return ERR_PTR(-ENODEV);
cxled_dev is not set here. In fact it's never set anywhere. the put_device() later will fail. Although the __free() should take care of it right? The err path isn't necessary?
> + }
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + goto err;
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
> + if (rc)
> + goto err;
> +
> + return cxled;
return no_free_ptr(cxled);
DJ
> +err:
> + put_device(cxled_dev);
> + return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
> +
> static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
> {
> u16 eig;
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 3af8821f7c15..6e724a8440f5 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -636,6 +636,8 @@ void put_cxl_root(struct cxl_root *cxl_root);
> DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T))
>
> DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
> +DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&_T->cxld.dev))
> +
> int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
> void cxl_bus_rescan(void);
> void cxl_bus_drain(void);
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index dd37b1d88454..a2f3e683724a 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -7,6 +7,7 @@
>
> #include <linux/node.h>
> #include <linux/ioport.h>
> +#include <linux/range.h>
> #include <cxl/mailbox.h>
>
> /**
> @@ -247,4 +248,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> unsigned long flags,
> resource_size_t *max);
> void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc);
> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2025-06-27 22:42 ` Dave Jiang
2025-07-04 14:45 ` Alejandro Lucero Palau
2025-08-05 16:14 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2025-06-27 22:42 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero, Jonathan Cameron
On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from device DPA
> (device-physical-address space) and assigning it to decode a given HPA
from Device Physical Address (DPA)
> (host-physical-address space). Before determining how much DPA to
Host Physical Address (HPA)
> allocate the amount of available HPA must be determined. Also, not all
> HPA is created equal, some specifically targets RAM, some target PMEM,
s/equal, some sepcifically targets/equal. Some HPA targets/
s/target PMEM/targets PMEM/
> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
> is host-only (HDM-H).
s/host-only (HDM-H)/HDM-H (host-only)/
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 169 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 11 +++
> 3 files changed, 183 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c3f4dc244df7..03e058ab697e 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -695,6 +695,175 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + /*
> + * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
> + * 32 bits, the bitmap functions can be used.
Should this be type2_find_max_hpa() since CXL_DECODER_F_MAX here is defined to only check up to the type2 bit?
> + */
> + if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
'found' never greater than 1 since it breaks immediately above on the first encounter? Should the break be there?
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_region_rwsem to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_region_rwsem);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + struct resource *next = res->sibling;
Could res be NULL here on the first iteration with res = cxlrd->res->child && res == NULL? Maybe set res to cxlrd->res above, and then pass in res->child to resource_size() above?
> + resource_size_t free = 0;
> +
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
Can you do something like this to avoid keep checking prev?
if (prev) {
if (!resource_size(prev))
continue;
if (res->start > prev->end + 1) {
...
}
} else {
if (res->start > cxlrd->res->start) {
...
}
}
> + if (next && res->end + 1 < next->start) {
> + free = next->start - res->end + 1;
> + max = max(free, max);
> + }
> + if (!next && res->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - res->end + 1;
> + max = max(free, max);
> + }
Maybe
if (next) {
...
} else {
...
}
> + }
> +
> + dev_dbg(CXLRD_DEV(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(CXLRD_DEV(ctx->cxlrd));
> + get_device(CXLRD_DEV(cxlrd));
Should this ref grab be a lot earlier in this function? Before we start using the cxlrd members?
> + ctx->cxlrd = cxlrd;> + ctx->max_hpa = max;
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @endpoint: the endpoint requiring the HPA
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this root decoder's capacity the capacity is reduced then
> + * caller needs to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxlrd_max_context ctx = {
> + .host_bridges = &endpoint->host_bridge,
> + .flags = flags,
> + };
> + struct cxl_port *root_port;
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
Should move this whole line below right before checking for 'root'.
> +
> + if (!endpoint) {
> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_region_rwsem)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +/*
> + * TODO: those references released here should avoid the decoder to be
> + * unregistered.
Are we missing code? Is it done later in the series?
> + */
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(CXLRD_DEV(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> +
> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t len)
> {
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b35eff0977a8..3af8821f7c15 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -665,6 +665,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
> bool is_root_decoder(struct device *dev);
> +
> +#define CXLRD_DEV(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
Maybe lower case to keep formatting same as other similar macros under CXL subsystem?
> +
> bool is_switch_decoder(struct device *dev);
> bool is_endpoint_decoder(struct device *dev);
> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 2928e16a62e2..dd37b1d88454 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -25,6 +25,11 @@ enum cxl_devtype {
>
> struct device;
>
> +#define CXL_DECODER_F_RAM BIT(0)
> +#define CXL_DECODER_F_PMEM BIT(1)
> +#define CXL_DECODER_F_TYPE2 BIT(2)
> +#define CXL_DECODER_F_MAX 3
> +
redefinition from drivers/cxl/cxl.h. Should those definition be deleted? Maybe move the whole thing over to here and call CXL_DECODER_F_MAX something else?
> /*
> * Using struct_group() allows for per register-block-type helper routines,
> * without requiring block-type agnostic code to include the prefix.
> @@ -236,4 +241,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> struct cxl_dev_state *cxlmds);
> struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
> void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> +struct cxl_port;
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max);
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 16/22] cxl/region: Factor out interleave ways setup
2025-06-27 9:13 ` Jonathan Cameron
@ 2025-06-27 23:05 ` Dave Jiang
2025-06-30 16:20 ` Jonathan Cameron
0 siblings, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2025-06-27 23:05 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, Alejandro Lucero, Zhi Wang, Ben Cheatham
On 6/27/25 2:13 AM, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:49 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation based on Type3 devices is triggered from user space
>> allowing memory combination through interleaving.
>>
>> In preparation for kernel driven region creation, that is Type2 drivers
>> triggering region creation backed with its advertised CXL memory, factor
>> out a common helper from the user-sysfs region setup for interleave ways.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> As a heads up, this code changes a fair bit in Dan's ACQUIRE() series
> that may well land before this. Dave can ask for whatever resolution he
> wants when we get to that stage!
>
>
We probably want to rebase on top of that. Dan has an immutable branch in cxl.git for the ACQUIRE() patch. Or are you talking about the outstanding CXL changes?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 01/22] cxl: Add type2 device basic support
2025-06-25 14:06 ` Jonathan Cameron
@ 2025-06-30 14:38 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-30 14:38 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alison Schofield
On 6/25/25 15:06, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:34 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>> (type 2) with a new function for initializing cxl_dev_state and a macro
>> for helping accel drivers to embed cxl_dev_state inside a private
>> struct.
>>
>> Move structs to include/cxl as the size of the accel driver private
>> struct embedding cxl_dev_state needs to know the size of this struct.
>>
>> Use same new initialization with the type3 pci driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Hi Alejandro,
>
> A few really minor comments inline.
Hi Jonathan,
>> ---
>> drivers/cxl/core/mbox.c | 12 +-
>> drivers/cxl/core/memdev.c | 32 +++++
>> drivers/cxl/core/pci.c | 1 +
>> drivers/cxl/core/regs.c | 1 +
>> drivers/cxl/cxl.h | 97 +--------------
>> drivers/cxl/cxlmem.h | 85 +------------
>> drivers/cxl/cxlpci.h | 21 ----
>> drivers/cxl/pci.c | 17 +--
>> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
>> include/cxl/pci.h | 23 ++++
>> tools/testing/cxl/test/mem.c | 3 +-
>> 11 files changed, 303 insertions(+), 215 deletions(-)
>> create mode 100644 include/cxl/cxl.h
>> create mode 100644 include/cxl/pci.h
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index d72764056ce6..d78f6039f997 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1484,23 +1484,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>>
>> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>> + u16 dvsec)
>> {
>> struct cxl_memdev_state *mds;
>> int rc;
>>
>> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>> + mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
>> + dvsec, struct cxl_memdev_state, cxlds,
>> + true);
>
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 3ec6b906371b..9cc4337cacfb 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -7,6 +7,7 @@
>> #include <linux/cdev.h>
>> #include <linux/uuid.h>
>> #include <linux/node.h>
> Is node still used in here? If the includes was just for
> struct access_coordinates then that is now gone from this file.
Good catch. It is not needed after the code movement.
I'll remove it.
Thanks!
>> +#include <cxl/cxl.h>
>> #include <cxl/event.h>
>> #include <cxl/mailbox.h>
>> #include "cxl.h"
>> @@ -357,87 +358,6 @@ struct cxl_security_state {
>> struct kernfs_node *sanitize_node;
>> };
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 785aa2af5eaa..0d3c67867965 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -924,19 +927,19 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>> return rc;
>> pci_set_master(pdev);
>>
>> - mds = cxl_memdev_state_create(&pdev->dev);
>> + dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> + if (!dvsec)
>> + dev_warn(&pdev->dev,
>> + "Device DVSEC not present, skip CXL.mem init\n");
> Could use pci_warn(pdev, "..."); Not particularly important.
>
>> +
>> + mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
> Jonathan
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-06-25 16:37 ` Jonathan Cameron
@ 2025-06-30 14:52 ` Alejandro Lucero Palau
2025-06-30 14:55 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-30 14:52 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Edward Cree, Alison Schofield
On 6/25/25 17:37, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:35 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Add CXL initialization based on new CXL API for accel drivers and make
>> it dependent on kernel CXL configuration.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Hi Alejandro,
>
> I think I'm missing something with respect to the relative life times.
> Throwing one devm_ call into the middle of a probe is normally a recipe
> for at least hard to read code, if not actual bugs. It should be done
> with care and accompanied by at least a comment.
Hi Jonathan,
I agree devm_* being harder in general and prone to some subtle
problems, but I can not see an issue here apart from the objects kept
until device unbinding. But I think adding some comment can help.
<snip>
> +
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + return 0;
> +
> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> +
> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
> + * specifying no mbox available.
> + */
> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_dev->dev.id, dvsec, struct efx_cxl,
> + cxlds, false);
> The life time of this will outlast everything else in the efx driver.
> Is that definitely safe to do? Mostly from a reviewability and difficulty
> of reasoning we avoid such late releasing of resources.
>
> Perhaps add to the comment before this call what you are doing to ensure that
> it is fine to release this after everything in efx_pci_remove()
>
> Or wrap it up in a devres group and release that group in efx_cxl_exit().
>
> See devres_open_group(), devres_release_group()
>
>
As I said above, I can not see a problem here, but maybe to explicitly
managed those resources with a devres group makes it simpler, so I think
it is a good advice to follow.
Thanks!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-06-30 14:52 ` Alejandro Lucero Palau
@ 2025-06-30 14:55 ` Alejandro Lucero Palau
2025-06-30 16:07 ` Jonathan Cameron
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-30 14:55 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Edward Cree, Alison Schofield
On 6/30/25 15:52, Alejandro Lucero Palau wrote:
>
> On 6/25/25 17:37, Jonathan Cameron wrote:
>> On Tue, 24 Jun 2025 15:13:35 +0100
>> <alejandro.lucero-palau@amd.com> wrote:
>>
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Add CXL initialization based on new CXL API for accel drivers and make
>>> it dependent on kernel CXL configuration.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> Hi Alejandro,
>>
>> I think I'm missing something with respect to the relative life times.
>> Throwing one devm_ call into the middle of a probe is normally a recipe
>> for at least hard to read code, if not actual bugs. It should be done
>> with care and accompanied by at least a comment.
>
>
> Hi Jonathan,
>
>
> I agree devm_* being harder in general and prone to some subtle
> problems, but I can not see an issue here apart from the objects kept
> until device unbinding. But I think adding some comment can help.
>
>
> <snip>
>
>> +
>> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> + if (!dvsec)
>> + return 0;
>> +
>> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>> +
>> + /* Create a cxl_dev_state embedded in the cxl struct using cxl
>> core api
>> + * specifying no mbox available.
>> + */
>> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
>> + pci_dev->dev.id, dvsec, struct efx_cxl,
>> + cxlds, false);
>> The life time of this will outlast everything else in the efx driver.
>> Is that definitely safe to do? Mostly from a reviewability and
>> difficulty
>> of reasoning we avoid such late releasing of resources.
>>
>> Perhaps add to the comment before this call what you are doing to
>> ensure that
>> it is fine to release this after everything in efx_pci_remove()
>>
>> Or wrap it up in a devres group and release that group in
>> efx_cxl_exit().
>>
>> See devres_open_group(), devres_release_group()
>>
>>
>
> As I said above, I can not see a problem here, but maybe to explicitly
> managed those resources with a devres group makes it simpler, so I
> think it is a good advice to follow.
>
>
> Thanks!
>
>
FWIW, I just want to add that although I agree with this, it is somehow
counterintuitive to me as the goal of devm is to avoid to care about
when to release those allocations.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-27 8:39 ` Jonathan Cameron
@ 2025-06-30 15:57 ` Alejandro Lucero Palau
2025-08-08 13:11 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-30 15:57 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 09:39, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:38 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl code for registers discovery and mapping regarding cxl component
>> regs and validate registers found are as expected.
>>
>> Set media ready explicitly as there is no means for doing so without
>> a mailbox, and without the related cxl register, not mandatory for type2.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Perhaps add a brief note to the description on why you decided on the
> mix of warn vs err messages in the different conditions.
After your comment I think such divergence needs to be addressed. Using
the warn comes from the cxl pci driver, but it is a different situation
here, so dev_err should be used instead.
>
> Superficially there is a call in here that can defer. If it can't
> add a comment on why as if it can you should be failing the main
> driver probe until it doesn't defer (or adding a bunch of descriptive
> comments on why that doesn't make sense!)
Commenting on this below.
>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 34 ++++++++++++++++++++++++++++++
>> 1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index f1db7284dee8..ea02eb82b73c 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -9,6 +9,7 @@
>> * by the Free Software Foundation, incorporated herein by reference.
>> */
>>
>> +#include <cxl/cxl.h>
>> #include <cxl/pci.h>
>> #include <linux/pci.h>
>>
>> @@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> struct pci_dev *pci_dev = efx->pci_dev;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> + int rc;
>>
>> probe_data->cxl_pio_initialised = false;
>>
>> @@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> if (!cxl)
>> return -ENOMEM;
>>
>> + rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
>> + &cxl->cxlds.reg_map);
>> + if (rc) {
>> + dev_warn(&pci_dev->dev, "No component registers (err=%d)\n", rc);
>> + return rc;
> I haven't checked the code paths to see if we might hit them but this might
> defer. In which case
> return dev_err_probe() is appropriate as it stashes away the
> cause of deferral for debugging purposes and doesn't print if that's what
> happened as we'll be back later.
>
> If we can hit the deferral then you should catch that at the caller of efx_cxl_init()
> and fail the probe (we'll be back a bit later and should then succeed).
>
I'm scare of opening this can ... but I think adding probe deferral
support to the sfc driver is not an option, or at least something we
want to avoid because the complexity it would add.
If this call returns EPROBE_DEFER it would be because cxl_mem is not
loaded or some delay with the cxl mem device initialization when probe
inside that module, because the other potential CXL modules should be
loaded at this point (subsys_initcall vs device_initcall). I am aware
there could be some latencies when those modules are initializing,
although I think it is not a big problem now and if it is, I think we
should address it specifically. Then, if the problem is with cxl_mem,
something I have been witnessing, IMO it is preferable to unload the sfc
driver and load it again, once the cxl_mem is installed, than to support
probe deferral. Moreover, if the problem is with something related to
the cxl_mem probe, I doubt probe deferral will help at all since the sfc
cxl unwinding will remove the mem device as well, so the probing will be
needed to happen again ...
Did I say I did not want to open this can?
>> + }
>> +
>> + if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
>> + dev_err(&pci_dev->dev, "Expected HDM component register not found\n");
>> + return -ENODEV;
> Trivial but given this is new code maybe differing from style of existing sfc
> and using
> return dev_err_probe(&pci->dev, "Expected HDM component register not found\n");
> would be a nice to have. Given deferral isn't a thing for this call, it just saves on about
> 2 lines of code for each use.
It makes sense.
Thanks!
> or use pci_err() and pci_warn()?
>
>
>> + }
>> +
>> + if (!cxl->cxlds.reg_map.component_map.ras.valid) {
>> + dev_err(&pci_dev->dev, "Expected RAS component register not found\n");
>> + return -ENODEV;
>> + }
>> +
>> + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
>> + &cxl->cxlds.regs.component,
>> + BIT(CXL_CM_CAP_CAP_ID_RAS));
>> + if (rc) {
>> + dev_err(&pci_dev->dev, "Failed to map RAS capability.\n");
>> + return rc;
>> + }
>> +
>> + /*
>> + * Set media ready explicitly as there are neither mailbox for checking
>> + * this state nor the CXL register involved, both not mandatory for
>> + * type2.
>> + */
>> + cxl->cxlds.media_ready = true;
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-06-30 14:55 ` Alejandro Lucero Palau
@ 2025-06-30 16:07 ` Jonathan Cameron
0 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-30 16:07 UTC (permalink / raw)
To: Alejandro Lucero Palau
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang,
Edward Cree, Alison Schofield
On Mon, 30 Jun 2025 15:55:39 +0100
Alejandro Lucero Palau <alucerop@amd.com> wrote:
> On 6/30/25 15:52, Alejandro Lucero Palau wrote:
> >
> > On 6/25/25 17:37, Jonathan Cameron wrote:
> >> On Tue, 24 Jun 2025 15:13:35 +0100
> >> <alejandro.lucero-palau@amd.com> wrote:
> >>
> >>> From: Alejandro Lucero <alucerop@amd.com>
> >>>
> >>> Add CXL initialization based on new CXL API for accel drivers and make
> >>> it dependent on kernel CXL configuration.
> >>>
> >>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> >>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> >>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> >> Hi Alejandro,
> >>
> >> I think I'm missing something with respect to the relative life times.
> >> Throwing one devm_ call into the middle of a probe is normally a recipe
> >> for at least hard to read code, if not actual bugs. It should be done
> >> with care and accompanied by at least a comment.
> >
> >
> > Hi Jonathan,
> >
> >
> > I agree devm_* being harder in general and prone to some subtle
> > problems, but I can not see an issue here apart from the objects kept
> > until device unbinding. But I think adding some comment can help.
> >
> >
> > <snip>
> >
> >> +
> >> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> >> + CXL_DVSEC_PCIE_DEVICE);
> >> + if (!dvsec)
> >> + return 0;
> >> +
> >> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> >> +
> >> + /* Create a cxl_dev_state embedded in the cxl struct using cxl
> >> core api
> >> + * specifying no mbox available.
> >> + */
> >> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> >> + pci_dev->dev.id, dvsec, struct efx_cxl,
> >> + cxlds, false);
> >> The life time of this will outlast everything else in the efx driver.
> >> Is that definitely safe to do? Mostly from a reviewability and
> >> difficulty
> >> of reasoning we avoid such late releasing of resources.
> >>
> >> Perhaps add to the comment before this call what you are doing to
> >> ensure that
> >> it is fine to release this after everything in efx_pci_remove()
> >>
> >> Or wrap it up in a devres group and release that group in
> >> efx_cxl_exit().
> >>
> >> See devres_open_group(), devres_release_group()
> >>
> >>
> >
> > As I said above, I can not see a problem here, but maybe to explicitly
> > managed those resources with a devres group makes it simpler, so I
> > think it is a good advice to follow.
> >
> >
> > Thanks!
> >
> >
>
> FWIW, I just want to add that although I agree with this, it is somehow
> counterintuitive to me as the goal of devm is to avoid to care about
> when to release those allocations.
In my view not quite. It's to enforce that those allocations are released
in the reverse order of the devm setup calls - which is almost always
the right thing to do as long as whole driver is using devm.
There are uses like you describe though so it's not a universal case
of one or the other. One advantage of the devres group thing is that
folk who are not keen on devm can effectively have normal manual release
flows.
Jonathan
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-27 18:17 ` Dave Jiang
@ 2025-06-30 16:20 ` Jonathan Cameron
2025-07-01 16:07 ` Alejandro Lucero Palau
2025-07-01 16:02 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-30 16:20 UTC (permalink / raw)
To: Dave Jiang
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, Alejandro Lucero
Hi Dave,
> > +/*
> > + * Try to get a locked reference on a memdev's CXL port topology
> > + * connection. Be careful to observe when cxl_mem_probe() has deposited
> > + * a probe deferral awaiting the arrival of the CXL root driver.
> > + */
> > +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
>
Just focusing on this part.
> Annotation of __acquires() is needed here to annotate that this function is taking multiple locks and keeping the locks.
Messy because it's a conditional case and on error we never have
a call marked __releases() so sparse may moan.
In theory we have __cond_acquires() but I think the sparse tooling
is still missing for that.
One option is to hike the thing into a header as inline and use __acquire()
in the appropriate places. Then sparse can see the markings
without problems.
https://lore.kernel.org/all/20250305161652.GA18280@noisy.programming.kicks-ass.net/
has some discussion on fixing the annotation issues around conditional locks
for LLVM but for now I think we are still stuck.
For the original __cond_acquires()
https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/
Linus posted sparse and kernel support but I think only the kernel bit merged
as sparse is currently (I think) unmaintained.
>
> > +{
> > + struct cxl_port *endpoint;
> > + int rc = -ENXIO;
> > +
> > + device_lock(&cxlmd->dev);
> > +> + endpoint = cxlmd->endpoint;
> > + if (!endpoint)
> > + goto err;
> > +
> > + if (IS_ERR(endpoint)) {
> > + rc = PTR_ERR(endpoint);
> > + goto err;
> > + }
> > +
> > + device_lock(&endpoint->dev);
> > + if (!endpoint->dev.driver)> + goto err_endpoint;
> > +
> > + return endpoint;
> > +
> > +err_endpoint:
> > + device_unlock(&endpoint->dev);
> > +err:
> > + device_unlock(&cxlmd->dev);
> > + return ERR_PTR(rc);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
> > +
> > +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
>
> And __releases() here to release the lock annotations
> > +{
> > + device_unlock(&endpoint->dev);
> > + device_unlock(&cxlmd->dev);
> > +}
> > +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 16/22] cxl/region: Factor out interleave ways setup
2025-06-27 23:05 ` Dave Jiang
@ 2025-06-30 16:20 ` Jonathan Cameron
2025-06-30 16:34 ` Dave Jiang
0 siblings, 1 reply; 112+ messages in thread
From: Jonathan Cameron @ 2025-06-30 16:20 UTC (permalink / raw)
To: Dave Jiang
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, Alejandro Lucero,
Zhi Wang, Ben Cheatham
On Fri, 27 Jun 2025 16:05:20 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 6/27/25 2:13 AM, Jonathan Cameron wrote:
> > On Tue, 24 Jun 2025 15:13:49 +0100
> > <alejandro.lucero-palau@amd.com> wrote:
> >
> >> From: Alejandro Lucero <alucerop@amd.com>
> >>
> >> Region creation based on Type3 devices is triggered from user space
> >> allowing memory combination through interleaving.
> >>
> >> In preparation for kernel driven region creation, that is Type2 drivers
> >> triggering region creation backed with its advertised CXL memory, factor
> >> out a common helper from the user-sysfs region setup for interleave ways.
> >>
> >> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> >> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> >> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > As a heads up, this code changes a fair bit in Dan's ACQUIRE() series
> > that may well land before this. Dave can ask for whatever resolution he
> > wants when we get to that stage!
> >
> >
> We probably want to rebase on top of that. Dan has an immutable branch in cxl.git for the ACQUIRE() patch. Or are you talking about the outstanding CXL changes?
>
The CXL specific ones from that series.
J
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 16/22] cxl/region: Factor out interleave ways setup
2025-06-30 16:20 ` Jonathan Cameron
@ 2025-06-30 16:34 ` Dave Jiang
0 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2025-06-30 16:34 UTC (permalink / raw)
To: Jonathan Cameron
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, Alejandro Lucero,
Zhi Wang, Ben Cheatham
On 6/30/25 9:20 AM, Jonathan Cameron wrote:
> On Fri, 27 Jun 2025 16:05:20 -0700
> Dave Jiang <dave.jiang@intel.com> wrote:
>
>> On 6/27/25 2:13 AM, Jonathan Cameron wrote:
>>> On Tue, 24 Jun 2025 15:13:49 +0100
>>> <alejandro.lucero-palau@amd.com> wrote:
>>>
>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>
>>>> Region creation based on Type3 devices is triggered from user space
>>>> allowing memory combination through interleaving.
>>>>
>>>> In preparation for kernel driven region creation, that is Type2 drivers
>>>> triggering region creation backed with its advertised CXL memory, factor
>>>> out a common helper from the user-sysfs region setup for interleave ways.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>> As a heads up, this code changes a fair bit in Dan's ACQUIRE() series
>>> that may well land before this. Dave can ask for whatever resolution he
>>> wants when we get to that stage!
>>>
>>>
>> We probably want to rebase on top of that. Dan has an immutable branch in cxl.git for the ACQUIRE() patch. Or are you talking about the outstanding CXL changes?
>>
> The CXL specific ones from that series.
Hopefully we can get those resolved with the next rev when Dan's back from vacation.
>
> J
>>
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-27 8:42 ` Jonathan Cameron
2025-06-27 16:43 ` Dave Jiang
@ 2025-07-01 15:23 ` Alejandro Lucero Palau
1 sibling, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 15:23 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 09:42, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:39 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
>> memdev state params which end up being used for DPA initialization.
>>
>> Allow a Type2 driver to initialize DPA simply by giving the size of its
>> volatile hardware partition.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> º
> ? Looks like an accidental degree symbol.
Yes.
>> ---
>> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
> Location make sense? I'd like some reasoning text for that in the patch
> description. After all whole point is this isn't a mailbox thing!
>
> Maybe moving add_part and this to somewhere more general makes sense?
As David suggests, I'll move it to memdev.c
Thanks
>> include/cxl/cxl.h | 1 +
>> 2 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index d78f6039f997..d3b4ba5214d5 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
>> info->nr_partitions++;
>> }
>>
>> +/**
>> + * cxl_set_capacity: initialize dpa by a driver without a mailbox.
>> + *
>> + * @cxlds: pointer to cxl_dev_state
>> + * @capacity: device volatile memory size
>> + */
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
>> +{
>> + struct cxl_dpa_info range_info = {
>> + .size = capacity,
>> + };
>> +
>> + add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
>> + cxl_dpa_setup(cxlds, &range_info);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
>> +
>> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
>> {
>> struct cxl_dev_state *cxlds = &mds->cxlds;
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 0810c18d7aef..4975ead488b4 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> int cxl_map_component_regs(const struct cxl_register_map *map,
>> struct cxl_component_regs *regs,
>> unsigned long map_mask);
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-27 8:43 ` Jonathan Cameron
@ 2025-07-01 15:25 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 15:25 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 09:43, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:39 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
>> memdev state params which end up being used for DPA initialization.
>>
>> Allow a Type2 driver to initialize DPA simply by giving the size of its
>> volatile hardware partition.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> º
>> ---
>> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
>> include/cxl/cxl.h | 1 +
>> 2 files changed, 18 insertions(+)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index d78f6039f997..d3b4ba5214d5 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1284,6 +1284,23 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
>> info->nr_partitions++;
>> }
>>
>> +/**
>> + * cxl_set_capacity: initialize dpa by a driver without a mailbox.
>> + *
>> + * @cxlds: pointer to cxl_dev_state
>> + * @capacity: device volatile memory size
>> + */
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
>> +{
>> + struct cxl_dpa_info range_info = {
>> + .size = capacity,
>> + };
>> +
>> + add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
>> + cxl_dpa_setup(cxlds, &range_info);
> I missed that this function can in general fail. If that either can't occur
> here for some reason or we don't care if does, add a comment. Otherwise handle
> the error.
I do not think it should fail under a controlled type2 initialization,
but as it can fail and maybe future changes will make it more likely, I
will add checking the call result.
Thanks
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
>> +
>> int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
>> {
>> struct cxl_dev_state *cxlds = &mds->cxlds;
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 0810c18d7aef..4975ead488b4 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -231,4 +231,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> int cxl_map_component_regs(const struct cxl_register_map *map,
>> struct cxl_component_regs *regs,
>> unsigned long map_mask);
>> +void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 09/22] sfc: create type2 cxl memdev
2025-06-27 8:51 ` Jonathan Cameron
@ 2025-07-01 15:30 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 15:30 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Martin Habets, Fan Ni, Edward Cree
On 6/27/25 09:51, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:42 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl API for creating a cxl memory device using the type2
>> cxl_dev_state struct.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 5d68ee4e818d..e2d52ed49535 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -79,6 +79,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>
>> cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
>>
>> + cxl->cxlmd = devm_cxl_add_memdev(&pci_dev->dev, &cxl->cxlds);
>> +
> Trivial style thing but common practice (I haven't checked local style)
> is no blank line between call and associated error check.
Right. I'll fix it.
Thanks
>> + if (IS_ERR(cxl->cxlmd)) {
>> + pci_err(pci_dev, "CXL accel memdev creation failed");
>> + return PTR_ERR(cxl->cxlmd);
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-27 9:42 ` Jonathan Cameron
@ 2025-07-01 15:30 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 15:30 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 10:42, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:43 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
> Typo in patch title.
Upps. I'll fix it.
Thanks
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-27 18:17 ` Dave Jiang
2025-06-30 16:20 ` Jonathan Cameron
@ 2025-07-01 16:02 ` Alejandro Lucero Palau
2025-07-28 17:45 ` dan.j.williams
1 sibling, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 16:02 UTC (permalink / raw)
To: Dave Jiang, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet
On 6/27/25 19:17, Dave Jiang wrote:
>
> On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> The first step for a CXL accelerator driver that wants to establish new
>> CXL.mem regions is to register a 'struct cxl_memdev'. That kicks off
>> cxl_mem_probe() to enumerate all 'struct cxl_port' instances in the
>> topology up to the root.
>>
>> If the port driver has not attached yet the expectation is that the
>> driver waits until that link is established. The common cxl_pci driver
>> has reason to keep the 'struct cxl_memdev' device attached to the bus
>> until the root driver attaches. An accelerator may want to instead defer
>> probing until CXL resources can be acquired.
>>
>> Use the @endpoint attribute of a 'struct cxl_memdev' to convey when a
>> accelerator driver probing should be deferred vs failed. Provide that
>> indication via a new cxl_acquire_endpoint() API that can retrieve the
>> probe status of the memdev.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/memdev.c | 42 +++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/core/port.c | 2 +-
>> drivers/cxl/mem.c | 7 +++++--
>> include/cxl/cxl.h | 2 ++
>> 4 files changed, 50 insertions(+), 3 deletions(-)
>>
<snip>
> Can you please explain how the accelerator driver init path is different in this instance that it requires cxl_mem driver to defer probing? Currently with a type3, the cxl_acpi driver will setup the CXL root, hostbridges and PCI root ports. At that point the memdev driver will enumerate the rest of the ports and attempt to establish the hierarchy. However if cxl_acpi is not done, the mem probe will fail. But, the cxl_acpi probe will trigger a re-probe sequence at the end when it is done. At that point, the mem probe should discover all the necessary ports if things are correct. If the accelerator init path is different, can we introduce some documentation to explain the difference?
>
> Also, it seems as long as port topology is not found, it will always go to deferred probing. At what point do we conclude that things may be missing/broken and we need to fail?
>
> DJ
>
>
>
Hi Dave,
The patch commit comes from Dan's original one, so I'm afraid I can not
explain it better myself.
I added this patch again after Dan suggesting with cxl_acquire_endpoint
the initialization by a Type2 can obtain some protection against cxl_mem
or cxl_acpi being removed. I added later protection or handling against
this by the sfc driver after initialization. So this is the main reason
for this patch at least to me.
Regarding the goal from the original patch, being honest, I can not see
the cxl_acpi problem, although I'm not saying it does not exist. But it
is quite confusing to me and as I said in another patch regarding probe
deferral, supporting that option would add complexity to the current sfc
driver probing. If there exists another workaround for avoiding it, that
would be the way I prefer to follow.
Adding documentation about all this would definitely help, even without
the Type2 case.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-30 16:20 ` Jonathan Cameron
@ 2025-07-01 16:07 ` Alejandro Lucero Palau
2025-07-01 16:25 ` Dave Jiang
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-01 16:07 UTC (permalink / raw)
To: Jonathan Cameron, Dave Jiang
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
On 6/30/25 17:20, Jonathan Cameron wrote:
> Hi Dave,
>
>>> +/*
>>> + * Try to get a locked reference on a memdev's CXL port topology
>>> + * connection. Be careful to observe when cxl_mem_probe() has deposited
>>> + * a probe deferral awaiting the arrival of the CXL root driver.
>>> + */
>>> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
> Just focusing on this part.
>
>> Annotation of __acquires() is needed here to annotate that this function is taking multiple locks and keeping the locks.
> Messy because it's a conditional case and on error we never have
> a call marked __releases() so sparse may moan.
>
> In theory we have __cond_acquires() but I think the sparse tooling
> is still missing for that.
>
> One option is to hike the thing into a header as inline and use __acquire()
> in the appropriate places. Then sparse can see the markings
> without problems.
>
> https://lore.kernel.org/all/20250305161652.GA18280@noisy.programming.kicks-ass.net/
>
> has some discussion on fixing the annotation issues around conditional locks
> for LLVM but for now I think we are still stuck.
>
> For the original __cond_acquires()
> https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/
>
> Linus posted sparse and kernel support but I think only the kernel bit merged
> as sparse is currently (I think) unmaintained.
>
Not sure what is the conclusion to this: should I do it or not?
I can not see the __acquires being used yet by cxl core so I wonder if
this needs to be introduced only when new code is added or it should
require a core revision for adding all required. I mean, those locks
being used in other code parts but not "advertised" by __acquires, is
not that a problem?
>>> +{
>>> + struct cxl_port *endpoint;
>>> + int rc = -ENXIO;
>>> +
>>> + device_lock(&cxlmd->dev);
>>> +> + endpoint = cxlmd->endpoint;
>>> + if (!endpoint)
>>> + goto err;
>>> +
>>> + if (IS_ERR(endpoint)) {
>>> + rc = PTR_ERR(endpoint);
>>> + goto err;
>>> + }
>>> +
>>> + device_lock(&endpoint->dev);
>>> + if (!endpoint->dev.driver)> + goto err_endpoint;
>>> +
>>> + return endpoint;
>>> +
>>> +err_endpoint:
>>> + device_unlock(&endpoint->dev);
>>> +err:
>>> + device_unlock(&cxlmd->dev);
>>> + return ERR_PTR(rc);
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
>>> +
>>> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
>> And __releases() here to release the lock annotations
>>> +{
>>> + device_unlock(&endpoint->dev);
>>> + device_unlock(&cxlmd->dev);
>>> +}
>>> +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-07-01 16:07 ` Alejandro Lucero Palau
@ 2025-07-01 16:25 ` Dave Jiang
2025-07-01 16:44 ` Jonathan Cameron
0 siblings, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2025-07-01 16:25 UTC (permalink / raw)
To: Alejandro Lucero Palau, Jonathan Cameron
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
On 7/1/25 9:07 AM, Alejandro Lucero Palau wrote:
>
> On 6/30/25 17:20, Jonathan Cameron wrote:
>> Hi Dave,
>>
>>>> +/*
>>>> + * Try to get a locked reference on a memdev's CXL port topology
>>>> + * connection. Be careful to observe when cxl_mem_probe() has deposited
>>>> + * a probe deferral awaiting the arrival of the CXL root driver.
>>>> + */
>>>> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
>> Just focusing on this part.
>>
>>> Annotation of __acquires() is needed here to annotate that this function is taking multiple locks and keeping the locks.
>> Messy because it's a conditional case and on error we never have
>> a call marked __releases() so sparse may moan.
>>
>> In theory we have __cond_acquires() but I think the sparse tooling
>> is still missing for that.
>>
>> One option is to hike the thing into a header as inline and use __acquire()
>> in the appropriate places. Then sparse can see the markings
>> without problems.
>>
>> https://lore.kernel.org/all/20250305161652.GA18280@noisy.programming.kicks-ass.net/
>>
>> has some discussion on fixing the annotation issues around conditional locks
>> for LLVM but for now I think we are still stuck.
>>
>> For the original __cond_acquires()
>> https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/
>>
>> Linus posted sparse and kernel support but I think only the kernel bit merged
>> as sparse is currently (I think) unmaintained.
>>
>
> Not sure what is the conclusion to this: should I do it or not?
Sounds like we can't with the way it's conditionally done.
>
>
> I can not see the __acquires being used yet by cxl core so I wonder if this needs to be introduced only when new code is added or it should require a core revision for adding all required. I mean, those locks being used in other code parts but not "advertised" by __acquires, is not that a problem?
It's only needed if you acquire a lock and leaving it held and then releases it in a different function. That allows sparse(?) to track if you are locking correctly. You don't need it if it's being done in the same function.
DJ
>
>
>>>> +{
>>>> + struct cxl_port *endpoint;
>>>> + int rc = -ENXIO;
>>>> +
>>>> + device_lock(&cxlmd->dev);
>>>> +> + endpoint = cxlmd->endpoint;
>>>> + if (!endpoint)
>>>> + goto err;
>>>> +
>>>> + if (IS_ERR(endpoint)) {
>>>> + rc = PTR_ERR(endpoint);
>>>> + goto err;
>>>> + }
>>>> +
>>>> + device_lock(&endpoint->dev);
>>>> + if (!endpoint->dev.driver)> + goto err_endpoint;
>>>> +
>>>> + return endpoint;
>>>> +
>>>> +err_endpoint:
>>>> + device_unlock(&endpoint->dev);
>>>> +err:
>>>> + device_unlock(&cxlmd->dev);
>>>> + return ERR_PTR(rc);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
>>>> +
>>>> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
>>> And __releases() here to release the lock annotations
>>>> +{
>>>> + device_unlock(&endpoint->dev);
>>>> + device_unlock(&cxlmd->dev);
>>>> +}
>>>> +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
>>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-07-01 16:25 ` Dave Jiang
@ 2025-07-01 16:44 ` Jonathan Cameron
0 siblings, 0 replies; 112+ messages in thread
From: Jonathan Cameron @ 2025-07-01 16:44 UTC (permalink / raw)
To: Dave Jiang
Cc: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet
On Tue, 1 Jul 2025 09:25:44 -0700
Dave Jiang <dave.jiang@intel.com> wrote:
> On 7/1/25 9:07 AM, Alejandro Lucero Palau wrote:
> >
> > On 6/30/25 17:20, Jonathan Cameron wrote:
> >> Hi Dave,
> >>
> >>>> +/*
> >>>> + * Try to get a locked reference on a memdev's CXL port topology
> >>>> + * connection. Be careful to observe when cxl_mem_probe() has deposited
> >>>> + * a probe deferral awaiting the arrival of the CXL root driver.
> >>>> + */
> >>>> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
> >> Just focusing on this part.
> >>
> >>> Annotation of __acquires() is needed here to annotate that this function is taking multiple locks and keeping the locks.
> >> Messy because it's a conditional case and on error we never have
> >> a call marked __releases() so sparse may moan.
> >>
> >> In theory we have __cond_acquires() but I think the sparse tooling
> >> is still missing for that.
> >>
> >> One option is to hike the thing into a header as inline and use __acquire()
> >> in the appropriate places. Then sparse can see the markings
> >> without problems.
> >>
> >> https://lore.kernel.org/all/20250305161652.GA18280@noisy.programming.kicks-ass.net/
> >>
> >> has some discussion on fixing the annotation issues around conditional locks
> >> for LLVM but for now I think we are still stuck.
> >>
> >> For the original __cond_acquires()
> >> https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/
> >>
> >> Linus posted sparse and kernel support but I think only the kernel bit merged
> >> as sparse is currently (I think) unmaintained.
> >>
> >
> > Not sure what is the conclusion to this: should I do it or not?
>
> Sounds like we can't with the way it's conditionally done.
All you can do today is hike the function implementation that conditionally
takes locks to the header as static inline and use explicit __acquire() markings
in paths where you exit with locks held.
Here's one I did earlier:
https://elixir.bootlin.com/linux/v6.16-rc4/source/include/linux/iio/iio.h#L674
static inline bool iio_device_claim_direct(struct iio_dev *indio_dev)
{
if (!__iio_device_claim_direct(indio_dev))
return false;
__acquire(iio_dev);
return true;
}
That exposes the marking so sparse can see it and correctly track the locking.
Or you could step up an maintain sparse (I've been trying to talk someone
into doing that but no luck yet ;)
> >
> >
> > I can not see the __acquires being used yet by cxl core so I wonder if this needs to be introduced only when new code is added or it should require a core revision for adding all required. I mean, those locks being used in other code parts but not "advertised" by __acquires, is not that a problem?
>
> It's only needed if you acquire a lock and leaving it held and then releases it in a different function. That allows sparse(?) to track if you are locking correctly. You don't need it if it's being done in the same function.
Exactly right. There is work on going I believe to make this work
with LLVMs tracking, but right now that is too simplistic to generate
reliable results. Note that sparse also sometimes gives false positives
if the code flow gets a bit complex but mostly that only happens when
the code probably needs a rethink anyway.
For now I'd go with do nothing here.
Jonathan
>
> DJ
>
>
> >
> >
> >>>> +{
> >>>> + struct cxl_port *endpoint;
> >>>> + int rc = -ENXIO;
> >>>> +
> >>>> + device_lock(&cxlmd->dev);
> >>>> +> + endpoint = cxlmd->endpoint;
> >>>> + if (!endpoint)
> >>>> + goto err;
> >>>> +
> >>>> + if (IS_ERR(endpoint)) {
> >>>> + rc = PTR_ERR(endpoint);
> >>>> + goto err;
> >>>> + }
> >>>> +
> >>>> + device_lock(&endpoint->dev);
> >>>> + if (!endpoint->dev.driver)> + goto err_endpoint;
> >>>> +
> >>>> + return endpoint;
> >>>> +
> >>>> +err_endpoint:
> >>>> + device_unlock(&endpoint->dev);
> >>>> +err:
> >>>> + device_unlock(&cxlmd->dev);
> >>>> + return ERR_PTR(rc);
> >>>> +}
> >>>> +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
> >>>> +
> >>>> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
> >>> And __releases() here to release the lock annotations
> >>>> +{
> >>>> + device_unlock(&endpoint->dev);
> >>>> + device_unlock(&cxlmd->dev);
> >>>> +}
> >>>> +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
> >>
>
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-06-27 22:42 ` Dave Jiang
@ 2025-07-04 14:45 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-04 14:45 UTC (permalink / raw)
To: Dave Jiang, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet
Cc: Jonathan Cameron
On 6/27/25 23:42, Dave Jiang wrote:
>
> On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> CXL region creation involves allocating capacity from device DPA
>> (device-physical-address space) and assigning it to decode a given HPA
> from Device Physical Address (DPA)
OK
>
>> (host-physical-address space). Before determining how much DPA to
> Host Physical Address (HPA)
OK
>> allocate the amount of available HPA must be determined. Also, not all
>> HPA is created equal, some specifically targets RAM, some target PMEM,
> s/equal, some sepcifically targets/equal. Some HPA targets/
>
> s/target PMEM/targets PMEM/
>
OK
>> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
>> is host-only (HDM-H).
> s/host-only (HDM-H)/HDM-H (host-only)/
OK
>> In order to support Type2 CXL devices, wrap all of those concerns into
>> an API that retrieves a root decoder (platform CXL window) that fits the
>> specified constraints and the capacity available for a new region.
>>
>> Add a complementary function for releasing the reference to such root
>> decoder.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/region.c | 169 ++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 3 +
>> include/cxl/cxl.h | 11 +++
>> 3 files changed, 183 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index c3f4dc244df7..03e058ab697e 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -695,6 +695,175 @@ static int free_hpa(struct cxl_region *cxlr)
>> return 0;
>> }
>>
>> +struct cxlrd_max_context {
>> + struct device * const *host_bridges;
>> + int interleave_ways;
>> + unsigned long flags;
>> + resource_size_t max_hpa;
>> + struct cxl_root_decoder *cxlrd;
>> +};
>> +
>> +static int find_max_hpa(struct device *dev, void *data)
>> +{
>> + struct cxlrd_max_context *ctx = data;
>> + struct cxl_switch_decoder *cxlsd;
>> + struct cxl_root_decoder *cxlrd;
>> + struct resource *res, *prev;
>> + struct cxl_decoder *cxld;
>> + resource_size_t max;
>> + int found = 0;
>> +
>> + if (!is_root_decoder(dev))
>> + return 0;
>> +
>> + cxlrd = to_cxl_root_decoder(dev);
>> + cxlsd = &cxlrd->cxlsd;
>> + cxld = &cxlsd->cxld;
>> +
>> + /*
>> + * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
>> + * 32 bits, the bitmap functions can be used.
> Should this be type2_find_max_hpa() since CXL_DECODER_F_MAX here is defined to only check up to the type2 bit?
I've been told several times to not name functions as specific for
accelerators or type2 when they could have a wider use.
>> + */
>> + if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
>> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
>> + cxld->flags, ctx->flags);
>> + return 0;
>> + }
>> +
>> + for (int i = 0; i < ctx->interleave_ways; i++) {
>> + for (int j = 0; j < ctx->interleave_ways; j++) {
>> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
>> + found++;
>> + break;
>> + }
>> + }
>> + }
>> +
>> + if (found != ctx->interleave_ways) {
> 'found' never greater than 1 since it breaks immediately above on the first encounter? Should the break be there?
found can be greater than one. There is a double for loop above with the
break exiting from the inner one.
>> + dev_dbg(dev,
>> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
>> + found, ctx->interleave_ways);
>> + return 0;
>> + }
>> +
>> + /*
>> + * Walk the root decoder resource range relying on cxl_region_rwsem to
>> + * preclude sibling arrival/departure and find the largest free space
>> + * gap.
>> + */
>> + lockdep_assert_held_read(&cxl_region_rwsem);
>> + res = cxlrd->res->child;
>> +
>> + /* With no resource child the whole parent resource is available */
>> + if (!res)
>> + max = resource_size(cxlrd->res);
>> + else
>> + max = 0;
>> +
>> + for (prev = NULL; res; prev = res, res = res->sibling) {
>> + struct resource *next = res->sibling;
> Could res be NULL here on the first iteration with res = cxlrd->res->child && res == NULL? Maybe set res to cxlrd->res above, and then pass in res->child to resource_size() above?
>
I do not think so. If res is NULL the condition fails and no loop execution.
>> + resource_size_t free = 0;
>> +
>> + /*
>> + * Sanity check for preventing arithmetic problems below as a
>> + * resource with size 0 could imply using the end field below
>> + * when set to unsigned zero - 1 or all f in hex.
>> + */
>> + if (prev && !resource_size(prev))
>> + continue;
>> +
>> + if (!prev && res->start > cxlrd->res->start) {
>> + free = res->start - cxlrd->res->start;> + max = max(free, max);
>> + }
>> + if (prev && res->start > prev->end + 1) {
>> + free = res->start - prev->end + 1;
>> + max = max(free, max);
>> + }
> Can you do something like this to avoid keep checking prev?
>
> if (prev) {
> if (!resource_size(prev))
> continue;
> if (res->start > prev->end + 1) {
> ...
> }
> } else {
> if (res->start > cxlrd->res->start) {
> ...
> }
> }
Honestly, I prefer to not do that. Same with code below.
>> + if (next && res->end + 1 < next->start) {
>> + free = next->start - res->end + 1;
>> + max = max(free, max);
>> + }
>> + if (!next && res->end + 1 < cxlrd->res->end + 1) {
>> + free = cxlrd->res->end + 1 - res->end + 1;
>> + max = max(free, max);
>> + }
> Maybe
>
> if (next) {
> ...
> } else {
> ...
> }
>
>> + }
>> +
>> + dev_dbg(CXLRD_DEV(cxlrd), "found %pa bytes of free space\n", &max);
>> + if (max > ctx->max_hpa) {
>> + if (ctx->cxlrd)
>> + put_device(CXLRD_DEV(ctx->cxlrd));
>> + get_device(CXLRD_DEV(cxlrd));
> Should this ref grab be a lot earlier in this function? Before we start using the cxlrd members?
I think the protection is with the cxl_region_rwsem before the loop
calling find_max_hpa.
The reference to the root decoder obtained will avoid its removal after
the previous lock is released.
>> + ctx->cxlrd = cxlrd;> + ctx->max_hpa = max;
>> + }
>> + return 0;
>> +}
>> +
>> +/**
>> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
>> + * @endpoint: the endpoint requiring the HPA
>> + * @interleave_ways: number of entries in @host_bridges
>> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
>> + * @max_avail_contig: output parameter of max contiguous bytes available in the
>> + * returned decoder
>> + *
>> + * Returns a pointer to a struct cxl_root_decoder
>> + *
>> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
>> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
>> + * caller goes to use this root decoder's capacity the capacity is reduced then
>> + * caller needs to loop and retry.
>> + *
>> + * The returned root decoder has an elevated reference count that needs to be
>> + * put with cxl_put_root_decoder(cxlrd).
>> + */
>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> + int interleave_ways,
>> + unsigned long flags,
>> + resource_size_t *max_avail_contig)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxlrd_max_context ctx = {
>> + .host_bridges = &endpoint->host_bridge,
>> + .flags = flags,
>> + };
>> + struct cxl_port *root_port;
>> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> Should move this whole line below right before checking for 'root'.
yes, that is probably better.
I'll do it.
>> +
>> + if (!endpoint) {
>> + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + if (!root) {
>> + dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + root_port = &root->port;
>> + scoped_guard(rwsem_read, &cxl_region_rwsem)
>> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
>> +
>> + if (!ctx.cxlrd)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + *max_avail_contig = ctx.max_hpa;
>> + return ctx.cxlrd;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
>> +
>> +/*
>> + * TODO: those references released here should avoid the decoder to be
>> + * unregistered.
> Are we missing code? Is it done later in the series?
This is related to Dan's comments in v16 about all those references not
being currently considered if cxl_acpi or cxl_mem modules are removed.
He explicitly mentioned about adding this TODO here.
>
>> + */
>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
>> +{
>> + put_device(CXLRD_DEV(cxlrd));
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
>> +
>> static ssize_t size_store(struct device *dev, struct device_attribute *attr,
>> const char *buf, size_t len)
>> {
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index b35eff0977a8..3af8821f7c15 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -665,6 +665,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
>> struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
>> struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
>> bool is_root_decoder(struct device *dev);
>> +
>> +#define CXLRD_DEV(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
> Maybe lower case to keep formatting same as other similar macros under CXL subsystem?
It makes sense. I'll do it.
>
>> +
>> bool is_switch_decoder(struct device *dev);
>> bool is_endpoint_decoder(struct device *dev);
>> struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 2928e16a62e2..dd37b1d88454 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -25,6 +25,11 @@ enum cxl_devtype {
>>
>> struct device;
>>
>> +#define CXL_DECODER_F_RAM BIT(0)
>> +#define CXL_DECODER_F_PMEM BIT(1)
>> +#define CXL_DECODER_F_TYPE2 BIT(2)
>> +#define CXL_DECODER_F_MAX 3
>> +
> redefinition from drivers/cxl/cxl.h. Should those definition be deleted? Maybe move the whole thing over to here and call CXL_DECODER_F_MAX something else?
Oh, yes. Thank you for spotting this. I'll fix it.
Thanks!
>
>> /*
>> * Using struct_group() allows for per register-block-type helper routines,
>> * without requiring block-type agnostic code to include the prefix.
>> @@ -236,4 +241,10 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
>> struct cxl_dev_state *cxlmds);
>> struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
>> void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
>> +struct cxl_port;
>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> + int interleave_ways,
>> + unsigned long flags,
>> + resource_size_t *max);
>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 12/22] sfc: get endpoint decoder
2025-06-27 9:10 ` Jonathan Cameron
@ 2025-07-04 14:51 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-04 14:51 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Martin Habets, Edward Cree
On 6/27/25 10:10, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:45 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for getting DPA (Device Physical Address) to use through an
>> endpoint decoder.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/Kconfig | 1 +
>> drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
>> 2 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
>> index 979f2801e2a8..e959d9b4f4ce 100644
>> --- a/drivers/net/ethernet/sfc/Kconfig
>> +++ b/drivers/net/ethernet/sfc/Kconfig
>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>> config SFC_CXL
>> bool "Solarflare SFC9100-family CXL support"
>> depends on SFC && CXL_BUS >= SFC
>> + depends on CXL_REGION
>> default SFC
>> help
>> This enables SFC CXL support if the kernel is configuring CXL for
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index e2d52ed49535..c0adfd99cc78 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> + resource_size_t max_size;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> int rc;
>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>> + if (IS_ERR(cxl->endpoint))
>> + return PTR_ERR(cxl->endpoint);
>> +
>> + cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
>> + CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
>> + &max_size);
>> +
>> + if (IS_ERR(cxl->cxlrd)) {
>> + pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
>> + rc = PTR_ERR(cxl->cxlrd);
>> + goto endpoint_release;
>> + }
>> +
>> + if (max_size < EFX_CTPIO_BUFFER_SIZE) {
>> + pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
>> + __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
>> + rc = -ENOSPC;
>> + goto put_root_decoder;
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> - return 0;
>> + goto endpoint_release;
> I'd avoid the spiders nest here and just duplicate the release
> or if you really want to avoid that duplication, factor out everything where
> it is held into another function and have aqcuire/function/release as all that
> is seen here.
>
I'll duplicate the release for the default successful return.
Thanks
>> +
>> +put_root_decoder:
>> + cxl_put_root_decoder(cxl->cxlrd);
>> +endpoint_release:
>> + cxl_release_endpoint(cxl->cxlmd, cxl->endpoint);
>> + return rc;
>> }
>>
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> + if (probe_data->cxl)
>> + cxl_put_root_decoder(probe_data->cxl->cxlrd);
>> }
>>
>> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation
2025-06-27 9:06 ` Jonathan Cameron
@ 2025-07-04 15:18 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-04 15:18 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Ben Cheatham
On 6/27/25 10:06, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:46 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation involves finding available DPA (device-physical-address)
>> capacity to map into HPA (host-physical-address) space.
>>
>> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
>> that tries to allocate the DPA memory the driver requires to operate.The
>> memory requested should not be bigger than the max available HPA obtained
>> previously with cxl_get_hpa_freespace.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Hmm. I wouldn't trust this last guy not to have missed a few things.
> See below.
>
My mistake. The patch changed after applying Dan's suggestions, so I
should have removed the tags.
>> +static struct cxl_endpoint_decoder *
>> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct device *dev;
>> +
>> + scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
>> + dev = device_find_child(&endpoint->dev, NULL,
>> + find_free_decoder);
>> + }
>> + if (dev)
>> + return to_cxl_endpoint_decoder(dev);
>> +
>> + return NULL;
> If this code isn't going to get modified later, could be simpler as
>
> guard(rwsem_read)(&cxl_dpa_rwsem) {
> dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> if (!dev)
> return NULL
>
> return to_cxl_endpoint_decoder(dev);
>
Yes, it makes sense.
>> +}
>> +
>> +/**
>> + * cxl_request_dpa - search and reserve DPA given input constraints
>> + * @cxlmd: memdev with an endpoint port with available decoders
>> + * @mode: DPA operation mode (ram vs pmem)
>> + * @alloc: dpa size required
>> + *
>> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
>> + *
>> + * Given that a region needs to allocate from limited HPA capacity it
>> + * may be the case that a device has more mappable DPA capacity than
>> + * available HPA. The expectation is that @alloc is a driver known
>> + * value based on the device capacity but it could not be available
>> + * due to HPA constraints.
>> + *
>> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
>> + * reserved, or an error pointer. The caller is also expected to own the
>> + * lifetime of the memdev registration associated with the endpoint to
>> + * pin the decoder registered as well.
>> + */
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc)
>> +{
>> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
>> + cxl_find_free_decoder(cxlmd);
>> + struct device *cxled_dev;
>> + int rc;
>> +
>> + if (!IS_ALIGNED(alloc, SZ_256M))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (!cxled) {
>> + rc = -ENODEV;
>> + goto err;
>> + }
>> +
>> + rc = cxl_dpa_set_part(cxled, mode);
>> + if (rc)
>> + goto err;
>> +
>> + rc = cxl_dpa_alloc(cxled, alloc);
>> + if (rc)
>> + goto err;
>> +
>> + return cxled;
> I was kind of expecting us to disable the put above wuth a return_ptr()
> here. If there is a reason why not, add a comment as it is not obvious
> to me anyway!
It seems I did a mess applying Dan's suggestions. You are right here and
the put not invoked.
>
>> +err:
>> + put_device(cxled_dev);
> It's not been assigned. I'm surprised if none of the standard tooling
> (sparse, smatch etc screamed about this one).
> For complex series like this it's worth running them on each patch just to
> avoid possible bot warnings later!
This is bad, of course. I did not see it, and I realize now Dan's
changes make this harder to handle.
I'll fix it.
Thanks!
>
>> + return ERR_PTR(rc);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
> __CXL_CXL_H__ */
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation
2025-06-27 20:46 ` Dave Jiang
@ 2025-07-04 15:21 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-04 15:21 UTC (permalink / raw)
To: Dave Jiang, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet
Cc: Ben Cheatham, Jonathan Cameron
On 6/27/25 21:46, Dave Jiang wrote:
>
> On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation involves finding available DPA (device-physical-address)
>> capacity to map into HPA (host-physical-address) space.
>>
>> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
>> that tries to allocate the DPA memory the driver requires to operate.The
>> memory requested should not be bigger than the max available HPA obtained
>> previously with cxl_get_hpa_freespace.
> cxl_get_hpa_freespace()
OK
>
>> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/hdm.c | 93 ++++++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 2 +
>> include/cxl/cxl.h | 5 +++
>> 3 files changed, 100 insertions(+)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index 70cae4ebf8a4..b17381e49836 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -3,6 +3,7 @@
>> #include <linux/seq_file.h>
>> #include <linux/device.h>
>> #include <linux/delay.h>
>> +#include <cxl/cxl.h>
>>
>> #include "cxlmem.h"
>> #include "core.h"
>> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>> return base;
>> }
>>
>> +/**
>> + * cxl_dpa_free - release DPA (Device Physical Address)
>> + *
>> + * @cxled: endpoint decoder linked to the DPA
>> + *
>> + * Returns 0 or error.
>> + */
>> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_port *port = cxled_to_port(cxled);
>> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> devm_cxl_dpa_release(cxled);
>> return 0;
>> }
>> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>>
>> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>> enum cxl_partition_mode mode)
>> @@ -686,6 +695,90 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>> }
>>
>> +static int find_free_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_port *port;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + port = cxled_to_port(cxled);
>> +
>> + if (cxled->cxld.id != port->hdm_end + 1)
>> + return 0;
>> +
>> + return 1;
> return cxled->cxld.id == port->hdm_end + 1;
OK
>> +}
>> +
>> +static struct cxl_endpoint_decoder *
>> +cxl_find_free_decoder(struct cxl_memdev *cxlmd)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct device *dev;
>> +
>> + scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
> Probably ok to just use guard() here
OK
>> + dev = device_find_child(&endpoint->dev, NULL,
>> + find_free_decoder);
>> + }
>> + if (dev)
>> + return to_cxl_endpoint_decoder(dev);
>> +
>> + return NULL;
>> +}
>> +
>> +/**
>> + * cxl_request_dpa - search and reserve DPA given input constraints
>> + * @cxlmd: memdev with an endpoint port with available decoders
>> + * @mode: DPA operation mode (ram vs pmem)
>> + * @alloc: dpa size required
>> + *
>> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
>> + *
>> + * Given that a region needs to allocate from limited HPA capacity it
>> + * may be the case that a device has more mappable DPA capacity than
>> + * available HPA. The expectation is that @alloc is a driver known
>> + * value based on the device capacity but it could not be available
>> + * due to HPA constraints.
>> + *
>> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
>> + * reserved, or an error pointer. The caller is also expected to own the
>> + * lifetime of the memdev registration associated with the endpoint to
>> + * pin the decoder registered as well.
>> + */
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc)
>> +{
>> + struct cxl_endpoint_decoder *cxled __free(put_cxled) =
>> + cxl_find_free_decoder(cxlmd);
>> + struct device *cxled_dev;
>> + int rc;
>> +
>> + if (!IS_ALIGNED(alloc, SZ_256M))
>> + return ERR_PTR(-EINVAL);
>> +
>> + if (!cxled) {
>> + rc = -ENODEV;
>> + goto err;
> return ERR_PTR(-ENODEV);
>
> cxled_dev is not set here. In fact it's never set anywhere. the put_device() later will fail. Although the __free() should take care of it right? The err path isn't necessary?
As I commented with Jonathan's review, I'll fix it.
>> + }
>> +
>> + rc = cxl_dpa_set_part(cxled, mode);
>> + if (rc)
>> + goto err;
>> +
>> + rc = cxl_dpa_alloc(cxled, alloc);
>> + if (rc)
>> + goto err;
>> +
>> + return cxled;
> return no_free_ptr(cxled);
Yes. Thanks!
>
> DJ
>
>> +err:
>> + put_device(cxled_dev);
>> + return ERR_PTR(rc);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
>> +
>> static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
>> {
>> u16 eig;
>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
>> index 3af8821f7c15..6e724a8440f5 100644
>> --- a/drivers/cxl/cxl.h
>> +++ b/drivers/cxl/cxl.h
>> @@ -636,6 +636,8 @@ void put_cxl_root(struct cxl_root *cxl_root);
>> DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_cxl_root(_T))
>>
>> DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
>> +DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&_T->cxld.dev))
>> +
>> int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd);
>> void cxl_bus_rescan(void);
>> void cxl_bus_drain(void);
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index dd37b1d88454..a2f3e683724a 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -7,6 +7,7 @@
>>
>> #include <linux/node.h>
>> #include <linux/ioport.h>
>> +#include <linux/range.h>
>> #include <cxl/mailbox.h>
>>
>> /**
>> @@ -247,4 +248,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> unsigned long flags,
>> resource_size_t *max);
>> void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc);
>> +int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 14/22] sfc: get endpoint decoder
2025-06-27 9:11 ` Jonathan Cameron
@ 2025-07-07 11:24 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-07 11:24 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Martin Habets, Edward Cree
On 6/27/25 10:11, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:47 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for getting DPA (Device Physical Address) to use through an
>> endpoint decoder.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index c0adfd99cc78..ffbf0e706330 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> goto put_root_decoder;
>> }
>>
>> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
>> + EFX_CTPIO_BUFFER_SIZE);
>> + if (IS_ERR(cxl->cxled)) {
>> + pci_err(pci_dev, "CXL accel request DPA failed");
>> + rc = PTR_ERR(cxl->cxled);
>> + goto put_root_decoder;
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> goto endpoint_release;
>> @@ -121,8 +129,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> - if (probe_data->cxl)
>> + if (probe_data->cxl) {
> Given this is going to get more complex I'd be tempted to go with early exist
> approach for !cxl
>
> if (!probe_data->cxl)
> return;
>
> cxl_dpa_free()
> etc.
It makes sense. I'll do it.
Thanks
>
>> + cxl_dpa_free(probe_data->cxl->cxled);
>> cxl_put_root_decoder(probe_data->cxl->cxlrd);
>> + }
>> }
>>
>> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 18/22] cxl: Allow region creation by type2 drivers
2025-06-27 9:32 ` Jonathan Cameron
@ 2025-07-07 11:31 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-07 11:31 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 10:32, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:51 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Creating a CXL region requires userspace intervention through the cxl
>> sysfs files. Type2 support should allow accelerator drivers to create
>> such cxl region from kernel code.
>>
>> Adding that functionality and integrating it with current support for
>> memory expanders.
>>
>> Support an action by the type2 driver to be linked to the created region
>> for unwinding the resources allocated properly.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> One question in here for others (probably Dan). When does it makes sense to
> manually request devm region cleanup and when should we let if flow out
> as we are failing the CXL creation anyway and it's one of many things to
> clean up if that happens.
>
>> ---
>> drivers/cxl/core/region.c | 152 ++++++++++++++++++++++++++++++++++++--
>> drivers/cxl/port.c | 5 +-
>> include/cxl/cxl.h | 5 ++
>> 3 files changed, 153 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 21cf8c11efe3..4ca5ade54ad9 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2319,6 +2319,12 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>> return rc;
>> }
>>
>> +/**
>> + * cxl_decoder_kill_region - detach a region from device
>> + *
>> + * @cxled: endpoint decoder to detach the region from.
>> + *
> Stray blank line.
I'll fix it.
>> + */
>> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>> {
>> down_write(&cxl_region_rwsem);
>> @@ -2326,6 +2332,7 @@ void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>> cxl_region_detach(cxled);
>> up_write(&cxl_region_rwsem);
>> }
>> +EXPORT_SYMBOL_NS_GPL(cxl_decoder_kill_region, "CXL");
>> +/**
>> + * cxl_create_region - Establish a region given an endpoint decoder
>> + * @cxlrd: root decoder to allocate HPA
>> + * @cxled: endpoint decoder with reserved DPA capacity
>> + * @ways: interleave ways required
>> + *
>> + * Returns a fully formed region in the commit state and attached to the
>> + * cxl_region driver.
>> + */
>> +struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder **cxled,
>> + int ways, void (*action)(void *),
>> + void *data)
>> +{
>> + struct cxl_region *cxlr;
>> +
>> + scoped_guard(mutex, &cxlrd->range_lock) {
>> + cxlr = __construct_new_region(cxlrd, cxled, ways);
>> + if (IS_ERR(cxlr))
>> + return cxlr;
>> + }
>> +
>> + if (device_attach(&cxlr->dev) <= 0) {
>> + dev_err(&cxlr->dev, "failed to create region\n");
>> + drop_region(cxlr);
> I'm in two minds about this. If we were to have wrapped the whole thing
> up in a devres group and on failure (so carrying on without cxl support)
> we tidy that group up, then we'd not need to clean this up here.
> However we do some local devm cleanup in construct_region today so maybe
> keeping this local makes sense... Dan, maybe you have a better view of
> whether cleaning up here is sensible or not?
>
>> + return ERR_PTR(-ENODEV);
>> + }
>> +
>> + if (action)
>> + devm_add_action_or_reset(&cxlr->dev, action, data);
> This is a little odd looking (and can fail so should be error checkeD)
> I'd push the devm registration to the caller.
>
I'll add the result check. The caller can not access the region struct
right now, so that would imply to "export" more CXL core to drivers.
>> +
>> + return cxlr;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
>> +
>> int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 20/22] sfc: create cxl region
2025-06-27 9:38 ` Jonathan Cameron
@ 2025-07-07 11:37 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-07 11:37 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 10:38, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:53 +0100
> <alejandro.lucero-palau@amd.com> wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for creating a region using the endpoint decoder related to
>> a DPA range.
>>
>> Add a callback for unwinding sfc cxl initialization when the endpoint port
>> is destroyed by potential cxl_acpi or cxl_mem modules removal.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 24 +++++++++++++++++++++++-
>> 1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index ffbf0e706330..7365effe974e 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -18,6 +18,16 @@
>>
>> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>>
>> +static void efx_release_cxl_region(void *priv_cxl)
>> +{
>> + struct efx_probe_data *probe_data = priv_cxl;
>> + struct efx_cxl *cxl = probe_data->cxl;
>> +
>> + iounmap(cxl->ctpio_cxl);
>> + cxl_put_root_decoder(cxl->cxlrd);
>> + probe_data->cxl_pio_initialised = false;
>> +}
>> +
>> int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> struct efx_nic *efx = &probe_data->efx;
>> @@ -116,10 +126,21 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> goto put_root_decoder;
>> }
>>
>> + cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1,
>> + efx_release_cxl_region,
> As per earlier comment - given when it's released, I'd register the devm callback
> out here not in cxl_create_region(). Might irritate the net maintainers though
> as it would be a devm callback registered in non CXL code, but I don't think
> that is a reason to jump through the hoops you currently have.
>
>
>> + &probe_data);
>> + if (IS_ERR(cxl->efx_region)) {
>> + pci_err(pci_dev, "CXL accel create region failed");
>> + rc = PTR_ERR(cxl->efx_region);
>> + goto err_region;
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> goto endpoint_release;
>>
>> +err_region:
>> + cxl_dpa_free(cxl->cxled);
>> put_root_decoder:
>> cxl_put_root_decoder(cxl->cxlrd);
>> endpoint_release:
>> @@ -129,7 +150,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> - if (probe_data->cxl) {
>> + if (probe_data->cxl_pio_initialised) {
> Doesn't make sense yet because it's never true yet. I assume the code
> doesn't always fail in a way it didn't until now?
Right. My problem increasingly adding functionality. With now the sfc
driver able to catch those potential cxl core module removals, the
unwinding here needs to be based on this other check. But that variable
is set by the init code in the next patch. I'll do it here in the next
version.
Thanks!
>> + cxl_decoder_kill_region(probe_data->cxl->cxled);
>> cxl_dpa_free(probe_data->cxl->cxled);
>> cxl_put_root_decoder(probe_data->cxl->cxlrd);
>> }
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 22/22] sfc: support pio mapping based on cxl
2025-06-27 9:46 ` Jonathan Cameron
@ 2025-07-07 12:06 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-07-07 12:06 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 10:46, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:55 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> A PIO buffer is a region of device memory to which the driver can write a
>> packet for TX, with the device handling the transmit doorbell without
>> requiring a DMA for getting the packet data, which helps reducing latency
>> in certain exchanges. With CXL mem protocol this latency can be lowered
>> further.
>>
>> With a device supporting CXL and successfully initialised, use the cxl
>> region to map the memory range and use this mapping for PIO buffers.
>>
>> Add the disabling of those CXL-based PIO buffers if the callback for
>> potential cxl endpoint removal by the CXL code happens.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> There is quite a bit of ifdef magic in here. If there is any way
> to push that to stubs in headers, it would probably improved code
> readability.
I'll look at it, but I think that would require a major refactoring in
the main sfc code.
>
>
> I was expecting to at somepoint see handling of the CXL code being
> called returning EPROBE_DEFER but that's not here so I don't
> understand exactly how that is supposed to work if the CXL infrastructure
> hasn't arrived at time of first probe.
As said previously, sfc code is not doing so because it implies a higher
complexity and I'm not sure the case behind EPROBE DEFER should be
handled this way. I mean, the system BIOS is doing most of the CXL
hardware initialization and those latencies supposedly possible are
likely arising at that point. If the kernel cxl code is affected by some
hardware latencies during cxl kernel initialization, I would like to
understand them better. Is this dependent on cxl hardware
implementation? is it because the way cxl kernel code is implemented?
The case I suffered regarding cxl_mem module not loaded does not, IMO,
justify it. Maybe I just need a reference to the CXL specs about this
for setting my mind. Anyways, I'll think again about it.
> Otherwise, main overall concern is that lifetimes are (I think) more
> complex than they need to be. I suggest a solution in an earlier patch (and
> in reply to previous version) Devres groups are really handy for wrapping
> up a bunch of devm calls with the option to unwind them all on error or at
> a specific point in the remove() path for a driver. That should resolve
> most of my concerns as you'll have something closely approximating a non devm flow.
I think your suggestion makes sense so I'll follow it in next version.
Thanks!
>
> Jonathan
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
` (2 preceding siblings ...)
2025-06-27 18:17 ` Dave Jiang
@ 2025-07-16 22:52 ` Dave Jiang
3 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2025-07-16 22:52 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> The first step for a CXL accelerator driver that wants to establish new
> CXL.mem regions is to register a 'struct cxl_memdev'. That kicks off
> cxl_mem_probe() to enumerate all 'struct cxl_port' instances in the
> topology up to the root.
>
> If the port driver has not attached yet the expectation is that the
> driver waits until that link is established. The common cxl_pci driver
> has reason to keep the 'struct cxl_memdev' device attached to the bus
> until the root driver attaches. An accelerator may want to instead defer
> probing until CXL resources can be acquired.
>
> Use the @endpoint attribute of a 'struct cxl_memdev' to convey when a
> accelerator driver probing should be deferred vs failed. Provide that
> indication via a new cxl_acquire_endpoint() API that can retrieve the
> probe status of the memdev.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Just noticed this. The subject needs a fix
s/cx/cxl/
DJ
> ---
> drivers/cxl/core/memdev.c | 42 +++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/port.c | 2 +-
> drivers/cxl/mem.c | 7 +++++--
> include/cxl/cxl.h | 2 ++
> 4 files changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index f43d2aa2928e..e2c6b5b532db 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -1124,6 +1124,48 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>
> +/*
> + * Try to get a locked reference on a memdev's CXL port topology
> + * connection. Be careful to observe when cxl_mem_probe() has deposited
> + * a probe deferral awaiting the arrival of the CXL root driver.
> + */
> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd)
> +{
> + struct cxl_port *endpoint;
> + int rc = -ENXIO;
> +
> + device_lock(&cxlmd->dev);
> +
> + endpoint = cxlmd->endpoint;
> + if (!endpoint)
> + goto err;
> +
> + if (IS_ERR(endpoint)) {
> + rc = PTR_ERR(endpoint);
> + goto err;
> + }
> +
> + device_lock(&endpoint->dev);
> + if (!endpoint->dev.driver)
> + goto err_endpoint;
> +
> + return endpoint;
> +
> +err_endpoint:
> + device_unlock(&endpoint->dev);
> +err:
> + device_unlock(&cxlmd->dev);
> + return ERR_PTR(rc);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_acquire_endpoint, "CXL");
> +
> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint)
> +{
> + device_unlock(&endpoint->dev);
> + device_unlock(&cxlmd->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_release_endpoint, "CXL");
> +
> static void sanitize_teardown_notifier(void *data)
> {
> struct cxl_memdev_state *mds = data;
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 9acf8c7afb6b..fa10a1643e4c 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1563,7 +1563,7 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd,
> */
> dev_dbg(&cxlmd->dev, "%s is a root dport\n",
> dev_name(dport_dev));
> - return -ENXIO;
> + return -EPROBE_DEFER;
> }
>
> struct cxl_port *parent_port __free(put_cxl_port) =
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 7f39790d9d98..cda0b2ff73ce 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -148,14 +148,17 @@ static int cxl_mem_probe(struct device *dev)
> return rc;
>
> rc = devm_cxl_enumerate_ports(cxlmd);
> - if (rc)
> + if (rc) {
> + cxlmd->endpoint = ERR_PTR(rc);
> return rc;
> + }
>
> struct cxl_port *parent_port __free(put_cxl_port) =
> cxl_mem_find_port(cxlmd, &dport);
> if (!parent_port) {
> dev_err(dev, "CXL port topology not found\n");
> - return -ENXIO;
> + cxlmd->endpoint = ERR_PTR(-EPROBE_DEFER);
> + return -EPROBE_DEFER;
> }
>
> if (cxl_pmem_size(cxlds) && IS_ENABLED(CONFIG_CXL_PMEM)) {
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index fcdf98231ffb..2928e16a62e2 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -234,4 +234,6 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> void cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> struct cxl_dev_state *cxlmds);
> +struct cxl_port *cxl_acquire_endpoint(struct cxl_memdev *cxlmd);
> +void cxl_release_endpoint(struct cxl_memdev *cxlmd, struct cxl_port *endpoint);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 14/22] sfc: get endpoint decoder
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27 9:11 ` Jonathan Cameron
@ 2025-07-16 23:48 ` Dave Jiang
1 sibling, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2025-07-16 23:48 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
On 6/24/25 7:13 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
12/22 and 14/22 have the exact same subject line. Maybe modify one or both to avoid confusion.
DJ
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index c0adfd99cc78..ffbf0e706330 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> goto put_root_decoder;
> }
>
> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
> + EFX_CTPIO_BUFFER_SIZE);
> + if (IS_ERR(cxl->cxled)) {
> + pci_err(pci_dev, "CXL accel request DPA failed");
> + rc = PTR_ERR(cxl->cxled);
> + goto put_root_decoder;
> + }
> +
> probe_data->cxl = cxl;
>
> goto endpoint_release;
> @@ -121,8 +129,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> - if (probe_data->cxl)
> + if (probe_data->cxl) {
> + cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
> + }
> }
>
> MODULE_IMPORT_NS("CXL");
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 00/22] Type2 device basic support
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (21 preceding siblings ...)
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2025-07-25 20:51 ` dan.j.williams
2025-07-25 21:11 ` dan.j.williams
2025-08-27 16:48 ` PJ Waskiewicz
23 siblings, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 20:51 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> v17 changes: (Dan Williams review)
> - use devm for cxl_dev_state allocation
> - using current cxl struct for checking capability registers found by
> the driver.
> - simplify dpa initialization without a mailbox not supporting pmem
> - add cxl_acquire_endpoint for protection during initialization
> - add callback/action to cxl_create_region for a driver notified about cxl
> core kernel modules removal.
> - add sfc function to disable CXL-based PIO buffers if such a callback
> is invoked.
> - Always manage a Type2 created region as private not allowing DAX.
[..]
> base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
That's v6.15. At time of writing v6.16-rc3 was out. At a minimum I would
expect new functionality targeting the next kernel to be based on -rc2
of the previous kernel.
It might be ok if the conflicts are low, but going forward do move your
baseline to at least -rc2 if not later.
This highlights that CXL needs a
Documentation/process/maintainer-handbooks.rst entry to detail
expectations like this.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 00/22] Type2 device basic support
2025-07-25 20:51 ` [PATCH v17 00/22] Type2 device basic support dan.j.williams
@ 2025-07-25 21:11 ` dan.j.williams
0 siblings, 0 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 21:11 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero
dan.j.williams@ wrote:
> alejandro.lucero-palau@ wrote:
> > From: Alejandro Lucero <alucerop@amd.com>
> >
> > v17 changes: (Dan Williams review)
> > - use devm for cxl_dev_state allocation
> > - using current cxl struct for checking capability registers found by
> > the driver.
> > - simplify dpa initialization without a mailbox not supporting pmem
> > - add cxl_acquire_endpoint for protection during initialization
> > - add callback/action to cxl_create_region for a driver notified about cxl
> > core kernel modules removal.
> > - add sfc function to disable CXL-based PIO buffers if such a callback
> > is invoked.
> > - Always manage a Type2 created region as private not allowing DAX.
> [..]
> > base-commit: 0ff41df1cb268fc69e703a08a57ee14ae967d0ca
>
> That's v6.15. At time of writing v6.16-rc3 was out. At a minimum I would
> expect new functionality targeting the next kernel to be based on -rc2
> of the previous kernel.
>
> It might be ok if the conflicts are low, but going forward do move your
> baseline to at least -rc2 if not later.
>
> This highlights that CXL needs a
> Documentation/process/maintainer-handbooks.rst entry to detail
> expectations like this.
...btw:
$ git merge v6.16-rc7
Auto-merging drivers/cxl/core/core.h
Auto-merging drivers/cxl/core/hdm.c
Auto-merging drivers/cxl/core/mbox.c
Auto-merging drivers/cxl/core/memdev.c
Auto-merging drivers/cxl/core/pci.c
Auto-merging drivers/cxl/core/port.c
Auto-merging drivers/cxl/core/region.c
CONFLICT (content): Merge conflict in drivers/cxl/core/region.c
Auto-merging drivers/cxl/cxl.h
CONFLICT (content): Merge conflict in drivers/cxl/cxl.h
Auto-merging drivers/cxl/cxlmem.h
Auto-merging drivers/cxl/mem.c
Auto-merging drivers/cxl/port.c
Auto-merging tools/testing/cxl/Kbuild
Auto-merging tools/testing/cxl/test/mem.c
Auto-merging tools/testing/cxl/test/mock.c
I am ok with conflicts with cxl/next because that is a moving / rebasing
target, but conflicts with mainline are the submitter's problem.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 01/22] cxl: Add type2 device basic support
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-06-25 14:06 ` Jonathan Cameron
@ 2025-07-25 21:46 ` dan.j.williams
2025-08-05 10:45 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 21:46 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Alison Schofield
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Differentiate CXL memory expanders (type 3) from CXL device accelerators
> (type 2) with a new function for initializing cxl_dev_state and a macro
> for helping accel drivers to embed cxl_dev_state inside a private
> struct.
>
> Move structs to include/cxl as the size of the accel driver private
> struct embedding cxl_dev_state needs to know the size of this struct.
>
> Use same new initialization with the type3 pci driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> ---
> drivers/cxl/core/mbox.c | 12 +-
> drivers/cxl/core/memdev.c | 32 +++++
> drivers/cxl/core/pci.c | 1 +
> drivers/cxl/core/regs.c | 1 +
> drivers/cxl/cxl.h | 97 +--------------
> drivers/cxl/cxlmem.h | 85 +------------
> drivers/cxl/cxlpci.h | 21 ----
> drivers/cxl/pci.c | 17 +--
> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
> include/cxl/pci.h | 23 ++++
> tools/testing/cxl/test/mem.c | 3 +-
> 11 files changed, 303 insertions(+), 215 deletions(-)
> create mode 100644 include/cxl/cxl.h
> create mode 100644 include/cxl/pci.h
Thanks for the updates.
Now, I notice this drops some objects out of the existing documentation
given some kdoc moves out of drivers/cxl/. The patch below fixes that
up, but then uncovers some other Documentation build problems:
$ make -j14 htmldocs SPHINXDIRS="driver-api/cxl/"
make[3]: Nothing to be done for 'html'.
Using alabaster theme
source directory: driver-api/cxl
./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_DEVMEM' not described in enum 'cxl_devtype'
./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_CLASSMEM' not described in enum 'cxl_devtype'
./include/cxl/cxl.h:225: warning: expecting prototype for cxl_dev_state_create(). Prototype was for devm_cxl_dev_state_create() instead
Note, this file was renamed in v6.16 to theory-of-operation.rst,
git-apply can usually figure that out.
cxlpci.h is not currently referenced in the documentation build since it
has not kdoc, so no need for a new include/cxl/pci.h entry, but
something to look out for if a later patch adds some kdoc.
-- 8< --
diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
index d732c42526df..ddaee57b80d0 100644
--- a/Documentation/driver-api/cxl/memory-devices.rst
+++ b/Documentation/driver-api/cxl/memory-devices.rst
@@ -344,6 +344,9 @@ CXL Core
.. kernel-doc:: drivers/cxl/cxl.h
:doc: cxl objects
+.. kernel-doc:: include/cxl/cxl.h
+ :internal:
+
.. kernel-doc:: drivers/cxl/cxl.h
:internal:
^ permalink raw reply related [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
2025-06-25 16:37 ` Jonathan Cameron
@ 2025-07-25 22:16 ` dan.j.williams
2025-08-06 8:37 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 22:16 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Add CXL initialization based on new CXL API for accel drivers and make
> it dependent on kernel CXL configuration.
Looks ok, I do feel it is missing Documentation for how someone
determines that this support is even turned on. For example, if
git-bisect lands on this patch the end user will see SFC_CXL enabled in
their kernel and:
pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
...in dmesg, but the CXL functionality is disabled.
Not a showstopper, so:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
...but when you respin patch1 do consider adding a blurb somewhere about
how to detect that CXL is in effect so there is a chance for end users
to help triage CXL operation problems.
[..]
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> new file mode 100644
> index 000000000000..f1db7284dee8
> --- /dev/null
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -0,0 +1,55 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/****************************************************************************
> + *
> + * Driver for AMD network controllers and boards
> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation, incorporated herein by reference.
Per, Documentation/process/license-rules.rst SPDX supersedes the need to
include this boilerplate paragraph, right?
> + */
> +
> +#include <cxl/pci.h>
> +#include <linux/pci.h>
> +
> +#include "net_driver.h"
> +#include "efx_cxl.h"
> +
> +#define EFX_CTPIO_BUFFER_SIZE SZ_256M
> +
> +int efx_cxl_init(struct efx_probe_data *probe_data)
> +{
> + struct efx_nic *efx = &probe_data->efx;
> + struct pci_dev *pci_dev = efx->pci_dev;
> + struct efx_cxl *cxl;
> + u16 dvsec;
> +
> + probe_data->cxl_pio_initialised = false;
> +
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + return 0;
> +
> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> +
> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
> + * specifying no mbox available.
> + */
> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_dev->dev.id, dvsec, struct efx_cxl,
> + cxlds, false);
> +
> + if (!cxl)
> + return -ENOMEM;
> +
> + probe_data->cxl = cxl;
Just note that this defeats the purpose of the
devm_cxl_dev_state_create() scheme which is to allow a container_of()
association of cxl_dev_state with something like a driver's @probe_data.
In this case @probe_data is allocated before @cxl and the devm
allocation of @cxl means that it is freed *after* @probe_data, i.e. not
strictly reverse allocation order.
It is fine as long as nothing in a devm release path tries to walk back
to @probe_data from @cxl, but just something to be aware of.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 03/22] cxl: Move pci generic code
2025-06-24 14:13 ` [PATCH v17 03/22] cxl: Move pci generic code alejandro.lucero-palau
@ 2025-07-25 22:41 ` dan.j.williams
2025-08-06 8:46 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 22:41 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron,
Alison Schofield
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
> meanwhile cxl/pci.c implements the functionality for a Type3 device
> initialization.
>
> Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
> exported and shared with CXL Type2 device initialization.
>
> Fix cxl mock tests affected by the code move.
Next time would be nice to have a bit more color commentary on "fixes".
In this case the code was just deleted to address a compilation problem,
but that deletion is ok because this function stopped being called back
in commit 733b57f262b0 ("cxl/pci: Early setup RCH dport component
registers from RCRB").
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by still stands, but I question why is_cxl_restricted() needs
to be promoted to a global scope function. Are there going to be RCD
type-2 devices that will have Linux drivers?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
2025-06-27 8:27 ` Jonathan Cameron
@ 2025-07-25 22:55 ` dan.j.williams
2025-07-28 16:23 ` Dave Jiang
2025-08-06 9:41 ` Alejandro Lucero Palau
1 sibling, 2 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 22:55 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Export cxl core functions for a Type2 driver being able to discover and
> map the device component registers.
I would squash this with patch5, up to Dave.
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/port.c | 1 +
> drivers/cxl/cxl.h | 7 -------
> drivers/cxl/cxlpci.h | 12 ------------
> include/cxl/cxl.h | 8 ++++++++
> include/cxl/pci.h | 15 +++++++++++++++
> 5 files changed, 24 insertions(+), 19 deletions(-)
>
[..]
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 9c1a82c8af3d..0810c18d7aef 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -70,6 +70,10 @@ struct cxl_regs {
> );
> };
>
> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
> +
> struct cxl_reg_map {
> bool valid;
> int id;
> @@ -223,4 +227,8 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
> sizeof(drv_struct), mbox); \
> })
> +
> +int cxl_map_component_regs(const struct cxl_register_map *map,
> + struct cxl_component_regs *regs,
> + unsigned long map_mask);
With this function now becoming public it really wants some kdoc, and a
rename to add devm_ so that readers are not suprised by hidden devres
behavior behind this API.
It was ok previously because it was private to drivers/cxl/ where
everything is devres managed.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
2025-06-27 8:39 ` Jonathan Cameron
2025-06-27 8:45 ` Jonathan Cameron
@ 2025-07-25 23:04 ` dan.j.williams
2 siblings, 0 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-25 23:04 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl code for registers discovery and mapping regarding cxl component
> regs and validate registers found are as expected.
>
> Set media ready explicitly as there is no means for doing so without
> a mailbox, and without the related cxl register, not mandatory for type2.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Looks good, thanks for the changes here to move all the validation to
sfc.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-06-27 8:42 ` Jonathan Cameron
2025-06-27 8:43 ` Jonathan Cameron
@ 2025-07-26 0:54 ` dan.j.williams
2 siblings, 0 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-26 0:54 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DPA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile hardware partition.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> º
What is that strange character on the last line?
> ---
> drivers/cxl/core/mbox.c | 17 +++++++++++++++++
> include/cxl/cxl.h | 1 +
> 2 files changed, 18 insertions(+)
I would squash this with the first user.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 07/22] sfc: initialize dpa
2025-06-24 14:13 ` [PATCH v17 07/22] sfc: initialize dpa alejandro.lucero-palau
@ 2025-07-26 0:55 ` dan.j.williams
2025-08-08 16:59 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-26 0:55 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use hardcoded values for initializing dpa as there is no mbox available.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index ea02eb82b73c..5d68ee4e818d 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -77,6 +77,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> */
> cxl->cxlds.media_ready = true;
>
> + cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
> +
Yes, definitely squash this with the last patch and you can add:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 08/22] cxl: Prepare memdev creation for type2
2025-06-24 14:13 ` [PATCH v17 08/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2025-07-26 1:05 ` dan.j.williams
2025-08-08 17:01 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-26 1:05 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
> creating a memdev leading to problems when obtaining cxl_memdev_state
> references from a CXL_DEVTYPE_DEVMEM type.
>
> Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
> support.
>
> Make devm_cxl_add_memdev accessible from a accel driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/cxl/core/memdev.c | 15 +++++++++++++--
> drivers/cxl/cxlmem.h | 2 --
> drivers/cxl/mem.c | 25 +++++++++++++++++++------
> include/cxl/cxl.h | 2 ++
> 4 files changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index c73582d24dd7..f43d2aa2928e 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -7,6 +7,7 @@
> #include <linux/slab.h>
> #include <linux/idr.h>
> #include <linux/pci.h>
> +#include <cxl/cxl.h>
> #include <cxlmem.h>
> #include "trace.h"
> #include "core.h"
> @@ -562,9 +563,16 @@ static const struct device_type cxl_memdev_type = {
> .groups = cxl_memdev_attribute_groups,
> };
>
> +static const struct device_type cxl_accel_memdev_type = {
> + .name = "cxl_accel_memdev",
> + .release = cxl_memdev_release,
> + .devnode = cxl_memdev_devnode,
> +};
> +
> bool is_cxl_memdev(const struct device *dev)
> {
> - return dev->type == &cxl_memdev_type;
> + return (dev->type == &cxl_memdev_type ||
> + dev->type == &cxl_accel_memdev_type);
> }
> EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
>
> @@ -689,7 +697,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> dev->parent = cxlds->dev;
> dev->bus = &cxl_bus_type;
> dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> - dev->type = &cxl_memdev_type;
> + if (cxlds->type == CXL_DEVTYPE_DEVMEM)
> + dev->type = &cxl_accel_memdev_type;
> + else
> + dev->type = &cxl_memdev_type;
> device_set_pm_not_required(dev);
> INIT_WORK(&cxlmd->detach_work, detach_memdev);
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 9cc4337cacfb..7be51f70902a 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -88,8 +88,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
> return is_cxl_memdev(port->uport_dev);
> }
>
> -struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> - struct cxl_dev_state *cxlds);
> int devm_cxl_sanitize_setup_notifier(struct device *host,
> struct cxl_memdev *cxlmd);
> struct cxl_memdev_state;
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 9675243bd05b..7f39790d9d98 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -130,12 +130,18 @@ static int cxl_mem_probe(struct device *dev)
> dentry = cxl_debugfs_create_dir(dev_name(dev));
> debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
>
> - if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
> - debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
> - &cxl_poison_inject_fops);
> - if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
> - debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
> - &cxl_poison_clear_fops);
> + /*
> + * Avoid poison debugfs files for Type2 devices as they rely on
> + * cxl_memdev_state.
> + */
I know this already has my Reviewed-by, but this comment is going to
annoying long term. The CXL specification has already dropped "Type2" as
a name and Linux has already called this DEVMEM, and the comment belongs
on a helper.
Just call a new cxl_memdev_poison_enable() helper unconditionally, put
the mds NULL check inside of it and comment on that helper:
/* For CLASSMEM memory expanders enable poison injection */
cxl_memdev_poison_enable()
> + if (mds) {
> + if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
> + debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
> + &cxl_poison_inject_fops);
> + if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
> + debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
> + &cxl_poison_clear_fops);
> + }
>
> rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
> if (rc)
> @@ -219,6 +225,13 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
>
> + /*
> + * Avoid poison sysfs files for Type2 devices as they rely on
> + * cxl_memdev_state.
> + */
> + if (!mds)
> + return 0;
> +
> if (a == &dev_attr_trigger_poison_list.attr)
> if (!test_bit(CXL_POISON_ENABLED_LIST,
> mds->poison.enabled_cmds))
Same here, do not sprinle an "if (!mds)" check add a:
cxl_poison_attr_visible() helper and call it unconditionally in the "if
(a == &dev_attr_trigger_poison_list.attr)" case.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 20/22] sfc: create cxl region
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
2025-06-27 9:38 ` Jonathan Cameron
@ 2025-07-28 16:20 ` dan.j.williams
2025-08-11 14:38 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-28 16:20 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for creating a region using the endpoint decoder related to
> a DPA range.
>
> Add a callback for unwinding sfc cxl initialization when the endpoint port
> is destroyed by potential cxl_acpi or cxl_mem modules removal.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 24 +++++++++++++++++++++++-
> 1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index ffbf0e706330..7365effe974e 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -18,6 +18,16 @@
>
> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>
> +static void efx_release_cxl_region(void *priv_cxl)
> +{
> + struct efx_probe_data *probe_data = priv_cxl;
> + struct efx_cxl *cxl = probe_data->cxl;
> +
> + iounmap(cxl->ctpio_cxl);
There is no synchronization here. If someone unbinds the a cxl_port
while the driver is using @ctpio_cxl, it looks it will cause a crash.
The loss of CXL connectivity after the driver has already committed to
it likely means that the whole driver needs to be shutdown, not just
this region cleanup.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-07-25 22:55 ` dan.j.williams
@ 2025-07-28 16:23 ` Dave Jiang
2025-08-06 9:43 ` Alejandro Lucero Palau
2025-08-06 9:41 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: Dave Jiang @ 2025-07-28 16:23 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet
Cc: Alejandro Lucero
On 7/25/25 3:55 PM, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Export cxl core functions for a Type2 driver being able to discover and
>> map the device component registers.
>
> I would squash this with patch5, up to Dave.
I would prefer that. In general I'd prefer to see the enabling code going with where it's being used to see how it gets utilized. It makes reviewing a bit easier. Thanks!
DJ
>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/port.c | 1 +
>> drivers/cxl/cxl.h | 7 -------
>> drivers/cxl/cxlpci.h | 12 ------------
>> include/cxl/cxl.h | 8 ++++++++
>> include/cxl/pci.h | 15 +++++++++++++++
>> 5 files changed, 24 insertions(+), 19 deletions(-)
>>
> [..]
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 9c1a82c8af3d..0810c18d7aef 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -70,6 +70,10 @@ struct cxl_regs {
>> );
>> };
>>
>> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
>> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
>> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
>> +
>> struct cxl_reg_map {
>> bool valid;
>> int id;
>> @@ -223,4 +227,8 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
>> sizeof(drv_struct), mbox); \
>> })
>> +
>> +int cxl_map_component_regs(const struct cxl_register_map *map,
>> + struct cxl_component_regs *regs,
>> + unsigned long map_mask);
>
> With this function now becoming public it really wants some kdoc, and a
> rename to add devm_ so that readers are not suprised by hidden devres
> behavior behind this API.
>
> It was ok previously because it was private to drivers/cxl/ where
> everything is devres managed.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 12/22] sfc: get endpoint decoder
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27 9:10 ` Jonathan Cameron
@ 2025-07-28 16:30 ` dan.j.williams
2025-08-11 14:24 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-07-28 16:30 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/Kconfig | 1 +
> drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
> 2 files changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
> index 979f2801e2a8..e959d9b4f4ce 100644
> --- a/drivers/net/ethernet/sfc/Kconfig
> +++ b/drivers/net/ethernet/sfc/Kconfig
> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
> config SFC_CXL
> bool "Solarflare SFC9100-family CXL support"
> depends on SFC && CXL_BUS >= SFC
> + depends on CXL_REGION
> default SFC
> help
> This enables SFC CXL support if the kernel is configuring CXL for
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index e2d52ed49535..c0adfd99cc78 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> + resource_size_t max_size;
> struct efx_cxl *cxl;
> u16 dvsec;
> int rc;
> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
> + if (IS_ERR(cxl->endpoint))
> + return PTR_ERR(cxl->endpoint);
Between Terry's set, the soft reserve set, and now this, it is become
clearer that the cxl_core needs a centralized solution to the questions
of:
- Does the platform have CXL and if so might a device ever successfully
complete cxl_mem_probe() for a cxl_memdev that it registered?
- When can a driver assume that no cxl_port topology is going to arrive?
I.e. when to give up on probe deferral.
It is also clear that a class of CXL accelerator drivers would be
served by a simple shared routine to autocreate a region.
I am going to take a stab at refactoring the current classmem case into
a scheme that resolves automatic region assembly at
devm_cxl_add_memdev() time in a way that can be reused to solve this
automatic region creation problem.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-07-01 16:02 ` Alejandro Lucero Palau
@ 2025-07-28 17:45 ` dan.j.williams
2025-07-30 3:46 ` dan.j.williams
2025-08-09 11:24 ` Alejandro Lucero Palau
0 siblings, 2 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-28 17:45 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dave Jiang, alejandro.lucero-palau,
linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet
Alejandro Lucero Palau wrote:
[..]
> > Can you please explain how the accelerator driver init path is
> > different in this instance that it requires cxl_mem driver to defer
> > probing? Currently with a type3, the cxl_acpi driver will setup the
> > CXL root, hostbridges and PCI root ports. At that point the memdev
> > driver will enumerate the rest of the ports and attempt to establish
> > the hierarchy. However if cxl_acpi is not done, the mem probe will
> > fail. But, the cxl_acpi probe will trigger a re-probe sequence at
> > the end when it is done. At that point, the mem probe should
> > discover all the necessary ports if things are correct. If the
> > accelerator init path is different, can we introduce some
> > documentation to explain the difference?
The biggest difference is that devm_cxl_add_memdev() is "hopeful" in the
cxl_pci case. I.e. cxl_pci_probe() does not fail is the memory device it
registered does not ever pass cxl_mem_probe().
Accelerators are different. They want to know that the CXL side of the
house is up and running before enabling driver features that depend on
it. They also want to safely teardown driver functionality if CXL
capabilities disappear.
cxl_pci does not know or care if or when cxl_mem::probe() succeeds and
cxl_mem::remove() is invoked.
> > Also, it seems as long as port topology is not found, it will always
> > go to deferred probing. At what point do we conclude that things may
> > be missing/broken and we need to fail?
Right, at some point the driver needs to give up on CXL ever arriving.
> Hi Dave,
>
>
> The patch commit comes from Dan's original one, so I'm afraid I can not
> explain it better myself.
>
>
> I added this patch again after Dan suggesting with cxl_acquire_endpoint
> the initialization by a Type2 can obtain some protection against cxl_mem
> or cxl_acpi being removed. I added later protection or handling against
> this by the sfc driver after initialization. So this is the main reason
> for this patch at least to me.
>
>
> Regarding the goal from the original patch, being honest, I can not see
> the cxl_acpi problem, although I'm not saying it does not exist. But it
> is quite confusing to me and as I said in another patch regarding probe
> deferral, supporting that option would add complexity to the current sfc
> driver probing. If there exists another workaround for avoiding it, that
> would be the way I prefer to follow.
The problem is how to handle the "CXL device in PCIe-only mode" problem.
Even with a CXL endpoint directly attached to a CXL host there is no
guarantee that the device trains the link in CXL mode. So in addition to
the software-dynamic problems of module loading and asynchronous driver
bind/unbind, there is this hardware-dynamic problem.
I am losing my nerve with the cxl_acquire_endpoint() approach. Now that
I see how this driver tried to use it and the questions it generated, it
pushes too much complexity to leaf drivers. In the end, I want to
(inspired by faux_device) get to the point where the caller can assume
that successful devm_cxl_add_memdev() means that CXL is operational and
any non-interleaved CXL regions have finished auto-assembly/creation.
To get there this needs Terry's patches that set pdev->is_cxl on all
ancestor devices in order to make a determination that the hardware-CXL
link is up before going to flush software CXL-link establishment.
> Adding documentation about all this would definitely help, even without
> the Type2 case.
I would ask that you help Terry get the protocol error handling series
in shape as part of the dependency here is to make sure that there is a
capable error model for CXL link events.
Meanwhile, I am going to rework devm_cxl_add_memdev() to make it report
when CXL port arrival is deferred, permanently failed, or successful.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-07-28 17:45 ` dan.j.williams
@ 2025-07-30 3:46 ` dan.j.williams
2025-08-09 11:24 ` Alejandro Lucero Palau
1 sibling, 0 replies; 112+ messages in thread
From: dan.j.williams @ 2025-07-30 3:46 UTC (permalink / raw)
To: dan.j.williams, Alejandro Lucero Palau, Dave Jiang,
alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet
dan.j.williams@ wrote:
[..]
> Meanwhile, I am going to rework devm_cxl_add_memdev() to make it report
> when CXL port arrival is deferred, permanently failed, or successful.
Here is a branch with my work-in-progress thoughts on fixing some of
these module load ordering problems and obviating the need for
cxl_acquire_endpoint():
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.18/cxl-probe-order
Feel free to steal from that branch and take code upstream with my
Co-developed-by. The main missing piece is integration with Terry's
"pdev->is_cxl" enabling to know when it is worth waiting for CXL
scanning and when to fallback to PCIe only.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 01/22] cxl: Add type2 device basic support
2025-07-25 21:46 ` dan.j.williams
@ 2025-08-05 10:45 ` Alejandro Lucero Palau
2025-08-05 15:14 ` Dave Jiang
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-05 10:45 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron, Alison Schofield
On 7/25/25 22:46, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>> (type 2) with a new function for initializing cxl_dev_state and a macro
>> for helping accel drivers to embed cxl_dev_state inside a private
>> struct.
>>
>> Move structs to include/cxl as the size of the accel driver private
>> struct embedding cxl_dev_state needs to know the size of this struct.
>>
>> Use same new initialization with the type3 pci driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> ---
>> drivers/cxl/core/mbox.c | 12 +-
>> drivers/cxl/core/memdev.c | 32 +++++
>> drivers/cxl/core/pci.c | 1 +
>> drivers/cxl/core/regs.c | 1 +
>> drivers/cxl/cxl.h | 97 +--------------
>> drivers/cxl/cxlmem.h | 85 +------------
>> drivers/cxl/cxlpci.h | 21 ----
>> drivers/cxl/pci.c | 17 +--
>> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
>> include/cxl/pci.h | 23 ++++
>> tools/testing/cxl/test/mem.c | 3 +-
>> 11 files changed, 303 insertions(+), 215 deletions(-)
>> create mode 100644 include/cxl/cxl.h
>> create mode 100644 include/cxl/pci.h
> Thanks for the updates.
>
> Now, I notice this drops some objects out of the existing documentation
> given some kdoc moves out of drivers/cxl/. The patch below fixes that
> up, but then uncovers some other Documentation build problems:
>
> $ make -j14 htmldocs SPHINXDIRS="driver-api/cxl/"
> make[3]: Nothing to be done for 'html'.
> Using alabaster theme
> source directory: driver-api/cxl
> ./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_DEVMEM' not described in enum 'cxl_devtype'
> ./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_CLASSMEM' not described in enum 'cxl_devtype'
> ./include/cxl/cxl.h:225: warning: expecting prototype for cxl_dev_state_create(). Prototype was for devm_cxl_dev_state_create() instead
OK. I can fix those problems easily (bad punctuation). I can not see
the one about the prototype, but maybe it is due to the base commit.
BTW, which one should I use for next version and rebasing on Terry's
patches?
Thanks
> Note, this file was renamed in v6.16 to theory-of-operation.rst,
> git-apply can usually figure that out.
>
> cxlpci.h is not currently referenced in the documentation build since it
> has not kdoc, so no need for a new include/cxl/pci.h entry, but
> something to look out for if a later patch adds some kdoc.
>
> -- 8< --
> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
> index d732c42526df..ddaee57b80d0 100644
> --- a/Documentation/driver-api/cxl/memory-devices.rst
> +++ b/Documentation/driver-api/cxl/memory-devices.rst
> @@ -344,6 +344,9 @@ CXL Core
> .. kernel-doc:: drivers/cxl/cxl.h
> :doc: cxl objects
>
> +.. kernel-doc:: include/cxl/cxl.h
> + :internal:
> +
> .. kernel-doc:: drivers/cxl/cxl.h
> :internal:
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 01/22] cxl: Add type2 device basic support
2025-08-05 10:45 ` Alejandro Lucero Palau
@ 2025-08-05 15:14 ` Dave Jiang
0 siblings, 0 replies; 112+ messages in thread
From: Dave Jiang @ 2025-08-05 15:14 UTC (permalink / raw)
To: Alejandro Lucero Palau, dan.j.williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet
Cc: Jonathan Cameron, Alison Schofield
On 8/5/25 3:45 AM, Alejandro Lucero Palau wrote:
>
> On 7/25/25 22:46, dan.j.williams@intel.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>>> (type 2) with a new function for initializing cxl_dev_state and a macro
>>> for helping accel drivers to embed cxl_dev_state inside a private
>>> struct.
>>>
>>> Move structs to include/cxl as the size of the accel driver private
>>> struct embedding cxl_dev_state needs to know the size of this struct.
>>>
>>> Use same new initialization with the type3 pci driver.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>>> ---
>>> drivers/cxl/core/mbox.c | 12 +-
>>> drivers/cxl/core/memdev.c | 32 +++++
>>> drivers/cxl/core/pci.c | 1 +
>>> drivers/cxl/core/regs.c | 1 +
>>> drivers/cxl/cxl.h | 97 +--------------
>>> drivers/cxl/cxlmem.h | 85 +------------
>>> drivers/cxl/cxlpci.h | 21 ----
>>> drivers/cxl/pci.c | 17 +--
>>> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
>>> include/cxl/pci.h | 23 ++++
>>> tools/testing/cxl/test/mem.c | 3 +-
>>> 11 files changed, 303 insertions(+), 215 deletions(-)
>>> create mode 100644 include/cxl/cxl.h
>>> create mode 100644 include/cxl/pci.h
>> Thanks for the updates.
>>
>> Now, I notice this drops some objects out of the existing documentation
>> given some kdoc moves out of drivers/cxl/. The patch below fixes that
>> up, but then uncovers some other Documentation build problems:
>>
>> $ make -j14 htmldocs SPHINXDIRS="driver-api/cxl/"
>> make[3]: Nothing to be done for 'html'.
>> Using alabaster theme
>> source directory: driver-api/cxl
>> ./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_DEVMEM' not described in enum 'cxl_devtype'
>> ./include/cxl/cxl.h:24: warning: Enum value 'CXL_DEVTYPE_CLASSMEM' not described in enum 'cxl_devtype'
>> ./include/cxl/cxl.h:225: warning: expecting prototype for cxl_dev_state_create(). Prototype was for devm_cxl_dev_state_create() instead
>
>
> OK. I can fix those problems easily (bad punctuation). I can not see the one about the prototype, but maybe it is due to the base commit. BTW, which one should I use for next version and rebasing on Terry's patches?
Latest against upstream RC is usually preferred. So 6.17-rc1 is probably the earliest.
DJ
>
>
> Thanks
>
>
>> Note, this file was renamed in v6.16 to theory-of-operation.rst,
>> git-apply can usually figure that out.
>>
>> cxlpci.h is not currently referenced in the documentation build since it
>> has not kdoc, so no need for a new include/cxl/pci.h entry, but
>> something to look out for if a later patch adds some kdoc.
>>
>> -- 8< --
>> diff --git a/Documentation/driver-api/cxl/memory-devices.rst b/Documentation/driver-api/cxl/memory-devices.rst
>> index d732c42526df..ddaee57b80d0 100644
>> --- a/Documentation/driver-api/cxl/memory-devices.rst
>> +++ b/Documentation/driver-api/cxl/memory-devices.rst
>> @@ -344,6 +344,9 @@ CXL Core
>> .. kernel-doc:: drivers/cxl/cxl.h
>> :doc: cxl objects
>> +.. kernel-doc:: include/cxl/cxl.h
>> + :internal:
>> +
>> .. kernel-doc:: drivers/cxl/cxl.h
>> :internal:
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-06-27 22:42 ` Dave Jiang
@ 2025-08-05 16:14 ` dan.j.williams
2025-08-11 12:04 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-08-05 16:14 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from device DPA
> (device-physical-address space) and assigning it to decode a given HPA
> (host-physical-address space). Before determining how much DPA to
> allocate the amount of available HPA must be determined. Also, not all
> HPA is created equal, some specifically targets RAM, some target PMEM,
> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
> is host-only (HDM-H).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 169 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 11 +++
> 3 files changed, 183 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c3f4dc244df7..03e058ab697e 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -695,6 +695,175 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + /*
> + * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
> + * 32 bits, the bitmap functions can be used.
> + */
Comments are supposed to explain the code, not repeat the code in
natural language.
> + if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
How is this easier to read than:
if ((cxld->flags & ctx->flags) != ctx->flags)
return 0;
?
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_region_rwsem to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_region_rwsem);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + struct resource *next = res->sibling;
> + resource_size_t free = 0;
> +
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + if (next && res->end + 1 < next->start) {
> + free = next->start - res->end + 1;
> + max = max(free, max);
> + }
> + if (!next && res->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - res->end + 1;
> + max = max(free, max);
> + }
> + }
With the benefit of time to reflect, and looking at this again after all
this time it strikes me that it is simply duplicating
get_free_mem_region() and in a way that can still fail later.
Does it simplify the implementation if this just attempts to
allocate the capacity in each window that might support the mapping
constraints and then pass that allocation to the region construction
routine?
Otherwise, this completes a survey of the capacity that is not
guaranteed to be present when the region finally gets allocated.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 18/22] cxl: Allow region creation by type2 drivers
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-06-27 9:32 ` Jonathan Cameron
@ 2025-08-05 16:33 ` dan.j.williams
2025-08-11 14:45 ` Alejandro Lucero Palau
1 sibling, 1 reply; 112+ messages in thread
From: dan.j.williams @ 2025-08-05 16:33 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
>
> Adding that functionality and integrating it with current support for
> memory expanders.
>
> Support an action by the type2 driver to be linked to the created region
> for unwinding the resources allocated properly.
The hardest part of CXL is the fact that typical straight-line driver
expectations like "device present == MMIO available" are violated. An
accelerator driver needs to worry about asynchronous region detach and
CXL port detach.
Ideally any event that takes down a CXL port or the region simply
results in the accelerator driver being detached to clean everything up.
The difficult part about that is that the remove path for regions and
CXL ports hold locks that prevent the accelerator remove path from
running.
I do not think it is maintainable for every accelerator driver to invent
its own cleanup scheme like this. The expectation should be that a
region can go into a defunct state if someone triggers removal actions
in the wrong order, but otherwise the accelerator driver should be able
to rely on a detach event to clean everything up.
So opting into CXL operation puts a driver into a situation where it can
be unbound whenever the CXL link goes down logically or physically.
Physical device removal of a CXL port expects that the operator has
first shutdown all driver operations, and if they have not at least the
driver should not crash while awaiting the remove event.
Physical CXL port removal is the "easy" case since that will naturally
result in the accelerator 'struct pci_dev' being removed. The more
difficult cases are the logical removal / shutdown of a CXL port or
region. Those should schedule accelerator detach and put the region into
an error state until that cleanup runs.
So, in summary, do not allow for custom region callbacks, arrange for
accelerator detach and just solve the "fail in-flight operations while
awaiting detach" problem.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 02/22] sfc: add cxl support
2025-07-25 22:16 ` dan.j.williams
@ 2025-08-06 8:37 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-06 8:37 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron, Edward Cree, Alison Schofield
On 7/25/25 23:16, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Add CXL initialization based on new CXL API for accel drivers and make
>> it dependent on kernel CXL configuration.
> Looks ok, I do feel it is missing Documentation for how someone
> determines that this support is even turned on. For example, if
> git-bisect lands on this patch the end user will see SFC_CXL enabled in
> their kernel and:
>
> pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>
> ...in dmesg, but the CXL functionality is disabled.
Not really. There is an empty efx_cxl_init defined at efx_cxl.h when
SFC_CXL is not set.
>
> Not a showstopper, so:
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Thanks!
more comments below.
>
> ...but when you respin patch1 do consider adding a blurb somewhere about
> how to detect that CXL is in effect so there is a chance for end users
> to help triage CXL operation problems.
>
> [..]
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> new file mode 100644
>> index 000000000000..f1db7284dee8
>> --- /dev/null
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -0,0 +1,55 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/****************************************************************************
>> + *
>> + * Driver for AMD network controllers and boards
>> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published
>> + * by the Free Software Foundation, incorporated herein by reference.
> Per, Documentation/process/license-rules.rst SPDX supersedes the need to
> include this boilerplate paragraph, right?
>
Yes. I'll remove it.
>> + */
>> +
>> +#include <cxl/pci.h>
>> +#include <linux/pci.h>
>> +
>> +#include "net_driver.h"
>> +#include "efx_cxl.h"
>> +
>> +#define EFX_CTPIO_BUFFER_SIZE SZ_256M
>> +
>> +int efx_cxl_init(struct efx_probe_data *probe_data)
>> +{
>> + struct efx_nic *efx = &probe_data->efx;
>> + struct pci_dev *pci_dev = efx->pci_dev;
>> + struct efx_cxl *cxl;
>> + u16 dvsec;
>> +
>> + probe_data->cxl_pio_initialised = false;
>> +
>> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> + if (!dvsec)
>> + return 0;
>> +
>> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>> +
>> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
>> + * specifying no mbox available.
>> + */
>> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
>> + pci_dev->dev.id, dvsec, struct efx_cxl,
>> + cxlds, false);
>> +
>> + if (!cxl)
>> + return -ENOMEM;
>> +
>> + probe_data->cxl = cxl;
> Just note that this defeats the purpose of the
> devm_cxl_dev_state_create() scheme which is to allow a container_of()
> association of cxl_dev_state with something like a driver's @probe_data.
> In this case @probe_data is allocated before @cxl and the devm
> allocation of @cxl means that it is freed *after* @probe_data, i.e. not
> strictly reverse allocation order.
>
> It is fine as long as nothing in a devm release path tries to walk back
> to @probe_data from @cxl, but just something to be aware of.
Right, but I have to live with current sfc driver design, and I do not
think there is a good justification for changing it for this case. But I
agree the idea is to have such container_of functionality which
hopefully will be used by other drivers.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 03/22] cxl: Move pci generic code
2025-07-25 22:41 ` dan.j.williams
@ 2025-08-06 8:46 ` Alejandro Lucero Palau
2025-08-06 9:31 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-06 8:46 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Fan Ni, Jonathan Cameron, Alison Schofield
On 7/25/25 23:41, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
>> meanwhile cxl/pci.c implements the functionality for a Type3 device
>> initialization.
>>
>> Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
>> exported and shared with CXL Type2 device initialization.
>>
>> Fix cxl mock tests affected by the code move.
> Next time would be nice to have a bit more color commentary on "fixes".
> In this case the code was just deleted to address a compilation problem,
> but that deletion is ok because this function stopped being called back
> in commit 733b57f262b0 ("cxl/pci: Early setup RCH dport component
> registers from RCRB").
Thanks. I'll add this.
>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by still stands, but I question why is_cxl_restricted() needs
> to be promoted to a global scope function. Are there going to be RCD
> type-2 devices that will have Linux drivers?
This was necessary in previous versions where a new accel API function
required it. It is now gone, so it can be defined as before.
I'll fix it.
Thanks!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 03/22] cxl: Move pci generic code
2025-08-06 8:46 ` Alejandro Lucero Palau
@ 2025-08-06 9:31 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-06 9:31 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Fan Ni, Jonathan Cameron, Alison Schofield
On 8/6/25 09:46, Alejandro Lucero Palau wrote:
>
> On 7/25/25 23:41, dan.j.williams@intel.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
>>> meanwhile cxl/pci.c implements the functionality for a Type3 device
>>> initialization.
>>>
>>> Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
>>> exported and shared with CXL Type2 device initialization.
>>>
>>> Fix cxl mock tests affected by the code move.
>> Next time would be nice to have a bit more color commentary on "fixes".
>> In this case the code was just deleted to address a compilation problem,
>> but that deletion is ok because this function stopped being called back
>> in commit 733b57f262b0 ("cxl/pci: Early setup RCH dport component
>> registers from RCRB").
>
>
> Thanks. I'll add this.
>
>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>> Reviewed-by: Fan Ni <fan.ni@samsung.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Reviewed-by still stands, but I question why is_cxl_restricted() needs
>> to be promoted to a global scope function. Are there going to be RCD
>> type-2 devices that will have Linux drivers?
>
>
> This was necessary in previous versions where a new accel API function
> required it. It is now gone, so it can be defined as before.
>
> I'll fix it.
>
It turns out this is not the only reason. After moving
cxl_pci_setup_regs such a definition is needed in two files, so it needs
to be in a header, but not in include/cxl, so I'll put it in
drivers/cxl/cxlpci.h
>
> Thanks!
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-07-25 22:55 ` dan.j.williams
2025-07-28 16:23 ` Dave Jiang
@ 2025-08-06 9:41 ` Alejandro Lucero Palau
1 sibling, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-06 9:41 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 7/25/25 23:55, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Export cxl core functions for a Type2 driver being able to discover and
>> map the device component registers.
> I would squash this with patch5, up to Dave.
>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/port.c | 1 +
>> drivers/cxl/cxl.h | 7 -------
>> drivers/cxl/cxlpci.h | 12 ------------
>> include/cxl/cxl.h | 8 ++++++++
>> include/cxl/pci.h | 15 +++++++++++++++
>> 5 files changed, 24 insertions(+), 19 deletions(-)
>>
> [..]
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 9c1a82c8af3d..0810c18d7aef 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -70,6 +70,10 @@ struct cxl_regs {
>> );
>> };
>>
>> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
>> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
>> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
>> +
>> struct cxl_reg_map {
>> bool valid;
>> int id;
>> @@ -223,4 +227,8 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
>> sizeof(drv_struct), mbox); \
>> })
>> +
>> +int cxl_map_component_regs(const struct cxl_register_map *map,
>> + struct cxl_component_regs *regs,
>> + unsigned long map_mask);
> With this function now becoming public it really wants some kdoc, and a
> rename to add devm_ so that readers are not suprised by hidden devres
> behavior behind this API.
>
> It was ok previously because it was private to drivers/cxl/ where
> everything is devres managed.
I'll do so.
Thanks.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs
2025-07-28 16:23 ` Dave Jiang
@ 2025-08-06 9:43 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-06 9:43 UTC (permalink / raw)
To: Dave Jiang, dan.j.williams, alejandro.lucero-palau, linux-cxl,
netdev, edward.cree, davem, kuba, pabeni, edumazet
On 7/28/25 17:23, Dave Jiang wrote:
>
> On 7/25/25 3:55 PM, dan.j.williams@intel.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Export cxl core functions for a Type2 driver being able to discover and
>>> map the device component registers.
>> I would squash this with patch5, up to Dave.
> I would prefer that. In general I'd prefer to see the enabling code going with where it's being used to see how it gets utilized. It makes reviewing a bit easier. Thanks!
It was recommended to have sfc changes isolated for facilitating someone
testing the type2 support with another driver. But I think it should not
be a big issue for anyone to squash some of them, so I'll do so.
Thanks
> DJ
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> ---
>>> drivers/cxl/core/port.c | 1 +
>>> drivers/cxl/cxl.h | 7 -------
>>> drivers/cxl/cxlpci.h | 12 ------------
>>> include/cxl/cxl.h | 8 ++++++++
>>> include/cxl/pci.h | 15 +++++++++++++++
>>> 5 files changed, 24 insertions(+), 19 deletions(-)
>>>
>> [..]
>>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>>> index 9c1a82c8af3d..0810c18d7aef 100644
>>> --- a/include/cxl/cxl.h
>>> +++ b/include/cxl/cxl.h
>>> @@ -70,6 +70,10 @@ struct cxl_regs {
>>> );
>>> };
>>>
>>> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
>>> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
>>> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
>>> +
>>> struct cxl_reg_map {
>>> bool valid;
>>> int id;
>>> @@ -223,4 +227,8 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>>> (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
>>> sizeof(drv_struct), mbox); \
>>> })
>>> +
>>> +int cxl_map_component_regs(const struct cxl_register_map *map,
>>> + struct cxl_component_regs *regs,
>>> + unsigned long map_mask);
>> With this function now becoming public it really wants some kdoc, and a
>> rename to add devm_ so that readers are not suprised by hidden devres
>> behavior behind this API.
>>
>> It was ok previously because it was private to drivers/cxl/ where
>> everything is devres managed.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-30 15:57 ` Alejandro Lucero Palau
@ 2025-08-08 13:11 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-08 13:11 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
<snip>
>>> +#include <cxl/cxl.h>
>>> #include <cxl/pci.h>
>>> #include <linux/pci.h>
>>> @@ -23,6 +24,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>> struct pci_dev *pci_dev = efx->pci_dev;
>>> struct efx_cxl *cxl;
>>> u16 dvsec;
>>> + int rc;
>>> probe_data->cxl_pio_initialised = false;
>>> @@ -43,6 +45,38 @@ int efx_cxl_init(struct efx_probe_data
>>> *probe_data)
>>> if (!cxl)
>>> return -ENOMEM;
>>> + rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
>>> + &cxl->cxlds.reg_map);
>>> + if (rc) {
>>> + dev_warn(&pci_dev->dev, "No component registers
>>> (err=%d)\n", rc);
>>> + return rc;
>> I haven't checked the code paths to see if we might hit them but this
>> might
>> defer. In which case
>> return dev_err_probe() is appropriate as it stashes away the
>> cause of deferral for debugging purposes and doesn't print if that's
>> what
>> happened as we'll be back later.
>>
>> If we can hit the deferral then you should catch that at the caller
>> of efx_cxl_init()
>> and fail the probe (we'll be back a bit later and should then succeed).
>>
>
> I'm scare of opening this can ... but I think adding probe deferral
> support to the sfc driver is not an option, or at least something we
> want to avoid because the complexity it would add.
>
>
It turns out the EPROBE_DEFER can only be obtained in this call if it is
a restricted cxl device, so nothing to care about for sfc.
Seizing this reply for telling you I'm going to squash this patch with
the previous one which you gave your review tag, so I think it is better
to not add yours after this squashing, but happy to do soon ...
Thank you
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 05/22] sfc: setup cxl component regs and set media ready
2025-06-27 8:45 ` Jonathan Cameron
@ 2025-08-08 13:14 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-08 13:14 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 6/27/25 09:45, Jonathan Cameron wrote:
> On Tue, 24 Jun 2025 15:13:38 +0100
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl code for registers discovery and mapping regarding cxl component
>> regs and validate registers found are as expected.
>>
>> Set media ready explicitly as there is no means for doing so without
>> a mailbox, and without the related cxl register, not mandatory for type2.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Few things came to mind reading later patches...
>
> Some of the calls in here register extra devm stuff. So, given we just
> eat any errors in this cxl setup in my mind we should clean them up.
>
> The devres group approach suggested earlier deals with that for you as
> all the CXL devm stuff will end up in that group and you can tear it
> down on error in efx_cxl_init()
This adds to your previous concern for using devres group, but after
looking at it, I think from a netdev point of view is preferable to keep
this out. Moreover, once I got Dan's review tag, I prefer to act quickly ;-)
But I will add the comments you advised to instead for making it clearer
for reviewers.
Thank you!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 07/22] sfc: initialize dpa
2025-07-26 0:55 ` dan.j.williams
@ 2025-08-08 16:59 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-08 16:59 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 7/26/25 01:55, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use hardcoded values for initializing dpa as there is no mbox available.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index ea02eb82b73c..5d68ee4e818d 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -77,6 +77,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> */
>> cxl->cxlds.media_ready = true;
>>
>> + cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE);
>> +
> Yes, definitely squash this with the last patch and you can add:
>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
I'll do so. Thanks!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 08/22] cxl: Prepare memdev creation for type2
2025-07-26 1:05 ` dan.j.williams
@ 2025-08-08 17:01 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-08 17:01 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Jonathan Cameron, Alison Schofield
On 7/26/25 02:05, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
>> creating a memdev leading to problems when obtaining cxl_memdev_state
>> references from a CXL_DEVTYPE_DEVMEM type.
>>
>> Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
>> support.
>>
>> Make devm_cxl_add_memdev accessible from a accel driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> ---
>> drivers/cxl/core/memdev.c | 15 +++++++++++++--
>> drivers/cxl/cxlmem.h | 2 --
>> drivers/cxl/mem.c | 25 +++++++++++++++++++------
>> include/cxl/cxl.h | 2 ++
>> 4 files changed, 34 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index c73582d24dd7..f43d2aa2928e 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -7,6 +7,7 @@
>> #include <linux/slab.h>
>> #include <linux/idr.h>
>> #include <linux/pci.h>
>> +#include <cxl/cxl.h>
>> #include <cxlmem.h>
>> #include "trace.h"
>> #include "core.h"
>> @@ -562,9 +563,16 @@ static const struct device_type cxl_memdev_type = {
>> .groups = cxl_memdev_attribute_groups,
>> };
>>
>> +static const struct device_type cxl_accel_memdev_type = {
>> + .name = "cxl_accel_memdev",
>> + .release = cxl_memdev_release,
>> + .devnode = cxl_memdev_devnode,
>> +};
>> +
>> bool is_cxl_memdev(const struct device *dev)
>> {
>> - return dev->type == &cxl_memdev_type;
>> + return (dev->type == &cxl_memdev_type ||
>> + dev->type == &cxl_accel_memdev_type);
>> }
>> EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
>>
>> @@ -689,7 +697,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>> dev->parent = cxlds->dev;
>> dev->bus = &cxl_bus_type;
>> dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
>> - dev->type = &cxl_memdev_type;
>> + if (cxlds->type == CXL_DEVTYPE_DEVMEM)
>> + dev->type = &cxl_accel_memdev_type;
>> + else
>> + dev->type = &cxl_memdev_type;
>> device_set_pm_not_required(dev);
>> INIT_WORK(&cxlmd->detach_work, detach_memdev);
>>
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index 9cc4337cacfb..7be51f70902a 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -88,8 +88,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
>> return is_cxl_memdev(port->uport_dev);
>> }
>>
>> -struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
>> - struct cxl_dev_state *cxlds);
>> int devm_cxl_sanitize_setup_notifier(struct device *host,
>> struct cxl_memdev *cxlmd);
>> struct cxl_memdev_state;
>> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
>> index 9675243bd05b..7f39790d9d98 100644
>> --- a/drivers/cxl/mem.c
>> +++ b/drivers/cxl/mem.c
>> @@ -130,12 +130,18 @@ static int cxl_mem_probe(struct device *dev)
>> dentry = cxl_debugfs_create_dir(dev_name(dev));
>> debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
>>
>> - if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
>> - debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
>> - &cxl_poison_inject_fops);
>> - if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
>> - debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
>> - &cxl_poison_clear_fops);
>> + /*
>> + * Avoid poison debugfs files for Type2 devices as they rely on
>> + * cxl_memdev_state.
>> + */
> I know this already has my Reviewed-by, but this comment is going to
> annoying long term. The CXL specification has already dropped "Type2" as
> a name and Linux has already called this DEVMEM, and the comment belongs
> on a helper.
>
> Just call a new cxl_memdev_poison_enable() helper unconditionally, put
> the mds NULL check inside of it and comment on that helper:
>
> /* For CLASSMEM memory expanders enable poison injection */
> cxl_memdev_poison_enable()
Sounds good.
>
>> + if (mds) {
>> + if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
>> + debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
>> + &cxl_poison_inject_fops);
>> + if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
>> + debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
>> + &cxl_poison_clear_fops);
>> + }
>>
>> rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
>> if (rc)
>> @@ -219,6 +225,13 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
>> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
>> struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
>>
>> + /*
>> + * Avoid poison sysfs files for Type2 devices as they rely on
>> + * cxl_memdev_state.
>> + */
>> + if (!mds)
>> + return 0;
>> +
>> if (a == &dev_attr_trigger_poison_list.attr)
>> if (!test_bit(CXL_POISON_ENABLED_LIST,
>> mds->poison.enabled_cmds))
> Same here, do not sprinle an "if (!mds)" check add a:
>
> cxl_poison_attr_visible() helper and call it unconditionally in the "if
> (a == &dev_attr_trigger_poison_list.attr)" case.
I'll do so.
Thanks
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 10/22] cx/memdev: Indicate probe deferral
2025-07-28 17:45 ` dan.j.williams
2025-07-30 3:46 ` dan.j.williams
@ 2025-08-09 11:24 ` Alejandro Lucero Palau
1 sibling, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-09 11:24 UTC (permalink / raw)
To: dan.j.williams, Dave Jiang, alejandro.lucero-palau, linux-cxl,
netdev, edward.cree, davem, kuba, pabeni, edumazet
On 7/28/25 18:45, dan.j.williams@intel.com wrote:
> Alejandro Lucero Palau wrote:
> [..]
>>> Can you please explain how the accelerator driver init path is
>>> different in this instance that it requires cxl_mem driver to defer
>>> probing? Currently with a type3, the cxl_acpi driver will setup the
>>> CXL root, hostbridges and PCI root ports. At that point the memdev
>>> driver will enumerate the rest of the ports and attempt to establish
>>> the hierarchy. However if cxl_acpi is not done, the mem probe will
>>> fail. But, the cxl_acpi probe will trigger a re-probe sequence at
>>> the end when it is done. At that point, the mem probe should
>>> discover all the necessary ports if things are correct. If the
>>> accelerator init path is different, can we introduce some
>>> documentation to explain the difference?
> The biggest difference is that devm_cxl_add_memdev() is "hopeful" in the
> cxl_pci case. I.e. cxl_pci_probe() does not fail is the memory device it
> registered does not ever pass cxl_mem_probe().
>
> Accelerators are different. They want to know that the CXL side of the
> house is up and running before enabling driver features that depend on
> it. They also want to safely teardown driver functionality if CXL
> capabilities disappear.
>
> cxl_pci does not know or care if or when cxl_mem::probe() succeeds and
> cxl_mem::remove() is invoked.
>
>>> Also, it seems as long as port topology is not found, it will always
>>> go to deferred probing. At what point do we conclude that things may
>>> be missing/broken and we need to fail?
> Right, at some point the driver needs to give up on CXL ever arriving.
>
>
>> Hi Dave,
>>
>>
>> The patch commit comes from Dan's original one, so I'm afraid I can not
>> explain it better myself.
>>
>>
>> I added this patch again after Dan suggesting with cxl_acquire_endpoint
>> the initialization by a Type2 can obtain some protection against cxl_mem
>> or cxl_acpi being removed. I added later protection or handling against
>> this by the sfc driver after initialization. So this is the main reason
>> for this patch at least to me.
>>
>>
>> Regarding the goal from the original patch, being honest, I can not see
>> the cxl_acpi problem, although I'm not saying it does not exist. But it
>> is quite confusing to me and as I said in another patch regarding probe
>> deferral, supporting that option would add complexity to the current sfc
>> driver probing. If there exists another workaround for avoiding it, that
>> would be the way I prefer to follow.
> The problem is how to handle the "CXL device in PCIe-only mode" problem.
> Even with a CXL endpoint directly attached to a CXL host there is no
> guarantee that the device trains the link in CXL mode. So in addition to
> the software-dynamic problems of module loading and asynchronous driver
> bind/unbind, there is this hardware-dynamic problem.
OK, but I think this is easy to do and before reaching this point of the
driver cxl initialization. In fact, the sfc driver checks for dvsec to
be there as the first step for cxl initialization and doing nothing else
if not there. This can be changed to check for the CXL.io being in place
and not legacy pcie.
>
> I am losing my nerve with the cxl_acquire_endpoint() approach. Now that
> I see how this driver tried to use it and the questions it generated, it
> pushes too much complexity to leaf drivers. In the end, I want to
> (inspired by faux_device) get to the point where the caller can assume
> that successful devm_cxl_add_memdev() means that CXL is operational and
> any non-interleaved CXL regions have finished auto-assembly/creation.
OK
>
> To get there this needs Terry's patches that set pdev->is_cxl on all
> ancestor devices in order to make a determination that the hardware-CXL
> link is up before going to flush software CXL-link establishment.
I have commented to that and if extended to check the CXL.io status,
easy to add here, but I do not think that needs anything else. I do not
mean your changes for making this easier to use by leaf drivers not
needed, but I think you will address there a different issue and a more
complex one.
>> Adding documentation about all this would definitely help, even without
>> the Type2 case.
> I would ask that you help Terry get the protocol error handling series
> in shape as part of the dependency here is to make sure that there is a
> capable error model for CXL link events.
Sure. I'll help as much as possible there. I did review some parts in
previous versions, mainly those I have the proper understanding, but
I'll try to review all through the next days.
>
> Meanwhile, I am going to rework devm_cxl_add_memdev() to make it report
> when CXL port arrival is deferred, permanently failed, or successful.
Already studying those changes.
Thanks!
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-08-05 16:14 ` dan.j.williams
@ 2025-08-11 12:04 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-11 12:04 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 8/5/25 17:14, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> CXL region creation involves allocating capacity from device DPA
>> (device-physical-address space) and assigning it to decode a given HPA
>> (host-physical-address space). Before determining how much DPA to
>> allocate the amount of available HPA must be determined. Also, not all
>> HPA is created equal, some specifically targets RAM, some target PMEM,
>> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
>> is host-only (HDM-H).
>>
>> In order to support Type2 CXL devices, wrap all of those concerns into
>> an API that retrieves a root decoder (platform CXL window) that fits the
>> specified constraints and the capacity available for a new region.
>>
>> Add a complementary function for releasing the reference to such root
>> decoder.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/region.c | 169 ++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 3 +
>> include/cxl/cxl.h | 11 +++
>> 3 files changed, 183 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index c3f4dc244df7..03e058ab697e 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -695,6 +695,175 @@ static int free_hpa(struct cxl_region *cxlr)
>> return 0;
>> }
>>
>> +struct cxlrd_max_context {
>> + struct device * const *host_bridges;
>> + int interleave_ways;
>> + unsigned long flags;
>> + resource_size_t max_hpa;
>> + struct cxl_root_decoder *cxlrd;
>> +};
>> +
>> +static int find_max_hpa(struct device *dev, void *data)
>> +{
>> + struct cxlrd_max_context *ctx = data;
>> + struct cxl_switch_decoder *cxlsd;
>> + struct cxl_root_decoder *cxlrd;
>> + struct resource *res, *prev;
>> + struct cxl_decoder *cxld;
>> + resource_size_t max;
>> + int found = 0;
>> +
>> + if (!is_root_decoder(dev))
>> + return 0;
>> +
>> + cxlrd = to_cxl_root_decoder(dev);
>> + cxlsd = &cxlrd->cxlsd;
>> + cxld = &cxlsd->cxld;
>> +
>> + /*
>> + * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
>> + * 32 bits, the bitmap functions can be used.
>> + */
> Comments are supposed to explain the code, not repeat the code in
> natural language.
>
>> + if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
>> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
>> + cxld->flags, ctx->flags);
>> + return 0;
>> + }
> How is this easier to read than:
>
> if ((cxld->flags & ctx->flags) != ctx->flags)
> return 0;
>
> ?
I think it is not!
I'll simplify the code with your suggestion.
Thanks!
(more comments below)
>
>> +
>> + for (int i = 0; i < ctx->interleave_ways; i++) {
>> + for (int j = 0; j < ctx->interleave_ways; j++) {
>> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
>> + found++;
>> + break;
>> + }
>> + }
>> + }
>> +
>> + if (found != ctx->interleave_ways) {
>> + dev_dbg(dev,
>> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
>> + found, ctx->interleave_ways);
>> + return 0;
>> + }
>> +
>> + /*
>> + * Walk the root decoder resource range relying on cxl_region_rwsem to
>> + * preclude sibling arrival/departure and find the largest free space
>> + * gap.
>> + */
>> + lockdep_assert_held_read(&cxl_region_rwsem);
>> + res = cxlrd->res->child;
>> +
>> + /* With no resource child the whole parent resource is available */
>> + if (!res)
>> + max = resource_size(cxlrd->res);
>> + else
>> + max = 0;
>> +
>> + for (prev = NULL; res; prev = res, res = res->sibling) {
>> + struct resource *next = res->sibling;
>> + resource_size_t free = 0;
>> +
>> + /*
>> + * Sanity check for preventing arithmetic problems below as a
>> + * resource with size 0 could imply using the end field below
>> + * when set to unsigned zero - 1 or all f in hex.
>> + */
>> + if (prev && !resource_size(prev))
>> + continue;
>> +
>> + if (!prev && res->start > cxlrd->res->start) {
>> + free = res->start - cxlrd->res->start;
>> + max = max(free, max);
>> + }
>> + if (prev && res->start > prev->end + 1) {
>> + free = res->start - prev->end + 1;
>> + max = max(free, max);
>> + }
>> + if (next && res->end + 1 < next->start) {
>> + free = next->start - res->end + 1;
>> + max = max(free, max);
>> + }
>> + if (!next && res->end + 1 < cxlrd->res->end + 1) {
>> + free = cxlrd->res->end + 1 - res->end + 1;
>> + max = max(free, max);
>> + }
>> + }
> With the benefit of time to reflect, and looking at this again after all
> this time it strikes me that it is simply duplicating
> get_free_mem_region() and in a way that can still fail later.
>
> Does it simplify the implementation if this just attempts to
> allocate the capacity in each window that might support the mapping
> constraints and then pass that allocation to the region construction
> routine?
>
> Otherwise, this completes a survey of the capacity that is not
> guaranteed to be present when the region finally gets allocated.
If we use alloc_free_mem_region the resource is reserved so it will not
fail later.
But this requires a major change in the current approach since if we
keep trying to get a suitable root decoder, the one with larger
available hpa, we need to release the previous allocation once we obtain
a new one. Then, because allocated DPA will likely be smaller, another
release will be needed later on. I would say it is not going to simplify
things.
IMO, although such change makes sense and it will be needed when CXL is
hopefully massively deployed, the risk of the HPA allocation hint not
being there is quite low at this moment. The function is explaining the
potential problem as well, so I would prefer to not try to get the
perfect implementation at this point, and to leave such a improvement
for a follow-up work which I will be happy to work on.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 12/22] sfc: get endpoint decoder
2025-07-28 16:30 ` dan.j.williams
@ 2025-08-11 14:24 ` Alejandro Lucero Palau
2025-09-02 7:11 ` Alejandro Lucero Palau
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-11 14:24 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron
On 7/28/25 17:30, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for getting DPA (Device Physical Address) to use through an
>> endpoint decoder.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/Kconfig | 1 +
>> drivers/net/ethernet/sfc/efx_cxl.c | 32 +++++++++++++++++++++++++++++-
>> 2 files changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
>> index 979f2801e2a8..e959d9b4f4ce 100644
>> --- a/drivers/net/ethernet/sfc/Kconfig
>> +++ b/drivers/net/ethernet/sfc/Kconfig
>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>> config SFC_CXL
>> bool "Solarflare SFC9100-family CXL support"
>> depends on SFC && CXL_BUS >= SFC
>> + depends on CXL_REGION
>> default SFC
>> help
>> This enables SFC CXL support if the kernel is configuring CXL for
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index e2d52ed49535..c0adfd99cc78 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> + resource_size_t max_size;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> int rc;
>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>> + if (IS_ERR(cxl->endpoint))
>> + return PTR_ERR(cxl->endpoint);
> Between Terry's set, the soft reserve set, and now this, it is become
> clearer that the cxl_core needs a centralized solution to the questions
> of:
>
> - Does the platform have CXL and if so might a device ever successfully
> complete cxl_mem_probe() for a cxl_memdev that it registered?
>
> - When can a driver assume that no cxl_port topology is going to arrive?
> I.e. when to give up on probe deferral.
Hi Dan,
I think your concern is valid, but I think we are mixing things up, or
maybe it is just me getting confused, so let me to explain myself.
We have different situations to be aware of:
1) CXL topology not there or nor properly configured yet.
2) accelerator relying on pcie instead of CXL.io
3) potential removal of cxl_mem, cxl_acpi or cxl_port
4) cxl initialization failing due to dynamic modules dependencies
5) CXL errors
I think your patches in the cxl-probe-order branch will hopefully fix
the last situation.
About 2, and as I have commented in another patch review in this series,
it is possible to check and to preclude further cxl initialization. This
is the last concern you have raised, and it is valid but your proposal
in those patches are not, in my understanding, addressing it, but they
are still useful for 4.
About 3, the only way to be protected is partially during initialization
with cxl_acquire, and afer initialization with that callback you do not
like introduced in patch 18. I think we agreed those modules should not
be allowed to be removed and it requires work in the cxl core for
support that as a follow-up work.
Regarding 5, I think Terry's patchset introduces the proper handling for
this, or at least some initial work which will surely require adjustments.
Then we have the first situation which I admit is the most confusing (at
least to me). If we can solve the problem of the proper initialization
based on the probe() calls for those cxl devices to work with, the any
other explanation for specifically dealing with this situation requires
further explanation and, I guess, documentation.
AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW, I
think we should discuss this as well at some point for having same
expectations about what and how things are done, and also when) then the
kernel CXL initialization will perform its own bunch based on what the
BIOS is given. That implies CXL Root ports, downstream/upstream cxl
ports to be register, switches, ... . If I am not wrong, that depends on
subsys_initcall level, and therefore earlier than any accelerator driver
initialization. Am I right assuming once those modules are done the
kernel cxl topology/infrastructure is ready to deal with an accelerator
initializing its cxl functionality? If not, what is the problem or
problems? Is this due to longer than expected hardware initialization by
the kernel? if so, could not be leave to the BIOS somehow? is this due
to some asynchronous initialization impossible to avoid or be certain
of? If so, can we document it?
I understand with CXL could/will come complex topologies where maybe
initialization by a single host is not possible without synchronizing
with other hosts or CXL infrastructure. Is this what is all this about?
> It is also clear that a class of CXL accelerator drivers would be
> served by a simple shared routine to autocreate a region.
>
> I am going to take a stab at refactoring the current classmem case into
> a scheme that resolves automatic region assembly at
> devm_cxl_add_memdev() time in a way that can be reused to solve this
> automatic region creation problem.
>
Not sure I follow you here. But in any case, do you consider that is
necessary for this initial Type2 support?
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 20/22] sfc: create cxl region
2025-07-28 16:20 ` dan.j.williams
@ 2025-08-11 14:38 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-11 14:38 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 7/28/25 17:20, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for creating a region using the endpoint decoder related to
>> a DPA range.
>>
>> Add a callback for unwinding sfc cxl initialization when the endpoint port
>> is destroyed by potential cxl_acpi or cxl_mem modules removal.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 24 +++++++++++++++++++++++-
>> 1 file changed, 23 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index ffbf0e706330..7365effe974e 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -18,6 +18,16 @@
>>
>> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>>
>> +static void efx_release_cxl_region(void *priv_cxl)
>> +{
>> + struct efx_probe_data *probe_data = priv_cxl;
>> + struct efx_cxl *cxl = probe_data->cxl;
>> +
>> + iounmap(cxl->ctpio_cxl);
> There is no synchronization here. If someone unbinds the a cxl_port
> while the driver is using @ctpio_cxl, it looks it will cause a crash.
Yes, the unmapping should be after changing cxl_pio_initialised. I will
fix it if this mechanism stays ...
>
> The loss of CXL connectivity after the driver has already committed to
> it likely means that the whole driver needs to be shutdown, not just
> this region cleanup.
What I am trying to handle here is not an CXL error but removal of cxl
modules. If the latter happens, the driver will keep using other
datapath but not the cxl_pio memory.
About a CXL error, I really do not know what is the right thing to do
here. If further CXL.mem writes after such an error are not problematic,
then this is enough. If not, I'm afraid we can not safely deal with this
since the host/driver will be notified too late.
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 18/22] cxl: Allow region creation by type2 drivers
2025-08-05 16:33 ` dan.j.williams
@ 2025-08-11 14:45 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-11 14:45 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 8/5/25 17:33, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Creating a CXL region requires userspace intervention through the cxl
>> sysfs files. Type2 support should allow accelerator drivers to create
>> such cxl region from kernel code.
>>
>> Adding that functionality and integrating it with current support for
>> memory expanders.
>>
>> Support an action by the type2 driver to be linked to the created region
>> for unwinding the resources allocated properly.
> The hardest part of CXL is the fact that typical straight-line driver
> expectations like "device present == MMIO available" are violated. An
> accelerator driver needs to worry about asynchronous region detach and
> CXL port detach.
>
> Ideally any event that takes down a CXL port or the region simply
> results in the accelerator driver being detached to clean everything up.
>
> The difficult part about that is that the remove path for regions and
> CXL ports hold locks that prevent the accelerator remove path from
> running.
>
> I do not think it is maintainable for every accelerator driver to invent
> its own cleanup scheme like this. The expectation should be that a
> region can go into a defunct state if someone triggers removal actions
> in the wrong order, but otherwise the accelerator driver should be able
> to rely on a detach event to clean everything up.
>
> So opting into CXL operation puts a driver into a situation where it can
> be unbound whenever the CXL link goes down logically or physically.
> Physical device removal of a CXL port expects that the operator has
> first shutdown all driver operations, and if they have not at least the
> driver should not crash while awaiting the remove event.
>
> Physical CXL port removal is the "easy" case since that will naturally
> result in the accelerator 'struct pci_dev' being removed. The more
> difficult cases are the logical removal / shutdown of a CXL port or
> region. Those should schedule accelerator detach and put the region into
> an error state until that cleanup runs.
>
> So, in summary, do not allow for custom region callbacks, arrange for
> accelerator detach and just solve the "fail in-flight operations while
> awaiting detach" problem.
>
This is similar to what I mentioned in patch 20: the idea is to handle
cxl modules removal and not CXL errors.
But seizing on the error case, if the error is:
1) not fatal and related to CXL.mem: the sfc driver can keep working
without the CXL PIO memory as a datapath.
2) not fatal and related to CXL.io: I see this like current pcie error
handling.
3) fatal and CXL.mem related: probably nothing safe to do or too late
4) fatal and CXL.io: not sure if similar to 2 or to 3
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 00/22] Type2 device basic support
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
` (22 preceding siblings ...)
2025-07-25 20:51 ` [PATCH v17 00/22] Type2 device basic support dan.j.williams
@ 2025-08-27 16:48 ` PJ Waskiewicz
2025-08-28 8:02 ` Alejandro Lucero Palau
23 siblings, 1 reply; 112+ messages in thread
From: PJ Waskiewicz @ 2025-08-27 16:48 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
On Tue, 2025-06-24 at 15:13 +0100, alejandro.lucero-palau@amd.com
wrote:
Hi Alejandro,
> From: Alejandro Lucero <alucerop@amd.com>
>
> v17 changes: (Dan Williams review)
> - use devm for cxl_dev_state allocation
> - using current cxl struct for checking capability registers found
> by
> the driver.
> - simplify dpa initialization without a mailbox not supporting pmem
> - add cxl_acquire_endpoint for protection during initialization
> - add callback/action to cxl_create_region for a driver notified
> about cxl
> core kernel modules removal.
> - add sfc function to disable CXL-based PIO buffers if such a
> callback
> is invoked.
> - Always manage a Type2 created region as private not allowing DAX.
>
I've been following the patches here since your initial RFC. What
platform are you testing these on out of curiosity?
I've tried pulling the v16 patches into my test environment, and on CXL
2.0 hosts that I have access to, the patches did not work when trying
to hook up a Type 2 device. Most of it centered around many of the CXL
host registers you try poking not existing. I do have CXL-capable BIOS
firmware on these hosts, but I'm questioning that either there's still
missing firmware, or the patches are trying to touch something that
doesn't exist.
I'm working on rebasing to the v17 patches to see if this resolves what
I'm seeing. But it's a bit of a lift, so I figured I'd ask what you're
testing on before burning more time.
Eventually I'd like to either give a Tested-by or shoot back some
amended patches based on testing. But I've not been able to get that
far yet...
Cheers,
-PJ
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 22/22] sfc: support pio mapping based on cxl
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-06-27 9:46 ` Jonathan Cameron
@ 2025-08-27 17:26 ` ALOK TIWARI
1 sibling, 0 replies; 112+ messages in thread
From: ALOK TIWARI @ 2025-08-27 17:26 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
On 6/24/2025 7:43 PM, alejandro.lucero-palau@amd.com wrote:
> + rc = cxl_get_region_range(cxl->efx_region, &range);
> + if (rc) {
> + pci_err(pci_dev, "CXL getting regions params failed");
> + goto err_region_params;
> + }
> +
> + cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
> + if (!cxl->ctpio_cxl) {
> + pci_err(pci_dev, "CXL ioremap region (%pra) pfailed", &range);
pfailed -> failed
> + rc = -ENOMEM;
> + goto err_region_params;
> + }
> +
Thanks,
Alok
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 00/22] Type2 device basic support
2025-08-27 16:48 ` PJ Waskiewicz
@ 2025-08-28 8:02 ` Alejandro Lucero Palau
2025-09-04 17:48 ` PJ Waskiewicz
0 siblings, 1 reply; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-08-28 8:02 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Hi PJ,
On 8/27/25 17:48, PJ Waskiewicz wrote:
> On Tue, 2025-06-24 at 15:13 +0100, alejandro.lucero-palau@amd.com
> wrote:
>
> Hi Alejandro,
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> v17 changes: (Dan Williams review)
>> - use devm for cxl_dev_state allocation
>> - using current cxl struct for checking capability registers found
>> by
>> the driver.
>> - simplify dpa initialization without a mailbox not supporting pmem
>> - add cxl_acquire_endpoint for protection during initialization
>> - add callback/action to cxl_create_region for a driver notified
>> about cxl
>> core kernel modules removal.
>> - add sfc function to disable CXL-based PIO buffers if such a
>> callback
>> is invoked.
>> - Always manage a Type2 created region as private not allowing DAX.
>>
> I've been following the patches here since your initial RFC. What
> platform are you testing these on out of curiosity?
Most of the work was done with qemu. Nowadays, I have several system
with CXL support and Type2 BIOS support, so it has been successfully
tested there as well.
> I've tried pulling the v16 patches into my test environment, and on CXL
> 2.0 hosts that I have access to, the patches did not work when trying
> to hook up a Type 2 device. Most of it centered around many of the CXL
> host registers you try poking not existing.
Can you share the system logs and maybe run it with CXL debugging on?
> I do have CXL-capable BIOS
> firmware on these hosts, but I'm questioning that either there's still
> missing firmware, or the patches are trying to touch something that
> doesn't exist.
May I ask which system are you using? ARM/Intel/AMD/surpriseme? lspci
-vvv output would also be useful. I did find some issues with how the
BIOS we got is doing things, something I will share and work on if that
turns out to be a valid case and not a BIOS problem.
>
> I'm working on rebasing to the v17 patches to see if this resolves what
> I'm seeing. But it's a bit of a lift, so I figured I'd ask what you're
> testing on before burning more time.
>
> Eventually I'd like to either give a Tested-by or shoot back some
> amended patches based on testing. But I've not been able to get that
> far yet...
That would be really good. Let's see if we can figure out what is the
problem there.
Thank you
> Cheers,
> -PJ
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 12/22] sfc: get endpoint decoder
2025-08-11 14:24 ` Alejandro Lucero Palau
@ 2025-09-02 7:11 ` Alejandro Lucero Palau
0 siblings, 0 replies; 112+ messages in thread
From: Alejandro Lucero Palau @ 2025-09-02 7:11 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron
Hi all,
I just want to refresh this discussion below. There were not replies to
my comment, and I would like to have people's comments since I consider
this the main obstacle before sending v18.
Thanks
On 8/11/25 15:24, Alejandro Lucero Palau wrote:
>
> On 7/28/25 17:30, dan.j.williams@intel.com wrote:
>> alejandro.lucero-palau@ wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Use cxl api for getting DPA (Device Physical Address) to use through an
>>> endpoint decoder.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> ---
>>> drivers/net/ethernet/sfc/Kconfig | 1 +
>>> drivers/net/ethernet/sfc/efx_cxl.c | 32
>>> +++++++++++++++++++++++++++++-
>>> 2 files changed, 32 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/sfc/Kconfig
>>> b/drivers/net/ethernet/sfc/Kconfig
>>> index 979f2801e2a8..e959d9b4f4ce 100644
>>> --- a/drivers/net/ethernet/sfc/Kconfig
>>> +++ b/drivers/net/ethernet/sfc/Kconfig
>>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>>> config SFC_CXL
>>> bool "Solarflare SFC9100-family CXL support"
>>> depends on SFC && CXL_BUS >= SFC
>>> + depends on CXL_REGION
>>> default SFC
>>> help
>>> This enables SFC CXL support if the kernel is configuring
>>> CXL for
>>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>> index e2d52ed49535..c0adfd99cc78 100644
>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>> @@ -22,6 +22,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>> {
>>> struct efx_nic *efx = &probe_data->efx;
>>> struct pci_dev *pci_dev = efx->pci_dev;
>>> + resource_size_t max_size;
>>> struct efx_cxl *cxl;
>>> u16 dvsec;
>>> int rc;
>>> @@ -86,13 +87,42 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>>> return PTR_ERR(cxl->cxlmd);
>>> }
>>> + cxl->endpoint = cxl_acquire_endpoint(cxl->cxlmd);
>>> + if (IS_ERR(cxl->endpoint))
>>> + return PTR_ERR(cxl->endpoint);
>> Between Terry's set, the soft reserve set, and now this, it is become
>> clearer that the cxl_core needs a centralized solution to the questions
>> of:
>>
>> - Does the platform have CXL and if so might a device ever successfully
>> complete cxl_mem_probe() for a cxl_memdev that it registered?
>>
>> - When can a driver assume that no cxl_port topology is going to arrive?
>> I.e. when to give up on probe deferral.
>
>
> Hi Dan,
>
> I think your concern is valid, but I think we are mixing things up, or
> maybe it is just me getting confused, so let me to explain myself.
>
> We have different situations to be aware of:
>
>
> 1) CXL topology not there or nor properly configured yet.
>
> 2) accelerator relying on pcie instead of CXL.io
>
> 3) potential removal of cxl_mem, cxl_acpi or cxl_port
>
> 4) cxl initialization failing due to dynamic modules dependencies
>
> 5) CXL errors
>
>
> I think your patches in the cxl-probe-order branch will hopefully fix
> the last situation.
>
> About 2, and as I have commented in another patch review in this
> series, it is possible to check and to preclude further cxl
> initialization. This is the last concern you have raised, and it is
> valid but your proposal in those patches are not, in my understanding,
> addressing it, but they are still useful for 4.
>
> About 3, the only way to be protected is partially during
> initialization with cxl_acquire, and afer initialization with that
> callback you do not like introduced in patch 18. I think we agreed
> those modules should not be allowed to be removed and it requires work
> in the cxl core for support that as a follow-up work.
>
> Regarding 5, I think Terry's patchset introduces the proper handling
> for this, or at least some initial work which will surely require
> adjustments.
>
> Then we have the first situation which I admit is the most confusing
> (at least to me). If we can solve the problem of the proper
> initialization based on the probe() calls for those cxl devices to
> work with, the any other explanation for specifically dealing with
> this situation requires further explanation and, I guess, documentation.
>
> AFAIK, the BIOS will perform a good bunch of CXL initialization (BTW,
> I think we should discuss this as well at some point for having same
> expectations about what and how things are done, and also when) then
> the kernel CXL initialization will perform its own bunch based on what
> the BIOS is given. That implies CXL Root ports, downstream/upstream
> cxl ports to be register, switches, ... . If I am not wrong, that
> depends on subsys_initcall level, and therefore earlier than any
> accelerator driver initialization. Am I right assuming once those
> modules are done the kernel cxl topology/infrastructure is ready to
> deal with an accelerator initializing its cxl functionality? If not,
> what is the problem or problems? Is this due to longer than expected
> hardware initialization by the kernel? if so, could not be leave to
> the BIOS somehow? is this due to some asynchronous initialization
> impossible to avoid or be certain of? If so, can we document it?
>
> I understand with CXL could/will come complex topologies where maybe
> initialization by a single host is not possible without synchronizing
> with other hosts or CXL infrastructure. Is this what is all this about?
>
>> It is also clear that a class of CXL accelerator drivers would be
>> served by a simple shared routine to autocreate a region.
>>
>> I am going to take a stab at refactoring the current classmem case into
>> a scheme that resolves automatic region assembly at
>> devm_cxl_add_memdev() time in a way that can be reused to solve this
>> automatic region creation problem.
>>
>
> Not sure I follow you here. But in any case, do you consider that is
> necessary for this initial Type2 support?
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 15/22] cxl: Make region type based on endpoint type
2025-06-24 14:13 ` [PATCH v17 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2025-09-03 17:20 ` Davidlohr Bueso
0 siblings, 0 replies; 112+ messages in thread
From: Davidlohr Bueso @ 2025-09-03 17:20 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron, Ben Cheatham
On Tue, 24 Jun 2025, alejandro.lucero-palau@amd.com wrote:
>From: Alejandro Lucero <alucerop@amd.com>
>
>Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
>Support for Type2 implies region type needs to be based on the endpoint
s/Type2/HDM-D[B]
>type instead.
>
>Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 19/22] cxl: Avoid dax creation for accelerators
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2025-06-27 9:33 ` Jonathan Cameron
@ 2025-09-03 17:24 ` Davidlohr Bueso
1 sibling, 0 replies; 112+ messages in thread
From: Davidlohr Bueso @ 2025-09-03 17:24 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Tue, 24 Jun 2025, alejandro.lucero-palau@amd.com wrote:
>From: Alejandro Lucero <alucerop@amd.com>
>
>By definition a type2 cxl device will use the host managed memory for
>specific functionality, therefore it should not be available to other
>uses.
>
>Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
>---
> drivers/cxl/core/region.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
>diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>index 4ca5ade54ad9..e933e4ebed1c 100644
>--- a/drivers/cxl/core/region.c
>+++ b/drivers/cxl/core/region.c
>@@ -3857,6 +3857,13 @@ static int cxl_region_probe(struct device *dev)
> if (rc)
> return rc;
>
>+ /*
>+ * HDM-D[B] (device-memory) regions have accelerator specific usage.
>+ * Skip device-dax registration.
>+ */
>+ if (cxlr->type == CXL_DECODER_DEVMEM)
>+ return 0;
>+
> switch (cxlr->mode) {
> case CXL_PARTMODE_PMEM:
> return devm_cxl_add_pmem_region(cxlr);
>--
>2.34.1
>
^ permalink raw reply [flat|nested] 112+ messages in thread
* Re: [PATCH v17 00/22] Type2 device basic support
2025-08-28 8:02 ` Alejandro Lucero Palau
@ 2025-09-04 17:48 ` PJ Waskiewicz
0 siblings, 0 replies; 112+ messages in thread
From: PJ Waskiewicz @ 2025-09-04 17:48 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Hi Alejandro,
Apologies for the late reply. Totally lost the reply during the US
holiday...
On Thu, 2025-08-28 at 09:02 +0100, Alejandro Lucero Palau wrote:
> Hi PJ,
>
> On 8/27/25 17:48, PJ Waskiewicz wrote:
> > On Tue, 2025-06-24 at 15:13 +0100, alejandro.lucero-palau@amd.com
> > wrote:
> >
> > Hi Alejandro,
> >
> > > From: Alejandro Lucero <alucerop@amd.com>
> > >
> > > v17 changes: (Dan Williams review)
> > > - use devm for cxl_dev_state allocation
> > > - using current cxl struct for checking capability registers
> > > found
> > > by
> > > the driver.
> > > - simplify dpa initialization without a mailbox not supporting
> > > pmem
> > > - add cxl_acquire_endpoint for protection during initialization
> > > - add callback/action to cxl_create_region for a driver
> > > notified
> > > about cxl
> > > core kernel modules removal.
> > > - add sfc function to disable CXL-based PIO buffers if such a
> > > callback
> > > is invoked.
> > > - Always manage a Type2 created region as private not allowing
> > > DAX.
> > >
> > I've been following the patches here since your initial RFC. What
> > platform are you testing these on out of curiosity?
>
>
> Most of the work was done with qemu. Nowadays, I have several system
> with CXL support and Type2 BIOS support, so it has been successfully
> tested there as well.
I also have a number of systems with Type2 support enabled in the BIOS,
spread between multiple uarch versions of Intel and AMD (EMR/GNR,
Genoa/Turin).
>
> > I've tried pulling the v16 patches into my test environment, and on
> > CXL
> > 2.0 hosts that I have access to, the patches did not work when
> > trying
> > to hook up a Type 2 device. Most of it centered around many of the
> > CXL
> > host registers you try poking not existing.
>
>
> Can you share the system logs and maybe run it with CXL debugging on?
What system logs are you referring to? dmesg? Also what CXL
debugging? Just enabling the dev_dbg() paths for the CXL modules?
>
> > I do have CXL-capable BIOS
> > firmware on these hosts, but I'm questioning that either there's
> > still
> > missing firmware, or the patches are trying to touch something that
> > doesn't exist.
>
>
> May I ask which system are you using? ARM/Intel/AMD/surpriseme? lspci
> -vvv output would also be useful. I did find some issues with how the
> BIOS we got is doing things, something I will share and work on if
> that
> turns out to be a valid case and not a BIOS problem.
I've been lately testing on an Intel GNR and an AMD Turin. Let's just
say we can focus on the CRB's from both of them, so I have BIOS's
directly from the CPU vendors (there are other OEM vendors in the mix,
same results, but we'll leave them out for now).
We have our Type2 device that successfully links/trains CXL protocols
(all of them), and have been working for some time on previous gen's as
well (SPR/EMR/Genoa). I can't share the full output of lspci due to
this being a proprietary device, but link caps show the .mem and other
protocols fully linked/trained. I also have the .mem acceleration
region mapped currently by our drivers directly.
What I'm running into is very early in the driver bringup when
migrating to the new API you have presented with the refactors of the
CXL core. In my driver's .probe() function (assume this is a pci_dev),
I have the following beginning flow:
- pci_find_dvsec_capability() (returns the correct field pointer)
- cxl_dev_state_create(..., CXL_DEVTYPE_DEVMEM, ...) - succeeds
- cxl_pci_accel_setup_regs() - fails to detect accelerated registers
- cxl_mem_dpa_init()
- cxl_dpa_setup() - returns failure
This is where the wheels have already flown off. Note that this is
with the V16 patches, so I'm not sure if there was something resolved
between those and the V17 patches. I'm working right now on geting the
V17 patches running on my Purico Turin box. But if there's a specific
BIOS I would need to target for the Purico CRB, that would be useful
information to have as well. My Purico box is running BIOS Revision
5.33.
>
> >
> > I'm working on rebasing to the v17 patches to see if this resolves
> > what
> > I'm seeing. But it's a bit of a lift, so I figured I'd ask what
> > you're
> > testing on before burning more time.
> >
> > Eventually I'd like to either give a Tested-by or shoot back some
> > amended patches based on testing. But I've not been able to get
> > that
> > far yet...
>
>
> That would be really good. Let's see if we can figure out what is the
> problem there.
Sounds like a plan to me. Thanks for doing the heavy lifting here on
these patches.
Cheers,
-PJ
^ permalink raw reply [flat|nested] 112+ messages in thread
end of thread, other threads:[~2025-09-04 17:48 UTC | newest]
Thread overview: 112+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-24 14:13 [PATCH v17 00/22] Type2 device basic support alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-06-25 14:06 ` Jonathan Cameron
2025-06-30 14:38 ` Alejandro Lucero Palau
2025-07-25 21:46 ` dan.j.williams
2025-08-05 10:45 ` Alejandro Lucero Palau
2025-08-05 15:14 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 02/22] sfc: add cxl support alejandro.lucero-palau
2025-06-25 16:37 ` Jonathan Cameron
2025-06-30 14:52 ` Alejandro Lucero Palau
2025-06-30 14:55 ` Alejandro Lucero Palau
2025-06-30 16:07 ` Jonathan Cameron
2025-07-25 22:16 ` dan.j.williams
2025-08-06 8:37 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 03/22] cxl: Move pci generic code alejandro.lucero-palau
2025-07-25 22:41 ` dan.j.williams
2025-08-06 8:46 ` Alejandro Lucero Palau
2025-08-06 9:31 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 04/22] cxl: allow Type2 drivers to map cxl component regs alejandro.lucero-palau
2025-06-27 8:27 ` Jonathan Cameron
2025-07-25 22:55 ` dan.j.williams
2025-07-28 16:23 ` Dave Jiang
2025-08-06 9:43 ` Alejandro Lucero Palau
2025-08-06 9:41 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 05/22] sfc: setup cxl component regs and set media ready alejandro.lucero-palau
2025-06-27 8:39 ` Jonathan Cameron
2025-06-30 15:57 ` Alejandro Lucero Palau
2025-08-08 13:11 ` Alejandro Lucero Palau
2025-06-27 8:45 ` Jonathan Cameron
2025-08-08 13:14 ` Alejandro Lucero Palau
2025-07-25 23:04 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 06/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-06-27 8:42 ` Jonathan Cameron
2025-06-27 16:43 ` Dave Jiang
2025-07-01 15:23 ` Alejandro Lucero Palau
2025-06-27 8:43 ` Jonathan Cameron
2025-07-01 15:25 ` Alejandro Lucero Palau
2025-07-26 0:54 ` dan.j.williams
2025-06-24 14:13 ` [PATCH v17 07/22] sfc: initialize dpa alejandro.lucero-palau
2025-07-26 0:55 ` dan.j.williams
2025-08-08 16:59 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 08/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-07-26 1:05 ` dan.j.williams
2025-08-08 17:01 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 09/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-06-27 8:51 ` Jonathan Cameron
2025-07-01 15:30 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 10/22] cx/memdev: Indicate probe deferral alejandro.lucero-palau
2025-06-27 8:59 ` Jonathan Cameron
2025-06-27 9:42 ` Jonathan Cameron
2025-07-01 15:30 ` Alejandro Lucero Palau
2025-06-27 18:17 ` Dave Jiang
2025-06-30 16:20 ` Jonathan Cameron
2025-07-01 16:07 ` Alejandro Lucero Palau
2025-07-01 16:25 ` Dave Jiang
2025-07-01 16:44 ` Jonathan Cameron
2025-07-01 16:02 ` Alejandro Lucero Palau
2025-07-28 17:45 ` dan.j.williams
2025-07-30 3:46 ` dan.j.williams
2025-08-09 11:24 ` Alejandro Lucero Palau
2025-07-16 22:52 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-06-27 22:42 ` Dave Jiang
2025-07-04 14:45 ` Alejandro Lucero Palau
2025-08-05 16:14 ` dan.j.williams
2025-08-11 12:04 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 12/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27 9:10 ` Jonathan Cameron
2025-07-04 14:51 ` Alejandro Lucero Palau
2025-07-28 16:30 ` dan.j.williams
2025-08-11 14:24 ` Alejandro Lucero Palau
2025-09-02 7:11 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-06-27 9:06 ` Jonathan Cameron
2025-07-04 15:18 ` Alejandro Lucero Palau
2025-06-27 20:46 ` Dave Jiang
2025-07-04 15:21 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-06-27 9:11 ` Jonathan Cameron
2025-07-07 11:24 ` Alejandro Lucero Palau
2025-07-16 23:48 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-09-03 17:20 ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-06-27 9:13 ` Jonathan Cameron
2025-06-27 23:05 ` Dave Jiang
2025-06-30 16:20 ` Jonathan Cameron
2025-06-30 16:34 ` Dave Jiang
2025-06-24 14:13 ` [PATCH v17 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-06-27 9:32 ` Jonathan Cameron
2025-07-07 11:31 ` Alejandro Lucero Palau
2025-08-05 16:33 ` dan.j.williams
2025-08-11 14:45 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 19/22] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2025-06-27 9:33 ` Jonathan Cameron
2025-09-03 17:24 ` Davidlohr Bueso
2025-06-24 14:13 ` [PATCH v17 20/22] sfc: create cxl region alejandro.lucero-palau
2025-06-27 9:38 ` Jonathan Cameron
2025-07-07 11:37 ` Alejandro Lucero Palau
2025-07-28 16:20 ` dan.j.williams
2025-08-11 14:38 ` Alejandro Lucero Palau
2025-06-24 14:13 ` [PATCH v17 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-06-24 14:13 ` [PATCH v17 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-06-27 9:46 ` Jonathan Cameron
2025-07-07 12:06 ` Alejandro Lucero Palau
2025-08-27 17:26 ` ALOK TIWARI
2025-07-25 20:51 ` [PATCH v17 00/22] Type2 device basic support dan.j.williams
2025-07-25 21:11 ` dan.j.williams
2025-08-27 16:48 ` PJ Waskiewicz
2025-08-28 8:02 ` Alejandro Lucero Palau
2025-09-04 17:48 ` PJ Waskiewicz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).