* [PATCH v16 00/22] Type2 device basic support
@ 2025-05-14 13:27 alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
` (21 more replies)
0 siblings, 22 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
v16 changes:
- rebase against rc4 (Dave Jiang)
- remove duplicate line (Ben Cheatham)
v15 changes:
- remove reference to unused header file (Jonathan Cameron)
- add proper kernel docs to exported functions (Alison Schofield)
- using an array to map the enums to strings (Alison Schofield)
- clarify comment when using bitmap_subset (Jonathan Cameron)
- specify link to type2 support in all patches (Alison Schofield)
Patches changed (minor): 4, 11
v14 changes:
- static null initialization of bitmaps (Jonathan Cameron)
- Fixing cxl tests (Alison Schofield)
- Fixing robot compilation problems
Patches changed (minor): 1, 4, 6, 13
v13 changes:
- using names for headers checking more consistent (Jonathan Cameron)
- using helper for caps bit setting (Jonathan Cameron)
- provide generic function for reporting missing capabilities (Jonathan Cameron)
- rename cxl_pci_setup_memdev_regs to cxl_pci_accel_setup_memdev_regs (Jonathan Cameron)
- cxl_dpa_info size to be set by the Type2 driver (Jonathan Cameron)
- avoiding rc variable when possible (Jonathan Cameron)
- fix spelling (Simon Horman)
- use scoped_guard (Dave Jiang)
- use enum instead of bool (Dave Jiang)
- dropping patch with hardware symbols
v12 changes:
- use new macro cxl_dev_state_create in pci driver (Ben Cheatham)
- add public/private sections in now exported cxl_dev_state struct (Ben
Cheatham)
- fix cxl/pci.h regarding file name for checking if defined
- Clarify capabilities found vs expected in error message. (Ben
Cheatham)
- Clarify new CXL_DECODER_F flag (Ben Cheatham)
- Fix changes about cxl memdev creation support moving code to the
proper patch. (Ben Cheatham)
- Avoid debug and function duplications (Ben Cheatham)
- Fix robot compilation error reported by Simon Horman as well.
- Add doc about new param in clx_create_region (Simon Horman).
v11 changes:
- Dropping the use of cxl_memdev_state and going back to using
cxl_dev_state.
- Using a helper for an accel driver to allocate its own cxl-related
struct embedding cxl_dev_state.
- Exporting the required structs in include/cxl/cxl.h for an accel
driver being able to know the cxl_dev_state size required in the
previously mentioned helper for allocation.
- Avoid using any struct for dpa initialization by the accel driver
adding a specific function for creating dpa partitions by accel
drivers without a mailbox.
v10 changes:
- Using cxl_memdev_state instead of cxl_dev_state for type2 which has a
memory after all and facilitates the setup.
- Adapt core for using cxl_memdev_state allowing accel drivers to work
with them without further awareness of internal cxl structs.
- Using last DPA changes for creating DPA partitions with accel driver
hardcoding mds values when no mailbox.
- capabilities not a new field but built up when current register maps
is performed and returned to the caller for checking.
- HPA free space supporting interleaving.
- DPA free space droping max-min for a simple alloc size.
v9 changes:
- adding forward definitions (Jonathan Cameron)
- using set_bit instead of bitmap_set (Jonathan Cameron)
- fix rebase problem (Jonathan Cameron)
- Improve error path (Jonathan Cameron)
- fix build problems with cxl region dependency (robot)
- fix error path (Simon Horman)
v8 changes:
- Change error path labeling inside sfc cxl code (Edward Cree)
- Properly handling checks and error in sfc cxl code (Simon Horman)
- Fix bug when checking resource_size (Simon Horman)
- Avoid bisect problems reordering patches (Edward Cree)
- Fix buffer allocation size in sfc (Simon Horman)
v7 changes:
- fixing kernel test robot complains
- fix type with Type3 mandatory capabilities (Zhi Wang)
- optimize code in cxl_request_resource (Kalesh Anakkur Purayil)
- add sanity check when dealing with resources arithmetics (Fan Ni)
- fix typos and blank lines (Fan Ni)
- keep previous log errors/warnings in sfc driver (Martin Habets)
- add WARN_ON_ONCE if region given is NULL
v6 changes:
- update sfc mcdi_pcol.h with full hardware changes most not related to
this patchset. This is an automatic file created from hardware design
changes and not touched by software. It is updated from time to time
and it required update for the sfc driver CXL support.
- remove CXL capabilities definitions not used by the patchset or
previous kernel code. (Dave Jiang, Jonathan Cameron)
- Use bitmap_subset instead of reinventing the wheel ... (Ben Cheatham)
- Use cxl_accel_memdev for new device_type created (Ben Cheatham)
- Fix construct_region use of rwsem (Zhi Wang)
- Obtain region range instead of region params (Allison Schofield, Dave
Jiang)
v5 changes:
- Fix SFC configuration based on kernel CXL configuration
- Add subset check for capabilities.
- fix region creation when HDM decoders programmed by firmware/BIOS (Ben
Cheatham)
- Add option for creating dax region based on driver decission (Ben
Cheatham)
- Using sfc probe_data struct for keeping sfc cxl data
v4 changes:
- Use bitmap for capabilities new field (Jonathan Cameron)
- Use cxl_mem attributes for sysfs based on device type (Dave Jian)
- Add conditional cxl sfc compilation relying on kernel CXL config (kernel test robot)
- Add sfc changes in different patches for facilitating backport (Jonathan Cameron)
- Remove patch for dealing with cxl modules dependencies and using sfc kconfig plus
MODULE_SOFTDEP instead.
v3 changes:
- cxl_dev_state not defined as opaque but only manipulated by accel drivers
through accessors.
- accessors names not identified as only for accel drivers.
- move pci code from pci driver (drivers/cxl/pci.c) to generic pci code
(drivers/cxl/core/pci.c).
- capabilities field from u8 to u32 and initialised by CXL regs discovering
code.
- add capabilities check and removing current check by CXL regs discovering
code.
- Not fail if CXL Device Registers not found. Not mandatory for Type2.
- add timeout in acquire_endpoint for solving a race with the endpoint port
creation.
- handle EPROBE_DEFER by sfc driver.
- Limiting interleave ways to 1 for accel driver HPA/DPA requests.
- factoring out interleave ways and granularity helpers from type2 region
creation patch.
- restricting region_creation for type2 to one endpoint decoder.
- add accessor for release_resource.
- handle errors and errors messages properly.
v2 changes:
I have removed the introduction about the concerns with BIOS/UEFI after the
discussion leading to confirm the need of the functionality implemented, at
least is some scenarios.
There are two main changes from the RFC:
1) Following concerns about drivers using CXL core without restrictions, the CXL
struct to work with is opaque to those drivers, therefore functions are
implemented for modifying or reading those structs indirectly.
2) The driver for using the added functionality is not a test driver but a real
one: the SFC ethernet network driver. It uses the CXL region mapped for PIO
buffers instead of regions inside PCIe BARs.
RFC:
Current CXL kernel code is focused on supporting Type3 CXL devices, aka memory
expanders. Type2 CXL devices, aka device accelerators, share some functionalities
but require some special handling.
First of all, Type2 are by definition specific to drivers doing something and not just
a memory expander, so it is expected to work with the CXL specifics. This implies the CXL
setup needs to be done by such a driver instead of by a generic CXL PCI driver
as for memory expanders. Most of such setup needs to use current CXL core code
and therefore needs to be accessible to those vendor drivers. This is accomplished
exporting opaque CXL structs and adding and exporting functions for working with
those structs indirectly.
Some of the patches are based on a patchset sent by Dan Williams [1] which was just
partially integrated, most related to making things ready for Type2 but none
related to specific Type2 support. Those patches based on Dan´s work have Dan´s
signing as co-developer, and a link to the original patch.
A final note about CXL.cache is needed. This patchset does not cover it at all,
although the emulated Type2 device advertises it. From the kernel point of view
supporting CXL.cache will imply to be sure the CXL path supports what the Type2
device needs. A device accelerator will likely be connected to a Root Switch,
but other configurations can not be discarded. Therefore the kernel will need to
check not just HPA, DPA, interleave and granularity, but also the available
CXL.cache support and resources in each switch in the CXL path to the Type2
device. I expect to contribute to this support in the following months, and
it would be good to discuss about it when possible.
[1] https://lore.kernel.org/linux-cxl/98b1f61a-e6c2-71d4-c368-50d958501b0c@intel.com/T/
Alejandro Lucero (22):
cxl: Add type2 device basic support
sfc: add cxl support
cxl: Move pci generic code
cxl: Move register/capability check to driver
cxl: Add function for type2 cxl regs setup
sfc: make regs setup with checking and set media ready
cxl: Support dpa initialization without a mailbox
sfc: initialize dpa
cxl: Prepare memdev creation for type2
sfc: create type2 cxl memdev
cxl: Define a driver interface for HPA free space enumeration
sfc: obtain root decoder with enough HPA free space
cxl: Define a driver interface for DPA allocation
sfc: get endpoint decoder
cxl: Make region type based on endpoint type
cxl/region: Factor out interleave ways setup
cxl/region: Factor out interleave granularity setup
cxl: Allow region creation by type2 drivers
cxl: Add region flag for precluding a device memory to be used for dax
sfc: create cxl region
cxl: Add function for obtaining region range
sfc: support pio mapping based on cxl
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/hdm.c | 86 +++++
drivers/cxl/core/mbox.c | 37 ++-
drivers/cxl/core/memdev.c | 47 ++-
drivers/cxl/core/pci.c | 162 ++++++++++
drivers/cxl/core/port.c | 8 +-
drivers/cxl/core/region.c | 432 +++++++++++++++++++++++---
drivers/cxl/core/regs.c | 40 ++-
drivers/cxl/cxl.h | 111 +------
drivers/cxl/cxlmem.h | 103 +-----
drivers/cxl/cxlpci.h | 23 +-
drivers/cxl/mem.c | 25 +-
drivers/cxl/pci.c | 111 ++-----
drivers/cxl/port.c | 5 +-
drivers/net/ethernet/sfc/Kconfig | 10 +
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/ef10.c | 50 ++-
drivers/net/ethernet/sfc/efx.c | 15 +-
drivers/net/ethernet/sfc/efx_cxl.c | 158 ++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++
drivers/net/ethernet/sfc/net_driver.h | 12 +
drivers/net/ethernet/sfc/nic.h | 3 +
include/cxl/cxl.h | 292 +++++++++++++++++
include/cxl/pci.h | 36 +++
tools/testing/cxl/Kbuild | 1 -
tools/testing/cxl/test/mem.c | 3 +-
tools/testing/cxl/test/mock.c | 17 -
27 files changed, 1413 insertions(+), 417 deletions(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
base-commit: b4432656b36e5cc1d50a1f2dc15357543add530e
--
2.34.1
^ permalink raw reply [flat|nested] 84+ messages in thread
* [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:43 ` Alison Schofield
2025-05-20 7:17 ` dan.j.williams
2025-05-14 13:27 ` [PATCH v16 02/22] sfc: add cxl support alejandro.lucero-palau
` (20 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Differentiate CXL memory expanders (type 3) from CXL device accelerators
(type 2) with a new function for initializing cxl_dev_state and a macro
for helping accel drivers to embed cxl_dev_state inside a private
struct.
Move structs to include/cxl as the size of the accel driver private
struct embedding cxl_dev_state needs to know the size of this struct.
Use same new initialization with the type3 pci driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/mbox.c | 11 +-
drivers/cxl/core/memdev.c | 32 +++++
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxl.h | 97 +--------------
drivers/cxl/cxlmem.h | 88 +-------------
drivers/cxl/cxlpci.h | 21 ----
drivers/cxl/pci.c | 17 +--
include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
include/cxl/pci.h | 23 ++++
tools/testing/cxl/test/mem.c | 3 +-
11 files changed, 305 insertions(+), 215 deletions(-)
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index d72764056ce6..ab994d459f46 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1484,23 +1484,20 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec)
{
struct cxl_memdev_state *mds;
int rc;
- mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
+ mds = cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial, dvsec,
+ struct cxl_memdev_state, cxlds, true);
if (!mds) {
dev_err(dev, "No memory available\n");
return ERR_PTR(-ENOMEM);
}
mutex_init(&mds->event.log_lock);
- mds->cxlds.dev = dev;
- mds->cxlds.reg_map.host = dev;
- mds->cxlds.cxl_mbox.host = dev;
- mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
- mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
if (rc == -EOPNOTSUPP)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index a16a5886d40a..6cc732aeb9de 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -633,6 +633,38 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
+void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+ enum cxl_devtype type, u64 serial, u16 dvsec,
+ bool has_mbox)
+{
+ *cxlds = (struct cxl_dev_state) {
+ .dev = dev,
+ .type = type,
+ .serial = serial,
+ .cxl_dvsec = dvsec,
+ .reg_map.host = dev,
+ .reg_map.resource = CXL_RESOURCE_NONE,
+ };
+
+ if (has_mbox)
+ cxlds->cxl_mbox.host = dev;
+}
+
+struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type, u64 serial,
+ u16 dvsec, size_t size,
+ bool has_mbox)
+{
+ struct cxl_dev_state *cxlds __free(kfree) = kzalloc(size, GFP_KERNEL);
+
+ if (!cxlds)
+ return NULL;
+
+ cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
+ return_ptr(cxlds);
+}
+EXPORT_SYMBOL_NS_GPL(_cxl_dev_state_create, "CXL");
+
static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
const struct file_operations *fops)
{
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 3b80e9a76ba8..0eb339c91413 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -7,6 +7,7 @@
#include <linux/pci.h>
#include <linux/pci-doe.h>
#include <linux/aer.h>
+#include <cxl/pci.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..ecdb22ae6952 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <pmu.h>
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index a9ab46eb0610..844dc0782a5f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -11,6 +11,7 @@
#include <linux/log2.h>
#include <linux/node.h>
#include <linux/io.h>
+#include <cxl/cxl.h>
extern const struct nvdimm_security_ops *cxl_security_ops;
@@ -200,97 +201,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
-/*
- * Using struct_group() allows for per register-block-type helper routines,
- * without requiring block-type agnostic code to include the prefix.
- */
-struct cxl_regs {
- /*
- * Common set of CXL Component register block base pointers
- * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
- * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
- */
- struct_group_tagged(cxl_component_regs, component,
- void __iomem *hdm_decoder;
- void __iomem *ras;
- );
- /*
- * Common set of CXL Device register block base pointers
- * @status: CXL 2.0 8.2.8.3 Device Status Registers
- * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
- * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
- */
- struct_group_tagged(cxl_device_regs, device_regs,
- void __iomem *status, *mbox, *memdev;
- );
-
- struct_group_tagged(cxl_pmu_regs, pmu_regs,
- void __iomem *pmu;
- );
-
- /*
- * RCH downstream port specific RAS register
- * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
- */
- struct_group_tagged(cxl_rch_regs, rch_regs,
- void __iomem *dport_aer;
- );
-
- /*
- * RCD upstream port specific PCIe cap register
- * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
- */
- struct_group_tagged(cxl_rcd_regs, rcd_regs,
- void __iomem *rcd_pcie_cap;
- );
-};
-
-struct cxl_reg_map {
- bool valid;
- int id;
- unsigned long offset;
- unsigned long size;
-};
-
-struct cxl_component_reg_map {
- struct cxl_reg_map hdm_decoder;
- struct cxl_reg_map ras;
-};
-
-struct cxl_device_reg_map {
- struct cxl_reg_map status;
- struct cxl_reg_map mbox;
- struct cxl_reg_map memdev;
-};
-
-struct cxl_pmu_reg_map {
- struct cxl_reg_map pmu;
-};
-
-/**
- * struct cxl_register_map - DVSEC harvested register block mapping parameters
- * @host: device for devm operations and logging
- * @base: virtual base of the register-block-BAR + @block_offset
- * @resource: physical resource base of the register block
- * @max_size: maximum mapping size to perform register search
- * @reg_type: see enum cxl_regloc_type
- * @component_map: cxl_reg_map for component registers
- * @device_map: cxl_reg_maps for device registers
- * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
- */
-struct cxl_register_map {
- struct device *host;
- void __iomem *base;
- resource_size_t resource;
- resource_size_t max_size;
- u8 reg_type;
- union {
- struct cxl_component_reg_map component_map;
- struct cxl_device_reg_map device_map;
- struct cxl_pmu_reg_map pmu_map;
- };
-};
-
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
@@ -482,11 +392,6 @@ struct cxl_region_params {
resource_size_t cache_size;
};
-enum cxl_partition_mode {
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 3ec6b906371b..e7cd31b9f107 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -7,6 +7,7 @@
#include <linux/cdev.h>
#include <linux/uuid.h>
#include <linux/node.h>
+#include <cxl/cxl.h>
#include <cxl/event.h>
#include <cxl/mailbox.h>
#include "cxl.h"
@@ -357,87 +358,6 @@ struct cxl_security_state {
struct kernfs_node *sanitize_node;
};
-/*
- * enum cxl_devtype - delineate type-2 from a generic type-3 device
- * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
- * HDM-DB, no requirement that this device implements a
- * mailbox, or other memory-device-standard manageability
- * flows.
- * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
- * HDM-H and class-mandatory memory device registers
- */
-enum cxl_devtype {
- CXL_DEVTYPE_DEVMEM,
- CXL_DEVTYPE_CLASSMEM,
-};
-
-/**
- * struct cxl_dpa_perf - DPA performance property entry
- * @dpa_range: range for DPA address
- * @coord: QoS performance data (i.e. latency, bandwidth)
- * @cdat_coord: raw QoS performance data from CDAT
- * @qos_class: QoS Class cookies
- */
-struct cxl_dpa_perf {
- struct range dpa_range;
- struct access_coordinate coord[ACCESS_COORDINATE_MAX];
- struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
- int qos_class;
-};
-
-/**
- * struct cxl_dpa_partition - DPA partition descriptor
- * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
- * @perf: performance attributes of the partition from CDAT
- * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
- */
-struct cxl_dpa_partition {
- struct resource res;
- struct cxl_dpa_perf perf;
- enum cxl_partition_mode mode;
-};
-
-/**
- * struct cxl_dev_state - The driver device state
- *
- * cxl_dev_state represents the CXL driver/device state. It provides an
- * interface to mailbox commands as well as some cached data about the device.
- * Currently only memory devices are represented.
- *
- * @dev: The device associated with this CXL state
- * @cxlmd: The device representing the CXL.mem capabilities of @dev
- * @reg_map: component and ras register mapping parameters
- * @regs: Parsed register blocks
- * @cxl_dvsec: Offset to the PCIe device DVSEC
- * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
- * @media_ready: Indicate whether the device media is usable
- * @dpa_res: Overall DPA resource tree for the device
- * @part: DPA partition array
- * @nr_partitions: Number of DPA partitions
- * @serial: PCIe Device Serial Number
- * @type: Generic Memory Class device or Vendor Specific Memory device
- * @cxl_mbox: CXL mailbox context
- * @cxlfs: CXL features context
- */
-struct cxl_dev_state {
- struct device *dev;
- struct cxl_memdev *cxlmd;
- struct cxl_register_map reg_map;
- struct cxl_regs regs;
- int cxl_dvsec;
- bool rcd;
- bool media_ready;
- struct resource dpa_res;
- struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
- unsigned int nr_partitions;
- u64 serial;
- enum cxl_devtype type;
- struct cxl_mailbox cxl_mbox;
-#ifdef CONFIG_CXL_FEATURES
- struct cxl_features_state *cxlfs;
-#endif
-};
-
static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
{
/*
@@ -833,7 +753,11 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec);
+void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+ enum cxl_devtype type, u64 serial, u16 dvsec,
+ bool has_mbox);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 54e219b0049e..570e53e26f11 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -7,29 +7,8 @@
#define CXL_MEMORY_PROGIF 0x10
-/*
- * See section 8.1 Configuration Space Registers in the CXL 2.0
- * Specification. Names are taken straight from the specification with "CXL" and
- * "DVSEC" redundancies removed. When obvious, abbreviations may be used.
- */
#define PCI_DVSEC_HEADER1_LENGTH_MASK GENMASK(31, 20)
-/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
-#define CXL_DVSEC_PCIE_DEVICE 0
-#define CXL_DVSEC_CAP_OFFSET 0xA
-#define CXL_DVSEC_MEM_CAPABLE BIT(2)
-#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
-#define CXL_DVSEC_CTRL_OFFSET 0xC
-#define CXL_DVSEC_MEM_ENABLE BIT(2)
-#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + (i * 0x10))
-#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + (i * 0x10))
-#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
-#define CXL_DVSEC_MEM_ACTIVE BIT(1)
-#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
-#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + (i * 0x10))
-#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + (i * 0x10))
-#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
-
#define CXL_DVSEC_RANGE_MAX 2
/* CXL 2.0 8.1.4: Non-CXL Function Map DVSEC */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 785aa2af5eaa..0d3c67867965 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,8 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/cxl.h>
+#include <cxl/pci.h>
#include <cxl/mailbox.h>
#include "cxlmem.h"
#include "cxlpci.h"
@@ -911,6 +913,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
int rc, pmu_count;
unsigned int i;
bool irq_avail;
+ u16 dvsec;
/*
* Double check the anonymous union trickery in struct cxl_regs
@@ -924,19 +927,19 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return rc;
pci_set_master(pdev);
- mds = cxl_memdev_state_create(&pdev->dev);
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (!dvsec)
+ dev_warn(&pdev->dev,
+ "Device DVSEC not present, skip CXL.mem init\n");
+
+ mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
if (IS_ERR(mds))
return PTR_ERR(mds);
cxlds = &mds->cxlds;
pci_set_drvdata(pdev, cxlds);
cxlds->rcd = is_cxl_restricted(pdev);
- cxlds->serial = pci_get_dsn(pdev);
- cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, CXL_DVSEC_PCIE_DEVICE);
- if (!cxlds->cxl_dvsec)
- dev_warn(&pdev->dev,
- "Device DVSEC not present, skip CXL.mem init\n");
rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
if (rc)
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..b7f79313409b
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. */
+/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
+
+#ifndef __CXL_CXL_H__
+#define __CXL_CXL_H__
+
+#include <linux/node.h>
+#include <linux/ioport.h>
+#include <cxl/mailbox.h>
+
+/**
+ * enum cxl_devtype - delineate type-2 from a generic type-3 device
+ * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
+ * HDM-DB, no requirement that this device implements a
+ * mailbox, or other memory-device-standard manageability
+ * flows.
+ * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
+ * HDM-H and class-mandatory memory device registers
+ */
+enum cxl_devtype {
+ CXL_DEVTYPE_DEVMEM,
+ CXL_DEVTYPE_CLASSMEM,
+};
+
+struct device;
+
+/*
+ * Using struct_group() allows for per register-block-type helper routines,
+ * without requiring block-type agnostic code to include the prefix.
+ */
+struct cxl_regs {
+ /*
+ * Common set of CXL Component register block base pointers
+ * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
+ * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ */
+ struct_group_tagged(cxl_component_regs, component,
+ void __iomem *hdm_decoder;
+ void __iomem *ras;
+ );
+ /*
+ * Common set of CXL Device register block base pointers
+ * @status: CXL 2.0 8.2.8.3 Device Status Registers
+ * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
+ * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
+ */
+ struct_group_tagged(cxl_device_regs, device_regs,
+ void __iomem *status, *mbox, *memdev;
+ );
+
+ struct_group_tagged(cxl_pmu_regs, pmu_regs,
+ void __iomem *pmu;
+ );
+
+ /*
+ * RCH downstream port specific RAS register
+ * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
+ */
+ struct_group_tagged(cxl_rch_regs, rch_regs,
+ void __iomem *dport_aer;
+ );
+
+ /*
+ * RCD upstream port specific PCIe cap register
+ * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
+ */
+ struct_group_tagged(cxl_rcd_regs, rcd_regs,
+ void __iomem *rcd_pcie_cap;
+ );
+};
+
+struct cxl_reg_map {
+ bool valid;
+ int id;
+ unsigned long offset;
+ unsigned long size;
+};
+
+struct cxl_component_reg_map {
+ struct cxl_reg_map hdm_decoder;
+ struct cxl_reg_map ras;
+};
+
+struct cxl_device_reg_map {
+ struct cxl_reg_map status;
+ struct cxl_reg_map mbox;
+ struct cxl_reg_map memdev;
+};
+
+struct cxl_pmu_reg_map {
+ struct cxl_reg_map pmu;
+};
+
+/**
+ * struct cxl_register_map - DVSEC harvested register block mapping parameters
+ * @host: device for devm operations and logging
+ * @base: virtual base of the register-block-BAR + @block_offset
+ * @resource: physical resource base of the register block
+ * @max_size: maximum mapping size to perform register search
+ * @reg_type: see enum cxl_regloc_type
+ * @component_map: cxl_reg_map for component registers
+ * @device_map: cxl_reg_maps for device registers
+ * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
+ */
+struct cxl_register_map {
+ struct device *host;
+ void __iomem *base;
+ resource_size_t resource;
+ resource_size_t max_size;
+ u8 reg_type;
+ union {
+ struct cxl_component_reg_map component_map;
+ struct cxl_device_reg_map device_map;
+ struct cxl_pmu_reg_map pmu_map;
+ };
+};
+
+/**
+ * struct cxl_dpa_perf - DPA performance property entry
+ * @dpa_range: range for DPA address
+ * @coord: QoS performance data (i.e. latency, bandwidth)
+ * @cdat_coord: raw QoS performance data from CDAT
+ * @qos_class: QoS Class cookies
+ */
+struct cxl_dpa_perf {
+ struct range dpa_range;
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
+ int qos_class;
+};
+
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+/**
+ * struct cxl_dev_state - The driver device state
+ *
+ * cxl_dev_state represents the CXL driver/device state. It provides an
+ * interface to mailbox commands as well as some cached data about the device.
+ * Currently only memory devices are represented.
+ *
+ * @dev: The device associated with this CXL state
+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
+ * @reg_map: component and ras register mapping parameters
+ * @regs: Parsed register blocks
+ * @cxl_dvsec: Offset to the PCIe device DVSEC
+ * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
+ * @dpa_res: Overall DPA resource tree for the device
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
+ * @serial: PCIe Device Serial Number
+ * @type: Generic Memory Class device or Vendor Specific Memory device
+ * @cxl_mbox: CXL mailbox context
+ * @cxlfs: CXL features context
+ */
+struct cxl_dev_state {
+ /* public for Type2 drivers */
+ struct device *dev;
+ struct cxl_memdev *cxlmd;
+
+ /* private for Type2 drivers */
+ struct cxl_register_map reg_map;
+ struct cxl_regs regs;
+ int cxl_dvsec;
+ bool rcd;
+ bool media_ready;
+ struct resource dpa_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
+ u64 serial;
+ enum cxl_devtype type;
+ struct cxl_mailbox cxl_mbox;
+#ifdef CONFIG_CXL_FEATURES
+ struct cxl_features_state *cxlfs;
+#endif
+};
+
+struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type, u64 serial,
+ u16 dvsec, size_t size,
+ bool has_mbox);
+
+/**
+ * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
+ * driver specific struct.
+ *
+ * @parent: device behind the request
+ * @type: CXL device type
+ * @serial: device identification
+ * @dvsec: dvsec capability offset
+ * @drv_struct: driver struct embedding a cxl_dev_state struct
+ * @member: drv_struct member as cxl_dev_state
+ * @mbox: true if mailbox supported
+ *
+ * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
+ * struct initialized.
+ *
+ * Introduced for Type2 driver support.
+ */
+#define cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
+ ({ \
+ static_assert(__same_type(struct cxl_dev_state, \
+ ((drv_struct *)NULL)->member)); \
+ static_assert(offsetof(drv_struct, member) == 0); \
+ (drv_struct *)_cxl_dev_state_create(parent, type, serial, dvsec, \
+ sizeof(drv_struct), mbox); \
+ })
+#endif /* __CXL_CXL_H__ */
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..5729a93b252a
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
+#define CXL_DVSEC_PCIE_DEVICE 0
+#define CXL_DVSEC_CAP_OFFSET 0xA
+#define CXL_DVSEC_MEM_CAPABLE BIT(2)
+#define CXL_DVSEC_HDM_COUNT_MASK GENMASK(5, 4)
+#define CXL_DVSEC_CTRL_OFFSET 0xC
+#define CXL_DVSEC_MEM_ENABLE BIT(2)
+#define CXL_DVSEC_RANGE_SIZE_HIGH(i) (0x18 + ((i) * 0x10))
+#define CXL_DVSEC_RANGE_SIZE_LOW(i) (0x1C + ((i) * 0x10))
+#define CXL_DVSEC_MEM_INFO_VALID BIT(0)
+#define CXL_DVSEC_MEM_ACTIVE BIT(1)
+#define CXL_DVSEC_MEM_SIZE_LOW_MASK GENMASK(31, 28)
+#define CXL_DVSEC_RANGE_BASE_HIGH(i) (0x20 + ((i) * 0x10))
+#define CXL_DVSEC_RANGE_BASE_LOW(i) (0x24 + ((i) * 0x10))
+#define CXL_DVSEC_MEM_BASE_LOW_MASK GENMASK(31, 28)
+
+#endif
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index bf9caa908f89..e62cb5049cf5 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1717,7 +1717,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- mds = cxl_memdev_state_create(dev);
+ mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
if (IS_ERR(mds))
return PTR_ERR(mds);
@@ -1733,7 +1733,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
- cxlds->serial = pdev->id + 1;
if (is_rcd(pdev))
cxlds->rcd = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 02/22] sfc: add cxl support
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 7:37 ` dan.j.williams
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
` (19 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree
From: Alejandro Lucero <alucerop@amd.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
---
drivers/net/ethernet/sfc/Kconfig | 9 +++++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/efx.c | 15 +++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 55 +++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
drivers/net/ethernet/sfc/net_driver.h | 10 +++++
6 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index c4c43434f314..979f2801e2a8 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
Driver-Interface) commands and responses, allowing debugging of
driver/firmware interaction. The tracing is actually enabled by
a sysfs file 'mcdi_logging' under the PCI device.
+config SFC_CXL
+ bool "Solarflare SFC9100-family CXL support"
+ depends on SFC && CXL_BUS >= SFC
+ default SFC
+ help
+ This enables SFC CXL support if the kernel is configuring CXL for
+ using CTPIO with CXL.mem. The SFC device with CXL support and
+ with a CXL-aware firmware can be used for minimizing latencies
+ when sending through CTPIO.
source "drivers/net/ethernet/sfc/falcon/Kconfig"
source "drivers/net/ethernet/sfc/siena/Kconfig"
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index d99039ec468d..bb0f1891cde6 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
mae.o tc.o tc_bindings.o tc_counters.o \
tc_encap_actions.o tc_conntrack.o
+sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
obj-$(CONFIG_SFC) += sfc.o
obj-$(CONFIG_SFC_FALCON) += falcon/
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 112e55b98ed3..537668278375 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -34,6 +34,7 @@
#include "selftest.h"
#include "sriov.h"
#include "efx_devlink.h"
+#include "efx_cxl.h"
#include "mcdi_port_common.h"
#include "mcdi_pcol.h"
@@ -981,12 +982,15 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx_pci_remove_main(efx);
efx_fini_io(efx);
+
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ efx_cxl_exit(probe_data);
+
pci_dbg(efx->pci_dev, "shutdown successful\n");
efx_fini_devlink_and_unlock(efx);
efx_fini_struct(efx);
free_netdev(efx->net_dev);
- probe_data = container_of(efx, struct efx_probe_data, efx);
kfree(probe_data);
};
@@ -1190,6 +1194,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
if (rc)
goto fail2;
+ /* A successful cxl initialization implies a CXL region created to be
+ * used for PIO buffers. If there is no CXL support, or initialization
+ * fails, efx_cxl_pio_initialised will be false and legacy PIO buffers
+ * defined at specific PCI BAR regions will be used.
+ */
+ rc = efx_cxl_init(probe_data);
+ if (rc)
+ pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
+
rc = efx_pci_probe_post_io(efx);
if (rc) {
/* On failure, retry once immediately.
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
new file mode 100644
index 000000000000..753d5b7d49b6
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -0,0 +1,55 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/****************************************************************************
+ *
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#include <cxl/pci.h>
+#include <linux/pci.h>
+
+#include "net_driver.h"
+#include "efx_cxl.h"
+
+#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+
+int efx_cxl_init(struct efx_probe_data *probe_data)
+{
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl;
+ u16 dvsec;
+
+ probe_data->cxl_pio_initialised = false;
+
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (!dvsec)
+ return 0;
+
+ pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
+
+ /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
+ * specifying no mbox available.
+ */
+ cxl = cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_dev->dev.id, dvsec, struct efx_cxl,
+ cxlds, false);
+
+ if (!cxl)
+ return -ENOMEM;
+
+ probe_data->cxl = cxl;
+
+ return 0;
+}
+
+void efx_cxl_exit(struct efx_probe_data *probe_data)
+{
+}
+
+MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
new file mode 100644
index 000000000000..961639cef692
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/****************************************************************************
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#ifndef EFX_CXL_H
+#define EFX_CXL_H
+
+#ifdef CONFIG_SFC_CXL
+
+#include <cxl/cxl.h>
+
+struct cxl_root_decoder;
+struct cxl_port;
+struct cxl_endpoint_decoder;
+struct cxl_region;
+struct efx_probe_data;
+
+struct efx_cxl {
+ struct cxl_dev_state cxlds;
+ struct cxl_memdev *cxlmd;
+ struct cxl_root_decoder *cxlrd;
+ struct cxl_port *endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *efx_region;
+ void __iomem *ctpio_cxl;
+};
+
+int efx_cxl_init(struct efx_probe_data *probe_data);
+void efx_cxl_exit(struct efx_probe_data *probe_data);
+#else
+static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
+static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
+#endif
+#endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 5c0f306fb019..0e685b8a9980 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1199,14 +1199,24 @@ struct efx_nic {
atomic_t n_rx_noskb_drops;
};
+#ifdef CONFIG_SFC_CXL
+struct efx_cxl;
+#endif
+
/**
* struct efx_probe_data - State after hardware probe
* @pci_dev: The PCI device
* @efx: Efx NIC details
+ * @cxl: details of related cxl objects
+ * @cxl_pio_initialised: cxl initialization outcome.
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
struct efx_nic efx;
+#ifdef CONFIG_SFC_CXL
+ struct efx_cxl *cxl;
+ bool cxl_pio_initialised;
+#endif
};
static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 03/22] cxl: Move pci generic code
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 02/22] sfc: add cxl support alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:42 ` Alison Schofield
2025-05-21 17:44 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
` (18 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci.c implements the functionality for a Type3 device
initialization.
Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
exported and shared with CXL Type2 device initialization.
Fix cxl mock tests affected by the code move.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/pci.c | 62 +++++++++++++++++++++++++++++++
drivers/cxl/core/regs.c | 1 -
drivers/cxl/cxl.h | 2 -
drivers/cxl/cxlpci.h | 2 +
drivers/cxl/pci.c | 70 -----------------------------------
include/cxl/pci.h | 13 +++++++
tools/testing/cxl/Kbuild | 1 -
tools/testing/cxl/test/mock.c | 17 ---------
9 files changed, 79 insertions(+), 91 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 17b692eb3257..2f39944074f6 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -134,4 +134,6 @@ int cxl_set_feature(struct cxl_mailbox *cxl_mbox, const uuid_t *feat_uuid,
u16 *return_code);
#endif
+resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
+ struct cxl_dport *dport);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 0eb339c91413..447dc8d3138f 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1033,6 +1033,68 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, "CXL");
+static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
+ struct cxl_register_map *map,
+ struct cxl_dport *dport)
+{
+ resource_size_t component_reg_phys;
+
+ *map = (struct cxl_register_map) {
+ .host = &pdev->dev,
+ .resource = CXL_RESOURCE_NONE,
+ };
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
+ if (component_reg_phys == CXL_RESOURCE_NONE)
+ return -ENXIO;
+
+ map->resource = component_reg_phys;
+ map->reg_type = CXL_REGLOC_RBI_COMPONENT;
+ map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
+
+ return 0;
+}
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map)
+{
+ int rc;
+
+ rc = cxl_find_regblock(pdev, type, map);
+
+ /*
+ * If the Register Locator DVSEC does not exist, check if it
+ * is an RCH and try to extract the Component Registers from
+ * an RCRB.
+ */
+ if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_map_rcd_linkcap(pdev, dport);
+ if (rc)
+ return rc;
+
+ } else if (rc) {
+ return rc;
+ }
+
+ return cxl_setup_regs(map);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
+
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
{
int speed, bw;
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ecdb22ae6952..fdb99d05a66c 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -642,4 +642,3 @@ resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
return CXL_RESOURCE_NONE;
return __rcrb_to_component(dev, &dport->rcrb, CXL_RCRB_UPSTREAM);
}
-EXPORT_SYMBOL_NS_GPL(cxl_rcd_component_reg_phys, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 844dc0782a5f..b60738f5d11a 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -221,8 +221,6 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
int cxl_setup_regs(struct cxl_register_map *map);
struct cxl_dport;
-resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport);
int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 570e53e26f11..0611d96d76da 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -114,4 +114,6 @@ void read_cdat_data(struct cxl_port *port);
void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0d3c67867965..57f125e39051 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -467,76 +467,6 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
return 0;
}
-/*
- * Assume that any RCIEP that emits the CXL memory expander class code
- * is an RCD
- */
-static bool is_cxl_restricted(struct pci_dev *pdev)
-{
- return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
-}
-
-static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
- struct cxl_register_map *map,
- struct cxl_dport *dport)
-{
- resource_size_t component_reg_phys;
-
- *map = (struct cxl_register_map) {
- .host = &pdev->dev,
- .resource = CXL_RESOURCE_NONE,
- };
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
- if (component_reg_phys == CXL_RESOURCE_NONE)
- return -ENXIO;
-
- map->resource = component_reg_phys;
- map->reg_type = CXL_REGLOC_RBI_COMPONENT;
- map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
-
- return 0;
-}
-
-static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map)
-{
- int rc;
-
- rc = cxl_find_regblock(pdev, type, map);
-
- /*
- * If the Register Locator DVSEC does not exist, check if it
- * is an RCH and try to extract the Component Registers from
- * an RCRB.
- */
- if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
- struct cxl_dport *dport;
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
- if (rc)
- return rc;
-
- rc = cxl_dport_map_rcd_linkcap(pdev, dport);
- if (rc)
- return rc;
-
- } else if (rc) {
- return rc;
- }
-
- return cxl_setup_regs(map);
-}
-
static int cxl_pci_ras_unmask(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
index 5729a93b252a..e1a1727de3b3 100644
--- a/include/cxl/pci.h
+++ b/include/cxl/pci.h
@@ -4,6 +4,19 @@
#ifndef __CXL_CXL_PCI_H__
#define __CXL_CXL_PCI_H__
+#include <linux/pci.h>
+
+/*
+ * Assume that the caller has already validated that @pdev has CXL
+ * capabilities, any RCIEp with CXL capabilities is treated as a
+ * Restricted CXL Device (RCD) and finds upstream port and endpoint
+ * registers in a Root Complex Register Block (RCRB).
+ */
+static inline bool is_cxl_restricted(struct pci_dev *pdev)
+{
+ return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
+}
+
/* CXL 2.0 8.1.3: PCIe DVSEC for CXL Device */
#define CXL_DVSEC_PCIE_DEVICE 0
#define CXL_DVSEC_CAP_OFFSET 0xA
diff --git a/tools/testing/cxl/Kbuild b/tools/testing/cxl/Kbuild
index 387f3df8b988..2455fabc317d 100644
--- a/tools/testing/cxl/Kbuild
+++ b/tools/testing/cxl/Kbuild
@@ -12,7 +12,6 @@ ldflags-y += --wrap=cxl_await_media_ready
ldflags-y += --wrap=cxl_hdm_decode_init
ldflags-y += --wrap=cxl_dvsec_rr_decode
ldflags-y += --wrap=devm_cxl_add_rch_dport
-ldflags-y += --wrap=cxl_rcd_component_reg_phys
ldflags-y += --wrap=cxl_endpoint_parse_cdat
ldflags-y += --wrap=cxl_dport_init_ras_reporting
diff --git a/tools/testing/cxl/test/mock.c b/tools/testing/cxl/test/mock.c
index af2594e4f35d..3c6a071fbbe3 100644
--- a/tools/testing/cxl/test/mock.c
+++ b/tools/testing/cxl/test/mock.c
@@ -268,23 +268,6 @@ struct cxl_dport *__wrap_devm_cxl_add_rch_dport(struct cxl_port *port,
}
EXPORT_SYMBOL_NS_GPL(__wrap_devm_cxl_add_rch_dport, "CXL");
-resource_size_t __wrap_cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport)
-{
- int index;
- resource_size_t component_reg_phys;
- struct cxl_mock_ops *ops = get_cxl_mock_ops(&index);
-
- if (ops && ops->is_mock_port(dev))
- component_reg_phys = CXL_RESOURCE_NONE;
- else
- component_reg_phys = cxl_rcd_component_reg_phys(dev, dport);
- put_cxl_mock_ops(index);
-
- return component_reg_phys;
-}
-EXPORT_SYMBOL_NS_GPL(__wrap_cxl_rcd_component_reg_phys, "CXL");
-
void __wrap_cxl_endpoint_parse_cdat(struct cxl_port *port)
{
int index;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (2 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:23 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
` (17 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Type3 has some mandatory capabilities which are optional for Type2.
In order to support same register/capability discovery code for both
types, avoid any assumption about what capabilities should be there, and
export the capabilities found for the caller doing the capabilities
check based on the expected ones.
Add a function for facilitating the report of capabilities missing the
expected ones.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/pci.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/cxl/core/port.c | 8 ++++----
drivers/cxl/core/regs.c | 38 ++++++++++++++++++++++----------------
drivers/cxl/cxl.h | 6 +++---
drivers/cxl/cxlpci.h | 2 +-
drivers/cxl/pci.c | 24 +++++++++++++++++++++---
include/cxl/cxl.h | 24 ++++++++++++++++++++++++
7 files changed, 114 insertions(+), 29 deletions(-)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 447dc8d3138f..e2b6420592de 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1061,7 +1061,7 @@ static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
}
int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map)
+ struct cxl_register_map *map, unsigned long *caps)
{
int rc;
@@ -1091,7 +1091,7 @@ int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
return rc;
}
- return cxl_setup_regs(map);
+ return cxl_setup_regs(map, caps);
}
EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
@@ -1218,3 +1218,40 @@ int cxl_gpf_port_setup(struct cxl_dport *dport)
return 0;
}
+
+/**
+ * cxl_check_caps - check expected caps are included in the found caps.
+ *
+ * @pdev: device checking the caps
+ * @expected: capabilities expected by the driver
+ * @found: capabilities found
+ *
+ * Returns 0 if check is positive, -1 otherwise.
+ */
+int cxl_check_caps(struct pci_dev *pdev, unsigned long *expected,
+ unsigned long *found)
+{
+ static const char * const cap_name[CXL_MAX_CAPS] = {
+ [CXL_DEV_CAP_RAS] = "CXL_DEV_CAP_RAS",
+ [CXL_DEV_CAP_HDM] = "CXL_DEV_CAP_HDM",
+ [CXL_DEV_CAP_DEV_STATUS] = "CXL_DEV_CAP_DEV_STATUS",
+ [CXL_DEV_CAP_MAILBOX_PRIMARY] = "CXL_DEV_CAP_MAILBOX_PRIMARY",
+ [CXL_DEV_CAP_MEMDEV] = "CXL_DEV_CAP_MEMDEV"
+ };
+ DECLARE_BITMAP(missing, CXL_MAX_CAPS);
+
+ if (bitmap_subset(expected, found, CXL_MAX_CAPS))
+ /* all good */
+ return 0;
+
+ bitmap_andnot(missing, expected, found, CXL_MAX_CAPS);
+
+ for (int i = 0; i < CXL_MAX_CAPS; i++) {
+ if (test_bit(i, missing))
+ dev_err(&pdev->dev, "%s capability not found\n",
+ cap_name[i]);
+ }
+
+ return -1;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_check_caps, "CXL");
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 726bd4a7de27..7a105687d450 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -755,7 +755,7 @@ static struct cxl_port *cxl_port_alloc(struct device *uport_dev,
}
static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map,
- resource_size_t component_reg_phys)
+ resource_size_t component_reg_phys, unsigned long *caps)
{
*map = (struct cxl_register_map) {
.host = host,
@@ -769,7 +769,7 @@ static int cxl_setup_comp_regs(struct device *host, struct cxl_register_map *map
map->reg_type = CXL_REGLOC_RBI_COMPONENT;
map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
- return cxl_setup_regs(map);
+ return cxl_setup_regs(map, caps);
}
static int cxl_port_setup_regs(struct cxl_port *port,
@@ -778,7 +778,7 @@ static int cxl_port_setup_regs(struct cxl_port *port,
if (dev_is_platform(port->uport_dev))
return 0;
return cxl_setup_comp_regs(&port->dev, &port->reg_map,
- component_reg_phys);
+ component_reg_phys, NULL);
}
static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
@@ -795,7 +795,7 @@ static int cxl_dport_setup_regs(struct device *host, struct cxl_dport *dport,
* NULL.
*/
rc = cxl_setup_comp_regs(dport->dport_dev, &dport->reg_map,
- component_reg_phys);
+ component_reg_phys, NULL);
dport->reg_map.host = host;
return rc;
}
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index fdb99d05a66c..2ba997106434 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
@@ -11,6 +12,12 @@
#include "core.h"
+static void cxl_cap_set_bit(int bit, unsigned long *caps)
+{
+ if (caps)
+ set_bit(bit, caps);
+}
+
/**
* DOC: cxl registers
*
@@ -30,6 +37,7 @@
* @dev: Host device of the @base mapping
* @base: Mapping containing the HDM Decoder Capability Header
* @map: Map object describing the register block information found
+ * @caps: capabilities to be set when discovered
*
* See CXL 2.0 8.2.4 Component Register Layout and Definition
* See CXL 2.0 8.2.5.5 CXL Device Register Interface
@@ -37,7 +45,8 @@
* Probe for component register information and return it in map object.
*/
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
- struct cxl_component_reg_map *map)
+ struct cxl_component_reg_map *map,
+ unsigned long *caps)
{
int cap, cap_count;
u32 cap_array;
@@ -85,6 +94,7 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
decoder_cnt = cxl_hdm_decoder_count(hdr);
length = 0x20 * decoder_cnt + 0x10;
rmap = &map->hdm_decoder;
+ cxl_cap_set_bit(CXL_DEV_CAP_HDM, caps);
break;
}
case CXL_CM_CAP_CAP_ID_RAS:
@@ -92,6 +102,7 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
offset);
length = CXL_RAS_CAPABILITY_LENGTH;
rmap = &map->ras;
+ cxl_cap_set_bit(CXL_DEV_CAP_RAS, caps);
break;
default:
dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
@@ -114,11 +125,12 @@ EXPORT_SYMBOL_NS_GPL(cxl_probe_component_regs, "CXL");
* @dev: Host device of the @base mapping
* @base: Mapping of CXL 2.0 8.2.8 CXL Device Register Interface
* @map: Map object describing the register block information found
+ * @caps: capabilities to be set when discovered
*
* Probe for device register information and return it in map object.
*/
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
- struct cxl_device_reg_map *map)
+ struct cxl_device_reg_map *map, unsigned long *caps)
{
int cap, cap_count;
u64 cap_array;
@@ -147,10 +159,12 @@ void cxl_probe_device_regs(struct device *dev, void __iomem *base,
case CXLDEV_CAP_CAP_ID_DEVICE_STATUS:
dev_dbg(dev, "found Status capability (0x%x)\n", offset);
rmap = &map->status;
+ cxl_cap_set_bit(CXL_DEV_CAP_DEV_STATUS, caps);
break;
case CXLDEV_CAP_CAP_ID_PRIMARY_MAILBOX:
dev_dbg(dev, "found Mailbox capability (0x%x)\n", offset);
rmap = &map->mbox;
+ cxl_cap_set_bit(CXL_DEV_CAP_MAILBOX_PRIMARY, caps);
break;
case CXLDEV_CAP_CAP_ID_SECONDARY_MAILBOX:
dev_dbg(dev, "found Secondary Mailbox capability (0x%x)\n", offset);
@@ -158,6 +172,7 @@ void cxl_probe_device_regs(struct device *dev, void __iomem *base,
case CXLDEV_CAP_CAP_ID_MEMDEV:
dev_dbg(dev, "found Memory Device capability (0x%x)\n", offset);
rmap = &map->memdev;
+ cxl_cap_set_bit(CXL_DEV_CAP_MEMDEV, caps);
break;
default:
if (cap_id >= 0x8000)
@@ -434,7 +449,7 @@ static void cxl_unmap_regblock(struct cxl_register_map *map)
map->base = NULL;
}
-static int cxl_probe_regs(struct cxl_register_map *map)
+static int cxl_probe_regs(struct cxl_register_map *map, unsigned long *caps)
{
struct cxl_component_reg_map *comp_map;
struct cxl_device_reg_map *dev_map;
@@ -444,21 +459,12 @@ static int cxl_probe_regs(struct cxl_register_map *map)
switch (map->reg_type) {
case CXL_REGLOC_RBI_COMPONENT:
comp_map = &map->component_map;
- cxl_probe_component_regs(host, base, comp_map);
+ cxl_probe_component_regs(host, base, comp_map, caps);
dev_dbg(host, "Set up component registers\n");
break;
case CXL_REGLOC_RBI_MEMDEV:
dev_map = &map->device_map;
- cxl_probe_device_regs(host, base, dev_map);
- if (!dev_map->status.valid || !dev_map->mbox.valid ||
- !dev_map->memdev.valid) {
- dev_err(host, "registers not found: %s%s%s\n",
- !dev_map->status.valid ? "status " : "",
- !dev_map->mbox.valid ? "mbox " : "",
- !dev_map->memdev.valid ? "memdev " : "");
- return -ENXIO;
- }
-
+ cxl_probe_device_regs(host, base, dev_map, caps);
dev_dbg(host, "Probing device registers...\n");
break;
default:
@@ -468,7 +474,7 @@ static int cxl_probe_regs(struct cxl_register_map *map)
return 0;
}
-int cxl_setup_regs(struct cxl_register_map *map)
+int cxl_setup_regs(struct cxl_register_map *map, unsigned long *caps)
{
int rc;
@@ -476,7 +482,7 @@ int cxl_setup_regs(struct cxl_register_map *map)
if (rc)
return rc;
- rc = cxl_probe_regs(map);
+ rc = cxl_probe_regs(map, caps);
cxl_unmap_regblock(map);
return rc;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b60738f5d11a..dfe8a04b0ea2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -202,9 +202,9 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
- struct cxl_component_reg_map *map);
+ struct cxl_component_reg_map *map, unsigned long *caps);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
- struct cxl_device_reg_map *map);
+ struct cxl_device_reg_map *map, unsigned long *caps);
int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
@@ -219,7 +219,7 @@ int cxl_find_regblock_instance(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map, unsigned int index);
int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
-int cxl_setup_regs(struct cxl_register_map *map);
+int cxl_setup_regs(struct cxl_register_map *map, unsigned long *caps);
struct cxl_dport;
int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 0611d96d76da..e003495295a0 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -115,5 +115,5 @@ void cxl_cor_error_detected(struct pci_dev *pdev);
pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
pci_channel_state_t state);
int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map);
+ struct cxl_register_map *map, unsigned long *caps);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 57f125e39051..694bdfc5b7ea 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -836,6 +836,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct pci_host_bridge *host_bridge = pci_find_host_bridge(pdev->bus);
struct cxl_dpa_info range_info = { 0 };
+ DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
+ DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
struct cxl_memdev_state *mds;
struct cxl_dev_state *cxlds;
struct cxl_register_map map;
@@ -871,7 +873,16 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
cxlds->rcd = is_cxl_restricted(pdev);
- rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
+ /*
+ * These are the mandatory capabilities for a Type3 device.
+ * Only checking capabilities used by current Linux drivers.
+ */
+ set_bit(CXL_DEV_CAP_HDM, expected);
+ set_bit(CXL_DEV_CAP_DEV_STATUS, expected);
+ set_bit(CXL_DEV_CAP_MAILBOX_PRIMARY, expected);
+ set_bit(CXL_DEV_CAP_MEMDEV, expected);
+
+ rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map, found);
if (rc)
return rc;
@@ -883,8 +894,8 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
* If the component registers can't be found, the cxl_pci driver may
* still be useful for management functions so don't return an error.
*/
- rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT,
- &cxlds->reg_map);
+ rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &cxlds->reg_map,
+ found);
if (rc)
dev_warn(&pdev->dev, "No component registers (%d)\n", rc);
else if (!cxlds->reg_map.component_map.ras.valid)
@@ -895,6 +906,13 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+ /*
+ * Checking mandatory caps are there as, at least, a subset of those
+ * found.
+ */
+ if (cxl_check_caps(pdev, expected, found))
+ return -ENXIO;
+
rc = cxl_pci_type3_init_mailbox(cxlds);
if (rc)
return rc;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index b7f79313409b..412c45a2f351 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -25,6 +25,26 @@ enum cxl_devtype {
struct device;
+/*
+ * Capabilities as defined for:
+ *
+ * Component Registers (Table 8-22 CXL 3.2 specification)
+ * Device Registers (8.2.8.2.1 CXL 3.2 specification)
+ *
+ * and currently being used for kernel CXL support.
+ */
+
+enum cxl_dev_cap {
+ /* capabilities from Component Registers */
+ CXL_DEV_CAP_RAS,
+ CXL_DEV_CAP_HDM,
+ /* capabilities from Device Registers */
+ CXL_DEV_CAP_DEV_STATUS,
+ CXL_DEV_CAP_MAILBOX_PRIMARY,
+ CXL_DEV_CAP_MEMDEV,
+ CXL_MAX_CAPS
+};
+
/*
* Using struct_group() allows for per register-block-type helper routines,
* without requiring block-type agnostic code to include the prefix.
@@ -223,4 +243,8 @@ struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
(drv_struct *)_cxl_dev_state_create(parent, type, serial, dvsec, \
sizeof(drv_struct), mbox); \
})
+
+struct pci_dev;
+int cxl_check_caps(struct pci_dev *pdev, unsigned long *expected,
+ unsigned long *found);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (3 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:28 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 06/22] sfc: make regs setup with checking and set media ready alejandro.lucero-palau
` (16 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Create a new function for a type2 device initialising
cxl_dev_state struct regarding cxl regs setup and mapping.
Export the capabilities found for checking them against the
expected ones by the driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/pci.c | 62 ++++++++++++++++++++++++++++++++++++++++++
include/cxl/cxl.h | 3 ++
2 files changed, 65 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index e2b6420592de..b05c6e64bfe2 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1095,6 +1095,68 @@ int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
}
EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
+static int cxl_pci_accel_setup_memdev_regs(struct pci_dev *pdev,
+ struct cxl_dev_state *cxlds,
+ unsigned long *caps)
+{
+ struct cxl_register_map map;
+ int rc;
+
+ rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map, caps);
+ /*
+ * This call can return -ENODEV if regs not found. This is not an error
+ * for Type2 since these regs are not mandatory. If they do exist then
+ * mapping them should not fail. If they should exist, it is with driver
+ * calling cxl_pci_check_caps() where the problem should be found.
+ */
+ if (rc == -ENODEV)
+ return 0;
+
+ if (rc)
+ return rc;
+
+ return cxl_map_device_regs(&map, &cxlds->regs.device_regs);
+}
+
+/**
+ * cxl_pci_accel_setup_regs - initialize found cxl device regs and export
+ * capabilities found.
+ *
+ * @pdev: device checking the caps.
+ * @cxlds: pointer to driver cxl_dev_state struct.
+ * @caps: pointer to caller capabilities struct to set.
+ *
+ * Returns 0 or error.
+ */
+int cxl_pci_accel_setup_regs(struct pci_dev *pdev, struct cxl_dev_state *cxlds,
+ unsigned long *caps)
+{
+ int rc;
+
+ rc = cxl_pci_accel_setup_memdev_regs(pdev, cxlds, caps);
+ if (rc)
+ return rc;
+
+ rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT,
+ &cxlds->reg_map, caps);
+ if (rc) {
+ dev_warn(&pdev->dev, "No component registers (err=%d)\n", rc);
+ return rc;
+ }
+
+ if (!caps || !test_bit(CXL_CM_CAP_CAP_ID_RAS, caps))
+ return 0;
+
+ rc = cxl_map_component_regs(&cxlds->reg_map,
+ &cxlds->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS));
+ if (rc)
+ dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_accel_setup_regs, "CXL");
+
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
{
int speed, bw;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 412c45a2f351..6ab6dcf81824 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -247,4 +247,7 @@ struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
struct pci_dev;
int cxl_check_caps(struct pci_dev *pdev, unsigned long *expected,
unsigned long *found);
+
+int cxl_pci_accel_setup_regs(struct pci_dev *pdev, struct cxl_dev_state *cxlmds,
+ unsigned long *caps);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (4 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-21 18:34 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
` (15 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Zhi Wang, Edward Cree,
Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Use cxl code for registers discovery and mapping.
Validate capabilities found based on those registers against expected
capabilities.
Set media ready explicitly as there is no means for doing so without
a mailbox and without the related cxl register, not mandatory for type2.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Zhi Wang <zhi@nvidia.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 753d5b7d49b6..e94af8bf3a79 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -19,10 +19,13 @@
int efx_cxl_init(struct efx_probe_data *probe_data)
{
+ DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
+ DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
u16 dvsec;
+ int rc;
probe_data->cxl_pio_initialised = false;
@@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (!cxl)
return -ENOMEM;
+ set_bit(CXL_DEV_CAP_HDM, expected);
+ set_bit(CXL_DEV_CAP_RAS, expected);
+
+ rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
+ if (rc) {
+ pci_err(pci_dev, "CXL accel setup regs failed");
+ return rc;
+ }
+
+ /*
+ * Checking mandatory caps are there as, at least, a subset of those
+ * found.
+ */
+ if (cxl_check_caps(pci_dev, expected, found))
+ return -ENXIO;
+
+ /*
+ * Set media ready explicitly as there are neither mailbox for checking
+ * this state nor the CXL register involved, both not mandatory for
+ * type2.
+ */
+ cxl->cxlds.media_ready = true;
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (5 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 06/22] sfc: make regs setup with checking and set media ready alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:47 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 08/22] sfc: initialize dpa alejandro.lucero-palau
` (14 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DMA initialization.
Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile and/or non-volatile hardware partitions.
Export cxl_dpa_setup as well for initializing those added DPA partitions
with the proper resources.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/mbox.c | 26 ++++++++++++++++++++------
drivers/cxl/cxlmem.h | 13 -------------
include/cxl/cxl.h | 14 ++++++++++++++
3 files changed, 34 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index ab994d459f46..b14cfc6e3dba 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1284,6 +1284,22 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
info->nr_partitions++;
}
+/**
+ * cxl_mem_dpa_init: initialize dpa by a driver without a mailbox.
+ *
+ * @info: pointer to cxl_dpa_info
+ * @volatile_bytes: device volatile memory size
+ * @persistent_bytes: device persistent memory size
+ */
+void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
+ u64 persistent_bytes)
+{
+ add_part(info, 0, volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, volatile_bytes, persistent_bytes,
+ CXL_PARTMODE_PMEM);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_init, "CXL");
+
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
{
struct cxl_dev_state *cxlds = &mds->cxlds;
@@ -1298,9 +1314,8 @@ int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
info->size = mds->total_bytes;
if (mds->partition_align_bytes == 0) {
- add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->volatile_only_bytes,
- mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ cxl_mem_dpa_init(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes);
return 0;
}
@@ -1310,9 +1325,8 @@ int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
return rc;
}
- add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
- CXL_PARTMODE_PMEM);
+ cxl_mem_dpa_init(info, mds->active_volatile_bytes,
+ mds->active_persistent_bytes);
return 0;
}
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e7cd31b9f107..e47f51025efd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -98,19 +98,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
-#define CXL_NR_PARTITIONS_MAX 2
-
-struct cxl_dpa_info {
- u64 size;
- struct cxl_dpa_part_info {
- struct range range;
- enum cxl_partition_mode mode;
- } part[CXL_NR_PARTITIONS_MAX];
- int nr_partitions;
-};
-
-int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
-
static inline struct cxl_ep *cxl_ep_load(struct cxl_port *port,
struct cxl_memdev *cxlmd)
{
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 6ab6dcf81824..6d2cebae2ca2 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -213,6 +213,17 @@ struct cxl_dev_state {
#endif
};
+#define CXL_NR_PARTITIONS_MAX 2
+
+struct cxl_dpa_info {
+ u64 size;
+ struct cxl_dpa_part_info {
+ struct range range;
+ enum cxl_partition_mode mode;
+ } part[CXL_NR_PARTITIONS_MAX];
+ int nr_partitions;
+};
+
struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
enum cxl_devtype type, u64 serial,
u16 dvsec, size_t size,
@@ -250,4 +261,7 @@ int cxl_check_caps(struct pci_dev *pdev, unsigned long *expected,
int cxl_pci_accel_setup_regs(struct pci_dev *pdev, struct cxl_dev_state *cxlmds,
unsigned long *caps);
+void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
+ u64 persistent_bytes);
+int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 08/22] sfc: initialize dpa
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (6 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
` (13 subsequent siblings)
21 siblings, 0 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree
From: Alejandro Lucero <alucerop@amd.com>
Use hardcoded values for defining and initializing dpa as there is no
mbox available.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index e94af8bf3a79..aac25d936c4b 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -23,6 +23,9 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
+ struct cxl_dpa_info sfc_dpa_info = {
+ .size = EFX_CTPIO_BUFFER_SIZE
+ };
struct efx_cxl *cxl;
u16 dvsec;
int rc;
@@ -69,6 +72,11 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
*/
cxl->cxlds.media_ready = true;
+ cxl_mem_dpa_init(&sfc_dpa_info, EFX_CTPIO_BUFFER_SIZE, 0);
+ rc = cxl_dpa_setup(&cxl->cxlds, &sfc_dpa_info);
+ if (rc)
+ return rc;
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 09/22] cxl: Prepare memdev creation for type2
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (7 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 08/22] sfc: initialize dpa alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:49 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 10/22] sfc: create type2 cxl memdev alejandro.lucero-palau
` (12 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.
Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.
Make devm_cxl_add_memdev accessible from a accel driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/memdev.c | 15 +++++++++++++--
drivers/cxl/cxlmem.h | 2 --
drivers/cxl/mem.c | 25 +++++++++++++++++++------
include/cxl/cxl.h | 2 ++
4 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 6cc732aeb9de..31af5c1ebe11 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include "trace.h"
#include "core.h"
@@ -562,9 +563,16 @@ static const struct device_type cxl_memdev_type = {
.groups = cxl_memdev_attribute_groups,
};
+static const struct device_type cxl_accel_memdev_type = {
+ .name = "cxl_accel_memdev",
+ .release = cxl_memdev_release,
+ .devnode = cxl_memdev_devnode,
+};
+
bool is_cxl_memdev(const struct device *dev)
{
- return dev->type == &cxl_memdev_type;
+ return (dev->type == &cxl_memdev_type ||
+ dev->type == &cxl_accel_memdev_type);
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
@@ -689,7 +697,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
dev->parent = cxlds->dev;
dev->bus = &cxl_bus_type;
dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
+ if (cxlds->type == CXL_DEVTYPE_DEVMEM)
+ dev->type = &cxl_accel_memdev_type;
+ else
+ dev->type = &cxl_memdev_type;
device_set_pm_not_required(dev);
INIT_WORK(&cxlmd->detach_work, detach_memdev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e47f51025efd..9fdaf5cf1dd9 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -88,8 +88,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
return is_cxl_memdev(port->uport_dev);
}
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 9675243bd05b..7f39790d9d98 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -130,12 +130,18 @@ static int cxl_mem_probe(struct device *dev)
dentry = cxl_debugfs_create_dir(dev_name(dev));
debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
- if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
- debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
- &cxl_poison_inject_fops);
- if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
- debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
- &cxl_poison_clear_fops);
+ /*
+ * Avoid poison debugfs files for Type2 devices as they rely on
+ * cxl_memdev_state.
+ */
+ if (mds) {
+ if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
+ debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
+ &cxl_poison_inject_fops);
+ if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
+ debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
+ &cxl_poison_clear_fops);
+ }
rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
if (rc)
@@ -219,6 +225,13 @@ static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
+ /*
+ * Avoid poison sysfs files for Type2 devices as they rely on
+ * cxl_memdev_state.
+ */
+ if (!mds)
+ return 0;
+
if (a == &dev_attr_trigger_poison_list.attr)
if (!test_bit(CXL_POISON_ENABLED_LIST,
mds->poison.enabled_cmds))
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 6d2cebae2ca2..19d194d98665 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -264,4 +264,6 @@ int cxl_pci_accel_setup_regs(struct pci_dev *pdev, struct cxl_dev_state *cxlmds,
void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
u64 persistent_bytes);
int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlmds);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 10/22] sfc: create type2 cxl memdev
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (8 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
` (11 subsequent siblings)
21 siblings, 0 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Fan Ni, Edward Cree,
Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index aac25d936c4b..53ff97ad07f5 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -77,6 +77,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (rc)
return rc;
+ cxl->cxlmd = devm_cxl_add_memdev(&pci_dev->dev, &cxl->cxlds);
+
+ if (IS_ERR(cxl->cxlmd)) {
+ pci_err(pci_dev, "CXL accel memdev creation failed");
+ return PTR_ERR(cxl->cxlmd);
+ }
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (9 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 10/22] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
2025-05-21 19:31 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
` (10 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
CXL region creation involves allocating capacity from device DPA
(device-physical-address space) and assigning it to decode a given HPA
(host-physical-address space). Before determining how much DPA to
allocate the amount of available HPA must be determined. Also, not all
HPA is created equal, some specifically targets RAM, some target PMEM,
some is prepared for device-memory flows like HDM-D and HDM-DB, and some
is host-only (HDM-H).
In order to support Type2 CXL devices, wrap all of those concerns into
an API that retrieves a root decoder (platform CXL window) that fits the
specified constraints and the capacity available for a new region.
Add a complementary function for releasing the reference to such root
decoder.
Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 166 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 3 +
include/cxl/cxl.h | 11 +++
3 files changed, 180 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c3f4dc244df7..4affa1f22fd1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -695,6 +695,172 @@ static int free_hpa(struct cxl_region *cxlr)
return 0;
}
+struct cxlrd_max_context {
+ struct device * const *host_bridges;
+ int interleave_ways;
+ unsigned long flags;
+ resource_size_t max_hpa;
+ struct cxl_root_decoder *cxlrd;
+};
+
+static int find_max_hpa(struct device *dev, void *data)
+{
+ struct cxlrd_max_context *ctx = data;
+ struct cxl_switch_decoder *cxlsd;
+ struct cxl_root_decoder *cxlrd;
+ struct resource *res, *prev;
+ struct cxl_decoder *cxld;
+ resource_size_t max;
+ int found = 0;
+
+ if (!is_root_decoder(dev))
+ return 0;
+
+ cxlrd = to_cxl_root_decoder(dev);
+ cxlsd = &cxlrd->cxlsd;
+ cxld = &cxlsd->cxld;
+
+ /*
+ * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
+ * 32 bits, the bitmap functions can be used.
+ */
+ if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
+ dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
+ cxld->flags, ctx->flags);
+ return 0;
+ }
+
+ for (int i = 0; i < ctx->interleave_ways; i++) {
+ for (int j = 0; j < ctx->interleave_ways; j++) {
+ if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
+ found++;
+ break;
+ }
+ }
+ }
+
+ if (found != ctx->interleave_ways) {
+ dev_dbg(dev,
+ "Not enough host bridges. Found %d for %d interleave ways requested\n",
+ found, ctx->interleave_ways);
+ return 0;
+ }
+
+ /*
+ * Walk the root decoder resource range relying on cxl_region_rwsem to
+ * preclude sibling arrival/departure and find the largest free space
+ * gap.
+ */
+ lockdep_assert_held_read(&cxl_region_rwsem);
+ res = cxlrd->res->child;
+
+ /* With no resource child the whole parent resource is available */
+ if (!res)
+ max = resource_size(cxlrd->res);
+ else
+ max = 0;
+
+ for (prev = NULL; res; prev = res, res = res->sibling) {
+ struct resource *next = res->sibling;
+ resource_size_t free = 0;
+
+ /*
+ * Sanity check for preventing arithmetic problems below as a
+ * resource with size 0 could imply using the end field below
+ * when set to unsigned zero - 1 or all f in hex.
+ */
+ if (prev && !resource_size(prev))
+ continue;
+
+ if (!prev && res->start > cxlrd->res->start) {
+ free = res->start - cxlrd->res->start;
+ max = max(free, max);
+ }
+ if (prev && res->start > prev->end + 1) {
+ free = res->start - prev->end + 1;
+ max = max(free, max);
+ }
+ if (next && res->end + 1 < next->start) {
+ free = next->start - res->end + 1;
+ max = max(free, max);
+ }
+ if (!next && res->end + 1 < cxlrd->res->end + 1) {
+ free = cxlrd->res->end + 1 - res->end + 1;
+ max = max(free, max);
+ }
+ }
+
+ dev_dbg(CXLRD_DEV(cxlrd), "found %pa bytes of free space\n", &max);
+ if (max > ctx->max_hpa) {
+ if (ctx->cxlrd)
+ put_device(CXLRD_DEV(ctx->cxlrd));
+ get_device(CXLRD_DEV(cxlrd));
+ ctx->cxlrd = cxlrd;
+ ctx->max_hpa = max;
+ }
+ return 0;
+}
+
+/**
+ * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
+ * @cxlmd: the CXL memory device with an endpoint that is mapped by the returned
+ * decoder
+ * @interleave_ways: number of entries in @host_bridges
+ * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
+ * @max_avail_contig: output parameter of max contiguous bytes available in the
+ * returned decoder
+ *
+ * Returns a pointer to a struct cxl_root_decoder
+ *
+ * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
+ * in (@max_avail_contig))' is a point in time snapshot. If by the time the
+ * caller goes to use this root decoder's capacity the capacity is reduced then
+ * caller needs to loop and retry.
+ *
+ * The returned root decoder has an elevated reference count that needs to be
+ * put with cxl_put_root_decoder(cxlrd).
+ */
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max_avail_contig)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxlrd_max_context ctx = {
+ .host_bridges = &endpoint->host_bridge,
+ .flags = flags,
+ };
+ struct cxl_port *root_port;
+ struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
+
+ if (!is_cxl_endpoint(endpoint)) {
+ dev_dbg(&endpoint->dev, "hpa requestor is not an endpoint\n");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (!root) {
+ dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ root_port = &root->port;
+ scoped_guard(rwsem_read, &cxl_region_rwsem)
+ device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
+
+ if (!ctx.cxlrd)
+ return ERR_PTR(-ENOMEM);
+
+ *max_avail_contig = ctx.max_hpa;
+ return ctx.cxlrd;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
+
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
+{
+ put_device(CXLRD_DEV(cxlrd));
+}
+EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
+
static ssize_t size_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index dfe8a04b0ea2..6fc6fd7b571d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -672,6 +672,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
+
+#define CXLRD_DEV(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
+
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 19d194d98665..489faef786c4 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -25,6 +25,11 @@ enum cxl_devtype {
struct device;
+#define CXL_DECODER_F_RAM BIT(0)
+#define CXL_DECODER_F_PMEM BIT(1)
+#define CXL_DECODER_F_TYPE2 BIT(2)
+#define CXL_DECODER_F_MAX 3
+
/*
* Capabilities as defined for:
*
@@ -266,4 +271,10 @@ void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
int cxl_dpa_setup(struct cxl_dev_state *cxlds, const struct cxl_dpa_info *info);
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
struct cxl_dev_state *cxlmds);
+struct cxl_port;
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max);
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (10 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-21 19:56 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
` (9 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Asking for available HPA space is the previous step to try to obtain
an HPA range suitable to accel driver purposes.
Add this call to efx cxl initialization.
Make sfc cxl build dependent on CXL region.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/Kconfig | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index 979f2801e2a8..e959d9b4f4ce 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
config SFC_CXL
bool "Solarflare SFC9100-family CXL support"
depends on SFC && CXL_BUS >= SFC
+ depends on CXL_REGION
default SFC
help
This enables SFC CXL support if the kernel is configuring CXL for
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 53ff97ad07f5..5635672b3fc3 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -26,6 +26,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct cxl_dpa_info sfc_dpa_info = {
.size = EFX_CTPIO_BUFFER_SIZE
};
+ resource_size_t max_size;
struct efx_cxl *cxl;
u16 dvsec;
int rc;
@@ -84,6 +85,22 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxlmd);
}
+ cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
+ CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
+ &max_size);
+
+ if (IS_ERR(cxl->cxlrd)) {
+ pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
+ return PTR_ERR(cxl->cxlrd);
+ }
+
+ if (max_size < EFX_CTPIO_BUFFER_SIZE) {
+ pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
+ __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
+ cxl_put_root_decoder(cxl->cxlrd);
+ return -ENOSPC;
+ }
+
probe_data->cxl = cxl;
return 0;
@@ -91,6 +108,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
+ if (probe_data->cxl)
+ cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (11 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:39 ` Alison Schofield
2025-05-21 20:23 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 14/22] sfc: get endpoint decoder alejandro.lucero-palau
` (8 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Region creation involves finding available DPA (device-physical-address)
capacity to map into HPA (host-physical-address) space.
In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
that tries to allocate the DPA memory the driver requires to operate.The
memory requested should not be bigger than the max available HPA obtained
previously with cxl_get_hpa_freespace.
Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
include/cxl/cxl.h | 5 +++
2 files changed, 91 insertions(+)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 70cae4ebf8a4..500df2deceef 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -3,6 +3,7 @@
#include <linux/seq_file.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <cxl/cxl.h>
#include "cxlmem.h"
#include "core.h"
@@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
return base;
}
+/**
+ * cxl_dpa_free - release DPA (Device Physical Address)
+ *
+ * @cxled: endpoint decoder linked to the DPA
+ *
+ * Returns 0 or error.
+ */
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
{
struct cxl_port *port = cxled_to_port(cxled);
@@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
devm_cxl_dpa_release(cxled);
return 0;
}
+EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode)
@@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
}
+static int find_free_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_port *port;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ port = cxled_to_port(cxled);
+
+ if (cxled->cxld.id != port->hdm_end + 1)
+ return 0;
+
+ return 1;
+}
+
+/**
+ * cxl_request_dpa - search and reserve DPA given input constraints
+ * @cxlmd: memdev with an endpoint port with available decoders
+ * @mode: DPA operation mode (ram vs pmem)
+ * @alloc: dpa size required
+ *
+ * Returns a pointer to a cxl_endpoint_decoder struct or an error
+ *
+ * Given that a region needs to allocate from limited HPA capacity it
+ * may be the case that a device has more mappable DPA capacity than
+ * available HPA. The expectation is that @alloc is a driver known
+ * value based on the device capacity but it could not be available
+ * due to HPA constraints.
+ *
+ * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
+ * reserved, or an error pointer. The caller is also expected to own the
+ * lifetime of the memdev registration associated with the endpoint to
+ * pin the decoder registered as well.
+ */
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct device *cxled_dev;
+ int rc;
+
+ if (!IS_ALIGNED(alloc, SZ_256M))
+ return ERR_PTR(-EINVAL);
+
+ down_read(&cxl_dpa_rwsem);
+ cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
+ up_read(&cxl_dpa_rwsem);
+
+ if (!cxled_dev)
+ return ERR_PTR(-ENXIO);
+
+ cxled = to_cxl_endpoint_decoder(cxled_dev);
+
+ if (!cxled) {
+ rc = -ENODEV;
+ goto err;
+ }
+
+ rc = cxl_dpa_set_part(cxled, mode);
+ if (rc)
+ goto err;
+
+ rc = cxl_dpa_alloc(cxled, alloc);
+ if (rc)
+ goto err;
+
+ return cxled;
+err:
+ put_device(cxled_dev);
+ return ERR_PTR(rc);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
+
static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
{
u16 eig;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 489faef786c4..b3ca0e988ae7 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -7,6 +7,7 @@
#include <linux/node.h>
#include <linux/ioport.h>
+#include <linux/range.h>
#include <cxl/mailbox.h>
/**
@@ -277,4 +278,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
unsigned long flags,
resource_size_t *max);
void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc);
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 14/22] sfc: get endpoint decoder
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (12 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-21 20:28 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
` (7 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 5635672b3fc3..20db9aa382ec 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -98,18 +98,33 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
__func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
cxl_put_root_decoder(cxl->cxlrd);
- return -ENOSPC;
+ rc = -ENOSPC;
+ goto sfc_put_decoder;
+ }
+
+ cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
+ EFX_CTPIO_BUFFER_SIZE);
+ if (IS_ERR(cxl->cxled)) {
+ pci_err(pci_dev, "CXL accel request DPA failed");
+ rc = PTR_ERR(cxl->cxled);
+ goto sfc_put_decoder;
}
probe_data->cxl = cxl;
return 0;
+
+sfc_put_decoder:
+ cxl_put_root_decoder(cxl->cxlrd);
+ return rc;
}
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
- if (probe_data->cxl)
+ if (probe_data->cxl) {
+ cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
+ }
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 15/22] cxl: Make region type based on endpoint type
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (13 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 14/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:39 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
` (6 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
Support for Type2 implies region type needs to be based on the endpoint
type instead.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 4affa1f22fd1..e13a812529ff 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2714,7 +2714,8 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode, int id,
+ enum cxl_decoder_type target_type)
{
int rc;
@@ -2736,7 +2737,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode, target_type);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
@@ -2750,7 +2751,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -3522,7 +3523,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
do {
cxlr = __create_region(cxlrd, cxlds->part[part].mode,
- atomic_read(&cxlrd->region_id));
+ atomic_read(&cxlrd->region_id),
+ cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
if (IS_ERR(cxlr)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 16/22] cxl/region: Factor out interleave ways setup
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (14 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:37 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
` (5 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup for interleave ways.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 46 +++++++++++++++++++++++----------------
1 file changed, 27 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e13a812529ff..0f61c9e9b954 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -464,22 +464,14 @@ static ssize_t interleave_ways_show(struct device *dev,
static const struct attribute_group *get_cxl_region_target_group(void);
-static ssize_t interleave_ways_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_ways(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- unsigned int val, save;
- int rc;
+ int save, rc;
u8 iw;
- rc = kstrtouint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = ways_to_eiw(val, &iw);
if (rc)
return rc;
@@ -494,20 +486,36 @@ static ssize_t interleave_ways_store(struct device *dev,
return -EINVAL;
}
- rc = down_write_killable(&cxl_region_rwsem);
- if (rc)
- return rc;
- if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
- rc = -EBUSY;
- goto out;
- }
+ lockdep_assert_held_write(&cxl_region_rwsem);
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ return -EBUSY;
save = p->interleave_ways;
p->interleave_ways = val;
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
if (rc)
p->interleave_ways = save;
-out:
+
+ return rc;
+}
+
+static ssize_t interleave_ways_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ unsigned int val;
+ int rc;
+
+ rc = kstrtouint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ rc = down_write_killable(&cxl_region_rwsem);
+ if (rc)
+ return rc;
+
+ rc = set_interleave_ways(cxlr, val);
up_write(&cxl_region_rwsem);
if (rc)
return rc;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (15 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:38 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
` (4 subsequent siblings)
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup forinterleave
granularity.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 39 +++++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 0f61c9e9b954..4113ee6daec9 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -540,21 +540,14 @@ static ssize_t interleave_granularity_show(struct device *dev,
return rc;
}
-static ssize_t interleave_granularity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_granularity(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- int rc, val;
+ int rc;
u16 ig;
- rc = kstrtoint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = granularity_to_eig(val, &ig);
if (rc)
return rc;
@@ -570,16 +563,30 @@ static ssize_t interleave_granularity_store(struct device *dev,
if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
return -EINVAL;
+ lockdep_assert_held_write(&cxl_region_rwsem);
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
+ return -EBUSY;
+
+ p->interleave_granularity = val;
+ return 0;
+}
+
+static ssize_t interleave_granularity_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ int rc, val;
+
+ rc = kstrtoint(buf, 0, &val);
+ if (rc)
+ return rc;
+
rc = down_write_killable(&cxl_region_rwsem);
if (rc)
return rc;
- if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
- rc = -EBUSY;
- goto out;
- }
- p->interleave_granularity = val;
-out:
+ rc = set_interleave_granularity(cxlr, val);
up_write(&cxl_region_rwsem);
if (rc)
return rc;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 18/22] cxl: Allow region creation by type2 drivers
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (16 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:37 ` Alison Schofield
2025-05-21 20:45 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
` (3 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Creating a CXL region requires userspace intervention through the cxl
sysfs files. Type2 support should allow accelerator drivers to create
such cxl region from kernel code.
Adding that functionality and integrating it with current support for
memory expanders.
Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 140 +++++++++++++++++++++++++++++++++++---
drivers/cxl/port.c | 5 +-
include/cxl/cxl.h | 4 ++
3 files changed, 140 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 4113ee6daec9..f82da914d125 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2316,6 +2316,21 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
return rc;
}
+/**
+ * cxl_accel_region_detach - detach a region from a Type2 device
+ *
+ * @cxled: Type2 endpoint decoder to detach the region from.
+ *
+ * Returns 0 or error.
+ */
+int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled)
+{
+ guard(rwsem_write)(&cxl_region_rwsem);
+ cxled->part = -1;
+ return cxl_region_detach(cxled);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_accel_region_detach, "CXL");
+
void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
{
down_write(&cxl_region_rwsem);
@@ -2822,6 +2837,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
return to_cxl_region(region_dev);
}
+static void drop_region(struct cxl_region *cxlr)
+{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+
+ devm_release_action(port->uport_dev, unregister_region, cxlr);
+}
+
static ssize_t delete_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -3526,14 +3549,12 @@ static int __construct_region(struct cxl_region *cxlr,
return 0;
}
-/* Establish an empty region covering the given HPA range */
-static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled)
+static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- int rc, part = READ_ONCE(cxled->part);
+ int part = READ_ONCE(cxled->part);
struct cxl_region *cxlr;
do {
@@ -3542,13 +3563,23 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
- if (IS_ERR(cxlr)) {
+ if (IS_ERR(cxlr))
dev_err(cxlmd->dev.parent,
"%s:%s: %s failed assign region: %ld\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
__func__, PTR_ERR(cxlr));
- return cxlr;
- }
+ return cxlr;
+};
+
+/* Establish an empty region covering the given HPA range */
+static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
+{
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_region *cxlr;
+ int rc;
+
+ cxlr = construct_region_begin(cxlrd, cxled);
rc = __construct_region(cxlr, cxlrd, cxled);
if (rc) {
@@ -3559,6 +3590,99 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+static struct cxl_region *
+__construct_new_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled, int ways)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+ struct cxl_region_params *p;
+ struct cxl_region *cxlr;
+ int rc;
+
+ cxlr = construct_region_begin(cxlrd, cxled);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ guard(rwsem_write)(&cxl_region_rwsem);
+
+ /*
+ * Sanity check. This should not happen with an accel driver handling
+ * the region creation.
+ */
+ p = &cxlr->params;
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+ dev_err(cxlmd->dev.parent,
+ "%s:%s: %s unexpected region state\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
+ __func__);
+ rc = -EBUSY;
+ goto err;
+ }
+
+ rc = set_interleave_ways(cxlr, ways);
+ if (rc)
+ goto err;
+
+ rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
+ if (rc)
+ goto err;
+
+ rc = alloc_hpa(cxlr, resource_size(cxled->dpa_res));
+ if (rc)
+ goto err;
+
+ scoped_guard(rwsem_read, &cxl_dpa_rwsem) {
+ rc = cxl_region_attach(cxlr, cxled, 0);
+ if (rc)
+ goto err;
+ }
+
+ if (rc)
+ goto err;
+
+ rc = cxl_region_decode_commit(cxlr);
+ if (rc)
+ goto err;
+
+ p->state = CXL_CONFIG_COMMIT;
+
+ return cxlr;
+err:
+ drop_region(cxlr);
+ return ERR_PTR(rc);
+}
+
+/**
+ * cxl_create_region - Establish a region given an endpoint decoder
+ * @cxlrd: root decoder to allocate HPA
+ * @cxled: endpoint decoder with reserved DPA capacity
+ * @ways: interleave ways required
+ *
+ * Returns a fully formed region in the commit state and attached to the
+ * cxl_region driver.
+ */
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled, int ways)
+{
+ struct cxl_region *cxlr;
+
+ scoped_guard(mutex, &cxlrd->range_lock) {
+ cxlr = __construct_new_region(cxlrd, cxled, ways);
+ if (IS_ERR(cxlr))
+ return cxlr;
+ }
+
+ if (device_attach(&cxlr->dev) <= 0) {
+ dev_err(&cxlr->dev, "failed to create region\n");
+ drop_region(cxlr);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return cxlr;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
+
int cxl_add_to_region(struct cxl_port *root, struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index a35fc5552845..69b8d8344029 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -33,6 +33,7 @@ static void schedule_detach(void *cxlmd)
static int discover_region(struct device *dev, void *root)
{
struct cxl_endpoint_decoder *cxled;
+ struct cxl_memdev *cxlmd;
int rc;
if (!is_endpoint_decoder(dev))
@@ -42,7 +43,9 @@ static int discover_region(struct device *dev, void *root)
if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
return 0;
- if (cxled->state != CXL_DECODER_STATE_AUTO)
+ cxlmd = cxled_to_memdev(cxled);
+ if (cxled->state != CXL_DECODER_STATE_AUTO ||
+ cxlmd->cxlds->type == CXL_DEVTYPE_DEVMEM)
return 0;
/*
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index b3ca0e988ae7..d9cd10537fb1 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -282,4 +282,8 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
enum cxl_partition_mode mode,
resource_size_t alloc);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled, int ways);
+
+int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (17 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
2025-05-21 20:49 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 20/22] sfc: create cxl region alejandro.lucero-palau
` (2 subsequent siblings)
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses. However, a dax interface could be just good enough in some cases.
Add a flag to a cxl region for specifically state to not create a dax
device. Allow a Type2 driver to set that flag at region creation time.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 10 +++++++++-
drivers/cxl/cxl.h | 3 +++
include/cxl/cxl.h | 3 ++-
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index f82da914d125..06647bae210f 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3658,12 +3658,14 @@ __construct_new_region(struct cxl_root_decoder *cxlrd,
* @cxlrd: root decoder to allocate HPA
* @cxled: endpoint decoder with reserved DPA capacity
* @ways: interleave ways required
+ * @no_dax: if true no DAX device should be created
*
* Returns a fully formed region in the commit state and attached to the
* cxl_region driver.
*/
struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled, int ways)
+ struct cxl_endpoint_decoder *cxled, int ways,
+ bool no_dax)
{
struct cxl_region *cxlr;
@@ -3679,6 +3681,9 @@ struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-ENODEV);
}
+ if (no_dax)
+ set_bit(CXL_REGION_F_NO_DAX, &cxlr->flags);
+
return cxlr;
}
EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
@@ -3842,6 +3847,9 @@ static int cxl_region_probe(struct device *dev)
if (rc)
return rc;
+ if (test_bit(CXL_REGION_F_NO_DAX, &cxlr->flags))
+ return 0;
+
switch (cxlr->mode) {
case CXL_PARTMODE_PMEM:
return devm_cxl_add_pmem_region(cxlr);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6fc6fd7b571d..8c418d62b0e4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -405,6 +405,9 @@ struct cxl_region_params {
*/
#define CXL_REGION_F_NEEDS_RESET 1
+/* Allow Type2 drivers to specify if a dax region should not be created. */
+#define CXL_REGION_F_NO_DAX 2
+
/**
* struct cxl_region - CXL region
* @dev: This region's device
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index d9cd10537fb1..867dd33adaff 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -283,7 +283,8 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
resource_size_t alloc);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled, int ways);
+ struct cxl_endpoint_decoder *cxled, int ways,
+ bool no_dax);
int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 20/22] sfc: create cxl region
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (18 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-21 21:01 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for creating a region using the endpoint decoder related to
a DPA range specifying no DAX device should be created.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 20db9aa382ec..960293a04ed3 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -110,10 +110,19 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
goto sfc_put_decoder;
}
+ cxl->efx_region = cxl_create_region(cxl->cxlrd, cxl->cxled, 1, true);
+ if (IS_ERR(cxl->efx_region)) {
+ pci_err(pci_dev, "CXL accel create region failed");
+ rc = PTR_ERR(cxl->efx_region);
+ goto err_region;
+ }
+
probe_data->cxl = cxl;
return 0;
+err_region:
+ cxl_dpa_free(cxl->cxled);
sfc_put_decoder:
cxl_put_root_decoder(cxl->cxlrd);
return rc;
@@ -122,6 +131,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
if (probe_data->cxl) {
+ cxl_accel_region_detach(probe_data->cxl->cxled);
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 21/22] cxl: Add function for obtaining region range
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (19 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 20/22] sfc: create cxl region alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-20 2:35 ` Alison Schofield
2025-05-21 21:31 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
21 siblings, 2 replies; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A CXL region struct contains the physical address to work with.
Type2 drivers can create a CXL region but have not access to the
related struct as it is defined as private by the kernel CXL core.
Add a function for getting the cxl region range to be used for mapping
such memory range by a Type2 driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
include/cxl/cxl.h | 2 ++
2 files changed, 25 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 06647bae210f..9b7c6b8304d6 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2726,6 +2726,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(rc);
}
+/**
+ * cxl_get_region_range - obtain range linked to a CXL region
+ *
+ * @region: a pointer to struct cxl_region
+ * @range: a pointer to a struct range to be set
+ *
+ * Returns 0 or error.
+ */
+int cxl_get_region_range(struct cxl_region *region, struct range *range)
+{
+ if (WARN_ON_ONCE(!region))
+ return -ENODEV;
+
+ if (!region->params.res)
+ return -ENOSPC;
+
+ range->start = region->params.res->start;
+ range->end = region->params.res->end;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_region_range, "CXL");
+
static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
{
return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 867dd33adaff..f6977eafd7e9 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -287,4 +287,6 @@ struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
bool no_dax);
int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled);
+struct range;
+int cxl_get_region_range(struct cxl_region *region, struct range *range);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* [PATCH v16 22/22] sfc: support pio mapping based on cxl
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
` (20 preceding siblings ...)
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
@ 2025-05-14 13:27 ` alejandro.lucero-palau
2025-05-21 21:48 ` Dan Williams
21 siblings, 1 reply; 84+ messages in thread
From: alejandro.lucero-palau @ 2025-05-14 13:27 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/net/ethernet/sfc/ef10.c | 50 +++++++++++++++++++++++----
drivers/net/ethernet/sfc/efx_cxl.c | 18 ++++++++++
drivers/net/ethernet/sfc/net_driver.h | 2 ++
drivers/net/ethernet/sfc/nic.h | 3 ++
4 files changed, 66 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 47349c148c0c..1a13fdbbc1b3 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -24,6 +24,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/udp_tunnel.h>
+#include "efx_cxl.h"
/* Hardware control for EF10 architecture including 'Huntington'. */
@@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
{
- MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
+ MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
efx->num_mac_stats);
}
+ if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
+ nic_data->datapath_caps3 = 0;
+ else
+ nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
+ GET_CAPABILITIES_V7_OUT_FLAGS3);
+
return 0;
}
@@ -919,6 +926,9 @@ static void efx_ef10_forget_old_piobufs(struct efx_nic *efx)
static void efx_ef10_remove(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
int rc;
#ifdef CONFIG_SFC_SRIOV
@@ -949,7 +959,12 @@ static void efx_ef10_remove(struct efx_nic *efx)
efx_mcdi_rx_free_indir_table(efx);
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
+#else
if (nic_data->wc_membase)
+#endif
iounmap(nic_data->wc_membase);
rc = efx_mcdi_free_vis(efx);
@@ -1140,6 +1155,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
unsigned int channel_vis, pio_write_vi_base, max_vis;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
unsigned int uc_mem_map_size, wc_mem_map_size;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
void __iomem *membase;
int rc;
@@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
iounmap(efx->membase);
efx->membase = membase;
- /* Set up the WC mapping if needed */
- if (wc_mem_map_size) {
+ if (!wc_mem_map_size)
+ goto skip_pio;
+
+ /* Set up the WC mapping */
+
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if ((nic_data->datapath_caps3 &
+ (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
+ probe_data->cxl_pio_initialised) {
+ /* Using PIO through CXL mapping? */
+ nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
+ (pio_write_vi_base * efx->vi_stride +
+ ER_DZ_TX_PIOBUF - uc_mem_map_size);
+ probe_data->cxl_pio_in_use = true;
+ } else
+#endif
+ {
+ /* Using legacy PIO BAR mapping */
nic_data->wc_membase = ioremap_wc(efx->membase_phys +
uc_mem_map_size,
wc_mem_map_size);
@@ -1279,12 +1314,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
nic_data->wc_membase +
(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
uc_mem_map_size);
-
- rc = efx_ef10_link_piobufs(efx);
- if (rc)
- efx_ef10_free_piobufs(efx);
}
+ rc = efx_ef10_link_piobufs(efx);
+ if (rc)
+ efx_ef10_free_piobufs(efx);
+
+skip_pio:
netif_dbg(efx, probe, efx->net_dev,
"memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
&efx->membase_phys, efx->membase, uc_mem_map_size,
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 960293a04ed3..efa3cf1a56c3 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -28,6 +28,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
};
resource_size_t max_size;
struct efx_cxl *cxl;
+ struct range range;
u16 dvsec;
int rc;
@@ -117,10 +118,26 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
goto err_region;
}
+ rc = cxl_get_region_range(cxl->efx_region, &range);
+ if (rc) {
+ pci_err(pci_dev, "CXL getting regions params failed");
+ goto err_region_params;
+ }
+
+ cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) pfailed", &range);
+ rc = -ENOMEM;
+ goto err_region_params;
+ }
+
probe_data->cxl = cxl;
+ probe_data->cxl_pio_initialised = true;
return 0;
+err_region_params:
+ cxl_accel_region_detach(cxl->cxled);
err_region:
cxl_dpa_free(cxl->cxled);
sfc_put_decoder:
@@ -131,6 +148,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
if (probe_data->cxl) {
+ iounmap(probe_data->cxl->ctpio_cxl);
cxl_accel_region_detach(probe_data->cxl->cxled);
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 0e685b8a9980..894b62d6ada9 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1209,6 +1209,7 @@ struct efx_cxl;
* @efx: Efx NIC details
* @cxl: details of related cxl objects
* @cxl_pio_initialised: cxl initialization outcome.
+ * @cxl_pio_in_use: PIO using CXL mapping
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
@@ -1216,6 +1217,7 @@ struct efx_probe_data {
#ifdef CONFIG_SFC_CXL
struct efx_cxl *cxl;
bool cxl_pio_initialised;
+ bool cxl_pio_in_use;
#endif
};
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 9fa5c4c713ab..c87cc9214690 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -152,6 +152,8 @@ enum {
* %MC_CMD_GET_CAPABILITIES response)
* @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
* %MC_CMD_GET_CAPABILITIES response)
+ * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
+ * %MC_CMD_GET_CAPABILITIES response)
* @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
* @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
* @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
@@ -186,6 +188,7 @@ struct efx_ef10_nic_data {
bool must_check_datapath_caps;
u32 datapath_caps;
u32 datapath_caps2;
+ u32 datapath_caps3;
unsigned int rx_dpcpu_fw_id;
unsigned int tx_dpcpu_fw_id;
bool must_probe_vswitching;
--
2.34.1
^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v16 21/22] cxl: Add function for obtaining region range
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
@ 2025-05-20 2:35 ` Alison Schofield
2025-05-21 21:31 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:35 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron
On Wed, May 14, 2025 at 02:27:42PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> A CXL region struct contains the physical address to work with.
>
> Type2 drivers can create a CXL region but have not access to the
> related struct as it is defined as private by the kernel CXL core.
> Add a function for getting the cxl region range to be used for mapping
> such memory range by a Type2 driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2025-05-20 2:36 ` Alison Schofield
2025-05-21 19:31 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:36 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Jonathan Cameron
On Wed, May 14, 2025 at 02:27:32PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from device DPA
> (device-physical-address space) and assigning it to decode a given HPA
> (host-physical-address space). Before determining how much DPA to
> allocate the amount of available HPA must be determined. Also, not all
> HPA is created equal, some specifically targets RAM, some target PMEM,
> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
> is host-only (HDM-H).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
@ 2025-05-20 2:36 ` Alison Schofield
2025-05-21 20:49 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:36 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron, Ben Cheatham
On Wed, May 14, 2025 at 02:27:40PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> By definition a type2 cxl device will use the host managed memory for
> specific functionality, therefore it should not be available to other
> uses. However, a dax interface could be just good enough in some cases.
>
> Add a flag to a cxl region for specifically state to not create a dax
> device. Allow a Type2 driver to set that flag at region creation time.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 18/22] cxl: Allow region creation by type2 drivers
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2025-05-20 2:37 ` Alison Schofield
2025-05-21 20:45 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:37 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Jonathan Cameron
On Wed, May 14, 2025 at 02:27:39PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
>
> Adding that functionality and integrating it with current support for
> memory expanders.
>
> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 16/22] cxl/region: Factor out interleave ways setup
2025-05-14 13:27 ` [PATCH v16 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2025-05-20 2:37 ` Alison Schofield
0 siblings, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:37 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron, Ben Cheatham
On Wed, May 14, 2025 at 02:27:37PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation based on Type3 devices is triggered from user space
> allowing memory combination through interleaving.
>
> In preparation for kernel driven region creation, that is Type2 drivers
> triggering region creation backed with its advertised CXL memory, factor
> out a common helper from the user-sysfs region setup for interleave ways.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup
2025-05-14 13:27 ` [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2025-05-20 2:38 ` Alison Schofield
0 siblings, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:38 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron, Ben Cheatham
On Wed, May 14, 2025 at 02:27:38PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation based on Type3 devices is triggered from user space
> allowing memory combination through interleaving.
>
> In preparation for kernel driven region creation, that is Type2 drivers
> triggering region creation backed with its advertised CXL memory, factor
> out a common helper from the user-sysfs region setup forinterleave
> granularity.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 15/22] cxl: Make region type based on endpoint type
2025-05-14 13:27 ` [PATCH v16 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2025-05-20 2:39 ` Alison Schofield
0 siblings, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:39 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Zhi Wang,
Jonathan Cameron, Ben Cheatham
On Wed, May 14, 2025 at 02:27:36PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
> Support for Type2 implies region type needs to be based on the endpoint
> type instead.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2025-05-20 2:39 ` Alison Schofield
2025-05-21 20:23 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:39 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham,
Jonathan Cameron
On Wed, May 14, 2025 at 02:27:34PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 09/22] cxl: Prepare memdev creation for type2
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:49 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:40 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham,
Jonathan Cameron
On Wed, May 14, 2025 at 02:27:30PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
> creating a memdev leading to problems when obtaining cxl_memdev_state
> references from a CXL_DEVTYPE_DEVMEM type.
>
> Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
> support.
>
> Make devm_cxl_add_memdev accessible from a accel driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
@ 2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:47 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:40 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham,
Jonathan Cameron
On Wed, May 14, 2025 at 02:27:28PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DMA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile and/or non-volatile hardware partitions.
>
> Export cxl_dpa_setup as well for initializing those added DPA partitions
> with the proper resources.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
@ 2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:28 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:41 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham,
Jonathan Cameron
On Wed, May 14, 2025 at 02:27:26PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Create a new function for a type2 device initialising
> cxl_dev_state struct regarding cxl regs setup and mapping.
>
> Export the capabilities found for checking them against the
> expected ones by the driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
@ 2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:23 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:41 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Jonathan Cameron
On Wed, May 14, 2025 at 02:27:25PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 has some mandatory capabilities which are optional for Type2.
>
> In order to support same register/capability discovery code for both
> types, avoid any assumption about what capabilities should be there, and
> export the capabilities found for the caller doing the capabilities
> check based on the expected ones.
>
> Add a function for facilitating the report of capabilities missing the
> expected ones.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 03/22] cxl: Move pci generic code
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
@ 2025-05-20 2:42 ` Alison Schofield
2025-05-21 17:44 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:42 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Ben Cheatham,
Fan Ni, Jonathan Cameron
On Wed, May 14, 2025 at 02:27:24PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
> meanwhile cxl/pci.c implements the functionality for a Type3 device
> initialization.
>
> Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
> exported and shared with CXL Type2 device initialization.
>
> Fix cxl mock tests affected by the code move.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
@ 2025-05-20 2:43 ` Alison Schofield
2025-05-20 7:18 ` Alejandro Lucero Palau
2025-05-20 7:17 ` dan.j.williams
1 sibling, 1 reply; 84+ messages in thread
From: Alison Schofield @ 2025-05-20 2:43 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero, Jonathan Cameron
On Wed, May 14, 2025 at 02:27:22PM +0100, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Differentiate CXL memory expanders (type 3) from CXL device accelerators
> (type 2) with a new function for initializing cxl_dev_state and a macro
> for helping accel drivers to embed cxl_dev_state inside a private
> struct.
>
> Move structs to include/cxl as the size of the accel driver private
> struct embedding cxl_dev_state needs to know the size of this struct.
>
> Use same new initialization with the type3 pci driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
snip
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-05-20 2:43 ` Alison Schofield
@ 2025-05-20 7:17 ` dan.j.williams
2025-05-21 10:44 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: dan.j.williams @ 2025-05-20 7:17 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Differentiate CXL memory expanders (type 3) from CXL device accelerators
> (type 2) with a new function for initializing cxl_dev_state and a macro
> for helping accel drivers to embed cxl_dev_state inside a private
> struct.
>
> Move structs to include/cxl as the size of the accel driver private
> struct embedding cxl_dev_state needs to know the size of this struct.
>
> Use same new initialization with the type3 pci driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/mbox.c | 11 +-
> drivers/cxl/core/memdev.c | 32 +++++
> drivers/cxl/core/pci.c | 1 +
> drivers/cxl/core/regs.c | 1 +
> drivers/cxl/cxl.h | 97 +--------------
> drivers/cxl/cxlmem.h | 88 +-------------
> drivers/cxl/cxlpci.h | 21 ----
> drivers/cxl/pci.c | 17 +--
> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
> include/cxl/pci.h | 23 ++++
> tools/testing/cxl/test/mem.c | 3 +-
> 11 files changed, 305 insertions(+), 215 deletions(-)
> create mode 100644 include/cxl/cxl.h
> create mode 100644 include/cxl/pci.h
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index d72764056ce6..ab994d459f46 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1484,23 +1484,20 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>
> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
> + u16 dvsec)
> {
> struct cxl_memdev_state *mds;
> int rc;
>
> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
> + mds = cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial, dvsec,
> + struct cxl_memdev_state, cxlds, true);
Existing cxl_memdev_state_create() callers expect that any state
allocation is managed by devres.
It is ok to make cxl_memdev_state_create() manually allocate, but then
you still need to take care of existing caller expectations.
> if (!mds) {
> dev_err(dev, "No memory available\n");
> return ERR_PTR(-ENOMEM);
> }
>
> mutex_init(&mds->event.log_lock);
> - mds->cxlds.dev = dev;
> - mds->cxlds.reg_map.host = dev;
> - mds->cxlds.cxl_mbox.host = dev;
> - mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
> - mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
>
> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
Ugh, but this function is now confused as some resources are devm, and
some are manual alloc. this is a bit of a mess. Like why does this need
to dev_warn() every boot on MCE-less archs like ARM64?
I was trying to keep the incremental fixup small, but this makes it
bigger, and likely means we need to clean this up before this patch.
Ugh2, looks like this current arrangment will cause a NULL pointer
de-reference if an MCE fires between cxl_memdev_state_create() and
devm_cxl_add_memdev().
Ugh3, looks like the MCE is registered once per memdev, but triggers
memory_failure() once per spa match. That really wants to be registered
once per-region.
That whole situation needs a rethink, but for now make the other
cleanups a TODO.
> if (rc == -EOPNOTSUPP)
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index a16a5886d40a..6cc732aeb9de 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -633,6 +633,38 @@ static void detach_memdev(struct work_struct *work)
>
> static struct lock_class_key cxl_memdev_key;
>
> +void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
> + enum cxl_devtype type, u64 serial, u16 dvsec,
> + bool has_mbox)
As far as I can see this can be static as _cxl_dev_state_create() is the
only caller in this whole series. Fixup included below:
> +{
> + *cxlds = (struct cxl_dev_state) {
> + .dev = dev,
> + .type = type,
> + .serial = serial,
> + .cxl_dvsec = dvsec,
> + .reg_map.host = dev,
> + .reg_map.resource = CXL_RESOURCE_NONE,
> + };
> +
> + if (has_mbox)
> + cxlds->cxl_mbox.host = dev;
> +}
> +
> +struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
> + enum cxl_devtype type, u64 serial,
> + u16 dvsec, size_t size,
> + bool has_mbox)
> +{
> + struct cxl_dev_state *cxlds __free(kfree) = kzalloc(size, GFP_KERNEL);
> +
> + if (!cxlds)
> + return NULL;
> +
> + cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
> + return_ptr(cxlds);
This function is so simple, there is no need to use scope-based cleanup.
-- 8< --
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index ab994d459f46..5664514dfb83 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1488,27 +1488,48 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
u16 dvsec)
{
struct cxl_memdev_state *mds;
- int rc;
mds = cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial, dvsec,
struct cxl_memdev_state, cxlds, true);
- if (!mds) {
- dev_err(dev, "No memory available\n");
+ if (!mds)
return ERR_PTR(-ENOMEM);
- }
mutex_init(&mds->event.log_lock);
- rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
- if (rc == -EOPNOTSUPP)
- dev_warn(dev, "CXL MCE unsupported\n");
- else if (rc)
- return ERR_PTR(rc);
+ /* TODO: move this registration to cxl_region_probe() */
+ cxl_register_mce_notifier(mds);
return mds;
}
EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, "CXL");
+void cxl_memdev_state_destroy(struct cxl_memdev_state *mds)
+{
+ cxl_unregister_mce_notifier(mds);
+ kfree(mds);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_destroy, "CXL");
+
+static void mds_destroy(void *mds)
+{
+ cxl_memdev_state_destroy(mds);
+}
+
+struct cxl_memdev_state *devm_cxl_memdev_state_create(struct device *dev,
+ u64 serial, u16 dvsec)
+{
+ struct cxl_memdev_state *mds = cxl_memdev_state_create(dev, serial, dvsec);
+ int rc;
+
+ if (IS_ERR(mds))
+ return mds;
+ rc = devm_add_action_or_reset(dev, mds_destroy, mds);
+ if (rc)
+ return ERR_PTR(rc);
+ return mds;
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_state_create, "CXL");
+
void __init cxl_mbox_init(void)
{
struct dentry *mbox_debugfs;
diff --git a/drivers/cxl/core/mce.c b/drivers/cxl/core/mce.c
index ff8d078c6ca1..71cc650f54ae 100644
--- a/drivers/cxl/core/mce.c
+++ b/drivers/cxl/core/mce.c
@@ -48,18 +48,16 @@ static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
return NOTIFY_OK;
}
-static void cxl_unregister_mce_notifier(void *mce_notifier)
+void cxl_unregister_mce_notifier(struct cxl_memdev_state *mds)
{
- mce_unregister_decode_chain(mce_notifier);
+ mce_unregister_decode_chain(&mds->mce_notifier);
}
-int devm_cxl_register_mce_notifier(struct device *dev,
- struct notifier_block *mce_notifier)
+void cxl_register_mce_notifier(struct cxl_memdev_state *mds)
{
+ struct notifier_block *mce_notifier = &mds->mce_notifier;
+
mce_notifier->notifier_call = cxl_handle_mce;
mce_notifier->priority = MCE_PRIO_UC;
- mce_register_decode_chain(mce_notifier);
-
- return devm_add_action_or_reset(dev, cxl_unregister_mce_notifier,
- mce_notifier);
+ mce_register_decode_chain(&mds->mce_notifier);
}
diff --git a/drivers/cxl/core/mce.h b/drivers/cxl/core/mce.h
index ace73424eeb6..b7930761b0d6 100644
--- a/drivers/cxl/core/mce.h
+++ b/drivers/cxl/core/mce.h
@@ -4,16 +4,17 @@
#define _CXL_CORE_MCE_H_
#include <linux/notifier.h>
+#include <asm/mce.h>
#ifdef CONFIG_CXL_MCE
-int devm_cxl_register_mce_notifier(struct device *dev,
- struct notifier_block *mce_notifer);
+void cxl_unregister_mce_notifier(struct cxl_memdev_state *mds);
+void cxl_register_mce_notifier(struct cxl_memdev_state *mds);
#else
-static inline int
-devm_cxl_register_mce_notifier(struct device *dev,
- struct notifier_block *mce_notifier)
+static inline void cxl_unregister_mce_notifier(struct cxl_memdev_state *mds)
+{
+}
+void cxl_register_mce_notifier(struct cxl_memdev_state *mds)
{
- return -EOPNOTSUPP;
}
#endif
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 6cc732aeb9de..3baf5b4502d0 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -633,7 +633,7 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
-void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
enum cxl_devtype type, u64 serial, u16 dvsec,
bool has_mbox)
{
@@ -655,13 +655,13 @@ struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
u16 dvsec, size_t size,
bool has_mbox)
{
- struct cxl_dev_state *cxlds __free(kfree) = kzalloc(size, GFP_KERNEL);
+ struct cxl_dev_state *cxlds = kzalloc(size, GFP_KERNEL);
if (!cxlds)
return NULL;
cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
- return_ptr(cxlds);
+ return cxlds;
}
EXPORT_SYMBOL_NS_GPL(_cxl_dev_state_create, "CXL");
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e7cd31b9f107..897589a6c6ca 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -416,7 +416,9 @@ struct cxl_memdev_state {
struct cxl_poison_state poison;
struct cxl_security_state security;
struct cxl_fw_state fw;
+#ifdef CONFIG_CXL_MCE
struct notifier_block mce_notifier;
+#endif
};
static inline struct cxl_memdev_state *
@@ -755,9 +757,11 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
u16 dvsec);
-void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
- enum cxl_devtype type, u64 serial, u16 dvsec,
- bool has_mbox);
+void cxl_memdev_state_destroy(struct cxl_memdev_state *mds);
+
+struct cxl_memdev_state *devm_cxl_memdev_state_create(struct device *dev,
+ u64 serial, u16 dvsec);
+
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 0d3c67867965..d5447c7d540f 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -933,7 +933,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
dev_warn(&pdev->dev,
"Device DVSEC not present, skip CXL.mem init\n");
- mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
+ mds = devm_cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
if (IS_ERR(mds))
return PTR_ERR(mds);
cxlds = &mds->cxlds;
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index e62cb5049cf5..c40afc743451 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1717,7 +1717,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
+ mds = devm_cxl_memdev_state_create(dev, pdev->id + 1, 0);
if (IS_ERR(mds))
return PTR_ERR(mds);
^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-20 2:43 ` Alison Schofield
@ 2025-05-20 7:18 ` Alejandro Lucero Palau
2025-05-20 20:06 ` Dave Jiang
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-20 7:18 UTC (permalink / raw)
To: Alison Schofield, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Jonathan Cameron
Hi Allison,
On 5/20/25 03:43, Alison Schofield wrote:
> On Wed, May 14, 2025 at 02:27:22PM +0100, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>> (type 2) with a new function for initializing cxl_dev_state and a macro
>> for helping accel drivers to embed cxl_dev_state inside a private
>> struct.
>>
>> Move structs to include/cxl as the size of the accel driver private
>> struct embedding cxl_dev_state needs to know the size of this struct.
>>
>> Use same new initialization with the type3 pci driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>
> snip
Thank you for all the review tags. Much appreciated.
I'm afraid Dave merged the patchset some hours ago. Maybe he can still
add your tags at some point since I think he is using a specific branch
for this merge which will likely be merged to another one in the next days.
Dave, can you reply here for knowing you have read it?
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-14 13:27 ` [PATCH v16 02/22] sfc: add cxl support alejandro.lucero-palau
@ 2025-05-20 7:37 ` dan.j.williams
2025-05-21 10:50 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: dan.j.williams @ 2025-05-20 7:37 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Add CXL initialization based on new CXL API for accel drivers and make
> it dependent on kernel CXL configuration.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> ---
> drivers/net/ethernet/sfc/Kconfig | 9 +++++
> drivers/net/ethernet/sfc/Makefile | 1 +
> drivers/net/ethernet/sfc/efx.c | 15 +++++++-
> drivers/net/ethernet/sfc/efx_cxl.c | 55 +++++++++++++++++++++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
> drivers/net/ethernet/sfc/net_driver.h | 10 +++++
> 6 files changed, 129 insertions(+), 1 deletion(-)
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
>
[..]
> +int efx_cxl_init(struct efx_probe_data *probe_data)
> +{
> + struct efx_nic *efx = &probe_data->efx;
> + struct pci_dev *pci_dev = efx->pci_dev;
> + struct efx_cxl *cxl;
> + u16 dvsec;
> +
> + probe_data->cxl_pio_initialised = false;
> +
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (!dvsec)
> + return 0;
> +
> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> +
> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
> + * specifying no mbox available.
> + */
> + cxl = cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_dev->dev.id, dvsec, struct efx_cxl,
> + cxlds, false);
> +
> + if (!cxl)
> + return -ENOMEM;
> +
> + probe_data->cxl = cxl;
> +
> + return 0;
> +}
> +
> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> +{
So this is empty which means it leaks the cxl_dev_state_create()
allocation, right?
The motivation for the cxl_dev_state_create() macro is so that
you do not need to manage more independently allocated driver objects.
For example, the existing kfree(probe_data) can also free the
cxl_dev_state with a change like below (UNTESTED).
Otherwise, something needs to responsible for freeing 'struct efx_cxl'
-- 8< --
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 112e55b98ed3..0135384c6fa1 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -1149,13 +1149,22 @@ static int efx_pci_probe_post_io(struct efx_nic *efx)
static int efx_pci_probe(struct pci_dev *pci_dev,
const struct pci_device_id *entry)
{
- struct efx_probe_data *probe_data, **probe_ptr;
+ struct efx_probe_data *probe_data = NULL, **probe_ptr;
struct net_device *net_dev;
struct efx_nic *efx;
int rc;
/* Allocate probe data and struct efx_nic */
- probe_data = kzalloc(sizeof(*probe_data), GFP_KERNEL);
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ CXL_DVSEC_PCIE_DEVICE);
+ if (dvsec) {
+ cxl = cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_dev->dev.id, dvsec,
+ struct efx_probe_data, cxl.cxlds, false);
+ if (cxl)
+ probe_data = container_of(cxl, typeof(*probe_data), cxl.cxlds);
+ } else
+ probe_data = kzalloc(sizeof(*probe_data), GFP_KERNEL);
if (!probe_data)
return -ENOMEM;
probe_data->pci_dev = pci_dev;
^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-20 7:18 ` Alejandro Lucero Palau
@ 2025-05-20 20:06 ` Dave Jiang
2025-05-21 9:30 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dave Jiang @ 2025-05-20 20:06 UTC (permalink / raw)
To: Alejandro Lucero Palau, Alison Schofield, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, Jonathan Cameron
On 5/20/25 12:18 AM, Alejandro Lucero Palau wrote:
> Hi Allison,
>
> On 5/20/25 03:43, Alison Schofield wrote:
>> On Wed, May 14, 2025 at 02:27:22PM +0100, alejandro.lucero-palau@amd.com wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>>> (type 2) with a new function for initializing cxl_dev_state and a macro
>>> for helping accel drivers to embed cxl_dev_state inside a private
>>> struct.
>>>
>>> Move structs to include/cxl as the size of the accel driver private
>>> struct embedding cxl_dev_state needs to know the size of this struct.
>>>
>>> Use same new initialization with the type3 pci driver.
>>>
>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>>
>> snip
>
>
> Thank you for all the review tags. Much appreciated.
>
>
> I'm afraid Dave merged the patchset some hours ago. Maybe he can still add your tags at some point since I think he is using a specific branch for this merge which will likely be merged to another one in the next days.
>
>
> Dave, can you reply here for knowing you have read it?
Hi Alejandro, sorry about the confusion. The branch I had you test was a test branch for my local testing before merging. The code has not been officially merged as it was not pushed to cxl/next branch upstream. It looks like there are some concerns we need to address from Alison and Dan.
>
>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-20 20:06 ` Dave Jiang
@ 2025-05-21 9:30 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-21 9:30 UTC (permalink / raw)
To: Dave Jiang, Alison Schofield, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, Jonathan Cameron
On 5/20/25 21:06, Dave Jiang wrote:
>
> On 5/20/25 12:18 AM, Alejandro Lucero Palau wrote:
>> Hi Allison,
>>
>> On 5/20/25 03:43, Alison Schofield wrote:
>>> On Wed, May 14, 2025 at 02:27:22PM +0100, alejandro.lucero-palau@amd.com wrote:
>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>
>>>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>>>> (type 2) with a new function for initializing cxl_dev_state and a macro
>>>> for helping accel drivers to embed cxl_dev_state inside a private
>>>> struct.
>>>>
>>>> Move structs to include/cxl as the size of the accel driver private
>>>> struct embedding cxl_dev_state needs to know the size of this struct.
>>>>
>>>> Use same new initialization with the type3 pci driver.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>>>
>>> snip
>>
>> Thank you for all the review tags. Much appreciated.
>>
>>
>> I'm afraid Dave merged the patchset some hours ago. Maybe he can still add your tags at some point since I think he is using a specific branch for this merge which will likely be merged to another one in the next days.
>>
>>
>> Dave, can you reply here for knowing you have read it?
> Hi Alejandro, sorry about the confusion. The branch I had you test was a test branch for my local testing before merging. The code has not been officially merged as it was not pushed to cxl/next branch upstream. It looks like there are some concerns we need to address from Alison and Dan.
Hi Dave. No worries.
I'll address Dan's concerns and send then a v17. AFAIK, there are no
concerns from Allison, only review tags.
Thanks
>>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 01/22] cxl: Add type2 device basic support
2025-05-20 7:17 ` dan.j.williams
@ 2025-05-21 10:44 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-21 10:44 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 5/20/25 08:17, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Differentiate CXL memory expanders (type 3) from CXL device accelerators
>> (type 2) with a new function for initializing cxl_dev_state and a macro
>> for helping accel drivers to embed cxl_dev_state inside a private
>> struct.
>>
>> Move structs to include/cxl as the size of the accel driver private
>> struct embedding cxl_dev_state needs to know the size of this struct.
>>
>> Use same new initialization with the type3 pci driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/cxl/core/mbox.c | 11 +-
>> drivers/cxl/core/memdev.c | 32 +++++
>> drivers/cxl/core/pci.c | 1 +
>> drivers/cxl/core/regs.c | 1 +
>> drivers/cxl/cxl.h | 97 +--------------
>> drivers/cxl/cxlmem.h | 88 +-------------
>> drivers/cxl/cxlpci.h | 21 ----
>> drivers/cxl/pci.c | 17 +--
>> include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
>> include/cxl/pci.h | 23 ++++
>> tools/testing/cxl/test/mem.c | 3 +-
>> 11 files changed, 305 insertions(+), 215 deletions(-)
>> create mode 100644 include/cxl/cxl.h
>> create mode 100644 include/cxl/pci.h
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index d72764056ce6..ab994d459f46 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1484,23 +1484,20 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
>>
>> -struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
>> +struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
>> + u16 dvsec)
>> {
>> struct cxl_memdev_state *mds;
>> int rc;
>>
>> - mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
>> + mds = cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial, dvsec,
>> + struct cxl_memdev_state, cxlds, true);
> Existing cxl_memdev_state_create() callers expect that any state
> allocation is managed by devres.
>
> It is ok to make cxl_memdev_state_create() manually allocate, but then
> you still need to take care of existing caller expectations.
I'm surprised to see this. I think the expectation was the
cxl_dev_state_create to be managed by devres, and somehow it ended up
without it.
I think the best option here is to add it. I do not think it needs any
special action but just the freeing of that memory.
>> if (!mds) {
>> dev_err(dev, "No memory available\n");
>> return ERR_PTR(-ENOMEM);
>> }
>>
>> mutex_init(&mds->event.log_lock);
>> - mds->cxlds.dev = dev;
>> - mds->cxlds.reg_map.host = dev;
>> - mds->cxlds.cxl_mbox.host = dev;
>> - mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
>> - mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
>>
>> rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
> Ugh, but this function is now confused as some resources are devm, and
> some are manual alloc. this is a bit of a mess. Like why does this need
> to dev_warn() every boot on MCE-less archs like ARM64?
>
> I was trying to keep the incremental fixup small, but this makes it
> bigger, and likely means we need to clean this up before this patch.
>
> Ugh2, looks like this current arrangment will cause a NULL pointer
> de-reference if an MCE fires between cxl_memdev_state_create() and
> devm_cxl_add_memdev().
>
> Ugh3, looks like the MCE is registered once per memdev, but triggers
> memory_failure() once per spa match. That really wants to be registered
> once per-region.
>
> That whole situation needs a rethink, but for now make the other
> cleanups a TODO.
I can do those TODOs but I do not think that is relevant for the patch.
I mean, you have discovered a problem there but this patch is not
introducing the problem, AFAIK.
>
>> if (rc == -EOPNOTSUPP)
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index a16a5886d40a..6cc732aeb9de 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -633,6 +633,38 @@ static void detach_memdev(struct work_struct *work)
>>
>> static struct lock_class_key cxl_memdev_key;
>>
>> +void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
>> + enum cxl_devtype type, u64 serial, u16 dvsec,
>> + bool has_mbox)
> As far as I can see this can be static as _cxl_dev_state_create() is the
> only caller in this whole series. Fixup included below:
Sure.
>> +{
>> + *cxlds = (struct cxl_dev_state) {
>> + .dev = dev,
>> + .type = type,
>> + .serial = serial,
>> + .cxl_dvsec = dvsec,
>> + .reg_map.host = dev,
>> + .reg_map.resource = CXL_RESOURCE_NONE,
>> + };
>> +
>> + if (has_mbox)
>> + cxlds->cxl_mbox.host = dev;
>> +}
>> +
>> +struct cxl_dev_state *_cxl_dev_state_create(struct device *dev,
>> + enum cxl_devtype type, u64 serial,
>> + u16 dvsec, size_t size,
>> + bool has_mbox)
>> +{
>> + struct cxl_dev_state *cxlds __free(kfree) = kzalloc(size, GFP_KERNEL);
>> +
>> + if (!cxlds)
>> + return NULL;
>> +
>> + cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
>> + return_ptr(cxlds);
> This function is so simple, there is no need to use scope-based cleanup.
>
OK. I'll do so.
Thanks!
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-20 7:37 ` dan.j.williams
@ 2025-05-21 10:50 ` Alejandro Lucero Palau
2025-05-21 17:12 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-21 10:50 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron, Edward Cree
On 5/20/25 08:37, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Add CXL initialization based on new CXL API for accel drivers and make
>> it dependent on kernel CXL configuration.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> ---
>> drivers/net/ethernet/sfc/Kconfig | 9 +++++
>> drivers/net/ethernet/sfc/Makefile | 1 +
>> drivers/net/ethernet/sfc/efx.c | 15 +++++++-
>> drivers/net/ethernet/sfc/efx_cxl.c | 55 +++++++++++++++++++++++++++
>> drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
>> drivers/net/ethernet/sfc/net_driver.h | 10 +++++
>> 6 files changed, 129 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
>> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
>>
> [..]
>> +int efx_cxl_init(struct efx_probe_data *probe_data)
>> +{
>> + struct efx_nic *efx = &probe_data->efx;
>> + struct pci_dev *pci_dev = efx->pci_dev;
>> + struct efx_cxl *cxl;
>> + u16 dvsec;
>> +
>> + probe_data->cxl_pio_initialised = false;
>> +
>> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
>> + CXL_DVSEC_PCIE_DEVICE);
>> + if (!dvsec)
>> + return 0;
>> +
>> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>> +
>> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
>> + * specifying no mbox available.
>> + */
>> + cxl = cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
>> + pci_dev->dev.id, dvsec, struct efx_cxl,
>> + cxlds, false);
>> +
>> + if (!cxl)
>> + return -ENOMEM;
>> +
>> + probe_data->cxl = cxl;
>> +
>> + return 0;
>> +}
>> +
>> +void efx_cxl_exit(struct efx_probe_data *probe_data)
>> +{
> So this is empty which means it leaks the cxl_dev_state_create()
> allocation, right?
Yes, because I was wrongly relying on devres ...
Previous patchsets were doing the explicit release here.
Your suggestion below relies on adding more awareness of cxl into
generic efx code, what we want to avoid using the specific efx_cxl.* files.
As I mentioned in patch 1, I think the right thing to do is to add
devres for cxl_dev_state_create.
Before sending v17 with this change, are you ok with the rest of the
patches or you want to go through them as well?
Thanks
>
> The motivation for the cxl_dev_state_create() macro is so that
> you do not need to manage more independently allocated driver objects.
> For example, the existing kfree(probe_data) can also free the
> cxl_dev_state with a change like below (UNTESTED).
>
> Otherwise, something needs to responsible for freeing 'struct efx_cxl'
>
> -- 8< --
> diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
> index 112e55b98ed3..0135384c6fa1 100644
> --- a/drivers/net/ethernet/sfc/efx.c
> +++ b/drivers/net/ethernet/sfc/efx.c
> @@ -1149,13 +1149,22 @@ static int efx_pci_probe_post_io(struct efx_nic *efx)
> static int efx_pci_probe(struct pci_dev *pci_dev,
> const struct pci_device_id *entry)
> {
> - struct efx_probe_data *probe_data, **probe_ptr;
> + struct efx_probe_data *probe_data = NULL, **probe_ptr;
> struct net_device *net_dev;
> struct efx_nic *efx;
> int rc;
>
> /* Allocate probe data and struct efx_nic */
> - probe_data = kzalloc(sizeof(*probe_data), GFP_KERNEL);
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + CXL_DVSEC_PCIE_DEVICE);
> + if (dvsec) {
> + cxl = cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_dev->dev.id, dvsec,
> + struct efx_probe_data, cxl.cxlds, false);
> + if (cxl)
> + probe_data = container_of(cxl, typeof(*probe_data), cxl.cxlds);
> + } else
> + probe_data = kzalloc(sizeof(*probe_data), GFP_KERNEL);
> if (!probe_data)
> return -ENOMEM;
> probe_data->pci_dev = pci_dev;
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-21 10:50 ` Alejandro Lucero Palau
@ 2025-05-21 17:12 ` Dan Williams
2025-05-22 8:49 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 17:12 UTC (permalink / raw)
To: Alejandro Lucero Palau, dan.j.williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Jonathan Cameron, Edward Cree
Alejandro Lucero Palau wrote:
[..]
> >> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> >> +{
> > So this is empty which means it leaks the cxl_dev_state_create()
> > allocation, right?
>
>
> Yes, because I was wrongly relying on devres ...
>
>
> Previous patchsets were doing the explicit release here.
>
>
> Your suggestion below relies on adding more awareness of cxl into
> generic efx code, what we want to avoid using the specific efx_cxl.* files.
>
> As I mentioned in patch 1, I think the right thing to do is to add
> devres for cxl_dev_state_create.
...but I thought netdev is anti-devres? I am ok having a
devm_cxl_dev_state_create() alongside a "manual" cxl_dev_state_create()
if that is the case.
> Before sending v17 with this change, are you ok with the rest of the
> patches or you want to go through them as well?
So I did start taking a look and then turned away upon finding a
memory-leak on the first 2 patches in the series. I will continue going
through it, but in general the lifetime and locking rules of the CXL
subsystem continue to be a source of trouble in new enabling. At a
minimum that indicates a need/opportunity to review the rules at a
future CXL collab meeting.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 03/22] cxl: Move pci generic code
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
2025-05-20 2:42 ` Alison Schofield
@ 2025-05-21 17:44 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-21 17:44 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
> meanwhile cxl/pci.c implements the functionality for a Type3 device
> initialization.
>
> Move helper functions from cxl/pci.c to cxl/core/pci.c in order to be
> exported and shared with CXL Type2 device initialization.
>
> Fix cxl mock tests affected by the code move.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Fan Ni <fan.ni@samsung.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/core.h | 2 +
> drivers/cxl/core/pci.c | 62 +++++++++++++++++++++++++++++++
> drivers/cxl/core/regs.c | 1 -
> drivers/cxl/cxl.h | 2 -
> drivers/cxl/cxlpci.h | 2 +
> drivers/cxl/pci.c | 70 -----------------------------------
> include/cxl/pci.h | 13 +++++++
> tools/testing/cxl/Kbuild | 1 -
> tools/testing/cxl/test/mock.c | 17 ---------
> 9 files changed, 79 insertions(+), 91 deletions(-)
Yeah, simple enough:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
@ 2025-05-21 18:23 ` Dan Williams
2025-05-22 9:45 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 18:23 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 has some mandatory capabilities which are optional for Type2.
>
> In order to support same register/capability discovery code for both
> types, avoid any assumption about what capabilities should be there, and
> export the capabilities found for the caller doing the capabilities
> check based on the expected ones.
>
> Add a function for facilitating the report of capabilities missing the
> expected ones.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/pci.c | 41 +++++++++++++++++++++++++++++++++++++++--
> drivers/cxl/core/port.c | 8 ++++----
> drivers/cxl/core/regs.c | 38 ++++++++++++++++++++++----------------
> drivers/cxl/cxl.h | 6 +++---
> drivers/cxl/cxlpci.h | 2 +-
> drivers/cxl/pci.c | 24 +++++++++++++++++++++---
> include/cxl/cxl.h | 24 ++++++++++++++++++++++++
> 7 files changed, 114 insertions(+), 29 deletions(-)
[..]
>
> @@ -434,7 +449,7 @@ static void cxl_unmap_regblock(struct cxl_register_map *map)
> map->base = NULL;
> }
>
> -static int cxl_probe_regs(struct cxl_register_map *map)
> +static int cxl_probe_regs(struct cxl_register_map *map, unsigned long *caps)
> {
> struct cxl_component_reg_map *comp_map;
> struct cxl_device_reg_map *dev_map;
> @@ -444,21 +459,12 @@ static int cxl_probe_regs(struct cxl_register_map *map)
> switch (map->reg_type) {
> case CXL_REGLOC_RBI_COMPONENT:
> comp_map = &map->component_map;
> - cxl_probe_component_regs(host, base, comp_map);
> + cxl_probe_component_regs(host, base, comp_map, caps);
> dev_dbg(host, "Set up component registers\n");
> break;
> case CXL_REGLOC_RBI_MEMDEV:
> dev_map = &map->device_map;
> - cxl_probe_device_regs(host, base, dev_map);
> - if (!dev_map->status.valid || !dev_map->mbox.valid ||
> - !dev_map->memdev.valid) {
> - dev_err(host, "registers not found: %s%s%s\n",
> - !dev_map->status.valid ? "status " : "",
> - !dev_map->mbox.valid ? "mbox " : "",
> - !dev_map->memdev.valid ? "memdev " : "");
> - return -ENXIO;
> - }
> -
> + cxl_probe_device_regs(host, base, dev_map, caps);
I thought we talked about this before [1] , i.e. that there is no need
to pass @caps through the stack.
[1]: http://lore.kernel.org/678b06a26cddc_20fa29492@dwillia2-xfh.jf.intel.com.notmuch
Here is the proposal that moves this simple check to the leaf consumer
where it belongs vs plumbing @caps everywhere, note how this removes
burden from the core, not add burden to support more use cases:
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index ecdb22ae6952..5f511cf4bab0 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -434,7 +434,7 @@ static void cxl_unmap_regblock(struct cxl_register_map *map)
map->base = NULL;
}
-static int cxl_probe_regs(struct cxl_register_map *map)
+static void cxl_probe_regs(struct cxl_register_map *map)
{
struct cxl_component_reg_map *comp_map;
struct cxl_device_reg_map *dev_map;
@@ -450,22 +450,11 @@ static int cxl_probe_regs(struct cxl_register_map *map)
case CXL_REGLOC_RBI_MEMDEV:
dev_map = &map->device_map;
cxl_probe_device_regs(host, base, dev_map);
- if (!dev_map->status.valid || !dev_map->mbox.valid ||
- !dev_map->memdev.valid) {
- dev_err(host, "registers not found: %s%s%s\n",
- !dev_map->status.valid ? "status " : "",
- !dev_map->mbox.valid ? "mbox " : "",
- !dev_map->memdev.valid ? "memdev " : "");
- return -ENXIO;
- }
-
dev_dbg(host, "Probing device registers...\n");
break;
default:
break;
}
-
- return 0;
}
int cxl_setup_regs(struct cxl_register_map *map)
@@ -476,10 +465,10 @@ int cxl_setup_regs(struct cxl_register_map *map)
if (rc)
return rc;
- rc = cxl_probe_regs(map);
+ cxl_probe_regs(map);
cxl_unmap_regblock(map);
- return rc;
+ return 0;
}
EXPORT_SYMBOL_NS_GPL(cxl_setup_regs, "CXL");
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index d5447c7d540f..cfe4b5fa948a 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -945,6 +945,16 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
return rc;
+ /* Check for mandatory CXL Memory Class Device capabilities */
+ if (!map.device_map.status.valid || !map.device_map.mbox.valid ||
+ !map.device_map.memdev.valid) {
+ dev_err(&pdev->dev, "registers not found: %s%s%s\n",
+ !map.device_map.status.valid ? "status " : "",
+ !map.device_map.mbox.valid ? "mbox " : "",
+ !map.device_map.memdev.valid ? "memdev " : "");
+ return -ENXIO;
+ }
+
rc = cxl_map_device_regs(&map, &cxlds->regs.device_regs);
if (rc)
return rc;
^ permalink raw reply related [flat|nested] 84+ messages in thread
* Re: [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
@ 2025-05-21 18:28 ` Dan Williams
2025-05-22 9:52 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 18:28 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Create a new function for a type2 device initialising
> cxl_dev_state struct regarding cxl regs setup and mapping.
>
> Export the capabilities found for checking them against the
> expected ones by the driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/pci.c | 62 ++++++++++++++++++++++++++++++++++++++++++
> include/cxl/cxl.h | 3 ++
> 2 files changed, 65 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index e2b6420592de..b05c6e64bfe2 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -1095,6 +1095,68 @@ int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
> }
> EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
>
> +static int cxl_pci_accel_setup_memdev_regs(struct pci_dev *pdev,
> + struct cxl_dev_state *cxlds,
> + unsigned long *caps)
> +{
> + struct cxl_register_map map;
> + int rc;
> +
> + rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map, caps);
> + /*
> + * This call can return -ENODEV if regs not found. This is not an error
> + * for Type2 since these regs are not mandatory. If they do exist then
> + * mapping them should not fail. If they should exist, it is with driver
> + * calling cxl_pci_check_caps() where the problem should be found.
> + */
The driver should know in advance if calling:
cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
...will fail. Put that logic where it belongs in the probe function of
the type-2 driver directly. This helper is not helping, it is just
obfuscating.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-14 13:27 ` [PATCH v16 06/22] sfc: make regs setup with checking and set media ready alejandro.lucero-palau
@ 2025-05-21 18:34 ` Dan Williams
2025-05-22 10:07 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 18:34 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Zhi Wang, Edward Cree,
Jonathan Cameron, Ben Cheatham
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl code for registers discovery and mapping.
>
> Validate capabilities found based on those registers against expected
> capabilities.
>
> Set media ready explicitly as there is no means for doing so without
> a mailbox and without the related cxl register, not mandatory for type2.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Reviewed-by: Zhi Wang <zhi@nvidia.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 753d5b7d49b6..e94af8bf3a79 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -19,10 +19,13 @@
>
> int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> + DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
> + DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
> struct efx_nic *efx = &probe_data->efx;
> struct pci_dev *pci_dev = efx->pci_dev;
> struct efx_cxl *cxl;
> u16 dvsec;
> + int rc;
>
> probe_data->cxl_pio_initialised = false;
>
> @@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> if (!cxl)
> return -ENOMEM;
>
> + set_bit(CXL_DEV_CAP_HDM, expected);
> + set_bit(CXL_DEV_CAP_RAS, expected);
> +
> + rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
> + if (rc) {
> + pci_err(pci_dev, "CXL accel setup regs failed");
> + return rc;
> + }
> +
> + /*
> + * Checking mandatory caps are there as, at least, a subset of those
> + * found.
> + */
> + if (cxl_check_caps(pci_dev, expected, found))
> + return -ENXIO;
This all looks like an obfuscated way of writing:
cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
/* sfc cxl expectations not met */
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
@ 2025-05-21 18:47 ` Dan Williams
2025-05-22 10:24 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 18:47 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
> memdev state params which end up being used for DMA initialization.
>
> Allow a Type2 driver to initialize DPA simply by giving the size of its
> volatile and/or non-volatile hardware partitions.
>
> Export cxl_dpa_setup as well for initializing those added DPA partitions
> with the proper resources.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/mbox.c | 26 ++++++++++++++++++++------
> drivers/cxl/cxlmem.h | 13 -------------
> include/cxl/cxl.h | 14 ++++++++++++++
> 3 files changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index ab994d459f46..b14cfc6e3dba 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -1284,6 +1284,22 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
> info->nr_partitions++;
> }
>
> +/**
> + * cxl_mem_dpa_init: initialize dpa by a driver without a mailbox.
> + *
> + * @info: pointer to cxl_dpa_info
> + * @volatile_bytes: device volatile memory size
> + * @persistent_bytes: device persistent memory size
> + */
> +void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
> + u64 persistent_bytes)
I struggle to imagine a Type-2 device with PMEM, or needing anything
more complicated than a single volatile range. No need to pre-enable
something that may never exist.
Lets just have a cxl_set_capacity() for the simple / common case:
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
{
struct cxl_dpa_info range_info = { 0 };
add_part(info, 0, capacity, CXL_PARTMODE_RAM);
return cxl_dpa_setup(cxlds, &range_info);
}
...then there is no need to move 'struct cxl_dpa_info' to a public
header, or require type-2 drivers to pass in a pointless PMEM capacity.
If more complicated devices show up later the code can always be made
more sophisticated at that point.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 09/22] cxl: Prepare memdev creation for type2
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
@ 2025-05-21 18:49 ` Dan Williams
1 sibling, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-21 18:49 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
> creating a memdev leading to problems when obtaining cxl_memdev_state
> references from a CXL_DEVTYPE_DEVMEM type.
>
> Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
> support.
>
> Make devm_cxl_add_memdev accessible from a accel driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/memdev.c | 15 +++++++++++++--
> drivers/cxl/cxlmem.h | 2 --
> drivers/cxl/mem.c | 25 +++++++++++++++++++------
> include/cxl/cxl.h | 2 ++
> 4 files changed, 34 insertions(+), 10 deletions(-)
Makes sense
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
@ 2025-05-21 19:31 ` Dan Williams
2025-05-22 10:56 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 19:31 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> CXL region creation involves allocating capacity from device DPA
> (device-physical-address space) and assigning it to decode a given HPA
> (host-physical-address space). Before determining how much DPA to
> allocate the amount of available HPA must be determined. Also, not all
> HPA is created equal, some specifically targets RAM, some target PMEM,
> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
> is host-only (HDM-H).
>
> In order to support Type2 CXL devices, wrap all of those concerns into
> an API that retrieves a root decoder (platform CXL window) that fits the
> specified constraints and the capacity available for a new region.
>
> Add a complementary function for releasing the reference to such root
> decoder.
>
> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 166 ++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 3 +
> include/cxl/cxl.h | 11 +++
> 3 files changed, 180 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index c3f4dc244df7..4affa1f22fd1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -695,6 +695,172 @@ static int free_hpa(struct cxl_region *cxlr)
> return 0;
> }
>
> +struct cxlrd_max_context {
> + struct device * const *host_bridges;
> + int interleave_ways;
> + unsigned long flags;
> + resource_size_t max_hpa;
> + struct cxl_root_decoder *cxlrd;
> +};
> +
> +static int find_max_hpa(struct device *dev, void *data)
> +{
> + struct cxlrd_max_context *ctx = data;
> + struct cxl_switch_decoder *cxlsd;
> + struct cxl_root_decoder *cxlrd;
> + struct resource *res, *prev;
> + struct cxl_decoder *cxld;
> + resource_size_t max;
> + int found = 0;
> +
> + if (!is_root_decoder(dev))
> + return 0;
> +
> + cxlrd = to_cxl_root_decoder(dev);
> + cxlsd = &cxlrd->cxlsd;
> + cxld = &cxlsd->cxld;
> +
> + /*
> + * Flags are single unsigned longs. As CXL_DECODER_F_MAX is less than
> + * 32 bits, the bitmap functions can be used.
> + */
> + if (!bitmap_subset(&ctx->flags, &cxld->flags, CXL_DECODER_F_MAX)) {
> + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
> + cxld->flags, ctx->flags);
> + return 0;
> + }
> +
> + for (int i = 0; i < ctx->interleave_ways; i++) {
> + for (int j = 0; j < ctx->interleave_ways; j++) {
> + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
> + found++;
> + break;
> + }
> + }
> + }
> +
> + if (found != ctx->interleave_ways) {
> + dev_dbg(dev,
> + "Not enough host bridges. Found %d for %d interleave ways requested\n",
> + found, ctx->interleave_ways);
> + return 0;
> + }
> +
> + /*
> + * Walk the root decoder resource range relying on cxl_region_rwsem to
> + * preclude sibling arrival/departure and find the largest free space
> + * gap.
> + */
> + lockdep_assert_held_read(&cxl_region_rwsem);
> + res = cxlrd->res->child;
> +
> + /* With no resource child the whole parent resource is available */
> + if (!res)
> + max = resource_size(cxlrd->res);
> + else
> + max = 0;
> +
> + for (prev = NULL; res; prev = res, res = res->sibling) {
> + struct resource *next = res->sibling;
> + resource_size_t free = 0;
> +
> + /*
> + * Sanity check for preventing arithmetic problems below as a
> + * resource with size 0 could imply using the end field below
> + * when set to unsigned zero - 1 or all f in hex.
> + */
> + if (prev && !resource_size(prev))
> + continue;
> +
> + if (!prev && res->start > cxlrd->res->start) {
> + free = res->start - cxlrd->res->start;
> + max = max(free, max);
> + }
> + if (prev && res->start > prev->end + 1) {
> + free = res->start - prev->end + 1;
> + max = max(free, max);
> + }
> + if (next && res->end + 1 < next->start) {
> + free = next->start - res->end + 1;
> + max = max(free, max);
> + }
> + if (!next && res->end + 1 < cxlrd->res->end + 1) {
> + free = cxlrd->res->end + 1 - res->end + 1;
> + max = max(free, max);
> + }
> + }
> +
> + dev_dbg(CXLRD_DEV(cxlrd), "found %pa bytes of free space\n", &max);
> + if (max > ctx->max_hpa) {
> + if (ctx->cxlrd)
> + put_device(CXLRD_DEV(ctx->cxlrd));
> + get_device(CXLRD_DEV(cxlrd));
> + ctx->cxlrd = cxlrd;
> + ctx->max_hpa = max;
> + }
> + return 0;
> +}
> +
> +/**
> + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
> + * @cxlmd: the CXL memory device with an endpoint that is mapped by the returned
> + * decoder
> + * @interleave_ways: number of entries in @host_bridges
> + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
> + * @max_avail_contig: output parameter of max contiguous bytes available in the
> + * returned decoder
> + *
> + * Returns a pointer to a struct cxl_root_decoder
> + *
> + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
> + * in (@max_avail_contig))' is a point in time snapshot. If by the time the
> + * caller goes to use this root decoder's capacity the capacity is reduced then
> + * caller needs to loop and retry.
> + *
> + * The returned root decoder has an elevated reference count that needs to be
> + * put with cxl_put_root_decoder(cxlrd).
> + */
> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
> + int interleave_ways,
> + unsigned long flags,
> + resource_size_t *max_avail_contig)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxlrd_max_context ctx = {
> + .host_bridges = &endpoint->host_bridge,
> + .flags = flags,
> + };
> + struct cxl_port *root_port;
> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
> +
> + if (!is_cxl_endpoint(endpoint)) {
> + dev_dbg(&endpoint->dev, "hpa requestor is not an endpoint\n");
> + return ERR_PTR(-EINVAL);
> + }
This seems confused because the @cxlmd argument is always an endpoint.
The dynamic state is whether that endpoint is currently connected to the
CXL HDM decode hierarchy, or not.
That state changes relative to whether @cxlmd is bound to the cxl_mem
driver. So the above check is also racy.
I think this wants to be:
guard(device)(&cxlmd->dev);
if (!cxlmd->endpoint)
return -ENXIO;
> + if (!root) {
> + dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
> + return ERR_PTR(-ENXIO);
> + }
> +
> + root_port = &root->port;
> + scoped_guard(rwsem_read, &cxl_region_rwsem)
> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
> +
> + if (!ctx.cxlrd)
> + return ERR_PTR(-ENOMEM);
> +
> + *max_avail_contig = ctx.max_hpa;
> + return ctx.cxlrd;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
> +
> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
> +{
> + put_device(CXLRD_DEV(cxlrd));
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
I think this cxl_put_root_decoder() requirement is manageable for the
for the initial merge, but it is not something to commit to long term.
The device's HPA freespace and CXL HDM should be freed at cxl_mem detach
time, but that will require more infrastructure.
The reference does not stop the root decoder from being unregistered and
it is clearly broken to allow it to be unregistered while drivers have
pending allocations.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space
2025-05-14 13:27 ` [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
@ 2025-05-21 19:56 ` Dan Williams
2025-06-06 12:59 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 19:56 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Asking for available HPA space is the previous step to try to obtain
> an HPA range suitable to accel driver purposes.
>
> Add this call to efx cxl initialization.
>
> Make sfc cxl build dependent on CXL region.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/Kconfig | 1 +
> drivers/net/ethernet/sfc/efx_cxl.c | 19 +++++++++++++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
> index 979f2801e2a8..e959d9b4f4ce 100644
> --- a/drivers/net/ethernet/sfc/Kconfig
> +++ b/drivers/net/ethernet/sfc/Kconfig
> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
> config SFC_CXL
> bool "Solarflare SFC9100-family CXL support"
> depends on SFC && CXL_BUS >= SFC
> + depends on CXL_REGION
> default SFC
> help
> This enables SFC CXL support if the kernel is configuring CXL for
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 53ff97ad07f5..5635672b3fc3 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -26,6 +26,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> struct cxl_dpa_info sfc_dpa_info = {
> .size = EFX_CTPIO_BUFFER_SIZE
> };
> + resource_size_t max_size;
> struct efx_cxl *cxl;
> u16 dvsec;
> int rc;
> @@ -84,6 +85,22 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> + cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
> + CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
> + &max_size);
> +
> + if (IS_ERR(cxl->cxlrd)) {
> + pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
> + return PTR_ERR(cxl->cxlrd);
> + }
This is a simple enough model, but it does mean that if async-driver
loading causes this driver to load before cxl_acpi or cxl_mem have
completed their init work, then it will die here.
It is also worth noting that nothing stops cxl_mem or cxl_acpi from
detaching immediately after passing the above check. So more work is
needed here (likely post-merge) to revoke and invalidate usage of that
freespace when that happens.
Otherwise you can do something like:
Driver1 Driver2 Notes
cxl_get_hpa_freespace() "Driver1 gets rangeX"
--- cxl_acpi unloaded --- "forgets rangeX was assigned"
--- cxl_acpi reloaded ---
cxl_get_hpa_freespace() "Driver2 gets rangeX"
use_cxl(rangeX) use_cxl(rangeX) "...uh oh"
So longer term there needs to be notification back to the creator of the
memdev to require it to handle cleaning up when the CXL topology is torn
down either physically or logically.
To date the CXL subsystem has not reset decoders on unload because it
needs to handle coordinating with HDM decode established by platform
firmware. Type-2 driver however should be prepared to have their CXL
range revoked at any moment.
The Type-3 case handles this because cxl_mem is the driver itself, for
Type-2 that driver wants to coordinate with cxl_mem on these events. To
me that looks like cxl_mem error handler operation callbacks.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-05-20 2:39 ` Alison Schofield
@ 2025-05-21 20:23 ` Dan Williams
2025-06-06 13:09 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 20:23 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Region creation involves finding available DPA (device-physical-address)
> capacity to map into HPA (host-physical-address) space.
>
> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
> that tries to allocate the DPA memory the driver requires to operate.The
> memory requested should not be bigger than the max available HPA obtained
> previously with cxl_get_hpa_freespace.
>
> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
> include/cxl/cxl.h | 5 +++
> 2 files changed, 91 insertions(+)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 70cae4ebf8a4..500df2deceef 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -3,6 +3,7 @@
> #include <linux/seq_file.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> +#include <cxl/cxl.h>
>
> #include "cxlmem.h"
> #include "core.h"
> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
> return base;
> }
>
> +/**
> + * cxl_dpa_free - release DPA (Device Physical Address)
> + *
> + * @cxled: endpoint decoder linked to the DPA
> + *
> + * Returns 0 or error.
> + */
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_port *port = cxled_to_port(cxled);
> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
> devm_cxl_dpa_release(cxled);
> return 0;
> }
> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>
> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> enum cxl_partition_mode mode)
> @@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
> }
>
> +static int find_free_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_port *port;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + port = cxled_to_port(cxled);
> +
> + if (cxled->cxld.id != port->hdm_end + 1)
> + return 0;
> +
> + return 1;
> +}
> +
> +/**
> + * cxl_request_dpa - search and reserve DPA given input constraints
> + * @cxlmd: memdev with an endpoint port with available decoders
> + * @mode: DPA operation mode (ram vs pmem)
> + * @alloc: dpa size required
> + *
> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
> + *
> + * Given that a region needs to allocate from limited HPA capacity it
> + * may be the case that a device has more mappable DPA capacity than
> + * available HPA. The expectation is that @alloc is a driver known
> + * value based on the device capacity but it could not be available
> + * due to HPA constraints.
> + *
> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
> + * reserved, or an error pointer. The caller is also expected to own the
> + * lifetime of the memdev registration associated with the endpoint to
> + * pin the decoder registered as well.
> + */
> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
> + enum cxl_partition_mode mode,
> + resource_size_t alloc)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct device *cxled_dev;
> + int rc;
> +
> + if (!IS_ALIGNED(alloc, SZ_256M))
> + return ERR_PTR(-EINVAL);
> +
> + down_read(&cxl_dpa_rwsem);
> + cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> + up_read(&cxl_dpa_rwsem);
In another effort [1] I am trying to get rid of all explicit unlock
management of cxl_dpa_rwsem and cxl_region_rwsem, and ultimately get rid
of all "goto" use in the CXL core.
[1]: http://lore.kernel.org/20250507072145.3614298-1-dan.j.williams@intel.com
So that conversion here would be:
DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&cxled->cxld.dev))
struct cxl_endpoint_decoder *cxl_find_free_decoder(struct cxl_memdev *cxlmd)
{
struct device *dev;
scoped_guard(rwsem_read, &cxl_dpa_rwsem)
dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
if (dev)
return to_cxl_endpoint_decoder(dev);
return NULL;
}
...and then:
struct cxl_endpoint_decoder *cxled __free(put_cxled) = cxl_find_free_decoder(cxlmd);
> +
> + if (!cxled_dev)
> + return ERR_PTR(-ENXIO);
> +
> + cxled = to_cxl_endpoint_decoder(cxled_dev);
> +
> + if (!cxled) {
> + rc = -ENODEV;
> + goto err;
> + }
> +
> + rc = cxl_dpa_set_part(cxled, mode);
> + if (rc)
> + goto err;
> +
> + rc = cxl_dpa_alloc(cxled, alloc);
The current user of this interface is sysfs. The expecation there is
that if 2 userspace threads are racing to allocate DPA space, the kernel
will protect itself and not get confused, but the result will be that
one thread loses the race and needs to redo its allocation.
That's not an interface that the kernel can support, so there needs to
be some locking to enforce that 2 threads racing cxl_request_dpa() each
end up with independent allocations. That likely needs to be a
syncrhonization primitive over the entire process due to the way that
CXL requires in-order allocation of DPA and HPA. Effectively you need to
complete the entire HPA allocatcion, DPA allocation, and decoder
programming in one atomic unit.
I think to start since there is only 1 Type-2 driver in the kernel and
it's only use case is single-threaded setup this is not yet an immediate
problem.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 14/22] sfc: get endpoint decoder
2025-05-14 13:27 ` [PATCH v16 14/22] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-05-21 20:28 ` Dan Williams
0 siblings, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-21 20:28 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 5635672b3fc3..20db9aa382ec 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -98,18 +98,33 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> pci_err(pci_dev, "%s: not enough free HPA space %pap < %u\n",
> __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
> cxl_put_root_decoder(cxl->cxlrd);
> - return -ENOSPC;
> + rc = -ENOSPC;
> + goto sfc_put_decoder;
> + }
> +
> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
> + EFX_CTPIO_BUFFER_SIZE);
> + if (IS_ERR(cxl->cxled)) {
> + pci_err(pci_dev, "CXL accel request DPA failed");
> + rc = PTR_ERR(cxl->cxled);
> + goto sfc_put_decoder;
> }
>
> probe_data->cxl = cxl;
>
> return 0;
> +
> +sfc_put_decoder:
> + cxl_put_root_decoder(cxl->cxlrd);
> + return rc;
> }
>
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> - if (probe_data->cxl)
> + if (probe_data->cxl) {
> + cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
Again there is nothing magic about a reference count that keeps the
allocation valid until this point. The endpoint decoder could have long
been unregistered before this point. All the reference allows you to be
sure is that the allocation backing the object is still there, but
cxl_dpa_free() is probably going to crash in cxled_to_port() or
devm_cxl_dpa_release().
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 18/22] cxl: Allow region creation by type2 drivers
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-05-20 2:37 ` Alison Schofield
@ 2025-05-21 20:45 ` Dan Williams
2025-06-06 13:27 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 20:45 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Creating a CXL region requires userspace intervention through the cxl
> sysfs files. Type2 support should allow accelerator drivers to create
> such cxl region from kernel code.
>
> Adding that functionality and integrating it with current support for
> memory expanders.
>
> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/region.c | 140 +++++++++++++++++++++++++++++++++++---
> drivers/cxl/port.c | 5 +-
> include/cxl/cxl.h | 4 ++
> 3 files changed, 140 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 4113ee6daec9..f82da914d125 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2316,6 +2316,21 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
> return rc;
> }
>
> +/**
> + * cxl_accel_region_detach - detach a region from a Type2 device
> + *
> + * @cxled: Type2 endpoint decoder to detach the region from.
> + *
> + * Returns 0 or error.
> + */
> +int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled)
> +{
> + guard(rwsem_write)(&cxl_region_rwsem);
> + cxled->part = -1;
> + return cxl_region_detach(cxled);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_accel_region_detach, "CXL");
There's nothing "accel" about the above sequence, it is nearly identical
to cxl_decoder_kill_region().
In general there does not need to be a parallel universe of "cxl_accel_"
helpers for Type-2, just use existing infrastructure and maybe enlighten
it a bit to accommodate a Type-2 nuance.
> +
> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
> {
> down_write(&cxl_region_rwsem);
> @@ -2822,6 +2837,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
> return to_cxl_region(region_dev);
> }
>
> +static void drop_region(struct cxl_region *cxlr)
> +{
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + struct cxl_port *port = cxlrd_to_port(cxlrd);
> +
> + devm_release_action(port->uport_dev, unregister_region, cxlr);
> +}
> +
> static ssize_t delete_region_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> @@ -3526,14 +3549,12 @@ static int __construct_region(struct cxl_region *cxlr,
> return 0;
> }
>
> -/* Establish an empty region covering the given HPA range */
> -static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> - struct cxl_endpoint_decoder *cxled)
> +static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder *cxled)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> - struct cxl_port *port = cxlrd_to_port(cxlrd);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - int rc, part = READ_ONCE(cxled->part);
> + int part = READ_ONCE(cxled->part);
> struct cxl_region *cxlr;
>
> do {
> @@ -3542,13 +3563,23 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> cxled->cxld.target_type);
> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>
> - if (IS_ERR(cxlr)) {
> + if (IS_ERR(cxlr))
> dev_err(cxlmd->dev.parent,
> "%s:%s: %s failed assign region: %ld\n",
> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> __func__, PTR_ERR(cxlr));
> - return cxlr;
> - }
> + return cxlr;
> +};
> +
> +/* Establish an empty region covering the given HPA range */
> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder *cxled)
> +{
> + struct cxl_port *port = cxlrd_to_port(cxlrd);
> + struct cxl_region *cxlr;
> + int rc;
> +
> + cxlr = construct_region_begin(cxlrd, cxled);
>
> rc = __construct_region(cxlr, cxlrd, cxled);
> if (rc) {
> @@ -3559,6 +3590,99 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
> return cxlr;
> }
>
> +static struct cxl_region *
> +__construct_new_region(struct cxl_root_decoder *cxlrd,
> + struct cxl_endpoint_decoder *cxled, int ways)
What is the point of an @ways argument when @cxled is not an array? It
was an array in the original proposal. Recall that this interface needs
to be useful not only to Type-2 but also the nascent CXL PMEM case which
will likely need to create interleave CXL PMEM regions from label data.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
@ 2025-05-21 20:49 ` Dan Williams
2025-06-06 13:39 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 20:49 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> By definition a type2 cxl device will use the host managed memory for
> specific functionality, therefore it should not be available to other
> uses. However, a dax interface could be just good enough in some cases.
>
> Add a flag to a cxl region for specifically state to not create a dax
> device. Allow a Type2 driver to set that flag at region creation time.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
What was wrong with the original proposal?
+
+ /*
+ * HDM-D[B] (device-memory) regions have accelerator
+ * specific usage, skip device-dax registration.
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM)
+ return 0;
I really do not want the ABI presentation policy layer leaking that deep into
the region creation flow. Another way to determine this is if devices
hosting the region are not driver by the generic CXL device memory class
driver 'cxl_pci'.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 20/22] sfc: create cxl region
2025-05-14 13:27 ` [PATCH v16 20/22] sfc: create cxl region alejandro.lucero-palau
@ 2025-05-21 21:01 ` Dan Williams
2025-06-06 13:44 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 21:01 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for creating a region using the endpoint decoder related to
> a DPA range specifying no DAX device should be created.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 20db9aa382ec..960293a04ed3 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -110,10 +110,19 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> goto sfc_put_decoder;
> }
>
> + cxl->efx_region = cxl_create_region(cxl->cxlrd, cxl->cxled, 1, true);
> + if (IS_ERR(cxl->efx_region)) {
> + pci_err(pci_dev, "CXL accel create region failed");
> + rc = PTR_ERR(cxl->efx_region);
> + goto err_region;
> + }
> +
> probe_data->cxl = cxl;
>
> return 0;
>
> +err_region:
> + cxl_dpa_free(cxl->cxled);
> sfc_put_decoder:
> cxl_put_root_decoder(cxl->cxlrd);
> return rc;
> @@ -122,6 +131,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> void efx_cxl_exit(struct efx_probe_data *probe_data)
> {
> if (probe_data->cxl) {
> + cxl_accel_region_detach(probe_data->cxl->cxled);
Here is more late magic hoping that cxl_accel_region_detach() can
actually find something useful to do at this point. I notice that this
series has dropped the cxl_acquire_endpoint() proposal which at least
guaranteed a consistent state of the world throughout this whole
process.
Did I miss the discussion where that approach was abandoned?
The idea would be that at setup time you do:
add_memdev()
acquire_endpoint()
register_hdm_error_handlers()
get_hpa()
get_dpa()
create_region()
release_endpoint()
...where that new register_hdm_error_handlers() is what coordinates
cleaning up everything the type-2 driver cares about upon the memdev
being detached from the CXL topology.
Then your efx_cxl_exit() is automatically handled by:
del_memdev()
...which includes detaching the memdev from the cxl topology.
> cxl_dpa_free(probe_data->cxl->cxled);
> cxl_put_root_decoder(probe_data->cxl->cxlrd);
Otherwise, these long held references are not buying you anything but
the ability to determine "whoops, should have let go of these
resources a long time ago, everything I needed for cleanup is now in a
defunct state".
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 21/22] cxl: Add function for obtaining region range
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-05-20 2:35 ` Alison Schofield
@ 2025-05-21 21:31 ` Dan Williams
2025-06-06 14:03 ` Alejandro Lucero Palau
1 sibling, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 21:31 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> A CXL region struct contains the physical address to work with.
>
> Type2 drivers can create a CXL region but have not access to the
> related struct as it is defined as private by the kernel CXL core.
> Add a function for getting the cxl region range to be used for mapping
> such memory range by a Type2 driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
> include/cxl/cxl.h | 2 ++
> 2 files changed, 25 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 06647bae210f..9b7c6b8304d6 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2726,6 +2726,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> return ERR_PTR(rc);
> }
>
> +/**
> + * cxl_get_region_range - obtain range linked to a CXL region
> + *
> + * @region: a pointer to struct cxl_region
> + * @range: a pointer to a struct range to be set
> + *
> + * Returns 0 or error.
> + */
> +int cxl_get_region_range(struct cxl_region *region, struct range *range)
> +{
> + if (WARN_ON_ONCE(!region))
> + return -ENODEV;
> +
> + if (!region->params.res)
> + return -ENOSPC;
> +
> + range->start = region->params.res->start;
> + range->end = region->params.res->end;
Region params are only consistent under cxl_region_rwsem. Whatever is
consuming this will want to have that consistent snapshot and some
coordination with the region shutdown / de-commit flow.
This again raises the question, what do you expect to happen after the
->remove(&cxlmd->dev) event?
For Type-3 the expectation is leave all the decoders in place and
reassemble the region from past hardware settings (as if the decode
range had been established by platform firmware).
Another model could be "never trust an existing decoder and always reset
the configuration when the driver loads. That would also involve walking
the topology to reset any upstream switch decoders that were decoding
the old configuration.
The current model in these patches is "unwind nothing at cxl_mem detach
time, hope that probe_data->cxl->cxlmd is attached immediately upon
return from devm_add_cxl_memdev(), hope that it remains attached until
efx_cxl_exit() runs, and always assume a fresh "from scratch" HDM decode
configuration at efx_cxl_init() time".
I do often cringe at the complexity of the CXL subsystem, but it is all
complexity that the CXL programming model mandates. Specifically, CXL
window capacity being a dynamically assigned limited resource that needs
runtime re-configuration across multiple devices/switches, and resources
that can interleave host-bridges and endpoints. Compare that to PCIe
that mostly statically assigns MMIO resources throughout the topology,
rarely needs to reassign that, and never interleaves.
Yes, there is some self-inflicted complexity in the CXL subsystem
introduced by allowing drivers like cxl_mem and cxl_acpi to logically
detach at runtime. However given cxl_mem needs to be prepared for
physical detachment there is no simple escape from handling the "dynamic
CXL HDM decode teardown" problem.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 22/22] sfc: support pio mapping based on cxl
2025-05-14 13:27 ` [PATCH v16 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2025-05-21 21:48 ` Dan Williams
2025-05-23 1:13 ` Edward Cree
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-21 21:48 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> With a device supporting CXL and successfully initialised, use the cxl
> region to map the memory range and use this mapping for PIO buffers.
I was really hoping to get to the end of this patchset and read the
compelling story about why anyone with this device would be clamoring
for the CXL support to be lit up. One of the best ways to get
maintainers to accept complexity is to offset the complexity with
compelling end user impact. So, every time someone dips into
drivers/cxl/ and gets discouraged by "ugh, the complexity" they can
refer to this patch and be reminded "oh, the benefit!".
Maybe that would be more obvious to me if I knew what a "PIO buffer" was
used for currently, but some more words about the why of all this would
help clarify if the design is making the right complexity vs benefit
tradeoffs.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/net/ethernet/sfc/ef10.c | 50 +++++++++++++++++++++++----
> drivers/net/ethernet/sfc/efx_cxl.c | 18 ++++++++++
> drivers/net/ethernet/sfc/net_driver.h | 2 ++
> drivers/net/ethernet/sfc/nic.h | 3 ++
> 4 files changed, 66 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index 47349c148c0c..1a13fdbbc1b3 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -24,6 +24,7 @@
> #include <linux/wait.h>
> #include <linux/workqueue.h>
> #include <net/udp_tunnel.h>
> +#include "efx_cxl.h"
>
> /* Hardware control for EF10 architecture including 'Huntington'. */
>
> @@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
>
> static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
> {
> - MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
> + MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
> struct efx_ef10_nic_data *nic_data = efx->nic_data;
> size_t outlen;
> int rc;
> @@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
> efx->num_mac_stats);
> }
>
> + if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
> + nic_data->datapath_caps3 = 0;
> + else
> + nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
> + GET_CAPABILITIES_V7_OUT_FLAGS3);
> +
> return 0;
> }
>
> @@ -919,6 +926,9 @@ static void efx_ef10_forget_old_piobufs(struct efx_nic *efx)
> static void efx_ef10_remove(struct efx_nic *efx)
> {
> struct efx_ef10_nic_data *nic_data = efx->nic_data;
> +#ifdef CONFIG_SFC_CXL
> + struct efx_probe_data *probe_data;
> +#endif
Not my driver to maintain, but the ifdefery really does not belong here...
> int rc;
>
> #ifdef CONFIG_SFC_SRIOV
> @@ -949,7 +959,12 @@ static void efx_ef10_remove(struct efx_nic *efx)
>
> efx_mcdi_rx_free_indir_table(efx);
>
> +#ifdef CONFIG_SFC_CXL
> + probe_data = container_of(efx, struct efx_probe_data, efx);
> + if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
> +#else
> if (nic_data->wc_membase)
> +#endif
...in the header do something like:
#ifdef CONFIG_SFC_CXL
static inline void unmap_wc_membase(nic_data)
{
struct efx_probe_data *probe_data = container_of(efx, struct efx_probe_data, efx);
if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
iounmap(nic_data->wc_membase);
}
#else
static inline void unmap_wc_membase(nic_data)
{
if (nic_data->wc_membase)
iounmap(nic_data->wc_membase);
}
#endif
> iounmap(nic_data->wc_membase);
>
> rc = efx_mcdi_free_vis(efx);
> @@ -1140,6 +1155,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> unsigned int channel_vis, pio_write_vi_base, max_vis;
> struct efx_ef10_nic_data *nic_data = efx->nic_data;
> unsigned int uc_mem_map_size, wc_mem_map_size;
> +#ifdef CONFIG_SFC_CXL
> + struct efx_probe_data *probe_data;
> +#endif
> void __iomem *membase;
> int rc;
>
> @@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
> iounmap(efx->membase);
> efx->membase = membase;
>
> - /* Set up the WC mapping if needed */
> - if (wc_mem_map_size) {
> + if (!wc_mem_map_size)
> + goto skip_pio;
> +
> + /* Set up the WC mapping */
> +
> +#ifdef CONFIG_SFC_CXL
> + probe_data = container_of(efx, struct efx_probe_data, efx);
> + if ((nic_data->datapath_caps3 &
> + (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
> + probe_data->cxl_pio_initialised) {
> + /* Using PIO through CXL mapping? */
> + nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
> + (pio_write_vi_base * efx->vi_stride +
> + ER_DZ_TX_PIOBUF - uc_mem_map_size);
> + probe_data->cxl_pio_in_use = true;
> + } else
> +#endif
Looks like another static inline helper.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-21 17:12 ` Dan Williams
@ 2025-05-22 8:49 ` Alejandro Lucero Palau
2025-05-22 19:41 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 8:49 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron, Edward Cree
On 5/21/25 18:12, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
> [..]
>>>> +void efx_cxl_exit(struct efx_probe_data *probe_data)
>>>> +{
>>> So this is empty which means it leaks the cxl_dev_state_create()
>>> allocation, right?
>>
>> Yes, because I was wrongly relying on devres ...
>>
>>
>> Previous patchsets were doing the explicit release here.
>>
>>
>> Your suggestion below relies on adding more awareness of cxl into
>> generic efx code, what we want to avoid using the specific efx_cxl.* files.
>>
>> As I mentioned in patch 1, I think the right thing to do is to add
>> devres for cxl_dev_state_create.
> ...but I thought netdev is anti-devres? I am ok having a
> devm_cxl_dev_state_create() alongside a "manual" cxl_dev_state_create()
> if that is the case.
But a netdev is using the CXL API where devres is being used already.
AFAIK, netdev maintainers prefer to not use devres by netdev drivers,
but I do not think they can impose their view to external API, mainly
when other driver types could likely also make use of it in the future.
>> Before sending v17 with this change, are you ok with the rest of the
>> patches or you want to go through them as well?
> So I did start taking a look and then turned away upon finding a
> memory-leak on the first 2 patches in the series. I will continue going
> through it, but in general the lifetime and locking rules of the CXL
> subsystem continue to be a source of trouble in new enabling. At a
> minimum that indicates a need/opportunity to review the rules at a
> future CXL collab meeting.
Great. And I agree about potential improvements mostly required after
all this new code (hopefully) ends up being merged, which I'll be happy
to contribute. Also, note this patchset original RFC and cover letters
since then states "basic Type2 support".
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-21 18:23 ` Dan Williams
@ 2025-05-22 9:45 ` Alejandro Lucero Palau
2025-05-22 19:51 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 9:45 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 5/21/25 19:23, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Type3 has some mandatory capabilities which are optional for Type2.
>>
>> In order to support same register/capability discovery code for both
>> types, avoid any assumption about what capabilities should be there, and
>> export the capabilities found for the caller doing the capabilities
>> check based on the expected ones.
>>
>> Add a function for facilitating the report of capabilities missing the
>> expected ones.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/pci.c | 41 +++++++++++++++++++++++++++++++++++++++--
>> drivers/cxl/core/port.c | 8 ++++----
>> drivers/cxl/core/regs.c | 38 ++++++++++++++++++++++----------------
>> drivers/cxl/cxl.h | 6 +++---
>> drivers/cxl/cxlpci.h | 2 +-
>> drivers/cxl/pci.c | 24 +++++++++++++++++++++---
>> include/cxl/cxl.h | 24 ++++++++++++++++++++++++
>> 7 files changed, 114 insertions(+), 29 deletions(-)
> [..]
>> @@ -434,7 +449,7 @@ static void cxl_unmap_regblock(struct cxl_register_map *map)
>> map->base = NULL;
>> }
>>
>> -static int cxl_probe_regs(struct cxl_register_map *map)
>> +static int cxl_probe_regs(struct cxl_register_map *map, unsigned long *caps)
>> {
>> struct cxl_component_reg_map *comp_map;
>> struct cxl_device_reg_map *dev_map;
>> @@ -444,21 +459,12 @@ static int cxl_probe_regs(struct cxl_register_map *map)
>> switch (map->reg_type) {
>> case CXL_REGLOC_RBI_COMPONENT:
>> comp_map = &map->component_map;
>> - cxl_probe_component_regs(host, base, comp_map);
>> + cxl_probe_component_regs(host, base, comp_map, caps);
>> dev_dbg(host, "Set up component registers\n");
>> break;
>> case CXL_REGLOC_RBI_MEMDEV:
>> dev_map = &map->device_map;
>> - cxl_probe_device_regs(host, base, dev_map);
>> - if (!dev_map->status.valid || !dev_map->mbox.valid ||
>> - !dev_map->memdev.valid) {
>> - dev_err(host, "registers not found: %s%s%s\n",
>> - !dev_map->status.valid ? "status " : "",
>> - !dev_map->mbox.valid ? "mbox " : "",
>> - !dev_map->memdev.valid ? "memdev " : "");
>> - return -ENXIO;
>> - }
>> -
>> + cxl_probe_device_regs(host, base, dev_map, caps);
> I thought we talked about this before [1] , i.e. that there is no need
> to pass @caps through the stack.
You did not like to add a new capability field to current structs
because the information needed was already there in the map. I think it
was a fair comment, so the caps, a variable the caller gives, is set
during the discovery without any internal struct added.
Regarding what you suggest below, I have to disagree. This change was
introduced for dealing with a driver using CXL, that is a Type2 or
future Type1 driver. IMO, most of the innerworkings should be hidden to
those clients and therefore working with the map struct is far from
ideal, and it is not currently accessible from those drivers. With these
new drivers the core does not know what should be there, so the check is
delayed and left to the driver. IMO, from a Type2/Type1 driver
perspective, it is better to deal with caps expected/found than being
aware of those internal CXL register discovery and maps. I will go back
to this in a later comment in your review what states "the driver should
know" . That is true, as the patchset introduces this expectation for
facilitating to know/check if the right registers are there. But note
this is not only about that final end, but the API handling something
going wrong on that discovery/mapping.
> [1]: http://lore.kernel.org/678b06a26cddc_20fa29492@dwillia2-xfh.jf.intel.com.notmuch
>
> Here is the proposal that moves this simple check to the leaf consumer
> where it belongs vs plumbing @caps everywhere, note how this removes
> burden from the core, not add burden to support more use cases:
>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index ecdb22ae6952..5f511cf4bab0 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -434,7 +434,7 @@ static void cxl_unmap_regblock(struct cxl_register_map *map)
> map->base = NULL;
> }
>
> -static int cxl_probe_regs(struct cxl_register_map *map)
> +static void cxl_probe_regs(struct cxl_register_map *map)
> {
> struct cxl_component_reg_map *comp_map;
> struct cxl_device_reg_map *dev_map;
> @@ -450,22 +450,11 @@ static int cxl_probe_regs(struct cxl_register_map *map)
> case CXL_REGLOC_RBI_MEMDEV:
> dev_map = &map->device_map;
> cxl_probe_device_regs(host, base, dev_map);
> - if (!dev_map->status.valid || !dev_map->mbox.valid ||
> - !dev_map->memdev.valid) {
> - dev_err(host, "registers not found: %s%s%s\n",
> - !dev_map->status.valid ? "status " : "",
> - !dev_map->mbox.valid ? "mbox " : "",
> - !dev_map->memdev.valid ? "memdev " : "");
> - return -ENXIO;
> - }
> -
> dev_dbg(host, "Probing device registers...\n");
> break;
> default:
> break;
> }
> -
> - return 0;
> }
>
> int cxl_setup_regs(struct cxl_register_map *map)
> @@ -476,10 +465,10 @@ int cxl_setup_regs(struct cxl_register_map *map)
> if (rc)
> return rc;
>
> - rc = cxl_probe_regs(map);
> + cxl_probe_regs(map);
> cxl_unmap_regblock(map);
>
> - return rc;
> + return 0;
> }
> EXPORT_SYMBOL_NS_GPL(cxl_setup_regs, "CXL");
>
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index d5447c7d540f..cfe4b5fa948a 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -945,6 +945,16 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> if (rc)
> return rc;
>
> + /* Check for mandatory CXL Memory Class Device capabilities */
> + if (!map.device_map.status.valid || !map.device_map.mbox.valid ||
> + !map.device_map.memdev.valid) {
> + dev_err(&pdev->dev, "registers not found: %s%s%s\n",
> + !map.device_map.status.valid ? "status " : "",
> + !map.device_map.mbox.valid ? "mbox " : "",
> + !map.device_map.memdev.valid ? "memdev " : "");
> + return -ENXIO;
> + }
> +
> rc = cxl_map_device_regs(&map, &cxlds->regs.device_regs);
> if (rc)
> return rc;
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-21 18:28 ` Dan Williams
@ 2025-05-22 9:52 ` Alejandro Lucero Palau
2025-05-22 20:04 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 9:52 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Jonathan Cameron
On 5/21/25 19:28, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Create a new function for a type2 device initialising
>> cxl_dev_state struct regarding cxl regs setup and mapping.
>>
>> Export the capabilities found for checking them against the
>> expected ones by the driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/pci.c | 62 ++++++++++++++++++++++++++++++++++++++++++
>> include/cxl/cxl.h | 3 ++
>> 2 files changed, 65 insertions(+)
>>
>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
>> index e2b6420592de..b05c6e64bfe2 100644
>> --- a/drivers/cxl/core/pci.c
>> +++ b/drivers/cxl/core/pci.c
>> @@ -1095,6 +1095,68 @@ int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
>>
>> +static int cxl_pci_accel_setup_memdev_regs(struct pci_dev *pdev,
>> + struct cxl_dev_state *cxlds,
>> + unsigned long *caps)
>> +{
>> + struct cxl_register_map map;
>> + int rc;
>> +
>> + rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map, caps);
>> + /*
>> + * This call can return -ENODEV if regs not found. This is not an error
>> + * for Type2 since these regs are not mandatory. If they do exist then
>> + * mapping them should not fail. If they should exist, it is with driver
>> + * calling cxl_pci_check_caps() where the problem should be found.
>> + */
> The driver should know in advance if calling:
>
> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>
> ...will fail. Put that logic where it belongs in the probe function of
> the type-2 driver directly. This helper is not helping, it is just
> obfuscating.
As I said in the previous email, I disagree. The CXL API should be
handling all this. A client only cares about certain things, let's say
manageable things like capabilities, without going deep into CXL specs
about how all that needs to be implemented. This patch introduces a
function embedding different calls for those innerworkings which should
only be handled by the CXL core.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-21 18:34 ` Dan Williams
@ 2025-05-22 10:07 ` Alejandro Lucero Palau
2025-05-22 20:22 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 10:07 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Martin Habets, Zhi Wang, Edward Cree, Jonathan Cameron,
Ben Cheatham
On 5/21/25 19:34, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl code for registers discovery and mapping.
>>
>> Validate capabilities found based on those registers against expected
>> capabilities.
>>
>> Set media ready explicitly as there is no means for doing so without
>> a mailbox and without the related cxl register, not mandatory for type2.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Reviewed-by: Zhi Wang <zhi@nvidia.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
>> 1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 753d5b7d49b6..e94af8bf3a79 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -19,10 +19,13 @@
>>
>> int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> + DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
>> + DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
>> struct efx_nic *efx = &probe_data->efx;
>> struct pci_dev *pci_dev = efx->pci_dev;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> + int rc;
>>
>> probe_data->cxl_pio_initialised = false;
>>
>> @@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> if (!cxl)
>> return -ENOMEM;
>>
>> + set_bit(CXL_DEV_CAP_HDM, expected);
>> + set_bit(CXL_DEV_CAP_RAS, expected);
>> +
>> + rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
>> + if (rc) {
>> + pci_err(pci_dev, "CXL accel setup regs failed");
>> + return rc;
>> + }
>> +
>> + /*
>> + * Checking mandatory caps are there as, at least, a subset of those
>> + * found.
>> + */
>> + if (cxl_check_caps(pci_dev, expected, found))
>> + return -ENXIO;
> This all looks like an obfuscated way of writing:
>
> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
> /* sfc cxl expectations not met */
That is an unfair comment.
Map is not available here and I do not think it should. The CXL API
should hide all this. Adding that new accel function avoids repeating
something all the drivers will go through:
cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
Maybe cxl_map_device_regs(&map, &cxlds->regs.device_regs);
cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &cxlds->reg_map, caps);
And maybe cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component, BIT(CXL_CM_CAP_CAP_ID_RAS));
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox
2025-05-21 18:47 ` Dan Williams
@ 2025-05-22 10:24 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 10:24 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Jonathan Cameron
On 5/21/25 19:47, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
>> memdev state params which end up being used for DMA initialization.
>>
>> Allow a Type2 driver to initialize DPA simply by giving the size of its
>> volatile and/or non-volatile hardware partitions.
>>
>> Export cxl_dpa_setup as well for initializing those added DPA partitions
>> with the proper resources.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/mbox.c | 26 ++++++++++++++++++++------
>> drivers/cxl/cxlmem.h | 13 -------------
>> include/cxl/cxl.h | 14 ++++++++++++++
>> 3 files changed, 34 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index ab994d459f46..b14cfc6e3dba 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -1284,6 +1284,22 @@ static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_pa
>> info->nr_partitions++;
>> }
>>
>> +/**
>> + * cxl_mem_dpa_init: initialize dpa by a driver without a mailbox.
>> + *
>> + * @info: pointer to cxl_dpa_info
>> + * @volatile_bytes: device volatile memory size
>> + * @persistent_bytes: device persistent memory size
>> + */
>> +void cxl_mem_dpa_init(struct cxl_dpa_info *info, u64 volatile_bytes,
>> + u64 persistent_bytes)
> I struggle to imagine a Type-2 device with PMEM, or needing anything
> more complicated than a single volatile range. No need to pre-enable
> something that may never exist.
>
> Lets just have a cxl_set_capacity() for the simple / common case:
>
> int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
> {
> struct cxl_dpa_info range_info = { 0 };
>
> add_part(info, 0, capacity, CXL_PARTMODE_RAM);
> return cxl_dpa_setup(cxlds, &range_info);
> }
>
> ...then there is no need to move 'struct cxl_dpa_info' to a public
> header, or require type-2 drivers to pass in a pointless PMEM capacity.
>
> If more complicated devices show up later the code can always be made
> more sophisticated at that point.
That seems fine to me. The only problem I see is a driver with a mailbox
will initialize this differently, getting the cxl_dpa_info first, then
calling cxl_setup_dpa, or all that also hidden in another function. In
any case, I guess the first driver needing that will have to work it out.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-05-21 19:31 ` Dan Williams
@ 2025-05-22 10:56 ` Alejandro Lucero Palau
2025-05-22 20:31 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-22 10:56 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 5/21/25 20:31, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> CXL region creation involves allocating capacity from device DPA
>> (device-physical-address space) and assigning it to decode a given HPA
>> (host-physical-address space). Before determining how much DPA to
>> allocate the amount of available HPA must be determined. Also, not all
>> HPA is created equal, some specifically targets RAM, some target PMEM,
>> some is prepared for device-memory flows like HDM-D and HDM-DB, and some
>> is host-only (HDM-H).
>>
>> In order to support Type2 CXL devices, wrap all of those concerns into
>> an API that retrieves a root decoder (platform CXL window) that fits the
>> specified constraints and the capacity available for a new region.
>>
>> Add a complementary function for releasing the reference to such root
>> decoder.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/region.c | 166 ++++++++++++++++++++++++++++++++++++++
>> drivers/cxl/cxl.h | 3 +
>> include/cxl/cxl.h | 11 +++
>> 3 files changed, 180 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index c3f4dc244df7..4affa1f22fd1 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -695,6 +695,172 @@ static int free_hpa(struct cxl_region *cxlr)
>> return 0;
>> }
>>
snip
>> +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
>> + int interleave_ways,
>> + unsigned long flags,
>> + resource_size_t *max_avail_contig)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxlrd_max_context ctx = {
>> + .host_bridges = &endpoint->host_bridge,
>> + .flags = flags,
>> + };
>> + struct cxl_port *root_port;
>> + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
>> +
>> + if (!is_cxl_endpoint(endpoint)) {
>> + dev_dbg(&endpoint->dev, "hpa requestor is not an endpoint\n");
>> + return ERR_PTR(-EINVAL);
>> + }
> This seems confused because the @cxlmd argument is always an endpoint.
> The dynamic state is whether that endpoint is currently connected to the
> CXL HDM decode hierarchy, or not.
>
> That state changes relative to whether @cxlmd is bound to the cxl_mem
> driver. So the above check is also racy.
>
> I think this wants to be:
>
> guard(device)(&cxlmd->dev);
> if (!cxlmd->endpoint)
> return -ENXIO;
It makes sense. I'll do so.
>> + if (!root) {
>> + dev_dbg(&endpoint->dev, "endpoint can not be related to a root port\n");
>> + return ERR_PTR(-ENXIO);
>> + }
>> +
>> + root_port = &root->port;
>> + scoped_guard(rwsem_read, &cxl_region_rwsem)
>> + device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
>> +
>> + if (!ctx.cxlrd)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + *max_avail_contig = ctx.max_hpa;
>> + return ctx.cxlrd;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
>> +
>> +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
>> +{
>> + put_device(CXLRD_DEV(cxlrd));
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
> I think this cxl_put_root_decoder() requirement is manageable for the
> for the initial merge, but it is not something to commit to long term.
> The device's HPA freespace and CXL HDM should be freed at cxl_mem detach
> time, but that will require more infrastructure.
>
> The reference does not stop the root decoder from being unregistered and
> it is clearly broken to allow it to be unregistered while drivers have
> pending allocations.
I agree all this requires to address those problems, hopefully in the
short-mid term, meaning follow-ups of this patchset.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-22 8:49 ` Alejandro Lucero Palau
@ 2025-05-22 19:41 ` Dan Williams
2025-06-04 8:09 ` Jonathan Cameron
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-22 19:41 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Jonathan Cameron, Edward Cree
Alejandro Lucero Palau wrote:
>
> On 5/21/25 18:12, Dan Williams wrote:
> > Alejandro Lucero Palau wrote:
> > [..]
> >>>> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> >>>> +{
> >>> So this is empty which means it leaks the cxl_dev_state_create()
> >>> allocation, right?
> >>
> >> Yes, because I was wrongly relying on devres ...
> >>
> >>
> >> Previous patchsets were doing the explicit release here.
> >>
> >>
> >> Your suggestion below relies on adding more awareness of cxl into
> >> generic efx code, what we want to avoid using the specific efx_cxl.* files.
> >>
> >> As I mentioned in patch 1, I think the right thing to do is to add
> >> devres for cxl_dev_state_create.
> > ...but I thought netdev is anti-devres? I am ok having a
> > devm_cxl_dev_state_create() alongside a "manual" cxl_dev_state_create()
> > if that is the case.
>
>
> But a netdev is using the CXL API where devres is being used already.
> AFAIK, netdev maintainers prefer to not use devres by netdev drivers,
> but I do not think they can impose their view to external API, mainly
> when other driver types could likely also make use of it in the future.
From the CXL perspective I am neutral. As long as the parallel manual
interfaces arrange to undo everything it "should just work (TM)". You
would need to create the manual version of devm_cxl_add_memdev(), and
audit that the paired cxl_del_memdev() does not result in any cxl_core
internal devres events to leak past the ->remove() event for the
accelerator driver.
> >> Before sending v17 with this change, are you ok with the rest of the
> >> patches or you want to go through them as well?
> > So I did start taking a look and then turned away upon finding a
> > memory-leak on the first 2 patches in the series. I will continue going
> > through it, but in general the lifetime and locking rules of the CXL
> > subsystem continue to be a source of trouble in new enabling. At a
> > minimum that indicates a need/opportunity to review the rules at a
> > future CXL collab meeting.
>
> Great. And I agree about potential improvements mostly required after
> all this new code (hopefully) ends up being merged, which I'll be happy
> to contribute. Also, note this patchset original RFC and cover letters
> since then states "basic Type2 support".
It would help to define "basic" in terms of impact. How much end-user
benefit arrives at this stage, and what is driving motivation to go
beyond basic. E.g. "PIO buffer in CXL == X amount of goodness, and Y
amount of goodness comes with additional changes".
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-22 9:45 ` Alejandro Lucero Palau
@ 2025-05-22 19:51 ` Dan Williams
2025-05-23 9:12 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-22 19:51 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Jonathan Cameron
Alejandro Lucero Palau wrote:
[..]
> You did not like to add a new capability field to current structs
> because the information needed was already there in the map. I think it
> was a fair comment, so the caps, a variable the caller gives, is set
> during the discovery without any internal struct added.
The objection was not limited to data structure changes, it also
includes sprinkling an @caps argument throughout the stack for an
as yet to be determined benefit.
> Regarding what you suggest below, I have to disagree. This change was
> introduced for dealing with a driver using CXL, that is a Type2 or
> future Type1 driver. IMO, most of the innerworkings should be hidden to
> those clients and therefore working with the map struct is far from
> ideal, and it is not currently accessible from those drivers.
Checking a couple validity flags in a now public (in include/cxl/pci.h)
data-structure is far from ideal?
> With these new drivers the core does not know what should be there, so
> the check is delayed and left to the driver.
Correct, left to the driver to read from an existing mechanism.
> IMO, from a Type2/Type1 driver perspective, it is better to deal with
> caps expected/found than being aware of those internal CXL register
> discovery and maps.
Not if a maintainer of the CXL register discovery and maps remains
unconvinced to merge a parallel redundant mechanism to achieve the exact
same goal.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-22 9:52 ` Alejandro Lucero Palau
@ 2025-05-22 20:04 ` Dan Williams
2025-06-06 11:59 ` Alejandro Lucero Palau
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-22 20:04 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Ben Cheatham, Jonathan Cameron
Alejandro Lucero Palau wrote:
[..]
> > The driver should know in advance if calling:
> >
> > cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> >
> > ...will fail. Put that logic where it belongs in the probe function of
> > the type-2 driver directly. This helper is not helping, it is just
> > obfuscating.
>
>
> As I said in the previous email, I disagree. The CXL API should be
> handling all this. A client only cares about certain things, let's say
> manageable things like capabilities, without going deep into CXL specs
> about how all that needs to be implemented. This patch introduces a
> function embedding different calls for those innerworkings which should
> only be handled by the CXL core.
No. Please keep this policy out of the core. Do not invent a new
"capabilities" contract that the CXL core needs to maintain, and do not
add thin "cxl_pci_accel_" helpers that just wrap existing core
functionality. Call existing core functions directly and only augment
them at the point where fundamental assumptions are violated between
the "Type-3" and "Type-2" device models.
If the fundamental assumption violation boils down to a policy
difference between Type-2 and Type-3 then move that policy out of the
core. For example, register discovery is a mechanism, what the client
does with the result of that mechanism is policy and belongs in the
leaf/client. It was an accident of implementation that mandatory Type-3
register blocks were validated in the core not in cxl_pci from the
outset.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-22 10:07 ` Alejandro Lucero Palau
@ 2025-05-22 20:22 ` Dan Williams
2025-05-22 20:53 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-22 20:22 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Zhi Wang, Edward Cree, Jonathan Cameron,
Ben Cheatham
Alejandro Lucero Palau wrote:
>
> On 5/21/25 19:34, Dan Williams wrote:
> > alejandro.lucero-palau@ wrote:
> >> From: Alejandro Lucero <alucerop@amd.com>
> >>
> >> Use cxl code for registers discovery and mapping.
> >>
> >> Validate capabilities found based on those registers against expected
> >> capabilities.
> >>
> >> Set media ready explicitly as there is no means for doing so without
> >> a mailbox and without the related cxl register, not mandatory for type2.
> >>
> >> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> >> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> >> Reviewed-by: Zhi Wang <zhi@nvidia.com>
> >> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> >> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> >> ---
> >> drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
> >> 1 file changed, 26 insertions(+)
> >>
> >> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> >> index 753d5b7d49b6..e94af8bf3a79 100644
> >> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> >> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> >> @@ -19,10 +19,13 @@
> >>
> >> int efx_cxl_init(struct efx_probe_data *probe_data)
> >> {
> >> + DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
> >> + DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
> >> struct efx_nic *efx = &probe_data->efx;
> >> struct pci_dev *pci_dev = efx->pci_dev;
> >> struct efx_cxl *cxl;
> >> u16 dvsec;
> >> + int rc;
> >>
> >> probe_data->cxl_pio_initialised = false;
> >>
> >> @@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> >> if (!cxl)
> >> return -ENOMEM;
> >>
> >> + set_bit(CXL_DEV_CAP_HDM, expected);
> >> + set_bit(CXL_DEV_CAP_RAS, expected);
> >> +
> >> + rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
> >> + if (rc) {
> >> + pci_err(pci_dev, "CXL accel setup regs failed");
> >> + return rc;
> >> + }
> >> +
> >> + /*
> >> + * Checking mandatory caps are there as, at least, a subset of those
> >> + * found.
> >> + */
> >> + if (cxl_check_caps(pci_dev, expected, found))
> >> + return -ENXIO;
> > This all looks like an obfuscated way of writing:
> >
> > cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> > if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
> > /* sfc cxl expectations not met */
>
>
> That is an unfair comment.
>
>
> Map is not available here
Why is @map not available? It was made public in patch1?
> and I do not think it should.
...that is the point of contention.
> The CXL API
> should hide all this.
A new "capabilities" contract is not hiding anything, it is layering on
a redundant mechanism.
> Adding that new accel function avoids repeating something all the
> drivers will go through:
When / if multiple drivers end up with the same policy, then look to
refactor them into a helper for that policy.
> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>
> Maybe cxl_map_device_regs(&map, &cxlds->regs.device_regs);
>
> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &cxlds->reg_map, caps);
>
> And maybe cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component, BIT(CXL_CM_CAP_CAP_ID_RAS));
This still looks like no net improvement over the mechanisms the core
currently has. cxl_pci_accel_setup_memdev_regs() spends time explaining
how it might not work for all CXL accelerator device types, it looks
like pure maintenance burden to me. I want future accelerator driver
writers to carefully think through this problem and not look to a
cxl_pci_accel_ helper to do that for them. Outside of Type-3 there is
simply too much freedom in the CXL specification for a client driver to
expect a pre-built helper for their use case.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration
2025-05-22 10:56 ` Alejandro Lucero Palau
@ 2025-05-22 20:31 ` Dan Williams
0 siblings, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-22 20:31 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Jonathan Cameron
Alejandro Lucero Palau wrote:
[..]
> > The reference does not stop the root decoder from being unregistered and
> > it is clearly broken to allow it to be unregistered while drivers have
> > pending allocations.
>
>
> I agree all this requires to address those problems, hopefully in the
> short-mid term, meaning follow-ups of this patchset.
When I made the comment above it was before realizing that v16 dropped
the cxl_acquire_endpoint() mechanism. There are just too many
memory-safety bugs in this patchset around CXL topology setup/teardown.
I am not seeing a way to address all the problems solely with
follow-ups.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-22 20:22 ` Dan Williams
@ 2025-05-22 20:53 ` Dan Williams
2025-05-22 21:09 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Dan Williams @ 2025-05-22 20:53 UTC (permalink / raw)
To: Dan Williams, Alejandro Lucero Palau, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Zhi Wang, Edward Cree, Jonathan Cameron,
Ben Cheatham
Dan Williams wrote:
> Alejandro Lucero Palau wrote:
> >
> > On 5/21/25 19:34, Dan Williams wrote:
> > > alejandro.lucero-palau@ wrote:
> > >> From: Alejandro Lucero <alucerop@amd.com>
> > >>
> > >> Use cxl code for registers discovery and mapping.
> > >>
> > >> Validate capabilities found based on those registers against expected
> > >> capabilities.
> > >>
> > >> Set media ready explicitly as there is no means for doing so without
> > >> a mailbox and without the related cxl register, not mandatory for type2.
> > >>
> > >> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> > >> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> > >> Reviewed-by: Zhi Wang <zhi@nvidia.com>
> > >> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > >> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > >> ---
> > >> drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
> > >> 1 file changed, 26 insertions(+)
> > >>
> > >> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> > >> index 753d5b7d49b6..e94af8bf3a79 100644
> > >> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> > >> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> > >> @@ -19,10 +19,13 @@
> > >>
> > >> int efx_cxl_init(struct efx_probe_data *probe_data)
> > >> {
> > >> + DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
> > >> + DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
> > >> struct efx_nic *efx = &probe_data->efx;
> > >> struct pci_dev *pci_dev = efx->pci_dev;
> > >> struct efx_cxl *cxl;
> > >> u16 dvsec;
> > >> + int rc;
> > >>
> > >> probe_data->cxl_pio_initialised = false;
> > >>
> > >> @@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> > >> if (!cxl)
> > >> return -ENOMEM;
> > >>
> > >> + set_bit(CXL_DEV_CAP_HDM, expected);
> > >> + set_bit(CXL_DEV_CAP_RAS, expected);
> > >> +
> > >> + rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
> > >> + if (rc) {
> > >> + pci_err(pci_dev, "CXL accel setup regs failed");
> > >> + return rc;
> > >> + }
> > >> +
> > >> + /*
> > >> + * Checking mandatory caps are there as, at least, a subset of those
> > >> + * found.
> > >> + */
> > >> + if (cxl_check_caps(pci_dev, expected, found))
> > >> + return -ENXIO;
> > > This all looks like an obfuscated way of writing:
> > >
> > > cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> > > if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
> > > /* sfc cxl expectations not met */
Now, I do notice that the proposal above got the registers blocks wrong.
I.e. that should be:
cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
/* sfc cxl expectations not met */
I also apologize for the negative connotation of "obfuscated". I should
have said:
"This simply boils down to the following..." which is yes, a bit of a
mouthful to type out, but it has the benefit of no new changes to the
core which is my preference until it becomes exceedingly clear that new
APIs are needed.
So,
map.component_map.hdm_decoder.valid
map.component_map.ras.valid
...instead of:
set_bit(CXL_DEV_CAP_HDM, expected);
set_bit(CXL_DEV_CAP_RAS, expected);
...please.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 06/22] sfc: make regs setup with checking and set media ready
2025-05-22 20:53 ` Dan Williams
@ 2025-05-22 21:09 ` Dan Williams
0 siblings, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-22 21:09 UTC (permalink / raw)
To: Dan Williams, Alejandro Lucero Palau, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Zhi Wang, Edward Cree, Jonathan Cameron,
Ben Cheatham
Dan Williams wrote:
> Dan Williams wrote:
> > Alejandro Lucero Palau wrote:
> > >
> > > On 5/21/25 19:34, Dan Williams wrote:
> > > > alejandro.lucero-palau@ wrote:
> > > >> From: Alejandro Lucero <alucerop@amd.com>
> > > >>
> > > >> Use cxl code for registers discovery and mapping.
> > > >>
> > > >> Validate capabilities found based on those registers against expected
> > > >> capabilities.
> > > >>
> > > >> Set media ready explicitly as there is no means for doing so without
> > > >> a mailbox and without the related cxl register, not mandatory for type2.
> > > >>
> > > >> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> > > >> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> > > >> Reviewed-by: Zhi Wang <zhi@nvidia.com>
> > > >> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> > > >> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > >> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > > >> ---
> > > >> drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
> > > >> 1 file changed, 26 insertions(+)
> > > >>
> > > >> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> > > >> index 753d5b7d49b6..e94af8bf3a79 100644
> > > >> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> > > >> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> > > >> @@ -19,10 +19,13 @@
> > > >>
> > > >> int efx_cxl_init(struct efx_probe_data *probe_data)
> > > >> {
> > > >> + DECLARE_BITMAP(expected, CXL_MAX_CAPS) = {};
> > > >> + DECLARE_BITMAP(found, CXL_MAX_CAPS) = {};
> > > >> struct efx_nic *efx = &probe_data->efx;
> > > >> struct pci_dev *pci_dev = efx->pci_dev;
> > > >> struct efx_cxl *cxl;
> > > >> u16 dvsec;
> > > >> + int rc;
> > > >>
> > > >> probe_data->cxl_pio_initialised = false;
> > > >>
> > > >> @@ -43,6 +46,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> > > >> if (!cxl)
> > > >> return -ENOMEM;
> > > >>
> > > >> + set_bit(CXL_DEV_CAP_HDM, expected);
> > > >> + set_bit(CXL_DEV_CAP_RAS, expected);
> > > >> +
> > > >> + rc = cxl_pci_accel_setup_regs(pci_dev, &cxl->cxlds, found);
> > > >> + if (rc) {
> > > >> + pci_err(pci_dev, "CXL accel setup regs failed");
> > > >> + return rc;
> > > >> + }
> > > >> +
> > > >> + /*
> > > >> + * Checking mandatory caps are there as, at least, a subset of those
> > > >> + * found.
> > > >> + */
> > > >> + if (cxl_check_caps(pci_dev, expected, found))
> > > >> + return -ENXIO;
> > > > This all looks like an obfuscated way of writing:
> > > >
> > > > cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
> > > > if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
> > > > /* sfc cxl expectations not met */
>
> Now, I do notice that the proposal above got the registers blocks wrong.
> I.e. that should be:
>
> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_COMPONENT, &map);
> if (!map.component_map.ras.valid || !map.component_map.hdm_decoder.valid)
> /* sfc cxl expectations not met */
Actually that is subtly wrong again and points to a shared wart that, if
it could be cleaned up, would benefit cxl_pci Type-3 use case as well.
That @map unfortunately needs to be cxlds->reg_map.
I do not like the fact that 'struct cxl_dev_state' carries that 'struct
cxl_register_map' just to forward the enumeration to the endpoint
cxl_port. I also do not like that cxl_pci maps the RAS capability while
cxl_port maps the hdm_decoder capability. Ideally cxl_port would own
both those capabilities.
In that scenario simple use cases like sfc that only care about HDM and
RAS can forget about enumerating and mapping CXL component registers
altogether, just devm_cxl_add_memdev() and the core handles the rest.
Now that is the kind of helper and CXL core improvement I am interested
in seeing, and perhaps precludes the need for 'struct cxl_register_map'
to be moved to include/cxl/pci.h.
It may be something we can do after Terry's CXL protocol error series.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 22/22] sfc: support pio mapping based on cxl
2025-05-21 21:48 ` Dan Williams
@ 2025-05-23 1:13 ` Edward Cree
0 siblings, 0 replies; 84+ messages in thread
From: Edward Cree @ 2025-05-23 1:13 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
On 21/05/2025 22:48, Dan Williams wrote:
> Maybe that would be more obvious to me if I knew what a "PIO buffer" was
> used for currently, but some more words about the why of all this would
> help clarify if the design is making the right complexity vs benefit
> tradeoffs.
A PIO buffer is a region of device memory to which the driver can write
packet data for TX, so that when the device handles the transmit
doorbell it doesn't have to DMA that data across from host memory.
Essentially it's spending CPU time to save a round-trip across the PCIe
bus, reducing latency; the driver heuristically decides whether a TX is
more bandwidth- or latency-sensitive, and in the latter case uses PIO.
I don't know too much about the CXL side of things (hopefully Alejandro
will elaborate) but AIUI using CXL instead of PCIe for this reduces the
latency further.
Some of the above information should probably be added to the series
cover letter or this patch description.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-22 19:51 ` Dan Williams
@ 2025-05-23 9:12 ` Alejandro Lucero Palau
2025-05-23 16:55 ` Dan Williams
0 siblings, 1 reply; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-05-23 9:12 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 5/22/25 20:51, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
> [..]
>> You did not like to add a new capability field to current structs
>> because the information needed was already there in the map. I think it
>> was a fair comment, so the caps, a variable the caller gives, is set
>> during the discovery without any internal struct added.
> The objection was not limited to data structure changes, it also
> includes sprinkling an @caps argument throughout the stack for an
> as yet to be determined benefit.
>
>> Regarding what you suggest below, I have to disagree. This change was
>> introduced for dealing with a driver using CXL, that is a Type2 or
>> future Type1 driver. IMO, most of the innerworkings should be hidden to
>> those clients and therefore working with the map struct is far from
>> ideal, and it is not currently accessible from those drivers.
> Checking a couple validity flags in a now public (in include/cxl/pci.h)
> data-structure is far from ideal?
>
>> With these new drivers the core does not know what should be there, so
>> the check is delayed and left to the driver.
> Correct, left to the driver to read from an existing mechanism.
>
>> IMO, from a Type2/Type1 driver perspective, it is better to deal with
>> caps expected/found than being aware of those internal CXL register
>> discovery and maps.
> Not if a maintainer of the CXL register discovery and maps remains
> unconvinced to merge a parallel redundant mechanism to achieve the exact
> same goal.
OK. You are the maintainer and you'll get what you want. I'm not going
to fight this if none else back me up.
Because you refer to your maintainer position, let me just say, in a
critical but constructive way, I'm a bit pissed off with this.
It is not because we disagree nor because you as the maintainer have a
weight on this I don't. I accept that and I am also happy to discuss all
this with you even if I end up doing the things your way. I'm pissed off
because you have been silent during months, with other people in the CXL
kernel community reviewing the patches, commenting and raising concerns.
I think it is discouraging that you, as the maintainer, allow me and
these people involved in those reviews, going through a path you
disagree with and say nothing. Again, it is not because you have another
view, surely more relevant than those less used to the code involved,
but because you disappear for so long.
I need some days off now, what is well-aligned with a four-day long
weekend for me, but I'll be back with new energy next week for
addressing all those concerns.
Thank you
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 04/22] cxl: Move register/capability check to driver
2025-05-23 9:12 ` Alejandro Lucero Palau
@ 2025-05-23 16:55 ` Dan Williams
0 siblings, 0 replies; 84+ messages in thread
From: Dan Williams @ 2025-05-23 16:55 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams, alejandro.lucero-palau,
linux-cxl, netdev, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Jonathan Cameron
Alejandro Lucero Palau wrote:
>
> On 5/22/25 20:51, Dan Williams wrote:
> > Alejandro Lucero Palau wrote:
> > [..]
> >> You did not like to add a new capability field to current structs
> >> because the information needed was already there in the map. I think it
> >> was a fair comment, so the caps, a variable the caller gives, is set
> >> during the discovery without any internal struct added.
> > The objection was not limited to data structure changes, it also
> > includes sprinkling an @caps argument throughout the stack for an
> > as yet to be determined benefit.
> >
> >> Regarding what you suggest below, I have to disagree. This change was
> >> introduced for dealing with a driver using CXL, that is a Type2 or
> >> future Type1 driver. IMO, most of the innerworkings should be hidden to
> >> those clients and therefore working with the map struct is far from
> >> ideal, and it is not currently accessible from those drivers.
> > Checking a couple validity flags in a now public (in include/cxl/pci.h)
> > data-structure is far from ideal?
> >
> >> With these new drivers the core does not know what should be there, so
> >> the check is delayed and left to the driver.
> > Correct, left to the driver to read from an existing mechanism.
> >
> >> IMO, from a Type2/Type1 driver perspective, it is better to deal with
> >> caps expected/found than being aware of those internal CXL register
> >> discovery and maps.
> > Not if a maintainer of the CXL register discovery and maps remains
> > unconvinced to merge a parallel redundant mechanism to achieve the exact
> > same goal.
>
>
> OK. You are the maintainer and you'll get what you want. I'm not going
> to fight this if none else back me up.
>
>
> Because you refer to your maintainer position, let me just say, in a
> critical but constructive way, I'm a bit pissed off with this.
>
>
> It is not because we disagree nor because you as the maintainer have a
> weight on this I don't. I accept that and I am also happy to discuss all
> this with you even if I end up doing the things your way. I'm pissed off
> because you have been silent during months, with other people in the CXL
> kernel community reviewing the patches, commenting and raising concerns.
> I think it is discouraging that you, as the maintainer, allow me and
> these people involved in those reviews, going through a path you
> disagree with and say nothing. Again, it is not because you have another
> view, surely more relevant than those less used to the code involved,
> but because you disappear for so long.
That is fair, and you are justified in being upset.
I cringed when realizing that here we are yet again at a late hour and I
have fundamental comments that postpone the merge.
It is also the case that my time is contended and I need to make
priority calls. My criteria for keeping this at the bottom of the queue,
wrongly or rightly, was that I had the sense that this is still a
performance optimization not a fundamental blocker for end users. Where
making progress on other priorities unblocked a larger number of
stakeholders or had a larger impact on end users.
There is also something missing in the CXL patch review process. It
should not be the case that we have this many review tags and versions
yet still have a memory leak in patch1, and mismatched object validity
expectations throughout. For my part I am going to take ownership of the
fact that the lifetime and locking rules of the CXL object model are not
well understood and will offer a presentation of that at the next CXL
collab meeting. If the review load for lifetime and locking can scale to
more people that hopefully helps me be less of a bottleneck ("pain in
the neck?") going forward.
> I need some days off now, what is well-aligned with a four-day long
> weekend for me, but I'll be back with new energy next week for
> addressing all those concerns.
Your time and effort here are appreciated. Our discussion on the
register enumeration did make me realize the shaky foundations you were
looking to enhance. That back and forth revealed a path to get to what
we both want which is less complexity exported to leaf drivers *and*
minimal incremental burden on top of the core. I.e. a minimal
devm_cxl_add_memdev() is likely all SFC needs to do for basic CXL init.
Lets work towards that.
Again, apologies for the late rug pull.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 02/22] sfc: add cxl support
2025-05-22 19:41 ` Dan Williams
@ 2025-06-04 8:09 ` Jonathan Cameron
0 siblings, 0 replies; 84+ messages in thread
From: Jonathan Cameron @ 2025-06-04 8:09 UTC (permalink / raw)
To: Dan Williams
Cc: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang,
Edward Cree
On Thu, 22 May 2025 12:41:19 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Alejandro Lucero Palau wrote:
> >
> > On 5/21/25 18:12, Dan Williams wrote:
> > > Alejandro Lucero Palau wrote:
> > > [..]
> > >>>> +void efx_cxl_exit(struct efx_probe_data *probe_data)
> > >>>> +{
> > >>> So this is empty which means it leaks the cxl_dev_state_create()
> > >>> allocation, right?
> > >>
> > >> Yes, because I was wrongly relying on devres ...
> > >>
> > >>
> > >> Previous patchsets were doing the explicit release here.
> > >>
> > >>
> > >> Your suggestion below relies on adding more awareness of cxl into
> > >> generic efx code, what we want to avoid using the specific efx_cxl.* files.
> > >>
> > >> As I mentioned in patch 1, I think the right thing to do is to add
> > >> devres for cxl_dev_state_create.
> > > ...but I thought netdev is anti-devres? I am ok having a
> > > devm_cxl_dev_state_create() alongside a "manual" cxl_dev_state_create()
> > > if that is the case.
> >
> >
> > But a netdev is using the CXL API where devres is being used already.
> > AFAIK, netdev maintainers prefer to not use devres by netdev drivers,
> > but I do not think they can impose their view to external API, mainly
> > when other driver types could likely also make use of it in the future.
>
> From the CXL perspective I am neutral. As long as the parallel manual
> interfaces arrange to undo everything it "should just work (TM)". You
> would need to create the manual version of devm_cxl_add_memdev(), and
> audit that the paired cxl_del_memdev() does not result in any cxl_core
> internal devres events to leak past the ->remove() event for the
> accelerator driver.
Maybe look at wrapping the CXL calls up in a devres group. Those
are sometimes useful for cases where we need to wind all devm stuff
down at a particular point in a remove flow. Might allow us to
have devm_cxl_add_memdev() wrapped up by cxl_add_memdev() and
cxl_del_memdev() or versions of those in the the net driver.
Whether that group management sits in the network driver or is in
CXL helpers is an open question and might be refined over time.
Jonathan
>
> > >> Before sending v17 with this change, are you ok with the rest of the
> > >> patches or you want to go through them as well?
> > > So I did start taking a look and then turned away upon finding a
> > > memory-leak on the first 2 patches in the series. I will continue going
> > > through it, but in general the lifetime and locking rules of the CXL
> > > subsystem continue to be a source of trouble in new enabling. At a
> > > minimum that indicates a need/opportunity to review the rules at a
> > > future CXL collab meeting.
> >
> > Great. And I agree about potential improvements mostly required after
> > all this new code (hopefully) ends up being merged, which I'll be happy
> > to contribute. Also, note this patchset original RFC and cover letters
> > since then states "basic Type2 support".
>
> It would help to define "basic" in terms of impact. How much end-user
> benefit arrives at this stage, and what is driving motivation to go
> beyond basic. E.g. "PIO buffer in CXL == X amount of goodness, and Y
> amount of goodness comes with additional changes".
>
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup
2025-05-22 20:04 ` Dan Williams
@ 2025-06-06 11:59 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 11:59 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Jonathan Cameron
On 5/22/25 21:04, Dan Williams wrote:
> Alejandro Lucero Palau wrote:
> [..]
>>> The driver should know in advance if calling:
>>>
>>> cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
>>>
>>> ...will fail. Put that logic where it belongs in the probe function of
>>> the type-2 driver directly. This helper is not helping, it is just
>>> obfuscating.
>>
>> As I said in the previous email, I disagree. The CXL API should be
>> handling all this. A client only cares about certain things, let's say
>> manageable things like capabilities, without going deep into CXL specs
>> about how all that needs to be implemented. This patch introduces a
>> function embedding different calls for those innerworkings which should
>> only be handled by the CXL core.
> No. Please keep this policy out of the core. Do not invent a new
> "capabilities" contract that the CXL core needs to maintain, and do not
> add thin "cxl_pci_accel_" helpers that just wrap existing core
> functionality. Call existing core functions directly and only augment
> them at the point where fundamental assumptions are violated between
> the "Type-3" and "Type-2" device models.
>
> If the fundamental assumption violation boils down to a policy
> difference between Type-2 and Type-3 then move that policy out of the
> core. For example, register discovery is a mechanism, what the client
> does with the result of that mechanism is policy and belongs in the
> leaf/client. It was an accident of implementation that mandatory Type-3
> register blocks were validated in the core not in cxl_pci from the
> outset.
While addressing this I did realize what you propose is far simpler ...
what confused me for some time: how could I ended up adding so much
complexity?
Then I realized the problem was the original request for not allowing
Type2 drivers to use CXL core structs ... . All this changed with the
way cxl_dev_state_create is now implemented so your proposal makes a lot
of sense with the current situation.
I guess the patchset evolution makes things harder.
On another note, addressing this makes the patchset two patches shorter
what is always a good thing.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space
2025-05-21 19:56 ` Dan Williams
@ 2025-06-06 12:59 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 12:59 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron
On 5/21/25 20:56, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Asking for available HPA space is the previous step to try to obtain
>> an HPA range suitable to accel driver purposes.
>>
>> Add this call to efx cxl initialization.
>>
>> Make sfc cxl build dependent on CXL region.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/Kconfig | 1 +
>> drivers/net/ethernet/sfc/efx_cxl.c | 19 +++++++++++++++++++
>> 2 files changed, 20 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
>> index 979f2801e2a8..e959d9b4f4ce 100644
>> --- a/drivers/net/ethernet/sfc/Kconfig
>> +++ b/drivers/net/ethernet/sfc/Kconfig
>> @@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
>> config SFC_CXL
>> bool "Solarflare SFC9100-family CXL support"
>> depends on SFC && CXL_BUS >= SFC
>> + depends on CXL_REGION
>> default SFC
>> help
>> This enables SFC CXL support if the kernel is configuring CXL for
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 53ff97ad07f5..5635672b3fc3 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -26,6 +26,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> struct cxl_dpa_info sfc_dpa_info = {
>> .size = EFX_CTPIO_BUFFER_SIZE
>> };
>> + resource_size_t max_size;
>> struct efx_cxl *cxl;
>> u16 dvsec;
>> int rc;
>> @@ -84,6 +85,22 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> + cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
>> + CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
>> + &max_size);
>> +
>> + if (IS_ERR(cxl->cxlrd)) {
>> + pci_err(pci_dev, "cxl_get_hpa_freespace failed\n");
>> + return PTR_ERR(cxl->cxlrd);
>> + }
> This is a simple enough model, but it does mean that if async-driver
> loading causes this driver to load before cxl_acpi or cxl_mem have
> completed their init work, then it will die here.
>
> It is also worth noting that nothing stops cxl_mem or cxl_acpi from
> detaching immediately after passing the above check. So more work is
> needed here (likely post-merge) to revoke and invalidate usage of that
> freespace when that happens.
>
> Otherwise you can do something like:
>
> Driver1 Driver2 Notes
> cxl_get_hpa_freespace() "Driver1 gets rangeX"
> --- cxl_acpi unloaded --- "forgets rangeX was assigned"
> --- cxl_acpi reloaded ---
> cxl_get_hpa_freespace() "Driver2 gets rangeX"
> use_cxl(rangeX) use_cxl(rangeX) "...uh oh"
I've been thinking about this and other similar comments in later
patches. I have to admit it is confusing, at least for me, because I did
not understand why cxl_acpi or cxl_mem can be removed when users/clients
depend on them. I think (but maybe it is a wrong assumption) they should
not, but the code is not implementing that restriction. In other words,
it is not a functionality but something to fix with two options: to not
allow that to happen implying the removal needs to detect the situation,
or allow it and the removal unwinding everything depending on them.
If this is the right assumption, then I understand your comments about
cxl_acquire_endpoint. Maybe it is worth to say I did relate
cxl_acquire_endpoint to the problem with the initialization and device
model probing, something that IMO requires further discussion.
> So longer term there needs to be notification back to the creator of the
> memdev to require it to handle cleaning up when the CXL topology is torn
> down either physically or logically.
>
> To date the CXL subsystem has not reset decoders on unload because it
> needs to handle coordinating with HDM decode established by platform
> firmware. Type-2 driver however should be prepared to have their CXL
> range revoked at any moment.
>
> The Type-3 case handles this because cxl_mem is the driver itself, for
> Type-2 that driver wants to coordinate with cxl_mem on these events. To
> me that looks like cxl_mem error handler operation callbacks.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation
2025-05-21 20:23 ` Dan Williams
@ 2025-06-06 13:09 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 13:09 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Ben Cheatham, Jonathan Cameron
On 5/21/25 21:23, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Region creation involves finding available DPA (device-physical-address)
>> capacity to map into HPA (host-physical-address) space.
>>
>> In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
>> that tries to allocate the DPA memory the driver requires to operate.The
>> memory requested should not be bigger than the max available HPA obtained
>> previously with cxl_get_hpa_freespace.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/hdm.c | 86 ++++++++++++++++++++++++++++++++++++++++++
>> include/cxl/cxl.h | 5 +++
>> 2 files changed, 91 insertions(+)
>>
>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
>> index 70cae4ebf8a4..500df2deceef 100644
>> --- a/drivers/cxl/core/hdm.c
>> +++ b/drivers/cxl/core/hdm.c
>> @@ -3,6 +3,7 @@
>> #include <linux/seq_file.h>
>> #include <linux/device.h>
>> #include <linux/delay.h>
>> +#include <cxl/cxl.h>
>>
>> #include "cxlmem.h"
>> #include "core.h"
>> @@ -546,6 +547,13 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>> return base;
>> }
>>
>> +/**
>> + * cxl_dpa_free - release DPA (Device Physical Address)
>> + *
>> + * @cxled: endpoint decoder linked to the DPA
>> + *
>> + * Returns 0 or error.
>> + */
>> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_port *port = cxled_to_port(cxled);
>> @@ -572,6 +580,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
>> devm_cxl_dpa_release(cxled);
>> return 0;
>> }
>> +EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
>>
>> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
>> enum cxl_partition_mode mode)
>> @@ -686,6 +695,83 @@ int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, unsigned long long size)
>> return devm_add_action_or_reset(&port->dev, cxl_dpa_release, cxled);
>> }
>>
>> +static int find_free_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_port *port;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + port = cxled_to_port(cxled);
>> +
>> + if (cxled->cxld.id != port->hdm_end + 1)
>> + return 0;
>> +
>> + return 1;
>> +}
>> +
>> +/**
>> + * cxl_request_dpa - search and reserve DPA given input constraints
>> + * @cxlmd: memdev with an endpoint port with available decoders
>> + * @mode: DPA operation mode (ram vs pmem)
>> + * @alloc: dpa size required
>> + *
>> + * Returns a pointer to a cxl_endpoint_decoder struct or an error
>> + *
>> + * Given that a region needs to allocate from limited HPA capacity it
>> + * may be the case that a device has more mappable DPA capacity than
>> + * available HPA. The expectation is that @alloc is a driver known
>> + * value based on the device capacity but it could not be available
>> + * due to HPA constraints.
>> + *
>> + * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
>> + * reserved, or an error pointer. The caller is also expected to own the
>> + * lifetime of the memdev registration associated with the endpoint to
>> + * pin the decoder registered as well.
>> + */
>> +struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
>> + enum cxl_partition_mode mode,
>> + resource_size_t alloc)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxl_endpoint_decoder *cxled;
>> + struct device *cxled_dev;
>> + int rc;
>> +
>> + if (!IS_ALIGNED(alloc, SZ_256M))
>> + return ERR_PTR(-EINVAL);
>> +
>> + down_read(&cxl_dpa_rwsem);
>> + cxled_dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
>> + up_read(&cxl_dpa_rwsem);
> In another effort [1] I am trying to get rid of all explicit unlock
> management of cxl_dpa_rwsem and cxl_region_rwsem, and ultimately get rid
> of all "goto" use in the CXL core.
>
> [1]: http://lore.kernel.org/20250507072145.3614298-1-dan.j.williams@intel.com
>
> So that conversion here would be:
>
> DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (_T) put_device(&cxled->cxld.dev))
> struct cxl_endpoint_decoder *cxl_find_free_decoder(struct cxl_memdev *cxlmd)
> {
> struct device *dev;
>
> scoped_guard(rwsem_read, &cxl_dpa_rwsem)
> dev = device_find_child(&endpoint->dev, NULL, find_free_decoder);
> if (dev)
> return to_cxl_endpoint_decoder(dev);
> return NULL;
> }
>
> ...and then:
>
> struct cxl_endpoint_decoder *cxled __free(put_cxled) = cxl_find_free_decoder(cxlmd);
>
>> +
>> + if (!cxled_dev)
>> + return ERR_PTR(-ENXIO);
>> +
>> + cxled = to_cxl_endpoint_decoder(cxled_dev);
>> +
>> + if (!cxled) {
>> + rc = -ENODEV;
>> + goto err;
>> + }
>> +
>> + rc = cxl_dpa_set_part(cxled, mode);
>> + if (rc)
>> + goto err;
>> +
>> + rc = cxl_dpa_alloc(cxled, alloc);
> The current user of this interface is sysfs. The expecation there is
> that if 2 userspace threads are racing to allocate DPA space, the kernel
> will protect itself and not get confused, but the result will be that
> one thread loses the race and needs to redo its allocation.
>
> That's not an interface that the kernel can support, so there needs to
> be some locking to enforce that 2 threads racing cxl_request_dpa() each
> end up with independent allocations. That likely needs to be a
> syncrhonization primitive over the entire process due to the way that
> CXL requires in-order allocation of DPA and HPA. Effectively you need to
> complete the entire HPA allocatcion, DPA allocation, and decoder
> programming in one atomic unit.
I do not understand this atomic requirement. As I understand this,
dpa_alloc, with the proper locking, will satisfy just one request if two
content, with the second one requiring another try. Once the decoders
resources have been obtained, there is nothing to worry about assuming
the commit of those decoders will be performed with the proper locking
as well.
>
> I think to start since there is only 1 Type-2 driver in the kernel and
> it's only use case is single-threaded setup this is not yet an immediate
> problem.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 18/22] cxl: Allow region creation by type2 drivers
2025-05-21 20:45 ` Dan Williams
@ 2025-06-06 13:27 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 13:27 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron
On 5/21/25 21:45, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Creating a CXL region requires userspace intervention through the cxl
>> sysfs files. Type2 support should allow accelerator drivers to create
>> such cxl region from kernel code.
>>
>> Adding that functionality and integrating it with current support for
>> memory expanders.
>>
>> Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/cxl/core/region.c | 140 +++++++++++++++++++++++++++++++++++---
>> drivers/cxl/port.c | 5 +-
>> include/cxl/cxl.h | 4 ++
>> 3 files changed, 140 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 4113ee6daec9..f82da914d125 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2316,6 +2316,21 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>> return rc;
>> }
>>
>> +/**
>> + * cxl_accel_region_detach - detach a region from a Type2 device
>> + *
>> + * @cxled: Type2 endpoint decoder to detach the region from.
>> + *
>> + * Returns 0 or error.
>> + */
>> +int cxl_accel_region_detach(struct cxl_endpoint_decoder *cxled)
>> +{
>> + guard(rwsem_write)(&cxl_region_rwsem);
>> + cxled->part = -1;
>> + return cxl_region_detach(cxled);
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_accel_region_detach, "CXL");
> There's nothing "accel" about the above sequence, it is nearly identical
> to cxl_decoder_kill_region().
>
> In general there does not need to be a parallel universe of "cxl_accel_"
> helpers for Type-2, just use existing infrastructure and maybe enlighten
> it a bit to accommodate a Type-2 nuance.
>
>> +
>> void cxl_decoder_kill_region(struct cxl_endpoint_decoder *cxled)
>> {
>> down_write(&cxl_region_rwsem);
>> @@ -2822,6 +2837,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
>> return to_cxl_region(region_dev);
>> }
>>
>> +static void drop_region(struct cxl_region *cxlr)
>> +{
>> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
>> + struct cxl_port *port = cxlrd_to_port(cxlrd);
>> +
>> + devm_release_action(port->uport_dev, unregister_region, cxlr);
>> +}
>> +
>> static ssize_t delete_region_store(struct device *dev,
>> struct device_attribute *attr,
>> const char *buf, size_t len)
>> @@ -3526,14 +3549,12 @@ static int __construct_region(struct cxl_region *cxlr,
>> return 0;
>> }
>>
>> -/* Establish an empty region covering the given HPA range */
>> -static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> - struct cxl_endpoint_decoder *cxled)
>> +static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder *cxled)
>> {
>> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
>> - struct cxl_port *port = cxlrd_to_port(cxlrd);
>> struct cxl_dev_state *cxlds = cxlmd->cxlds;
>> - int rc, part = READ_ONCE(cxled->part);
>> + int part = READ_ONCE(cxled->part);
>> struct cxl_region *cxlr;
>>
>> do {
>> @@ -3542,13 +3563,23 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> cxled->cxld.target_type);
>> } while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
>>
>> - if (IS_ERR(cxlr)) {
>> + if (IS_ERR(cxlr))
>> dev_err(cxlmd->dev.parent,
>> "%s:%s: %s failed assign region: %ld\n",
>> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
>> __func__, PTR_ERR(cxlr));
>> - return cxlr;
>> - }
>> + return cxlr;
>> +};
>> +
>> +/* Establish an empty region covering the given HPA range */
>> +static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder *cxled)
>> +{
>> + struct cxl_port *port = cxlrd_to_port(cxlrd);
>> + struct cxl_region *cxlr;
>> + int rc;
>> +
>> + cxlr = construct_region_begin(cxlrd, cxled);
>>
>> rc = __construct_region(cxlr, cxlrd, cxled);
>> if (rc) {
>> @@ -3559,6 +3590,99 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
>> return cxlr;
>> }
>>
>> +static struct cxl_region *
>> +__construct_new_region(struct cxl_root_decoder *cxlrd,
>> + struct cxl_endpoint_decoder *cxled, int ways)
> What is the point of an @ways argument when @cxled is not an array? It
> was an array in the original proposal. Recall that this interface needs
> to be useful not only to Type-2 but also the nascent CXL PMEM case which
> will likely need to create interleave CXL PMEM regions from label data.
Yes, you are right. I'll fix it and add the CXL PMEM case in the commit
message.
Thanks
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax
2025-05-21 20:49 ` Dan Williams
@ 2025-06-06 13:39 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 13:39 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Zhi Wang, Jonathan Cameron, Ben Cheatham
On 5/21/25 21:49, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> By definition a type2 cxl device will use the host managed memory for
>> specific functionality, therefore it should not be available to other
>> uses. However, a dax interface could be just good enough in some cases.
>>
>> Add a flag to a cxl region for specifically state to not create a dax
>> device. Allow a Type2 driver to set that flag at region creation time.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
> What was wrong with the original proposal?
It was suggested that although this is the most likely requirement from
most of accelerators, it could be the case someone happy with automatic
dax creation.
>
> +
> + /*
> + * HDM-D[B] (device-memory) regions have accelerator
> + * specific usage, skip device-dax registration.
> + */
> + if (cxlr->type == CXL_DECODER_DEVMEM)
> + return 0;
>
> I really do not want the ABI presentation policy layer leaking that deep into
> the region creation flow. Another way to determine this is if devices
> hosting the region are not driver by the generic CXL device memory class
> driver 'cxl_pci'.
I have found a problem with this patch which precludes its goal (no_dax
flag set after region probing). I was going to fix it but after your
comment, I will just drop the no_dax option.
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 20/22] sfc: create cxl region
2025-05-21 21:01 ` Dan Williams
@ 2025-06-06 13:44 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 13:44 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron
On 5/21/25 22:01, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for creating a region using the endpoint decoder related to
>> a DPA range specifying no DAX device should be created.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 20db9aa382ec..960293a04ed3 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -110,10 +110,19 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> goto sfc_put_decoder;
>> }
>>
>> + cxl->efx_region = cxl_create_region(cxl->cxlrd, cxl->cxled, 1, true);
>> + if (IS_ERR(cxl->efx_region)) {
>> + pci_err(pci_dev, "CXL accel create region failed");
>> + rc = PTR_ERR(cxl->efx_region);
>> + goto err_region;
>> + }
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
>>
>> +err_region:
>> + cxl_dpa_free(cxl->cxled);
>> sfc_put_decoder:
>> cxl_put_root_decoder(cxl->cxlrd);
>> return rc;
>> @@ -122,6 +131,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> void efx_cxl_exit(struct efx_probe_data *probe_data)
>> {
>> if (probe_data->cxl) {
>> + cxl_accel_region_detach(probe_data->cxl->cxled);
> Here is more late magic hoping that cxl_accel_region_detach() can
> actually find something useful to do at this point. I notice that this
> series has dropped the cxl_acquire_endpoint() proposal which at least
> guaranteed a consistent state of the world throughout this whole
> process.
>
> Did I miss the discussion where that approach was abandoned?
As I have commented in patch 12, I was not contemplating those modules
to be removed while the SFC driver depends on them.
>
> The idea would be that at setup time you do:
>
> add_memdev()
> acquire_endpoint()
> register_hdm_error_handlers()
> get_hpa()
> get_dpa()
> create_region()
> release_endpoint()
>
> ...where that new register_hdm_error_handlers() is what coordinates
> cleaning up everything the type-2 driver cares about upon the memdev
> being detached from the CXL topology.
For clarification, we assume this detachment being due to cxl_mem or
cxl_acpi being removed therefore contemplating a clean unwind which has
to be implemented. Correct?
>
> Then your efx_cxl_exit() is automatically handled by:
>
> del_memdev()
>
> ...which includes detaching the memdev from the cxl topology.
>
>> cxl_dpa_free(probe_data->cxl->cxled);
>> cxl_put_root_decoder(probe_data->cxl->cxlrd);
> Otherwise, these long held references are not buying you anything but
> the ability to determine "whoops, should have let go of these
> resources a long time ago, everything I needed for cleanup is now in a
> defunct state".
^ permalink raw reply [flat|nested] 84+ messages in thread
* Re: [PATCH v16 21/22] cxl: Add function for obtaining region range
2025-05-21 21:31 ` Dan Williams
@ 2025-06-06 14:03 ` Alejandro Lucero Palau
0 siblings, 0 replies; 84+ messages in thread
From: Alejandro Lucero Palau @ 2025-06-06 14:03 UTC (permalink / raw)
To: Dan Williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Zhi Wang, Jonathan Cameron
On 5/21/25 22:31, Dan Williams wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> A CXL region struct contains the physical address to work with.
>>
>> Type2 drivers can create a CXL region but have not access to the
>> related struct as it is defined as private by the kernel CXL core.
>> Add a function for getting the cxl region range to be used for mapping
>> such memory range by a Type2 driver.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Zhi Wang <zhiw@nvidia.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
>> include/cxl/cxl.h | 2 ++
>> 2 files changed, 25 insertions(+)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index 06647bae210f..9b7c6b8304d6 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2726,6 +2726,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>> return ERR_PTR(rc);
>> }
>>
>> +/**
>> + * cxl_get_region_range - obtain range linked to a CXL region
>> + *
>> + * @region: a pointer to struct cxl_region
>> + * @range: a pointer to a struct range to be set
>> + *
>> + * Returns 0 or error.
>> + */
>> +int cxl_get_region_range(struct cxl_region *region, struct range *range)
>> +{
>> + if (WARN_ON_ONCE(!region))
>> + return -ENODEV;
>> +
>> + if (!region->params.res)
>> + return -ENOSPC;
>> +
>> + range->start = region->params.res->start;
>> + range->end = region->params.res->end;
> Region params are only consistent under cxl_region_rwsem. Whatever is
> consuming this will want to have that consistent snapshot and some
> coordination with the region shutdown / de-commit flow.
I assumed the owner of the region could be confident the region would be
there ...
This is more of the same problem discussed in previous patches ... where
having cxl_acquire_endpoint makes sense. But it makes me wonder if we
should allow cxl_acpi or cxl_mem to be removed at all while clients
depend on them. Is there a case for this apart from the fact that
current implementation with kernel modules allow it?
>
> This again raises the question, what do you expect to happen after the
> ->remove(&cxlmd->dev) event?
>
> For Type-3 the expectation is leave all the decoders in place and
> reassemble the region from past hardware settings (as if the decode
> range had been established by platform firmware).
Not sure I follow this logic. So Type3 devices can still be accessed
after cxl_acpi or cxl_mem are removed? How does it work when those
modules are loaded again?
>
> Another model could be "never trust an existing decoder and always reset
> the configuration when the driver loads. That would also involve walking
> the topology to reset any upstream switch decoders that were decoding
> the old configuration.
>
> The current model in these patches is "unwind nothing at cxl_mem detach
> time, hope that probe_data->cxl->cxlmd is attached immediately upon
> return from devm_add_cxl_memdev(), hope that it remains attached until
> efx_cxl_exit() runs, and always assume a fresh "from scratch" HDM decode
> configuration at efx_cxl_init() time".
Part of that expected model is fine as long as the sfc driver does exit
doing the cxl unwinding like cxl_accel_region_detach. That should leave
the SFC CXL HDM decoder as in a fresh start ... . Anyway, I got what you
are warning about here and in previous patches, so I will try to address
them or at least identify most of the potential situations using your
reviews as the starting point. Maybe it is a good moment for going back
to my statement about the patchset being about "basic" support ...
>
> I do often cringe at the complexity of the CXL subsystem, but it is all
> complexity that the CXL programming model mandates. Specifically, CXL
> window capacity being a dynamically assigned limited resource that needs
> runtime re-configuration across multiple devices/switches, and resources
> that can interleave host-bridges and endpoints. Compare that to PCIe
> that mostly statically assigns MMIO resources throughout the topology,
> rarely needs to reassign that, and never interleaves.
>
> Yes, there is some self-inflicted complexity in the CXL subsystem
> introduced by allowing drivers like cxl_mem and cxl_acpi to logically
> detach at runtime. However given cxl_mem needs to be prepared for
> physical detachment there is no simple escape from handling the "dynamic
> CXL HDM decode teardown" problem.
^ permalink raw reply [flat|nested] 84+ messages in thread
end of thread, other threads:[~2025-06-06 14:03 UTC | newest]
Thread overview: 84+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-14 13:27 [PATCH v16 00/22] Type2 device basic support alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 01/22] cxl: Add type2 " alejandro.lucero-palau
2025-05-20 2:43 ` Alison Schofield
2025-05-20 7:18 ` Alejandro Lucero Palau
2025-05-20 20:06 ` Dave Jiang
2025-05-21 9:30 ` Alejandro Lucero Palau
2025-05-20 7:17 ` dan.j.williams
2025-05-21 10:44 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 02/22] sfc: add cxl support alejandro.lucero-palau
2025-05-20 7:37 ` dan.j.williams
2025-05-21 10:50 ` Alejandro Lucero Palau
2025-05-21 17:12 ` Dan Williams
2025-05-22 8:49 ` Alejandro Lucero Palau
2025-05-22 19:41 ` Dan Williams
2025-06-04 8:09 ` Jonathan Cameron
2025-05-14 13:27 ` [PATCH v16 03/22] cxl: Move pci generic code alejandro.lucero-palau
2025-05-20 2:42 ` Alison Schofield
2025-05-21 17:44 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 04/22] cxl: Move register/capability check to driver alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:23 ` Dan Williams
2025-05-22 9:45 ` Alejandro Lucero Palau
2025-05-22 19:51 ` Dan Williams
2025-05-23 9:12 ` Alejandro Lucero Palau
2025-05-23 16:55 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 05/22] cxl: Add function for type2 cxl regs setup alejandro.lucero-palau
2025-05-20 2:41 ` Alison Schofield
2025-05-21 18:28 ` Dan Williams
2025-05-22 9:52 ` Alejandro Lucero Palau
2025-05-22 20:04 ` Dan Williams
2025-06-06 11:59 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 06/22] sfc: make regs setup with checking and set media ready alejandro.lucero-palau
2025-05-21 18:34 ` Dan Williams
2025-05-22 10:07 ` Alejandro Lucero Palau
2025-05-22 20:22 ` Dan Williams
2025-05-22 20:53 ` Dan Williams
2025-05-22 21:09 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 07/22] cxl: Support dpa initialization without a mailbox alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:47 ` Dan Williams
2025-05-22 10:24 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 08/22] sfc: initialize dpa alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 09/22] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-05-20 2:40 ` Alison Schofield
2025-05-21 18:49 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 10/22] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-05-14 13:27 ` [PATCH v16 11/22] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
2025-05-21 19:31 ` Dan Williams
2025-05-22 10:56 ` Alejandro Lucero Palau
2025-05-22 20:31 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 12/22] sfc: obtain root decoder with enough HPA free space alejandro.lucero-palau
2025-05-21 19:56 ` Dan Williams
2025-06-06 12:59 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 13/22] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-05-20 2:39 ` Alison Schofield
2025-05-21 20:23 ` Dan Williams
2025-06-06 13:09 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 14/22] sfc: get endpoint decoder alejandro.lucero-palau
2025-05-21 20:28 ` Dan Williams
2025-05-14 13:27 ` [PATCH v16 15/22] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-05-20 2:39 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 16/22] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-05-20 2:37 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 17/22] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-05-20 2:38 ` Alison Schofield
2025-05-14 13:27 ` [PATCH v16 18/22] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-05-20 2:37 ` Alison Schofield
2025-05-21 20:45 ` Dan Williams
2025-06-06 13:27 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 19/22] cxl: Add region flag for precluding a device memory to be used for dax alejandro.lucero-palau
2025-05-20 2:36 ` Alison Schofield
2025-05-21 20:49 ` Dan Williams
2025-06-06 13:39 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 20/22] sfc: create cxl region alejandro.lucero-palau
2025-05-21 21:01 ` Dan Williams
2025-06-06 13:44 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 21/22] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-05-20 2:35 ` Alison Schofield
2025-05-21 21:31 ` Dan Williams
2025-06-06 14:03 ` Alejandro Lucero Palau
2025-05-14 13:27 ` [PATCH v16 22/22] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-05-21 21:48 ` Dan Williams
2025-05-23 1:13 ` Edward Cree
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).