* [PATCH v21 00/23] Type2 device basic support
@ 2025-11-19 19:22 alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
` (24 more replies)
0 siblings, 25 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
The patchset should be applied on the described base commit then applying
Terry's v13 about CXL error handling. The first 4 patches come from Dan's
for-6.18/cxl-probe-order branch with minor modifications.
v21 changes;
patch1-2: v20 patch1 splitted up doing the code move in the second
patch in v21. (Jonathan)
patch1-4: adding my Signed-off tag along with Dan's
patch5: fix duplication of CXL_NR_PARTITION definition
patch7: dropped the cxl test fixes removing unused function. It was
sent independently ahead of this version.
patch12: optimization for max free space calculation (Jonathan)
patch19: optimization for returning on error (Jonathan)
v20 changes:
patch 1: using release helps (Jonathan).
patch 6: minor fix in comments (Jonathan).
patch 7 & 8: change commit mentioning sfc changes
patch 11: Fix interleave_ways setting (Jonathan)
Change assignament location (Dave)
patch 13: changing error return order (Jonathan)
removing blank line (Dave)
patch 18: Add check for only supporting uncommitted decoders
(Ben, Dave)
Add check for returned value (Dave)
v19 changes:
Removal of cxl_acquire_endpoint and driver callback for unexpected cxl
module removal. Dan's patches made them unnecessary.
patch 4: remove code already moved by Terry's patches (Ben Cheatham)
patch 6: removed unrelated change (Ben Cheatham)
patch 7: fix error report inconsistencies (Jonathan, Dave)
patch 9: remove unnecessary comment (Ben Cheatham)
patch 11: fix __free usage (Jonathan Cameron, Ben Cheatham)
patch 13: style fixes (Jonathan Cameron, Dave Jiag)
patch 14: move code to previous patch (Jonathan Cameron)
patch 18: group code in one locking (Dave Jian)
use __free helper (Ben Cheatham)
v18 changes:
patch 1: minor changes and fixing docs generation (Jonathan, Dan)
patch4: merged with v17 patch5
patch 5: merging v17 patches 6 and 7
patch 6: adding helpers for clarity
patch 9:
- minor changes (Dave)
- simplifying flags check (Dan)
patch 10: minor changes (Jonathan)
patch 11:
- minor changes (Dave)
- fix mess (Jonathan, Dave)
patch 18: minor changes (Jonathan, Dan)
v17 changes: (Dan Williams review)
- use devm for cxl_dev_state allocation
- using current cxl struct for checking capability registers found by
the driver.
- simplify dpa initialization without a mailbox not supporting pmem
- add cxl_acquire_endpoint for protection during initialization
- add callback/action to cxl_create_region for a driver notified about cxl
core kernel modules removal.
- add sfc function to disable CXL-based PIO buffers if such a callback
is invoked.
- Always manage a Type2 created region as private not allowing DAX.
v16 changes:
- rebase against rc4 (Dave Jiang)
- remove duplicate line (Ben Cheatham)
v15 changes:
- remove reference to unused header file (Jonathan Cameron)
- add proper kernel docs to exported functions (Alison Schofield)
- using an array to map the enums to strings (Alison Schofield)
- clarify comment when using bitmap_subset (Jonathan Cameron)
- specify link to type2 support in all patches (Alison Schofield)
Patches changed (minor): 4, 11
v14 changes:
- static null initialization of bitmaps (Jonathan Cameron)
- Fixing cxl tests (Alison Schofield)
- Fixing robot compilation problems
Patches changed (minor): 1, 4, 6, 13
v13 changes:
- using names for headers checking more consistent (Jonathan Cameron)
- using helper for caps bit setting (Jonathan Cameron)
- provide generic function for reporting missing capabilities (Jonathan Cameron)
- rename cxl_pci_setup_memdev_regs to cxl_pci_accel_setup_memdev_regs (Jonathan Cameron)
- cxl_dpa_info size to be set by the Type2 driver (Jonathan Cameron)
- avoiding rc variable when possible (Jonathan Cameron)
- fix spelling (Simon Horman)
- use scoped_guard (Dave Jiang)
- use enum instead of bool (Dave Jiang)
- dropping patch with hardware symbols
v12 changes:
- use new macro cxl_dev_state_create in pci driver (Ben Cheatham)
- add public/private sections in now exported cxl_dev_state struct (Ben
Cheatham)
- fix cxl/pci.h regarding file name for checking if defined
- Clarify capabilities found vs expected in error message. (Ben
Cheatham)
- Clarify new CXL_DECODER_F flag (Ben Cheatham)
- Fix changes about cxl memdev creation support moving code to the
proper patch. (Ben Cheatham)
- Avoid debug and function duplications (Ben Cheatham)
v11 changes:
- Dropping the use of cxl_memdev_state and going back to using
cxl_dev_state.
- Using a helper for an accel driver to allocate its own cxl-related
struct embedding cxl_dev_state.
- Exporting the required structs in include/cxl/cxl.h for an accel
driver being able to know the cxl_dev_state size required in the
previously mentioned helper for allocation.
- Avoid using any struct for dpa initialization by the accel driver
adding a specific function for creating dpa partitions by accel
drivers without a mailbox.
v10 changes:
- Using cxl_memdev_state instead of cxl_dev_state for type2 which has a
memory after all and facilitates the setup.
- Adapt core for using cxl_memdev_state allowing accel drivers to work
with them without further awareness of internal cxl structs.
- Using last DPA changes for creating DPA partitions with accel driver
hardcoding mds values when no mailbox.
- capabilities not a new field but built up when current register maps
is performed and returned to the caller for checking.
- HPA free space supporting interleaving.
- DPA free space droping max-min for a simple alloc size.
v9 changes:
- adding forward definitions (Jonathan Cameron)
- using set_bit instead of bitmap_set (Jonathan Cameron)
- fix rebase problem (Jonathan Cameron)
- Improve error path (Jonathan Cameron)
- fix build problems with cxl region dependency (robot)
- fix error path (Simon Horman)
v8 changes:
- Change error path labeling inside sfc cxl code (Edward Cree)
- Properly handling checks and error in sfc cxl code (Simon Horman)
- Fix bug when checking resource_size (Simon Horman)
- Avoid bisect problems reordering patches (Edward Cree)
- Fix buffer allocation size in sfc (Simon Horman)
v7 changes:
- fixing kernel test robot complains
- fix type with Type3 mandatory capabilities (Zhi Wang)
- optimize code in cxl_request_resource (Kalesh Anakkur Purayil)
- add sanity check when dealing with resources arithmetics (Fan Ni)
- fix typos and blank lines (Fan Ni)
- keep previous log errors/warnings in sfc driver (Martin Habets)
- add WARN_ON_ONCE if region given is NULL
v6 changes:
- update sfc mcdi_pcol.h with full hardware changes most not related to
this patchset. This is an automatic file created from hardware design
changes and not touched by software. It is updated from time to time
and it required update for the sfc driver CXL support.
- remove CXL capabilities definitions not used by the patchset or
previous kernel code. (Dave Jiang, Jonathan Cameron)
- Use bitmap_subset instead of reinventing the wheel ... (Ben Cheatham)
- Use cxl_accel_memdev for new device_type created (Ben Cheatham)
- Fix construct_region use of rwsem (Zhi Wang)
- Obtain region range instead of region params (Allison Schofield, Dave
Jiang)
v5 changes:
- Fix SFC configuration based on kernel CXL configuration
- Add subset check for capabilities.
- fix region creation when HDM decoders programmed by firmware/BIOS (Ben
Cheatham)
- Add option for creating dax region based on driver decission (Ben
Cheatham)
- Using sfc probe_data struct for keeping sfc cxl data
v4 changes:
- Use bitmap for capabilities new field (Jonathan Cameron)
- Use cxl_mem attributes for sysfs based on device type (Dave Jian)
- Add conditional cxl sfc compilation relying on kernel CXL config (kernel test robot)
- Add sfc changes in different patches for facilitating backport (Jonathan Cameron)
- Remove patch for dealing with cxl modules dependencies and using sfc kconfig plus
MODULE_SOFTDEP instead.
v3 changes:
- cxl_dev_state not defined as opaque but only manipulated by accel drivers
through accessors.
- accessors names not identified as only for accel drivers.
- move pci code from pci driver (drivers/cxl/pci.c) to generic pci code
(drivers/cxl/core/pci.c).
- capabilities field from u8 to u32 and initialised by CXL regs discovering
code.
- add capabilities check and removing current check by CXL regs discovering
code.
- Not fail if CXL Device Registers not found. Not mandatory for Type2.
- add timeout in acquire_endpoint for solving a race with the endpoint port
creation.
- handle EPROBE_DEFER by sfc driver.
- Limiting interleave ways to 1 for accel driver HPA/DPA requests.
- factoring out interleave ways and granularity helpers from type2 region
creation patch.
- restricting region_creation for type2 to one endpoint decoder.
v2 changes:
I have removed the introduction about the concerns with BIOS/UEFI after the
discussion leading to confirm the need of the functionality implemented, at
least is some scenarios.
There are two main changes from the RFC:
1) Following concerns about drivers using CXL core without restrictions, the CXL
struct to work with is opaque to those drivers, therefore functions are
implemented for modifying or reading those structs indirectly.
2) The driver for using the added functionality is not a test driver but a real
one: the SFC ethernet network driver. It uses the CXL region mapped for PIO
buffers instead of regions inside PCIe BARs.
RFC:
Current CXL kernel code is focused on supporting Type3 CXL devices, aka memory
expanders. Type2 CXL devices, aka device accelerators, share some functionalities
but require some special handling.
First of all, Type2 are by definition specific to drivers doing something and not just
a memory expander, so it is expected to work with the CXL specifics. This implies the CXL
setup needs to be done by such a driver instead of by a generic CXL PCI driver
as for memory expanders. Most of such setup needs to use current CXL core code
and therefore needs to be accessible to those vendor drivers. This is accomplished
exporting opaque CXL structs and adding and exporting functions for working with
those structs indirectly.
Some of the patches are based on a patchset sent by Dan Williams [1] which was just
partially integrated, most related to making things ready for Type2 but none
related to specific Type2 support. Those patches based on Dan´s work have Dan´s
signing as co-developer, and a link to the original patch.
A final note about CXL.cache is needed. This patchset does not cover it at all,
although the emulated Type2 device advertises it. From the kernel point of view
supporting CXL.cache will imply to be sure the CXL path supports what the Type2
device needs. A device accelerator will likely be connected to a Root Switch,
but other configurations can not be discarded. Therefore the kernel will need to
check not just HPA, DPA, interleave and granularity, but also the available
CXL.cache support and resources in each switch in the CXL path to the Type2
device. I expect to contribute to this support in the following months, and
it would be good to discuss about it when possible.
[1] https://lore.kernel.org/linux-cxl/98b1f61a-e6c2-71d4-c368-50d958501b0c@intel.com/T/
Alejandro Lucero (21):
cxl/mem: refactor memdev allocation
cxl/mem: Arrange for always-synchronous memdev attach
cxl: Add type2 device basic support
sfc: add cxl support
cxl: Move pci generic code
cxl/sfc: Map cxl component regs
cxl/sfc: Initialize dpa without a mailbox
cxl: Prepare memdev creation for type2
sfc: create type2 cxl memdev
cxl: Define a driver interface for HPA free space enumeration
sfc: get root decoder
cxl: Define a driver interface for DPA allocation
sfc: get endpoint decoder
cxl: Make region type based on endpoint type
cxl/region: Factor out interleave ways setup
cxl/region: Factor out interleave granularity setup
cxl: Allow region creation by type2 drivers
cxl: Avoid dax creation for accelerators
sfc: create cxl region
cxl: Add function for obtaining region range
sfc: support pio mapping based on cxl
Dan Williams (2):
cxl/port: Arrange for always synchronous endpoint attach
cxl/mem: Introduce a memdev creation ->probe() operation
drivers/cxl/Kconfig | 2 +-
drivers/cxl/core/core.h | 10 +-
drivers/cxl/core/hdm.c | 91 ++++++
drivers/cxl/core/mbox.c | 63 +---
drivers/cxl/core/memdev.c | 212 +++++++++----
drivers/cxl/core/pci.c | 63 ++++
drivers/cxl/core/pci_drv.c | 88 +-----
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/region.c | 413 +++++++++++++++++++++++---
drivers/cxl/core/regs.c | 2 +-
drivers/cxl/cxl.h | 125 +-------
drivers/cxl/cxlmem.h | 91 +-----
drivers/cxl/cxlpci.h | 21 +-
drivers/cxl/mem.c | 152 +++++++---
drivers/cxl/port.c | 46 ++-
drivers/cxl/private.h | 16 +
drivers/net/ethernet/sfc/Kconfig | 10 +
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/ef10.c | 50 +++-
drivers/net/ethernet/sfc/efx.c | 15 +-
drivers/net/ethernet/sfc/efx_cxl.c | 165 ++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++
drivers/net/ethernet/sfc/net_driver.h | 12 +
drivers/net/ethernet/sfc/nic.h | 3 +
include/cxl/cxl.h | 291 ++++++++++++++++++
include/cxl/pci.h | 21 ++
tools/testing/cxl/test/mem.c | 5 +-
27 files changed, 1486 insertions(+), 523 deletions(-)
create mode 100644 drivers/cxl/private.h
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
create mode 100644 include/cxl/cxl.h
create mode 100644 include/cxl/pci.h
base-commit: 211ddde0823f1442e4ad052a2f30f050145ccada
prerequisite-patch-id: f8f1003c82226bdbd967c0755c41d6602f35884f
prerequisite-patch-id: 8bccb1a750b00b11bfc347f3f2e1a162990f6275
prerequisite-patch-id: d9142fe7f0c216b3ea219847b9514b5997df63be
prerequisite-patch-id: bbba5b3224f0c6a0a331769652e5d6a0a3c28934
prerequisite-patch-id: 7c9fa56417d63fdb17a09abf932de8048c5b334b
prerequisite-patch-id: f418c5b2aea8b65520742f750f4b79f8cf4f0c90
prerequisite-patch-id: 9205c9a8b15f9571c6ecf9ef46b526ac8c9d9b33
prerequisite-patch-id: 7390649b7e6b0c0628de8403d46a5047e1e12417
prerequisite-patch-id: 70e95c74c1777b9e281ba54add0024746f5ff5e1
prerequisite-patch-id: 5a2273b31ad4755e14fc8bca28362f2bff54a909
prerequisite-patch-id: e9dc88f1b91dce5dc3d46ff2b5bf184aba06439d
prerequisite-patch-id: 0c5c038156ff28f810a63cd08ddab7867619af23
prerequisite-patch-id: 7e719ed404f664ee8d9b98d56f58326f55ea2175
prerequisite-patch-id: ad0c7b6122a0398a2654c92ab0c0527cb8a968c6
prerequisite-patch-id: c2829969f73d41d63b50983b92fef4cf72f87d03
prerequisite-patch-id: e1d0d259bd20b59cd9dff76880f6214e88c1fe32
prerequisite-patch-id: db84a3b9aefceef39764452998967f7aef0a3796
prerequisite-patch-id: cfb91a38e8c55201344eda86b730c0991ab8d79e
prerequisite-patch-id: 9889b65c6eff79af627158dac6cfe67f2b10fc21
prerequisite-patch-id: a4e751c90817a7d5016f7840f64185108fe4393b
prerequisite-patch-id: e90c5457d242847534b1c7f657541ecc7c72f23a
prerequisite-patch-id: 16f41d388ef33e355d90b9a38d1bacfa9f5740d4
prerequisite-patch-id: 8654e54082d6dba5d83dfdfb2bc2fd85b12d4a12
prerequisite-patch-id: 1afa817cac87367bea6af9d6eed8582b070d8424
prerequisite-patch-id: f5c386200140e5b90cbe5914dba04076cbb79d2f
--
2.34.1
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-20 18:08 ` Jonathan Cameron
` (2 more replies)
2025-11-19 19:22 ` [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach alejandro.lucero-palau
` (23 subsequent siblings)
24 siblings, 3 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
In preparation for always-synchronous memdev attach, refactor memdev
allocation and fix release bug in devm_cxl_add_memdev() when error after
a successful allocation.
The diff is busy as this moves cxl_memdev_alloc() down below the definition
of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
preclude needing to export more symbols from the cxl_core.
Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/memdev.c | 134 +++++++++++++++++++++-----------------
drivers/cxl/private.h | 10 +++
2 files changed, 86 insertions(+), 58 deletions(-)
create mode 100644 drivers/cxl/private.h
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index e370d733e440..8de19807ac7b 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -8,6 +8,7 @@
#include <linux/idr.h>
#include <linux/pci.h>
#include <cxlmem.h>
+#include "private.h"
#include "trace.h"
#include "core.h"
@@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
-static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
- const struct file_operations *fops)
+int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
{
- struct cxl_memdev *cxlmd;
- struct device *dev;
- struct cdev *cdev;
+ struct device *dev = &cxlmd->dev;
+ struct cdev *cdev = &cxlmd->cdev;
int rc;
- cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
- if (!cxlmd)
- return ERR_PTR(-ENOMEM);
-
- rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
- if (rc < 0)
- goto err;
- cxlmd->id = rc;
- cxlmd->depth = -1;
-
- dev = &cxlmd->dev;
- device_initialize(dev);
- lockdep_set_class(&dev->mutex, &cxl_memdev_key);
- dev->parent = cxlds->dev;
- dev->bus = &cxl_bus_type;
- dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
- device_set_pm_not_required(dev);
- INIT_WORK(&cxlmd->detach_work, detach_memdev);
-
- cdev = &cxlmd->cdev;
- cdev_init(cdev, fops);
- return cxlmd;
+ rc = cdev_device_add(cdev, dev);
+ if (rc) {
+ /*
+ * The cdev was briefly live, shutdown any ioctl operations that
+ * saw that state.
+ */
+ cxl_memdev_shutdown(dev);
+ return rc;
+ }
-err:
- kfree(cxlmd);
- return ERR_PTR(rc);
+ return devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_add_or_reset, "CXL");
static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
unsigned long arg)
@@ -1051,48 +1035,82 @@ static const struct file_operations cxl_memdev_fops = {
.llseek = noop_llseek,
};
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds)
+struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
{
- struct cxl_memdev *cxlmd;
+ struct cxl_memdev *cxlmd __free(kfree) =
+ kzalloc(sizeof(*cxlmd), GFP_KERNEL);
struct device *dev;
struct cdev *cdev;
int rc;
- cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
- if (IS_ERR(cxlmd))
- return cxlmd;
-
- dev = &cxlmd->dev;
- rc = dev_set_name(dev, "mem%d", cxlmd->id);
- if (rc)
- goto err;
+ if (!cxlmd)
+ return ERR_PTR(-ENOMEM);
- /*
- * Activate ioctl operations, no cxl_memdev_rwsem manipulation
- * needed as this is ordered with cdev_add() publishing the device.
- */
+ rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
+ if (rc < 0)
+ return ERR_PTR(rc);
+ cxlmd->id = rc;
+ cxlmd->depth = -1;
cxlmd->cxlds = cxlds;
cxlds->cxlmd = cxlmd;
+ dev = &cxlmd->dev;
+ device_initialize(dev);
+ lockdep_set_class(&dev->mutex, &cxl_memdev_key);
+ dev->parent = cxlds->dev;
+ dev->bus = &cxl_bus_type;
+ dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
+ dev->type = &cxl_memdev_type;
+ device_set_pm_not_required(dev);
+ INIT_WORK(&cxlmd->detach_work, detach_memdev);
+
cdev = &cxlmd->cdev;
- rc = cdev_device_add(cdev, dev);
+ cdev_init(cdev, &cxl_memdev_fops);
+ return_ptr(cxlmd);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_alloc, "CXL");
+
+static void __cxlmd_free(struct cxl_memdev *cxlmd)
+{
+ if (IS_ERR(cxlmd))
+ return;
+
+ if (cxlmd->cxlds)
+ cxlmd->cxlds->cxlmd = NULL;
+
+ put_device(&cxlmd->dev);
+ kfree(cxlmd);
+}
+
+DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
+
+/**
+ * devm_cxl_add_memdev - Add a CXL memory device
+ * @host: devres alloc/release context and parent for the memdev
+ * @cxlds: CXL device state to associate with the memdev
+ *
+ * Upon return the device will have had a chance to attach to the
+ * cxl_mem driver, but may fail if the CXL topology is not ready
+ * (hardware CXL link down, or software platform CXL root not attached)
+ */
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
+ int rc;
+
+ if (IS_ERR(cxlmd))
+ return cxlmd;
+
+ rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
if (rc)
- goto err;
+ return ERR_PTR(rc);
- rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
+ rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
if (rc)
return ERR_PTR(rc);
- return cxlmd;
-err:
- /*
- * The cdev was briefly live, shutdown any ioctl operations that
- * saw that state.
- */
- cxl_memdev_shutdown(dev);
- put_device(dev);
- return ERR_PTR(rc);
+ return no_free_ptr(cxlmd);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
diff --git a/drivers/cxl/private.h b/drivers/cxl/private.h
new file mode 100644
index 000000000000..50c2ac57afb5
--- /dev/null
+++ b/drivers/cxl/private.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2025 Intel Corporation. */
+
+/* Private interfaces betwen common drivers ("cxl_mem") and the cxl_core */
+
+#ifndef __CXL_PRIVATE_H__
+#define __CXL_PRIVATE_H__
+struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds);
+int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd);
+#endif /* __CXL_PRIVATE_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-12-02 5:03 ` dan.j.williams
2025-11-19 19:22 ` [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach alejandro.lucero-palau
` (22 subsequent siblings)
24 siblings, 1 reply; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
In preparation for CXL accelerator drivers that have a hard dependency on
CXL capability initialization, arrange for the endpoint probe result to be
conveyed to the caller of devm_cxl_add_memdev().
As it stands cxl_pci does not care about the attach state of the cxl_memdev
because all generic memory expansion functionality can be handled by the
cxl_core. For accelerators, that driver needs to know perform driver
specific initialization if CXL is available, or exectute a fallback to PCIe
only operation.
By moving devm_cxl_add_memdev() to cxl_mem.ko it removes async module
loading as one reason that a memdev may not be attached upon return from
devm_cxl_add_memdev().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/Kconfig | 2 +-
drivers/cxl/core/memdev.c | 44 -------------------------------------
drivers/cxl/mem.c | 46 +++++++++++++++++++++++++++++++++++++++
3 files changed, 47 insertions(+), 45 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 360c78fa7e97..6b871cbbce13 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -22,6 +22,7 @@ if CXL_BUS
config CXL_PCI
bool "PCI manageability"
default CXL_BUS
+ select CXL_MEM
help
The CXL specification defines a "CXL memory device" sub-class in the
PCI "memory controller" base class of devices. Device's identified by
@@ -89,7 +90,6 @@ config CXL_PMEM
config CXL_MEM
tristate "CXL: Memory Expansion"
- depends on CXL_PCI
default CXL_BUS
help
The CXL.mem protocol allows a device to act as a provider of "System
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 8de19807ac7b..639bd0376d32 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1070,50 +1070,6 @@ struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
}
EXPORT_SYMBOL_NS_GPL(cxl_memdev_alloc, "CXL");
-static void __cxlmd_free(struct cxl_memdev *cxlmd)
-{
- if (IS_ERR(cxlmd))
- return;
-
- if (cxlmd->cxlds)
- cxlmd->cxlds->cxlmd = NULL;
-
- put_device(&cxlmd->dev);
- kfree(cxlmd);
-}
-
-DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
-
-/**
- * devm_cxl_add_memdev - Add a CXL memory device
- * @host: devres alloc/release context and parent for the memdev
- * @cxlds: CXL device state to associate with the memdev
- *
- * Upon return the device will have had a chance to attach to the
- * cxl_mem driver, but may fail if the CXL topology is not ready
- * (hardware CXL link down, or software platform CXL root not attached)
- */
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds)
-{
- struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
- int rc;
-
- if (IS_ERR(cxlmd))
- return cxlmd;
-
- rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
- if (rc)
- return ERR_PTR(rc);
-
- rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
- if (rc)
- return ERR_PTR(rc);
-
- return no_free_ptr(cxlmd);
-}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
-
static void sanitize_teardown_notifier(void *data)
{
struct cxl_memdev_state *mds = data;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index d2155f45240d..3f581c37f3ba 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -7,6 +7,7 @@
#include "cxlmem.h"
#include "cxlpci.h"
+#include "private.h"
/**
* DOC: cxl mem
@@ -202,6 +203,50 @@ static int cxl_mem_probe(struct device *dev)
return devm_add_action_or_reset(dev, enable_suspend, NULL);
}
+static void __cxlmd_free(struct cxl_memdev *cxlmd)
+{
+ if (IS_ERR(cxlmd))
+ return;
+
+ if (cxlmd->cxlds)
+ cxlmd->cxlds->cxlmd = NULL;
+
+ put_device(&cxlmd->dev);
+ kfree(cxlmd);
+}
+
+DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
+
+/**
+ * devm_cxl_add_memdev - Add a CXL memory device
+ * @host: devres alloc/release context and parent for the memdev
+ * @cxlds: CXL device state to associate with the memdev
+ *
+ * Upon return the device will have had a chance to attach to the
+ * cxl_mem driver, but may fail if the CXL topology is not ready
+ * (hardware CXL link down, or software platform CXL root not attached)
+ */
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
+ int rc;
+
+ if (IS_ERR(cxlmd))
+ return cxlmd;
+
+ rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
+ if (rc)
+ return ERR_PTR(rc);
+
+ rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
+ if (rc)
+ return ERR_PTR(rc);
+
+ return no_free_ptr(cxlmd);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+
static ssize_t trigger_poison_list_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -249,6 +294,7 @@ static struct cxl_driver cxl_mem_driver = {
.probe = cxl_mem_probe,
.id = CXL_DEVICE_MEMORY_EXPANDER,
.drv = {
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
.dev_groups = cxl_mem_groups,
},
};
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-12-02 5:08 ` dan.j.williams
2025-11-19 19:22 ` [PATCH v21 04/23] cxl/mem: Introduce a memdev creation ->probe() operation alejandro.lucero-palau
` (21 subsequent siblings)
24 siblings, 1 reply; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Dan Williams <dan.j.williams@intel.com>
Make it so that upon return from devm_cxl_add_endpoint() that
cxl_mem_probe() can assume that the endpoint has had a chance to complete
cxl_port_probe().
I.e. cxl_port module loading has completed prior to device registration.
MODULE_SOFTDEP() is not sufficient for this purpose, but a hard link-time
dependency is reliable.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/mem.c | 38 --------------------------------------
drivers/cxl/port.c | 41 +++++++++++++++++++++++++++++++++++++++++
drivers/cxl/private.h | 7 ++++++-
3 files changed, 47 insertions(+), 39 deletions(-)
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 3f581c37f3ba..cb16adfa56c8 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -46,44 +46,6 @@ static int cxl_mem_dpa_show(struct seq_file *file, void *data)
return 0;
}
-static int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
- struct cxl_dport *parent_dport)
-{
- struct cxl_port *parent_port = parent_dport->port;
- struct cxl_port *endpoint, *iter, *down;
- int rc;
-
- /*
- * Now that the path to the root is established record all the
- * intervening ports in the chain.
- */
- for (iter = parent_port, down = NULL; !is_cxl_root(iter);
- down = iter, iter = to_cxl_port(iter->dev.parent)) {
- struct cxl_ep *ep;
-
- ep = cxl_ep_load(iter, cxlmd);
- ep->next = down;
- }
-
- /* Note: endpoint port component registers are derived from @cxlds */
- endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE,
- parent_dport);
- if (IS_ERR(endpoint))
- return PTR_ERR(endpoint);
-
- rc = cxl_endpoint_autoremove(cxlmd, endpoint);
- if (rc)
- return rc;
-
- if (!endpoint->dev.driver) {
- dev_err(&cxlmd->dev, "%s failed probe\n",
- dev_name(&endpoint->dev));
- return -ENXIO;
- }
-
- return 0;
-}
-
static int cxl_debugfs_poison_inject(void *data, u64 dpa)
{
struct cxl_memdev *cxlmd = data;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index 51c8f2f84717..ef65d983e1c8 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -6,6 +6,7 @@
#include "cxlmem.h"
#include "cxlpci.h"
+#include "private.h"
/**
* DOC: cxl port
@@ -156,10 +157,50 @@ static struct cxl_driver cxl_port_driver = {
.probe = cxl_port_probe,
.id = CXL_DEVICE_PORT,
.drv = {
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
.dev_groups = cxl_port_attribute_groups,
},
};
+int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
+ struct cxl_dport *parent_dport)
+{
+ struct cxl_port *parent_port = parent_dport->port;
+ struct cxl_port *endpoint, *iter, *down;
+ int rc;
+
+ /*
+ * Now that the path to the root is established record all the
+ * intervening ports in the chain.
+ */
+ for (iter = parent_port, down = NULL; !is_cxl_root(iter);
+ down = iter, iter = to_cxl_port(iter->dev.parent)) {
+ struct cxl_ep *ep;
+
+ ep = cxl_ep_load(iter, cxlmd);
+ ep->next = down;
+ }
+
+ /* Note: endpoint port component registers are derived from @cxlds */
+ endpoint = devm_cxl_add_port(host, &cxlmd->dev, CXL_RESOURCE_NONE,
+ parent_dport);
+ if (IS_ERR(endpoint))
+ return PTR_ERR(endpoint);
+
+ rc = cxl_endpoint_autoremove(cxlmd, endpoint);
+ if (rc)
+ return rc;
+
+ if (!endpoint->dev.driver) {
+ dev_err(&cxlmd->dev, "%s failed probe\n",
+ dev_name(&endpoint->dev));
+ return -ENXIO;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_add_endpoint, "CXL");
+
static int __init cxl_port_init(void)
{
return cxl_driver_register(&cxl_port_driver);
diff --git a/drivers/cxl/private.h b/drivers/cxl/private.h
index 50c2ac57afb5..f8d1ff64f534 100644
--- a/drivers/cxl/private.h
+++ b/drivers/cxl/private.h
@@ -1,10 +1,15 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Copyright(c) 2025 Intel Corporation. */
-/* Private interfaces betwen common drivers ("cxl_mem") and the cxl_core */
+/*
+ * Private interfaces betwen common drivers ("cxl_mem", "cxl_port") and
+ * the cxl_core.
+ */
#ifndef __CXL_PRIVATE_H__
#define __CXL_PRIVATE_H__
struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds);
int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd);
+int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
+ struct cxl_dport *parent_dport);
#endif /* __CXL_PRIVATE_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 04/23] cxl/mem: Introduce a memdev creation ->probe() operation
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (2 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 05/23] cxl: Add type2 device basic support alejandro.lucero-palau
` (20 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
From: Dan Williams <dan.j.williams@intel.com>
Allow for a driver to pass a routine to be called in cxl_mem_probe()
context. This ability mirrors the semantics of faux_device_create() and
allows for the caller to run CXL-topology-attach dependent logic on behalf
of the caller.
This capability is needed for CXL accelerator device drivers that need to
make decisions about enabling CXL dependent functionality in the device, or
falling back to PCIe-only operation.
The probe callback runs after the port topology is successfully attached
for the given memdev.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/memdev.c | 5 ++++-
drivers/cxl/core/pci_drv.c | 2 +-
drivers/cxl/cxlmem.h | 9 ++++++++-
drivers/cxl/mem.c | 27 +++++++++++++++++++++++++--
drivers/cxl/private.h | 3 ++-
tools/testing/cxl/test/mem.c | 2 +-
6 files changed, 41 insertions(+), 7 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 639bd0376d32..5e8af91c921e 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1035,7 +1035,8 @@ static const struct file_operations cxl_memdev_fops = {
.llseek = noop_llseek,
};
-struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
+struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_ops *ops)
{
struct cxl_memdev *cxlmd __free(kfree) =
kzalloc(sizeof(*cxlmd), GFP_KERNEL);
@@ -1051,6 +1052,8 @@ struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
return ERR_PTR(rc);
cxlmd->id = rc;
cxlmd->depth = -1;
+ cxlmd->ops = ops;
+ cxlmd->endpoint = ERR_PTR(-ENXIO);
cxlmd->cxlds = cxlds;
cxlds->cxlmd = cxlmd;
diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/core/pci_drv.c
index bc3c959f7eb6..f43590062efd 100644
--- a/drivers/cxl/core/pci_drv.c
+++ b/drivers/cxl/core/pci_drv.c
@@ -1007,7 +1007,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
dev_dbg(&pdev->dev, "No CXL Features discovered\n");
- cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds);
+ cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds, NULL);
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 434031a0c1f7..e55f52a5598d 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -34,6 +34,10 @@
(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
CXLMDEV_RESET_NEEDED_NOT)
+struct cxl_memdev_ops {
+ int (*probe)(struct cxl_memdev *cxlmd);
+};
+
/**
* struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
* @dev: driver core device object
@@ -43,6 +47,7 @@
* @cxl_nvb: coordinate removal of @cxl_nvd if present
* @cxl_nvd: optional bridge to an nvdimm if the device supports pmem
* @endpoint: connection to the CXL port topology for this memory device
+ * @ops: incremental caller specific probe routine
* @id: id number of this memdev instance.
* @depth: endpoint port depth
* @scrub_cycle: current scrub cycle set for this device
@@ -59,6 +64,7 @@ struct cxl_memdev {
struct cxl_nvdimm_bridge *cxl_nvb;
struct cxl_nvdimm *cxl_nvd;
struct cxl_port *endpoint;
+ const struct cxl_memdev_ops *ops;
int id;
int depth;
u8 scrub_cycle;
@@ -96,7 +102,8 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
}
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds);
+ struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_ops *ops);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index cb16adfa56c8..b57bc6f38e64 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -144,6 +144,12 @@ static int cxl_mem_probe(struct device *dev)
return rc;
}
+ if (cxlmd->ops) {
+ rc = cxlmd->ops->probe(cxlmd);
+ if (rc)
+ return rc;
+ }
+
rc = devm_cxl_memdev_edac_register(cxlmd);
if (rc)
dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
@@ -183,15 +189,17 @@ DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
* devm_cxl_add_memdev - Add a CXL memory device
* @host: devres alloc/release context and parent for the memdev
* @cxlds: CXL device state to associate with the memdev
+ * @ops: optional operations to run in cxl_mem::{probe,remove}() context
*
* Upon return the device will have had a chance to attach to the
* cxl_mem driver, but may fail if the CXL topology is not ready
* (hardware CXL link down, or software platform CXL root not attached)
*/
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds)
+ struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_ops *ops)
{
- struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
+ struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds, ops);
int rc;
if (IS_ERR(cxlmd))
@@ -205,6 +213,21 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
if (rc)
return ERR_PTR(rc);
+ /*
+ * If ops is provided fail if the driver is not attached upon
+ * return. The ->endpoint ERR_PTR may have a more precise error
+ * code to convey. Note that failure here could be the result of
+ * a race to teardown the CXL port topology. I.e.
+ * cxl_mem_probe() could have succeeded and then cxl_mem unbound
+ * before the lock is acquired.
+ */
+ guard(device)(&cxlmd->dev);
+ if (ops && !cxlmd->dev.driver) {
+ if (IS_ERR(cxlmd->endpoint))
+ return ERR_CAST(cxlmd->endpoint);
+ return ERR_PTR(-ENXIO);
+ }
+
return no_free_ptr(cxlmd);
}
EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
diff --git a/drivers/cxl/private.h b/drivers/cxl/private.h
index f8d1ff64f534..7c04797a3a28 100644
--- a/drivers/cxl/private.h
+++ b/drivers/cxl/private.h
@@ -8,7 +8,8 @@
#ifndef __CXL_PRIVATE_H__
#define __CXL_PRIVATE_H__
-struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds);
+struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_ops *ops);
int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd);
int devm_cxl_add_endpoint(struct device *host, struct cxl_memdev *cxlmd,
struct cxl_dport *parent_dport);
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index d533481672b7..33d06ec5a4b9 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1768,7 +1768,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
cxl_mock_add_event_logs(&mdata->mes);
- cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds);
+ cxlmd = devm_cxl_add_memdev(&pdev->dev, cxlds, NULL);
if (IS_ERR(cxlmd))
return PTR_ERR(cxlmd);
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 05/23] cxl: Add type2 device basic support
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (3 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 04/23] cxl/mem: Introduce a memdev creation ->probe() operation alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 06/23] sfc: add cxl support alejandro.lucero-palau
` (19 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Alison Schofield,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Differentiate CXL memory expanders (type 3) from CXL device accelerators
(type 2) with a new function for initializing cxl_dev_state and a macro
for helping accel drivers to embed cxl_dev_state inside a private
struct.
Move structs to include/cxl as the size of the accel driver private
struct embedding cxl_dev_state needs to know the size of this struct.
Use same new initialization with the type3 pci driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/mbox.c | 12 +-
drivers/cxl/core/memdev.c | 32 +++++
drivers/cxl/core/pci_drv.c | 15 +--
drivers/cxl/cxl.h | 97 +--------------
drivers/cxl/cxlmem.h | 87 +-------------
include/cxl/cxl.h | 226 +++++++++++++++++++++++++++++++++++
tools/testing/cxl/test/mem.c | 3 +-
7 files changed, 276 insertions(+), 196 deletions(-)
create mode 100644 include/cxl/cxl.h
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index fa6dd0c94656..bee84d0101d1 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1514,23 +1514,21 @@ int cxl_mailbox_init(struct cxl_mailbox *cxl_mbox, struct device *host)
}
EXPORT_SYMBOL_NS_GPL(cxl_mailbox_init, "CXL");
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec)
{
struct cxl_memdev_state *mds;
int rc;
- mds = devm_kzalloc(dev, sizeof(*mds), GFP_KERNEL);
+ mds = devm_cxl_dev_state_create(dev, CXL_DEVTYPE_CLASSMEM, serial,
+ dvsec, struct cxl_memdev_state, cxlds,
+ true);
if (!mds) {
dev_err(dev, "No memory available\n");
return ERR_PTR(-ENOMEM);
}
mutex_init(&mds->event.log_lock);
- mds->cxlds.dev = dev;
- mds->cxlds.reg_map.host = dev;
- mds->cxlds.cxl_mbox.host = dev;
- mds->cxlds.reg_map.resource = CXL_RESOURCE_NONE;
- mds->cxlds.type = CXL_DEVTYPE_CLASSMEM;
rc = devm_cxl_register_mce_notifier(dev, &mds->mce_notifier);
if (rc == -EOPNOTSUPP)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 5e8af91c921e..dd10f17eb6ad 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -649,6 +649,38 @@ static void detach_memdev(struct work_struct *work)
static struct lock_class_key cxl_memdev_key;
+static void cxl_dev_state_init(struct cxl_dev_state *cxlds, struct device *dev,
+ enum cxl_devtype type, u64 serial, u16 dvsec,
+ bool has_mbox)
+{
+ *cxlds = (struct cxl_dev_state) {
+ .dev = dev,
+ .type = type,
+ .serial = serial,
+ .cxl_dvsec = dvsec,
+ .reg_map.host = dev,
+ .reg_map.resource = CXL_RESOURCE_NONE,
+ };
+
+ if (has_mbox)
+ cxlds->cxl_mbox.host = dev;
+}
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox)
+{
+ struct cxl_dev_state *cxlds = devm_kzalloc(dev, size, GFP_KERNEL);
+
+ if (!cxlds)
+ return NULL;
+
+ cxl_dev_state_init(cxlds, dev, type, serial, dvsec, has_mbox);
+ return cxlds;
+}
+EXPORT_SYMBOL_NS_GPL(_devm_cxl_dev_state_create, "CXL");
+
int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
{
struct device *dev = &cxlmd->dev;
diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/core/pci_drv.c
index f43590062efd..18ed819d847d 100644
--- a/drivers/cxl/core/pci_drv.c
+++ b/drivers/cxl/core/pci_drv.c
@@ -12,6 +12,7 @@
#include <linux/aer.h>
#include <linux/io.h>
#include <cxl/mailbox.h>
+#include <cxl/cxl.h>
#include "cxlmem.h"
#include "cxlpci.h"
#include "cxl.h"
@@ -912,6 +913,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
int rc, pmu_count;
unsigned int i;
bool irq_avail;
+ u16 dvsec;
/*
* Double check the anonymous union trickery in struct cxl_regs
@@ -925,19 +927,18 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
return rc;
pci_set_master(pdev);
- mds = cxl_memdev_state_create(&pdev->dev);
+ dvsec = pci_find_dvsec_capability(pdev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec)
+ pci_warn(pdev, "Device DVSEC not present, skip CXL.mem init\n");
+
+ mds = cxl_memdev_state_create(&pdev->dev, pci_get_dsn(pdev), dvsec);
if (IS_ERR(mds))
return PTR_ERR(mds);
cxlds = &mds->cxlds;
pci_set_drvdata(pdev, cxlds);
cxlds->rcd = is_cxl_restricted(pdev);
- cxlds->serial = pci_get_dsn(pdev);
- cxlds->cxl_dvsec = pci_find_dvsec_capability(
- pdev, PCI_VENDOR_ID_CXL, PCI_DVSEC_CXL_DEVICE);
- if (!cxlds->cxl_dvsec)
- dev_warn(&pdev->dev,
- "Device DVSEC not present, skip CXL.mem init\n");
rc = cxl_pci_setup_regs(pdev, CXL_REGLOC_RBI_MEMDEV, &map);
if (rc)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b7654d40dc9e..1517250b0ec2 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -12,6 +12,7 @@
#include <linux/node.h>
#include <linux/io.h>
#include <linux/range.h>
+#include <cxl/cxl.h>
extern const struct nvdimm_security_ops *cxl_security_ops;
@@ -201,97 +202,6 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXLDEV_MBOX_BG_CMD_COMMAND_VENDOR_MASK GENMASK_ULL(63, 48)
#define CXLDEV_MBOX_PAYLOAD_OFFSET 0x20
-/*
- * Using struct_group() allows for per register-block-type helper routines,
- * without requiring block-type agnostic code to include the prefix.
- */
-struct cxl_regs {
- /*
- * Common set of CXL Component register block base pointers
- * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
- * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
- */
- struct_group_tagged(cxl_component_regs, component,
- void __iomem *hdm_decoder;
- void __iomem *ras;
- );
- /*
- * Common set of CXL Device register block base pointers
- * @status: CXL 2.0 8.2.8.3 Device Status Registers
- * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
- * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
- */
- struct_group_tagged(cxl_device_regs, device_regs,
- void __iomem *status, *mbox, *memdev;
- );
-
- struct_group_tagged(cxl_pmu_regs, pmu_regs,
- void __iomem *pmu;
- );
-
- /*
- * RCH downstream port specific RAS register
- * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
- */
- struct_group_tagged(cxl_rch_regs, rch_regs,
- void __iomem *dport_aer;
- );
-
- /*
- * RCD upstream port specific PCIe cap register
- * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
- */
- struct_group_tagged(cxl_rcd_regs, rcd_regs,
- void __iomem *rcd_pcie_cap;
- );
-};
-
-struct cxl_reg_map {
- bool valid;
- int id;
- unsigned long offset;
- unsigned long size;
-};
-
-struct cxl_component_reg_map {
- struct cxl_reg_map hdm_decoder;
- struct cxl_reg_map ras;
-};
-
-struct cxl_device_reg_map {
- struct cxl_reg_map status;
- struct cxl_reg_map mbox;
- struct cxl_reg_map memdev;
-};
-
-struct cxl_pmu_reg_map {
- struct cxl_reg_map pmu;
-};
-
-/**
- * struct cxl_register_map - DVSEC harvested register block mapping parameters
- * @host: device for devm operations and logging
- * @base: virtual base of the register-block-BAR + @block_offset
- * @resource: physical resource base of the register block
- * @max_size: maximum mapping size to perform register search
- * @reg_type: see enum cxl_regloc_type
- * @component_map: cxl_reg_map for component registers
- * @device_map: cxl_reg_maps for device registers
- * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
- */
-struct cxl_register_map {
- struct device *host;
- void __iomem *base;
- resource_size_t resource;
- resource_size_t max_size;
- u8 reg_type;
- union {
- struct cxl_component_reg_map component_map;
- struct cxl_device_reg_map device_map;
- struct cxl_pmu_reg_map pmu_map;
- };
-};
-
void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
@@ -497,11 +407,6 @@ struct cxl_region_params {
resource_size_t cache_size;
};
-enum cxl_partition_mode {
- CXL_PARTMODE_RAM,
- CXL_PARTMODE_PMEM,
-};
-
/*
* Indicate whether this region has been assembled by autodetection or
* userspace assembly. Prevent endpoint decoders outside of automatic
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index e55f52a5598d..ceeda8796cba 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -9,6 +9,7 @@
#include <linux/node.h>
#include <cxl/event.h>
#include <cxl/mailbox.h>
+#include <cxl/cxl.h>
#include "cxl.h"
/* CXL 2.0 8.2.8.5.1.1 Memory Device Status Register */
@@ -112,8 +113,6 @@ int devm_cxl_dpa_reserve(struct cxl_endpoint_decoder *cxled,
resource_size_t base, resource_size_t len,
resource_size_t skipped);
-#define CXL_NR_PARTITIONS_MAX 2
-
struct cxl_dpa_info {
u64 size;
struct cxl_dpa_part_info {
@@ -372,87 +371,6 @@ struct cxl_security_state {
struct kernfs_node *sanitize_node;
};
-/*
- * enum cxl_devtype - delineate type-2 from a generic type-3 device
- * @CXL_DEVTYPE_DEVMEM - Vendor specific CXL Type-2 device implementing HDM-D or
- * HDM-DB, no requirement that this device implements a
- * mailbox, or other memory-device-standard manageability
- * flows.
- * @CXL_DEVTYPE_CLASSMEM - Common class definition of a CXL Type-3 device with
- * HDM-H and class-mandatory memory device registers
- */
-enum cxl_devtype {
- CXL_DEVTYPE_DEVMEM,
- CXL_DEVTYPE_CLASSMEM,
-};
-
-/**
- * struct cxl_dpa_perf - DPA performance property entry
- * @dpa_range: range for DPA address
- * @coord: QoS performance data (i.e. latency, bandwidth)
- * @cdat_coord: raw QoS performance data from CDAT
- * @qos_class: QoS Class cookies
- */
-struct cxl_dpa_perf {
- struct range dpa_range;
- struct access_coordinate coord[ACCESS_COORDINATE_MAX];
- struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
- int qos_class;
-};
-
-/**
- * struct cxl_dpa_partition - DPA partition descriptor
- * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
- * @perf: performance attributes of the partition from CDAT
- * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
- */
-struct cxl_dpa_partition {
- struct resource res;
- struct cxl_dpa_perf perf;
- enum cxl_partition_mode mode;
-};
-
-/**
- * struct cxl_dev_state - The driver device state
- *
- * cxl_dev_state represents the CXL driver/device state. It provides an
- * interface to mailbox commands as well as some cached data about the device.
- * Currently only memory devices are represented.
- *
- * @dev: The device associated with this CXL state
- * @cxlmd: The device representing the CXL.mem capabilities of @dev
- * @reg_map: component and ras register mapping parameters
- * @regs: Parsed register blocks
- * @cxl_dvsec: Offset to the PCIe device DVSEC
- * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
- * @media_ready: Indicate whether the device media is usable
- * @dpa_res: Overall DPA resource tree for the device
- * @part: DPA partition array
- * @nr_partitions: Number of DPA partitions
- * @serial: PCIe Device Serial Number
- * @type: Generic Memory Class device or Vendor Specific Memory device
- * @cxl_mbox: CXL mailbox context
- * @cxlfs: CXL features context
- */
-struct cxl_dev_state {
- struct device *dev;
- struct cxl_memdev *cxlmd;
- struct cxl_register_map reg_map;
- struct cxl_regs regs;
- int cxl_dvsec;
- bool rcd;
- bool media_ready;
- struct resource dpa_res;
- struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
- unsigned int nr_partitions;
- u64 serial;
- enum cxl_devtype type;
- struct cxl_mailbox cxl_mbox;
-#ifdef CONFIG_CXL_FEATURES
- struct cxl_features_state *cxlfs;
-#endif
-};
-
static inline resource_size_t cxl_pmem_size(struct cxl_dev_state *cxlds)
{
/*
@@ -857,7 +775,8 @@ int cxl_dev_state_identify(struct cxl_memdev_state *mds);
int cxl_await_media_ready(struct cxl_dev_state *cxlds);
int cxl_enumerate_cmds(struct cxl_memdev_state *mds);
int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info);
-struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev);
+struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev, u64 serial,
+ u16 dvsec);
void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
new file mode 100644
index 000000000000..13d448686189
--- /dev/null
+++ b/include/cxl/cxl.h
@@ -0,0 +1,226 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2020 Intel Corporation. */
+/* Copyright(c) 2025 Advanced Micro Devices, Inc. */
+
+#ifndef __CXL_CXL_H__
+#define __CXL_CXL_H__
+
+#include <linux/node.h>
+#include <linux/ioport.h>
+#include <cxl/mailbox.h>
+
+/**
+ * enum cxl_devtype - delineate type-2 from a generic type-3 device
+ * @CXL_DEVTYPE_DEVMEM: Vendor specific CXL Type-2 device implementing HDM-D or
+ * HDM-DB, no requirement that this device implements a
+ * mailbox, or other memory-device-standard manageability
+ * flows.
+ * @CXL_DEVTYPE_CLASSMEM: Common class definition of a CXL Type-3 device with
+ * HDM-H and class-mandatory memory device registers
+ */
+enum cxl_devtype {
+ CXL_DEVTYPE_DEVMEM,
+ CXL_DEVTYPE_CLASSMEM,
+};
+
+struct device;
+
+/*
+ * Using struct_group() allows for per register-block-type helper routines,
+ * without requiring block-type agnostic code to include the prefix.
+ */
+struct cxl_regs {
+ /*
+ * Common set of CXL Component register block base pointers
+ * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
+ * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ */
+ struct_group_tagged(cxl_component_regs, component,
+ void __iomem *hdm_decoder;
+ void __iomem *ras;
+ );
+ /*
+ * Common set of CXL Device register block base pointers
+ * @status: CXL 2.0 8.2.8.3 Device Status Registers
+ * @mbox: CXL 2.0 8.2.8.4 Mailbox Registers
+ * @memdev: CXL 2.0 8.2.8.5 Memory Device Registers
+ */
+ struct_group_tagged(cxl_device_regs, device_regs,
+ void __iomem *status, *mbox, *memdev;
+ );
+
+ struct_group_tagged(cxl_pmu_regs, pmu_regs,
+ void __iomem *pmu;
+ );
+
+ /*
+ * RCH downstream port specific RAS register
+ * @aer: CXL 3.0 8.2.1.1 RCH Downstream Port RCRB
+ */
+ struct_group_tagged(cxl_rch_regs, rch_regs,
+ void __iomem *dport_aer;
+ );
+
+ /*
+ * RCD upstream port specific PCIe cap register
+ * @pcie_cap: CXL 3.0 8.2.1.2 RCD Upstream Port RCRB
+ */
+ struct_group_tagged(cxl_rcd_regs, rcd_regs,
+ void __iomem *rcd_pcie_cap;
+ );
+};
+
+struct cxl_reg_map {
+ bool valid;
+ int id;
+ unsigned long offset;
+ unsigned long size;
+};
+
+struct cxl_component_reg_map {
+ struct cxl_reg_map hdm_decoder;
+ struct cxl_reg_map ras;
+};
+
+struct cxl_device_reg_map {
+ struct cxl_reg_map status;
+ struct cxl_reg_map mbox;
+ struct cxl_reg_map memdev;
+};
+
+struct cxl_pmu_reg_map {
+ struct cxl_reg_map pmu;
+};
+
+/**
+ * struct cxl_register_map - DVSEC harvested register block mapping parameters
+ * @host: device for devm operations and logging
+ * @base: virtual base of the register-block-BAR + @block_offset
+ * @resource: physical resource base of the register block
+ * @max_size: maximum mapping size to perform register search
+ * @reg_type: see enum cxl_regloc_type
+ * @component_map: cxl_reg_map for component registers
+ * @device_map: cxl_reg_maps for device registers
+ * @pmu_map: cxl_reg_maps for CXL Performance Monitoring Units
+ */
+struct cxl_register_map {
+ struct device *host;
+ void __iomem *base;
+ resource_size_t resource;
+ resource_size_t max_size;
+ u8 reg_type;
+ union {
+ struct cxl_component_reg_map component_map;
+ struct cxl_device_reg_map device_map;
+ struct cxl_pmu_reg_map pmu_map;
+ };
+};
+
+/**
+ * struct cxl_dpa_perf - DPA performance property entry
+ * @dpa_range: range for DPA address
+ * @coord: QoS performance data (i.e. latency, bandwidth)
+ * @cdat_coord: raw QoS performance data from CDAT
+ * @qos_class: QoS Class cookies
+ */
+struct cxl_dpa_perf {
+ struct range dpa_range;
+ struct access_coordinate coord[ACCESS_COORDINATE_MAX];
+ struct access_coordinate cdat_coord[ACCESS_COORDINATE_MAX];
+ int qos_class;
+};
+
+enum cxl_partition_mode {
+ CXL_PARTMODE_RAM,
+ CXL_PARTMODE_PMEM,
+};
+
+/**
+ * struct cxl_dpa_partition - DPA partition descriptor
+ * @res: shortcut to the partition in the DPA resource tree (cxlds->dpa_res)
+ * @perf: performance attributes of the partition from CDAT
+ * @mode: operation mode for the DPA capacity, e.g. ram, pmem, dynamic...
+ */
+struct cxl_dpa_partition {
+ struct resource res;
+ struct cxl_dpa_perf perf;
+ enum cxl_partition_mode mode;
+};
+
+#define CXL_NR_PARTITIONS_MAX 2
+
+/**
+ * struct cxl_dev_state - The driver device state
+ *
+ * cxl_dev_state represents the CXL driver/device state. It provides an
+ * interface to mailbox commands as well as some cached data about the device.
+ * Currently only memory devices are represented.
+ *
+ * @dev: The device associated with this CXL state
+ * @cxlmd: The device representing the CXL.mem capabilities of @dev
+ * @reg_map: component and ras register mapping parameters
+ * @regs: Parsed register blocks
+ * @cxl_dvsec: Offset to the PCIe device DVSEC
+ * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @media_ready: Indicate whether the device media is usable
+ * @dpa_res: Overall DPA resource tree for the device
+ * @part: DPA partition array
+ * @nr_partitions: Number of DPA partitions
+ * @serial: PCIe Device Serial Number
+ * @type: Generic Memory Class device or Vendor Specific Memory device
+ * @cxl_mbox: CXL mailbox context
+ * @cxlfs: CXL features context
+ */
+struct cxl_dev_state {
+ /* public for Type2 drivers */
+ struct device *dev;
+ struct cxl_memdev *cxlmd;
+
+ /* private for Type2 drivers */
+ struct cxl_register_map reg_map;
+ struct cxl_regs regs;
+ int cxl_dvsec;
+ bool rcd;
+ bool media_ready;
+ struct resource dpa_res;
+ struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
+ unsigned int nr_partitions;
+ u64 serial;
+ enum cxl_devtype type;
+ struct cxl_mailbox cxl_mbox;
+#ifdef CONFIG_CXL_FEATURES
+ struct cxl_features_state *cxlfs;
+#endif
+};
+
+struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
+ enum cxl_devtype type,
+ u64 serial, u16 dvsec,
+ size_t size, bool has_mbox);
+
+/**
+ * cxl_dev_state_create - safely create and cast a cxl dev state embedded in a
+ * driver specific struct.
+ *
+ * @parent: device behind the request
+ * @type: CXL device type
+ * @serial: device identification
+ * @dvsec: dvsec capability offset
+ * @drv_struct: driver struct embedding a cxl_dev_state struct
+ * @member: drv_struct member as cxl_dev_state
+ * @mbox: true if mailbox supported
+ *
+ * Returns a pointer to the drv_struct allocated and embedding a cxl_dev_state
+ * struct initialized.
+ *
+ * Introduced for Type2 driver support.
+ */
+#define devm_cxl_dev_state_create(parent, type, serial, dvsec, drv_struct, member, mbox) \
+ ({ \
+ static_assert(__same_type(struct cxl_dev_state, \
+ ((drv_struct *)NULL)->member)); \
+ static_assert(offsetof(drv_struct, member) == 0); \
+ (drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
+ sizeof(drv_struct), mbox); \
+ })
+#endif /* __CXL_CXL_H__ */
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
index 33d06ec5a4b9..6fbe0af3e8f8 100644
--- a/tools/testing/cxl/test/mem.c
+++ b/tools/testing/cxl/test/mem.c
@@ -1717,7 +1717,7 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
if (rc)
return rc;
- mds = cxl_memdev_state_create(dev);
+ mds = cxl_memdev_state_create(dev, pdev->id + 1, 0);
if (IS_ERR(mds))
return PTR_ERR(mds);
@@ -1733,7 +1733,6 @@ static int cxl_mock_mem_probe(struct platform_device *pdev)
mds->event.buf = (struct cxl_get_event_payload *) mdata->event_buf;
INIT_DELAYED_WORK(&mds->security.poll_dwork, cxl_mockmem_sanitize_work);
- cxlds->serial = pdev->id + 1;
if (is_rcd(pdev))
cxlds->rcd = true;
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 06/23] sfc: add cxl support
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (4 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 05/23] cxl: Add type2 device basic support alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 07/23] cxl: Move pci generic code alejandro.lucero-palau
` (18 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/Kconfig | 9 +++++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/efx.c | 15 ++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 56 +++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 40 +++++++++++++++++++
drivers/net/ethernet/sfc/net_driver.h | 10 +++++
6 files changed, 130 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index c4c43434f314..979f2801e2a8 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
Driver-Interface) commands and responses, allowing debugging of
driver/firmware interaction. The tracing is actually enabled by
a sysfs file 'mcdi_logging' under the PCI device.
+config SFC_CXL
+ bool "Solarflare SFC9100-family CXL support"
+ depends on SFC && CXL_BUS >= SFC
+ default SFC
+ help
+ This enables SFC CXL support if the kernel is configuring CXL for
+ using CTPIO with CXL.mem. The SFC device with CXL support and
+ with a CXL-aware firmware can be used for minimizing latencies
+ when sending through CTPIO.
source "drivers/net/ethernet/sfc/falcon/Kconfig"
source "drivers/net/ethernet/sfc/siena/Kconfig"
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index d99039ec468d..bb0f1891cde6 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
mae.o tc.o tc_bindings.o tc_counters.o \
tc_encap_actions.o tc_conntrack.o
+sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
obj-$(CONFIG_SFC) += sfc.o
obj-$(CONFIG_SFC_FALCON) += falcon/
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 112e55b98ed3..537668278375 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -34,6 +34,7 @@
#include "selftest.h"
#include "sriov.h"
#include "efx_devlink.h"
+#include "efx_cxl.h"
#include "mcdi_port_common.h"
#include "mcdi_pcol.h"
@@ -981,12 +982,15 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx_pci_remove_main(efx);
efx_fini_io(efx);
+
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ efx_cxl_exit(probe_data);
+
pci_dbg(efx->pci_dev, "shutdown successful\n");
efx_fini_devlink_and_unlock(efx);
efx_fini_struct(efx);
free_netdev(efx->net_dev);
- probe_data = container_of(efx, struct efx_probe_data, efx);
kfree(probe_data);
};
@@ -1190,6 +1194,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
if (rc)
goto fail2;
+ /* A successful cxl initialization implies a CXL region created to be
+ * used for PIO buffers. If there is no CXL support, or initialization
+ * fails, efx_cxl_pio_initialised will be false and legacy PIO buffers
+ * defined at specific PCI BAR regions will be used.
+ */
+ rc = efx_cxl_init(probe_data);
+ if (rc)
+ pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
+
rc = efx_pci_probe_post_io(efx);
if (rc) {
/* On failure, retry once immediately.
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
new file mode 100644
index 000000000000..8e0481d8dced
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/****************************************************************************
+ *
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/pci.h>
+
+#include "net_driver.h"
+#include "efx_cxl.h"
+
+#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+
+int efx_cxl_init(struct efx_probe_data *probe_data)
+{
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl;
+ u16 dvsec;
+
+ probe_data->cxl_pio_initialised = false;
+
+ /* Is the device configured with and using CXL? */
+ if (!pcie_is_cxl(pci_dev))
+ return 0;
+
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec) {
+ pci_err(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability not found\n");
+ return 0;
+ }
+
+ pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
+
+ /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
+ * specifying no mbox available.
+ */
+ cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_dev->dev.id, dvsec, struct efx_cxl,
+ cxlds, false);
+
+ if (!cxl)
+ return -ENOMEM;
+
+ probe_data->cxl = cxl;
+
+ return 0;
+}
+
+void efx_cxl_exit(struct efx_probe_data *probe_data)
+{
+}
+
+MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
new file mode 100644
index 000000000000..961639cef692
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/****************************************************************************
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#ifndef EFX_CXL_H
+#define EFX_CXL_H
+
+#ifdef CONFIG_SFC_CXL
+
+#include <cxl/cxl.h>
+
+struct cxl_root_decoder;
+struct cxl_port;
+struct cxl_endpoint_decoder;
+struct cxl_region;
+struct efx_probe_data;
+
+struct efx_cxl {
+ struct cxl_dev_state cxlds;
+ struct cxl_memdev *cxlmd;
+ struct cxl_root_decoder *cxlrd;
+ struct cxl_port *endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *efx_region;
+ void __iomem *ctpio_cxl;
+};
+
+int efx_cxl_init(struct efx_probe_data *probe_data);
+void efx_cxl_exit(struct efx_probe_data *probe_data);
+#else
+static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
+static inline void efx_cxl_exit(struct efx_probe_data *probe_data) {}
+#endif
+#endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index b98c259f672d..3964b2c56609 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1197,14 +1197,24 @@ struct efx_nic {
atomic_t n_rx_noskb_drops;
};
+#ifdef CONFIG_SFC_CXL
+struct efx_cxl;
+#endif
+
/**
* struct efx_probe_data - State after hardware probe
* @pci_dev: The PCI device
* @efx: Efx NIC details
+ * @cxl: details of related cxl objects
+ * @cxl_pio_initialised: cxl initialization outcome.
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
struct efx_nic efx;
+#ifdef CONFIG_SFC_CXL
+ struct efx_cxl *cxl;
+ bool cxl_pio_initialised;
+#endif
};
static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 07/23] cxl: Move pci generic code
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (5 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 06/23] sfc: add cxl support alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 08/23] cxl/sfc: Map cxl component regs alejandro.lucero-palau
` (17 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Fan Ni, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci_drv.c implements the functionality for a Type3 device
initialization.
Move helper functions from cxl/core/pci_drv.c to cxl/core/pci.c in order
to be exported and shared with CXL Type2 device initialization.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/core.h | 3 ++
drivers/cxl/core/pci.c | 62 +++++++++++++++++++++++++++++++++
drivers/cxl/core/pci_drv.c | 70 --------------------------------------
drivers/cxl/core/regs.c | 1 -
drivers/cxl/cxl.h | 2 --
drivers/cxl/cxlpci.h | 13 +++++++
6 files changed, 78 insertions(+), 73 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index a7a0838c8f23..2b2d3af0b5ec 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -232,4 +232,7 @@ static inline bool cxl_pci_drv_bound(struct pci_dev *pdev) { return false; };
static inline int cxl_pci_driver_init(void) { return 0; }
static inline void cxl_pci_driver_exit(void) { }
#endif
+
+resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
+ struct cxl_dport *dport);
#endif /* __CXL_CORE_H__ */
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index a66f7a84b5c8..566d57ba0579 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -775,6 +775,68 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port)
}
EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, "CXL");
+static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
+ struct cxl_register_map *map,
+ struct cxl_dport *dport)
+{
+ resource_size_t component_reg_phys;
+
+ *map = (struct cxl_register_map) {
+ .host = &pdev->dev,
+ .resource = CXL_RESOURCE_NONE,
+ };
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
+ if (component_reg_phys == CXL_RESOURCE_NONE)
+ return -ENXIO;
+
+ map->resource = component_reg_phys;
+ map->reg_type = CXL_REGLOC_RBI_COMPONENT;
+ map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
+
+ return 0;
+}
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map)
+{
+ int rc;
+
+ rc = cxl_find_regblock(pdev, type, map);
+
+ /*
+ * If the Register Locator DVSEC does not exist, check if it
+ * is an RCH and try to extract the Component Registers from
+ * an RCRB.
+ */
+ if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+ if (!port)
+ return -EPROBE_DEFER;
+
+ rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
+ if (rc)
+ return rc;
+
+ rc = cxl_dport_map_rcd_linkcap(pdev, dport);
+ if (rc)
+ return rc;
+
+ } else if (rc) {
+ return rc;
+ }
+
+ return cxl_setup_regs(map);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_pci_setup_regs, "CXL");
+
int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
{
int speed, bw;
diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/core/pci_drv.c
index 18ed819d847d..a35e746e6303 100644
--- a/drivers/cxl/core/pci_drv.c
+++ b/drivers/cxl/core/pci_drv.c
@@ -467,76 +467,6 @@ static int cxl_pci_setup_mailbox(struct cxl_memdev_state *mds, bool irq_avail)
return 0;
}
-/*
- * Assume that any RCIEP that emits the CXL memory expander class code
- * is an RCD
- */
-static bool is_cxl_restricted(struct pci_dev *pdev)
-{
- return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
-}
-
-static int cxl_rcrb_get_comp_regs(struct pci_dev *pdev,
- struct cxl_register_map *map,
- struct cxl_dport *dport)
-{
- resource_size_t component_reg_phys;
-
- *map = (struct cxl_register_map) {
- .host = &pdev->dev,
- .resource = CXL_RESOURCE_NONE,
- };
-
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- component_reg_phys = cxl_rcd_component_reg_phys(&pdev->dev, dport);
- if (component_reg_phys == CXL_RESOURCE_NONE)
- return -ENXIO;
-
- map->resource = component_reg_phys;
- map->reg_type = CXL_REGLOC_RBI_COMPONENT;
- map->max_size = CXL_COMPONENT_REG_BLOCK_SIZE;
-
- return 0;
-}
-
-static int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map)
-{
- int rc;
-
- rc = cxl_find_regblock(pdev, type, map);
-
- /*
- * If the Register Locator DVSEC does not exist, check if it
- * is an RCH and try to extract the Component Registers from
- * an RCRB.
- */
- if (rc && type == CXL_REGLOC_RBI_COMPONENT && is_cxl_restricted(pdev)) {
- struct cxl_dport *dport;
- struct cxl_port *port __free(put_cxl_port) =
- cxl_pci_find_port(pdev, &dport);
- if (!port)
- return -EPROBE_DEFER;
-
- rc = cxl_rcrb_get_comp_regs(pdev, map, dport);
- if (rc)
- return rc;
-
- rc = cxl_dport_map_rcd_linkcap(pdev, dport);
- if (rc)
- return rc;
-
- } else if (rc) {
- return rc;
- }
-
- return cxl_setup_regs(map);
-}
-
static int cxl_pci_ras_unmask(struct pci_dev *pdev)
{
struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index fb70ffbba72d..fc7fbd4f39d2 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -641,4 +641,3 @@ resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
return CXL_RESOURCE_NONE;
return __rcrb_to_component(dev, &dport->rcrb, CXL_RCRB_UPSTREAM);
}
-EXPORT_SYMBOL_NS_GPL(cxl_rcd_component_reg_phys, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1517250b0ec2..536c9d99e0e6 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -222,8 +222,6 @@ int cxl_find_regblock(struct pci_dev *pdev, enum cxl_regloc_type type,
struct cxl_register_map *map);
int cxl_setup_regs(struct cxl_register_map *map);
struct cxl_dport;
-resource_size_t cxl_rcd_component_reg_phys(struct device *dev,
- struct cxl_dport *dport);
int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 3526e6d75f79..24aba9ff6d2e 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -74,6 +74,17 @@ static inline bool cxl_pci_flit_256(struct pci_dev *pdev)
return lnksta2 & PCI_EXP_LNKSTA2_FLIT;
}
+/*
+ * Assume that the caller has already validated that @pdev has CXL
+ * capabilities, any RCiEP with CXL capabilities is treated as a
+ * Restricted CXL Device (RCD) and finds upstream port and endpoint
+ * registers in a Root Complex Register Block (RCRB).
+ */
+static inline bool is_cxl_restricted(struct pci_dev *pdev)
+{
+ return pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END;
+}
+
int devm_cxl_port_enumerate_dports(struct cxl_port *port);
struct cxl_dev_state;
void read_cdat_data(struct cxl_port *port);
@@ -89,4 +100,6 @@ static inline void cxl_uport_init_ras_reporting(struct cxl_port *port,
struct device *host) { }
#endif
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 08/23] cxl/sfc: Map cxl component regs
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (6 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 07/23] cxl: Move pci generic code alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-21 6:54 ` PJ Waskiewicz
2025-11-19 19:22 ` [PATCH v21 09/23] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
` (16 subsequent siblings)
24 siblings, 1 reply; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device component registers.
Use it in sfc driver cxl initialization.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/pci_drv.c | 1 +
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxl.h | 7 ------
drivers/cxl/cxlpci.h | 12 ----------
drivers/net/ethernet/sfc/efx_cxl.c | 35 ++++++++++++++++++++++++++++++
include/cxl/cxl.h | 19 ++++++++++++++++
include/cxl/pci.h | 21 ++++++++++++++++++
9 files changed, 79 insertions(+), 19 deletions(-)
create mode 100644 include/cxl/pci.h
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index 566d57ba0579..90a0763e72c4 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -6,6 +6,7 @@
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
+#include <cxl/pci.h>
#include <linux/aer.h>
#include <cxlpci.h>
#include <cxlmem.h>
diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/core/pci_drv.c
index a35e746e6303..4c767e2471b8 100644
--- a/drivers/cxl/core/pci_drv.c
+++ b/drivers/cxl/core/pci_drv.c
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/pci.h>
#include <cxl/mailbox.h>
#include <cxl/cxl.h>
#include "cxlmem.h"
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index d19ebf052d76..7c828c75e7b8 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <linux/node.h>
#include <cxl/einj.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index fc7fbd4f39d2..dcf444f1fe48 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <pmu.h>
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 536c9d99e0e6..d7ddca6f7115 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -39,10 +39,6 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
#define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
-#define CXL_CM_CAP_CAP_ID_RAS 0x2
-#define CXL_CM_CAP_CAP_ID_HDM 0x5
-#define CXL_CM_CAP_CAP_HDM_VERSION 1
-
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
#define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
@@ -206,9 +202,6 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
struct cxl_component_reg_map *map);
void cxl_probe_device_regs(struct device *dev, void __iomem *base,
struct cxl_device_reg_map *map);
-int cxl_map_component_regs(const struct cxl_register_map *map,
- struct cxl_component_regs *regs,
- unsigned long map_mask);
int cxl_map_device_regs(const struct cxl_register_map *map,
struct cxl_device_regs *regs);
int cxl_map_pmu_regs(struct cxl_register_map *map, struct cxl_pmu_regs *regs);
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index 24aba9ff6d2e..53760ce31af8 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -13,16 +13,6 @@
*/
#define CXL_PCI_DEFAULT_MAX_VECTORS 16
-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
- CXL_REGLOC_RBI_EMPTY = 0,
- CXL_REGLOC_RBI_COMPONENT,
- CXL_REGLOC_RBI_VIRT,
- CXL_REGLOC_RBI_MEMDEV,
- CXL_REGLOC_RBI_PMU,
- CXL_REGLOC_RBI_TYPES
-};
-
/*
* Table Access DOE, CDAT Read Entry Response
*
@@ -100,6 +90,4 @@ static inline void cxl_uport_init_ras_reporting(struct cxl_port *port,
struct device *host) { }
#endif
-int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 8e0481d8dced..34126bc4826c 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -7,6 +7,8 @@
#include <linux/pci.h>
+#include <cxl/cxl.h>
+#include <cxl/pci.h>
#include "net_driver.h"
#include "efx_cxl.h"
@@ -18,6 +20,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
u16 dvsec;
+ int rc;
probe_data->cxl_pio_initialised = false;
@@ -44,6 +47,38 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (!cxl)
return -ENOMEM;
+ rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
+ &cxl->cxlds.reg_map);
+ if (rc) {
+ pci_err(pci_dev, "No component registers\n");
+ return rc;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
+ pci_err(pci_dev, "Expected HDM component register not found\n");
+ return -ENODEV;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.ras.valid) {
+ pci_err(pci_dev, "Expected RAS component register not found\n");
+ return -ENODEV;
+ }
+
+ rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
+ &cxl->cxlds.regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_RAS));
+ if (rc) {
+ pci_err(pci_dev, "Failed to map RAS capability.\n");
+ return rc;
+ }
+
+ /*
+ * Set media ready explicitly as there are neither mailbox for checking
+ * this state nor the CXL register involved, both not mandatory for
+ * type2.
+ */
+ cxl->cxlds.media_ready = true;
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 13d448686189..7f2e23bce1f7 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -70,6 +70,10 @@ struct cxl_regs {
);
};
+#define CXL_CM_CAP_CAP_ID_RAS 0x2
+#define CXL_CM_CAP_CAP_ID_HDM 0x5
+#define CXL_CM_CAP_CAP_HDM_VERSION 1
+
struct cxl_reg_map {
bool valid;
int id;
@@ -223,4 +227,19 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
(drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
sizeof(drv_struct), mbox); \
})
+
+/**
+ * cxl_map_component_regs - map cxl component registers
+ *
+ * @map: cxl register map to update with the mappings
+ * @regs: cxl component registers to work with
+ * @map_mask: cxl component regs to map
+ *
+ * Returns integer: success (0) or error (-ENOMEM)
+ *
+ * Made public for Type2 driver support.
+ */
+int cxl_map_component_regs(const struct cxl_register_map *map,
+ struct cxl_component_regs *regs,
+ unsigned long map_mask);
#endif /* __CXL_CXL_H__ */
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..a172439f08c6
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+ CXL_REGLOC_RBI_EMPTY = 0,
+ CXL_REGLOC_RBI_COMPONENT,
+ CXL_REGLOC_RBI_VIRT,
+ CXL_REGLOC_RBI_MEMDEV,
+ CXL_REGLOC_RBI_PMU,
+ CXL_REGLOC_RBI_TYPES
+};
+
+struct cxl_register_map;
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 09/23] cxl/sfc: Initialize dpa without a mailbox
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (7 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 08/23] cxl/sfc: Map cxl component regs alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 10/23] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
` (15 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.
Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.
Move related functions to memdev.
Add sfc driver as the client.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/mbox.c | 51 +----------------------
drivers/cxl/core/memdev.c | 66 ++++++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.c | 5 +++
include/cxl/cxl.h | 1 +
5 files changed, 75 insertions(+), 50 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2b2d3af0b5ec..1c1726856139 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -91,6 +91,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
struct dentry *cxl_debugfs_create_dir(const char *dir);
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode);
+struct cxl_memdev_state;
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index bee84d0101d1..d57a0c2d39fb 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1144,7 +1144,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, "CXL");
*
* See CXL @8.2.9.5.2.1 Get Partition Info
*/
-static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_partition_info pi;
@@ -1300,55 +1300,6 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
return -EBUSY;
}
-static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
-{
- int i = info->nr_partitions;
-
- if (size == 0)
- return;
-
- info->part[i].range = (struct range) {
- .start = start,
- .end = start + size - 1,
- };
- info->part[i].mode = mode;
- info->nr_partitions++;
-}
-
-int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
-{
- struct cxl_dev_state *cxlds = &mds->cxlds;
- struct device *dev = cxlds->dev;
- int rc;
-
- if (!cxlds->media_ready) {
- info->size = 0;
- return 0;
- }
-
- info->size = mds->total_bytes;
-
- if (mds->partition_align_bytes == 0) {
- add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->volatile_only_bytes,
- mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
- return 0;
- }
-
- rc = cxl_mem_get_partition_info(mds);
- if (rc) {
- dev_err(dev, "Failed to query partition information\n");
- return rc;
- }
-
- add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
- CXL_PARTMODE_PMEM);
-
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
-
int cxl_get_dirty_count(struct cxl_memdev_state *mds, u32 *count)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index dd10f17eb6ad..b995eb991cdd 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -584,6 +584,72 @@ bool is_cxl_memdev(const struct device *dev)
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
+static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
+{
+ int i = info->nr_partitions;
+
+ if (size == 0)
+ return;
+
+ info->part[i].range = (struct range) {
+ .start = start,
+ .end = start + size - 1,
+ };
+ info->part[i].mode = mode;
+ info->nr_partitions++;
+}
+
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
+{
+ struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct device *dev = cxlds->dev;
+ int rc;
+
+ if (!cxlds->media_ready) {
+ info->size = 0;
+ return 0;
+ }
+
+ info->size = mds->total_bytes;
+
+ if (mds->partition_align_bytes == 0) {
+ add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ return 0;
+ }
+
+ rc = cxl_mem_get_partition_info(mds);
+ if (rc) {
+ dev_err(dev, "Failed to query partition information\n");
+ return rc;
+ }
+
+ add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
+ CXL_PARTMODE_PMEM);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
+
+/**
+ * cxl_set_capacity: initialize dpa by a driver without a mailbox.
+ *
+ * @cxlds: pointer to cxl_dev_state
+ * @capacity: device volatile memory size
+ */
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
+{
+ struct cxl_dpa_info range_info = {
+ .size = capacity,
+ };
+
+ add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
+ return cxl_dpa_setup(cxlds, &range_info);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
+
/**
* set_exclusive_cxl_commands() - atomically disable user cxl commands
* @mds: The device state to operate on
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 34126bc4826c..0b10a2e6aceb 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -79,6 +79,11 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
*/
cxl->cxlds.media_ready = true;
+ if (cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE)) {
+ pci_err(pci_dev, "dpa capacity setup failed\n");
+ return -ENODEV;
+ }
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 7f2e23bce1f7..fb2f8f2395d5 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -242,4 +242,5 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 10/23] cxl: Prepare memdev creation for type2
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (8 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 09/23] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 11/23] sfc: create type2 cxl memdev alejandro.lucero-palau
` (14 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.
Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.
Make devm_cxl_add_memdev accessible from a accel driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/memdev.c | 15 +++++++++++--
drivers/cxl/cxlmem.h | 7 ------
drivers/cxl/mem.c | 45 +++++++++++++++++++++++++++++----------
include/cxl/cxl.h | 7 ++++++
4 files changed, 54 insertions(+), 20 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index b995eb991cdd..759f3a4fc2a9 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include "private.h"
#include "trace.h"
@@ -578,9 +579,16 @@ static const struct device_type cxl_memdev_type = {
.groups = cxl_memdev_attribute_groups,
};
+static const struct device_type cxl_accel_memdev_type = {
+ .name = "cxl_accel_memdev",
+ .release = cxl_memdev_release,
+ .devnode = cxl_memdev_devnode,
+};
+
bool is_cxl_memdev(const struct device *dev)
{
- return dev->type == &cxl_memdev_type;
+ return (dev->type == &cxl_memdev_type ||
+ dev->type == &cxl_accel_memdev_type);
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
@@ -1161,7 +1169,10 @@ struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
dev->parent = cxlds->dev;
dev->bus = &cxl_bus_type;
dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
+ if (cxlds->type == CXL_DEVTYPE_DEVMEM)
+ dev->type = &cxl_accel_memdev_type;
+ else
+ dev->type = &cxl_memdev_type;
device_set_pm_not_required(dev);
INIT_WORK(&cxlmd->detach_work, detach_memdev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index ceeda8796cba..9e5da2f20753 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -35,10 +35,6 @@
(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
CXLMDEV_RESET_NEEDED_NOT)
-struct cxl_memdev_ops {
- int (*probe)(struct cxl_memdev *cxlmd);
-};
-
/**
* struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
* @dev: driver core device object
@@ -102,9 +98,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
return is_cxl_memdev(port->uport_dev);
}
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds,
- const struct cxl_memdev_ops *ops);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index b57bc6f38e64..47dcab76801f 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -66,6 +66,26 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
cxl_debugfs_poison_clear, "%llx\n");
+static void cxl_memdev_poison_enable(struct cxl_memdev_state *mds,
+ struct cxl_memdev *cxlmd,
+ struct dentry *dentry)
+{
+ /*
+ * Avoid poison debugfs for DEVMEM aka accelerators as they rely on
+ * cxl_memdev_state.
+ */
+ if (!mds)
+ return;
+
+ if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
+ debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
+ &cxl_poison_inject_fops);
+
+ if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
+ debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
+ &cxl_poison_clear_fops);
+}
+
static int cxl_mem_probe(struct device *dev)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
@@ -93,12 +113,7 @@ static int cxl_mem_probe(struct device *dev)
dentry = cxl_debugfs_create_dir(dev_name(dev));
debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
- if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
- debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
- &cxl_poison_inject_fops);
- if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
- debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
- &cxl_poison_clear_fops);
+ cxl_memdev_poison_enable(mds, cxlmd, dentry);
rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
if (rc)
@@ -248,16 +263,24 @@ static ssize_t trigger_poison_list_store(struct device *dev,
}
static DEVICE_ATTR_WO(trigger_poison_list);
-static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+static bool cxl_poison_attr_visible(struct kobject *kobj, struct attribute *a)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
- if (a == &dev_attr_trigger_poison_list.attr)
- if (!test_bit(CXL_POISON_ENABLED_LIST,
- mds->poison.enabled_cmds))
- return 0;
+ if (!mds ||
+ !test_bit(CXL_POISON_ENABLED_LIST, mds->poison.enabled_cmds))
+ return false;
+
+ return true;
+}
+
+static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ if (a == &dev_attr_trigger_poison_list.attr &&
+ !cxl_poison_attr_visible(kobj, a))
+ return 0;
return a->mode;
}
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fb2f8f2395d5..043fc31c764e 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -153,6 +153,10 @@ struct cxl_dpa_partition {
#define CXL_NR_PARTITIONS_MAX 2
+struct cxl_memdev_ops {
+ int (*probe)(struct cxl_memdev *cxlmd);
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -243,4 +247,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
struct cxl_component_regs *regs,
unsigned long map_mask);
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_ops *ops);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 11/23] sfc: create type2 cxl memdev
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (9 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 10/23] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 12/23] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
` (13 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Fan Ni, Edward Cree,
Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 0b10a2e6aceb..f6eda93e67e2 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -84,6 +84,12 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return -ENODEV;
}
+ cxl->cxlmd = devm_cxl_add_memdev(&pci_dev->dev, &cxl->cxlds, NULL);
+ if (IS_ERR(cxl->cxlmd)) {
+ pci_err(pci_dev, "CXL accel memdev creation failed");
+ return PTR_ERR(cxl->cxlmd);
+ }
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 12/23] cxl: Define a driver interface for HPA free space enumeration
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (10 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 11/23] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 13/23] sfc: get root decoder alejandro.lucero-palau
` (12 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
CXL region creation involves allocating capacity from Device Physical
Address (DPA) and assigning it to decode a given Host Physical Address
(HPA). Before determining how much DPA to allocate the amount of available
HPA must be determined. Also, not all HPA is created equal, some HPA
targets RAM, some targets PMEM, some is prepared for device-memory flows
like HDM-D and HDM-DB, and some is HDM-H (host-only).
In order to support Type2 CXL devices, wrap all of those concerns into
an API that retrieves a root decoder (platform CXL window) that fits the
specified constraints and the capacity available for a new region.
Add a complementary function for releasing the reference to such root
decoder.
Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/region.c | 159 ++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 3 +
include/cxl/cxl.h | 6 ++
3 files changed, 168 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index b06fee1978ba..c9bf7415535e 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -711,6 +711,165 @@ static int free_hpa(struct cxl_region *cxlr)
return 0;
}
+struct cxlrd_max_context {
+ struct device * const *host_bridges;
+ int interleave_ways;
+ unsigned long flags;
+ resource_size_t max_hpa;
+ struct cxl_root_decoder *cxlrd;
+};
+
+static int find_max_hpa(struct device *dev, void *data)
+{
+ struct cxlrd_max_context *ctx = data;
+ struct cxl_switch_decoder *cxlsd;
+ struct cxl_root_decoder *cxlrd;
+ struct resource *res, *prev;
+ struct cxl_decoder *cxld;
+ resource_size_t free = 0;
+ resource_size_t max;
+ int found = 0;
+
+ if (!is_root_decoder(dev))
+ return 0;
+
+ cxlrd = to_cxl_root_decoder(dev);
+ cxlsd = &cxlrd->cxlsd;
+ cxld = &cxlsd->cxld;
+
+ if ((cxld->flags & ctx->flags) != ctx->flags) {
+ dev_dbg(dev, "flags not matching: %08lx vs %08lx\n",
+ cxld->flags, ctx->flags);
+ return 0;
+ }
+
+ for (int i = 0; i < ctx->interleave_ways; i++) {
+ for (int j = 0; j < ctx->interleave_ways; j++) {
+ if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) {
+ found++;
+ break;
+ }
+ }
+ }
+
+ if (found != ctx->interleave_ways) {
+ dev_dbg(dev,
+ "Not enough host bridges. Found %d for %d interleave ways requested\n",
+ found, ctx->interleave_ways);
+ return 0;
+ }
+
+ /*
+ * Walk the root decoder resource range relying on cxl_rwsem.region to
+ * preclude sibling arrival/departure and find the largest free space
+ * gap.
+ */
+ lockdep_assert_held_read(&cxl_rwsem.region);
+ res = cxlrd->res->child;
+
+ /* With no resource child the whole parent resource is available */
+ if (!res)
+ max = resource_size(cxlrd->res);
+ else
+ max = 0;
+
+ for (prev = NULL; res; prev = res, res = res->sibling) {
+ /*
+ * Sanity check for preventing arithmetic problems below as a
+ * resource with size 0 could imply using the end field below
+ * when set to unsigned zero - 1 or all f in hex.
+ */
+ if (prev && !resource_size(prev))
+ continue;
+
+ if (!prev && res->start > cxlrd->res->start) {
+ free = res->start - cxlrd->res->start;
+ max = max(free, max);
+ }
+ if (prev && res->start > prev->end + 1) {
+ free = res->start - prev->end + 1;
+ max = max(free, max);
+ }
+ }
+
+ if (prev && prev->end + 1 < cxlrd->res->end + 1) {
+ free = cxlrd->res->end + 1 - prev->end + 1;
+ max = max(free, max);
+ }
+
+ dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max);
+ if (max > ctx->max_hpa) {
+ if (ctx->cxlrd)
+ put_device(cxlrd_dev(ctx->cxlrd));
+ get_device(cxlrd_dev(cxlrd));
+ ctx->cxlrd = cxlrd;
+ ctx->max_hpa = max;
+ }
+ return 0;
+}
+
+/**
+ * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints
+ * @cxlmd: the mem device requiring the HPA
+ * @interleave_ways: number of entries in @host_bridges
+ * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device
+ * @max_avail_contig: output parameter of max contiguous bytes available in the
+ * returned decoder
+ *
+ * Returns a pointer to a struct cxl_root_decoder
+ *
+ * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given
+ * in (@max_avail_contig))' is a point in time snapshot. If by the time the
+ * caller goes to use this decoder and its capacity is reduced then caller needs
+ * to loop and retry.
+ *
+ * The returned root decoder has an elevated reference count that needs to be
+ * put with cxl_put_root_decoder(cxlrd).
+ */
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max_avail_contig)
+{
+ struct cxlrd_max_context ctx = {
+ .flags = flags,
+ .interleave_ways = interleave_ways,
+ };
+ struct cxl_port *root_port;
+ struct cxl_port *endpoint;
+
+ endpoint = cxlmd->endpoint;
+ if (!endpoint) {
+ dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ ctx.host_bridges = &endpoint->host_bridge;
+
+ struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint);
+ if (!root) {
+ dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n");
+ return ERR_PTR(-ENXIO);
+ }
+
+ root_port = &root->port;
+ scoped_guard(rwsem_read, &cxl_rwsem.region)
+ device_for_each_child(&root_port->dev, &ctx, find_max_hpa);
+
+ if (!ctx.cxlrd)
+ return ERR_PTR(-ENOMEM);
+
+ *max_avail_contig = ctx.max_hpa;
+ return ctx.cxlrd;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL");
+
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd)
+{
+ put_device(cxlrd_dev(cxlrd));
+}
+EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL");
+
static ssize_t size_store(struct device *dev, struct device_attribute *attr,
const char *buf, size_t len)
{
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index d7ddca6f7115..78845e0e3e4f 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -679,6 +679,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev);
struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev);
struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev);
bool is_root_decoder(struct device *dev);
+
+#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev)
+
bool is_switch_decoder(struct device *dev);
bool is_endpoint_decoder(struct device *dev);
struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port,
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 043fc31c764e..2ec514c77021 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -250,4 +250,10 @@ int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
struct cxl_dev_state *cxlds,
const struct cxl_memdev_ops *ops);
+struct cxl_port;
+struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
+ int interleave_ways,
+ unsigned long flags,
+ resource_size_t *max);
+void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 13/23] sfc: get root decoder
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (11 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 12/23] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 14/23] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
` (11 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting HPA (Host Physical Address) to use from a
CXL root decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/cxl.h | 15 ---------------
drivers/net/ethernet/sfc/Kconfig | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 20 ++++++++++++++++++++
include/cxl/cxl.h | 14 ++++++++++++++
4 files changed, 35 insertions(+), 15 deletions(-)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 78845e0e3e4f..5441a296c351 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -220,21 +220,6 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_RESOURCE_NONE ((resource_size_t) -1)
#define CXL_TARGET_STRLEN 20
-/*
- * cxl_decoder flags that define the type of memory / devices this
- * decoder supports as well as configuration lock status See "CXL 2.0
- * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
- * Additionally indicate whether decoder settings were autodetected,
- * user customized.
- */
-#define CXL_DECODER_F_RAM BIT(0)
-#define CXL_DECODER_F_PMEM BIT(1)
-#define CXL_DECODER_F_TYPE2 BIT(2)
-#define CXL_DECODER_F_TYPE3 BIT(3)
-#define CXL_DECODER_F_LOCK BIT(4)
-#define CXL_DECODER_F_ENABLE BIT(5)
-#define CXL_DECODER_F_MASK GENMASK(5, 0)
-
enum cxl_decoder_type {
CXL_DECODER_DEVMEM = 2,
CXL_DECODER_HOSTONLYMEM = 3,
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index 979f2801e2a8..e959d9b4f4ce 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -69,6 +69,7 @@ config SFC_MCDI_LOGGING
config SFC_CXL
bool "Solarflare SFC9100-family CXL support"
depends on SFC && CXL_BUS >= SFC
+ depends on CXL_REGION
default SFC
help
This enables SFC CXL support if the kernel is configuring CXL for
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index f6eda93e67e2..d7c34c978434 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -18,6 +18,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
{
struct efx_nic *efx = &probe_data->efx;
struct pci_dev *pci_dev = efx->pci_dev;
+ resource_size_t max_size;
struct efx_cxl *cxl;
u16 dvsec;
int rc;
@@ -90,6 +91,23 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxlmd);
}
+ cxl->cxlrd = cxl_get_hpa_freespace(cxl->cxlmd, 1,
+ CXL_DECODER_F_RAM | CXL_DECODER_F_TYPE2,
+ &max_size);
+
+ if (IS_ERR(cxl->cxlrd)) {
+ dev_err(&pci_dev->dev, "cxl_get_hpa_freespace failed\n");
+ return PTR_ERR(cxl->cxlrd);
+ }
+
+ if (max_size < EFX_CTPIO_BUFFER_SIZE) {
+ dev_err(&pci_dev->dev,
+ "%s: not enough free HPA space %pap < %u\n",
+ __func__, &max_size, EFX_CTPIO_BUFFER_SIZE);
+ cxl_put_root_decoder(cxl->cxlrd);
+ return -ENOSPC;
+ }
+
probe_data->cxl = cxl;
return 0;
@@ -97,6 +115,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
+ if (probe_data->cxl)
+ cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
MODULE_IMPORT_NS("CXL");
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 2ec514c77021..2966b95e80a6 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -153,6 +153,20 @@ struct cxl_dpa_partition {
#define CXL_NR_PARTITIONS_MAX 2
+/*
+ * cxl_decoder flags that define the type of memory / devices this
+ * decoder supports as well as configuration lock status See "CXL 2.0
+ * 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
+ * Additionally indicate whether decoder settings were autodetected,
+ * user customized.
+ */
+#define CXL_DECODER_F_RAM BIT(0)
+#define CXL_DECODER_F_PMEM BIT(1)
+#define CXL_DECODER_F_TYPE2 BIT(2)
+#define CXL_DECODER_F_TYPE3 BIT(3)
+#define CXL_DECODER_F_LOCK BIT(4)
+#define CXL_DECODER_F_ENABLE BIT(5)
+
struct cxl_memdev_ops {
int (*probe)(struct cxl_memdev *cxlmd);
};
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 14/23] cxl: Define a driver interface for DPA allocation
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (12 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 13/23] sfc: get root decoder alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 15/23] sfc: get endpoint decoder alejandro.lucero-palau
` (10 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Region creation involves finding available DPA (device-physical-address)
capacity to map into HPA (host-physical-address) space.
In order to support CXL Type2 devices, define an API, cxl_request_dpa(),
that tries to allocate the DPA memory the driver requires to operate.The
memory requested should not be bigger than the max available HPA obtained
previously with cxl_get_hpa_freespace().
Based on https://lore.kernel.org/linux-cxl/168592158743.1948938.7622563891193802610.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/hdm.c | 84 ++++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 1 +
include/cxl/cxl.h | 5 +++
3 files changed, 90 insertions(+)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index d3a094ca01ad..88c8d14b8a63 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -3,6 +3,7 @@
#include <linux/seq_file.h>
#include <linux/device.h>
#include <linux/delay.h>
+#include <cxl/cxl.h>
#include "cxlmem.h"
#include "core.h"
@@ -546,6 +547,12 @@ bool cxl_resource_contains_addr(const struct resource *res, const resource_size_
return resource_contains(res, &_addr);
}
+/**
+ * cxl_dpa_free - release DPA (Device Physical Address)
+ * @cxled: endpoint decoder linked to the DPA
+ *
+ * Returns 0 or error.
+ */
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
{
struct cxl_port *port = cxled_to_port(cxled);
@@ -572,6 +579,7 @@ int cxl_dpa_free(struct cxl_endpoint_decoder *cxled)
devm_cxl_dpa_release(cxled);
return 0;
}
+EXPORT_SYMBOL_NS_GPL(cxl_dpa_free, "CXL");
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode)
@@ -603,6 +611,82 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
return 0;
}
+static int find_free_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_port *port;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ port = cxled_to_port(cxled);
+
+ return cxled->cxld.id == (port->hdm_end + 1);
+}
+
+static struct cxl_endpoint_decoder *
+cxl_find_free_decoder(struct cxl_memdev *cxlmd)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct device *dev;
+
+ guard(rwsem_read)(&cxl_rwsem.dpa);
+ dev = device_find_child(&endpoint->dev, NULL,
+ find_free_decoder);
+ if (!dev)
+ return NULL;
+
+ return to_cxl_endpoint_decoder(dev);
+}
+
+/**
+ * cxl_request_dpa - search and reserve DPA given input constraints
+ * @cxlmd: memdev with an endpoint port with available decoders
+ * @mode: CXL partition mode (ram vs pmem)
+ * @alloc: dpa size required
+ *
+ * Returns a pointer to a 'struct cxl_endpoint_decoder' on success or
+ * an errno encoded pointer on failure.
+ *
+ * Given that a region needs to allocate from limited HPA capacity it
+ * may be the case that a device has more mappable DPA capacity than
+ * available HPA. The expectation is that @alloc is a driver known
+ * value based on the device capacity but which could not be fully
+ * available due to HPA constraints.
+ *
+ * Returns a pinned cxl_decoder with at least @alloc bytes of capacity
+ * reserved, or an error pointer. The caller is also expected to own the
+ * lifetime of the memdev registration associated with the endpoint to
+ * pin the decoder registered as well.
+ */
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc)
+{
+ int rc;
+
+ if (!IS_ALIGNED(alloc, SZ_256M))
+ return ERR_PTR(-EINVAL);
+
+ struct cxl_endpoint_decoder *cxled __free(put_cxled) =
+ cxl_find_free_decoder(cxlmd);
+
+ if (!cxled)
+ return ERR_PTR(-ENODEV);
+
+ rc = cxl_dpa_set_part(cxled, mode);
+ if (rc)
+ return ERR_PTR(rc);
+
+ rc = cxl_dpa_alloc(cxled, alloc);
+ if (rc)
+ return ERR_PTR(rc);
+
+ return no_free_ptr(cxled);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_request_dpa, "CXL");
+
static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 5441a296c351..06a111392c3b 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -640,6 +640,7 @@ struct cxl_root *find_cxl_root(struct cxl_port *port);
DEFINE_FREE(put_cxl_root, struct cxl_root *, if (_T) put_device(&_T->port.dev))
DEFINE_FREE(put_cxl_port, struct cxl_port *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
+DEFINE_FREE(put_cxled, struct cxl_endpoint_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxld.dev))
DEFINE_FREE(put_cxl_root_decoder, struct cxl_root_decoder *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->cxlsd.cxld.dev))
DEFINE_FREE(put_cxl_region, struct cxl_region *, if (!IS_ERR_OR_NULL(_T)) put_device(&_T->dev))
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 2966b95e80a6..1cbe53ad0416 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -7,6 +7,7 @@
#include <linux/node.h>
#include <linux/ioport.h>
+#include <linux/range.h>
#include <cxl/mailbox.h>
/**
@@ -270,4 +271,8 @@ struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd,
unsigned long flags,
resource_size_t *max);
void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd);
+struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
+ enum cxl_partition_mode mode,
+ resource_size_t alloc);
+int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (13 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 14/23] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-26 1:27 ` PJ Waskiewicz
2025-11-19 19:22 ` [PATCH v21 16/23] cxl: Make region type based on endpoint type alejandro.lucero-palau
` (9 subsequent siblings)
24 siblings, 1 reply; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron,
Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for getting DPA (Device Physical Address) to use through an
endpoint decoder.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index d7c34c978434..1a50bb2c0913 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return -ENOSPC;
}
+ cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
+ EFX_CTPIO_BUFFER_SIZE);
+ if (IS_ERR(cxl->cxled)) {
+ pci_err(pci_dev, "CXL accel request DPA failed");
+ cxl_put_root_decoder(cxl->cxlrd);
+ return PTR_ERR(cxl->cxled);
+ }
+
probe_data->cxl = cxl;
return 0;
@@ -115,8 +123,10 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
- if (probe_data->cxl)
+ if (probe_data->cxl) {
+ cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
+ }
}
MODULE_IMPORT_NS("CXL");
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 16/23] cxl: Make region type based on endpoint type
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (14 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 15/23] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 17/23] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
` (8 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield, Davidlohr Bueso
From: Alejandro Lucero <alucerop@amd.com>
Current code is expecting Type3 or CXL_DECODER_HOSTONLYMEM devices only.
Support for Type2 implies region type needs to be based on the endpoint
type HDM-D[B] instead.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
---
drivers/cxl/core/region.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index c9bf7415535e..85c2c7ab45b8 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2752,7 +2752,8 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode, int id,
+ enum cxl_decoder_type target_type)
{
int rc;
@@ -2774,7 +2775,7 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode, target_type);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
@@ -2788,7 +2789,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -3682,7 +3683,8 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
do {
cxlr = __create_region(cxlrd, cxlds->part[part].mode,
- atomic_read(&cxlrd->region_id));
+ atomic_read(&cxlrd->region_id),
+ cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
if (IS_ERR(cxlr)) {
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 17/23] cxl/region: Factor out interleave ways setup
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (15 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 16/23] cxl: Make region type based on endpoint type alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 18/23] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
` (7 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup for interleave ways.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/region.c | 43 ++++++++++++++++++++++++---------------
1 file changed, 27 insertions(+), 16 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 85c2c7ab45b8..d618adee94e0 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -491,22 +491,14 @@ static ssize_t interleave_ways_show(struct device *dev,
static const struct attribute_group *get_cxl_region_target_group(void);
-static ssize_t interleave_ways_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_ways(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- unsigned int val, save;
- int rc;
+ int save, rc;
u8 iw;
- rc = kstrtouint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = ways_to_eiw(val, &iw);
if (rc)
return rc;
@@ -521,9 +513,7 @@ static ssize_t interleave_ways_store(struct device *dev,
return -EINVAL;
}
- ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
- if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
- return rc;
+ lockdep_assert_held_write(&cxl_rwsem.region);
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
return -EBUSY;
@@ -531,10 +521,31 @@ static ssize_t interleave_ways_store(struct device *dev,
save = p->interleave_ways;
p->interleave_ways = val;
rc = sysfs_update_group(&cxlr->dev.kobj, get_cxl_region_target_group());
- if (rc) {
+ if (rc)
p->interleave_ways = save;
+
+ return rc;
+}
+
+static ssize_t interleave_ways_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ unsigned int val;
+ int rc;
+
+ rc = kstrtouint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
+ if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
+ return rc;
+
+ rc = set_interleave_ways(cxlr, val);
+ if (rc)
return rc;
- }
return len;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 18/23] cxl/region: Factor out interleave granularity setup
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (16 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 17/23] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 19/23] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
` (6 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron, Ben Cheatham,
Alison Schofield
From: Alejandro Lucero <alucerop@amd.com>
Region creation based on Type3 devices is triggered from user space
allowing memory combination through interleaving.
In preparation for kernel driven region creation, that is Type2 drivers
triggering region creation backed with its advertised CXL memory, factor
out a common helper from the user-sysfs region setup forinterleave
granularity.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
---
drivers/cxl/core/region.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index d618adee94e0..1b0668fec02e 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -565,21 +565,14 @@ static ssize_t interleave_granularity_show(struct device *dev,
return sysfs_emit(buf, "%d\n", p->interleave_granularity);
}
-static ssize_t interleave_granularity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t len)
+static int set_interleave_granularity(struct cxl_region *cxlr, int val)
{
- struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev->parent);
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
- struct cxl_region *cxlr = to_cxl_region(dev);
struct cxl_region_params *p = &cxlr->params;
- int rc, val;
+ int rc;
u16 ig;
- rc = kstrtoint(buf, 0, &val);
- if (rc)
- return rc;
-
rc = granularity_to_eig(val, &ig);
if (rc)
return rc;
@@ -595,14 +588,32 @@ static ssize_t interleave_granularity_store(struct device *dev,
if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
return -EINVAL;
- ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
- if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
- return rc;
-
+ lockdep_assert_held_write(&cxl_rwsem.region);
if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE)
return -EBUSY;
p->interleave_granularity = val;
+ return 0;
+}
+
+static ssize_t interleave_granularity_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+ int rc, val;
+
+ rc = kstrtoint(buf, 0, &val);
+ if (rc)
+ return rc;
+
+ ACQUIRE(rwsem_write_kill, rwsem)(&cxl_rwsem.region);
+ if ((rc = ACQUIRE_ERR(rwsem_write_kill, &rwsem)))
+ return rc;
+
+ rc = set_interleave_granularity(cxlr, val);
+ if (rc)
+ return rc;
return len;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 19/23] cxl: Allow region creation by type2 drivers
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (17 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 18/23] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 20/23] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
` (5 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Creating a CXL region requires userspace intervention through the cxl
sysfs files. Type2 support should allow accelerator drivers to create
such cxl region from kernel code.
Adding that functionality and integrating it with current support for
memory expanders. Only support uncommitted CXL_DECODER_DEVMEM decoders.
Based on https://lore.kernel.org/linux-cxl/168592159835.1948938.1647215579839222774.stgit@dwillia2-xfh.jf.intel.com/
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/core.h | 5 --
drivers/cxl/core/hdm.c | 7 ++
drivers/cxl/core/region.c | 132 ++++++++++++++++++++++++++++++++++++--
drivers/cxl/port.c | 5 +-
include/cxl/cxl.h | 11 ++++
5 files changed, 147 insertions(+), 13 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 1c1726856139..9a6775845afe 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -15,11 +15,6 @@ extern const struct device_type cxl_pmu_type;
extern struct attribute_group cxl_base_attribute_group;
-enum cxl_detach_mode {
- DETACH_ONLY,
- DETACH_INVALIDATE,
-};
-
#ifdef CONFIG_CXL_REGION
extern struct device_attribute dev_attr_create_pmem_region;
extern struct device_attribute dev_attr_create_ram_region;
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 88c8d14b8a63..33b767bdedec 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -1104,6 +1104,13 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
/* decoders are enabled if committed */
if (committed) {
+ if (cxled && cxled->cxld.target_type == CXL_DECODER_DEVMEM) {
+ dev_warn(&port->dev,
+ "decoder%d.%d: DEVMEM decoder committed by firmware. Unsupported\n",
+ port->id, cxld->id);
+ kfree(cxled);
+ return -ENXIO;
+ }
cxld->flags |= CXL_DECODER_F_ENABLE;
if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
cxld->flags |= CXL_DECODER_F_LOCK;
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 1b0668fec02e..3af96c265351 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2380,6 +2380,7 @@ int cxl_decoder_detach(struct cxl_region *cxlr,
}
return 0;
}
+EXPORT_SYMBOL_NS_GPL(cxl_decoder_detach, "CXL");
static int __attach_target(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled, int pos,
@@ -2863,6 +2864,14 @@ cxl_find_region_by_name(struct cxl_root_decoder *cxlrd, const char *name)
return to_cxl_region(region_dev);
}
+static void drop_region(struct cxl_region *cxlr)
+{
+ struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+
+ devm_release_action(port->uport_dev, unregister_region, cxlr);
+}
+
static ssize_t delete_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -3693,14 +3702,12 @@ static int __construct_region(struct cxl_region *cxlr,
return 0;
}
-/* Establish an empty region covering the given HPA range */
-static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
- struct cxl_endpoint_decoder *cxled)
+static struct cxl_region *construct_region_begin(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
- struct cxl_port *port = cxlrd_to_port(cxlrd);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
- int rc, part = READ_ONCE(cxled->part);
+ int part = READ_ONCE(cxled->part);
struct cxl_region *cxlr;
do {
@@ -3709,13 +3716,26 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
cxled->cxld.target_type);
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
- if (IS_ERR(cxlr)) {
+ if (IS_ERR(cxlr))
dev_err(cxlmd->dev.parent,
"%s:%s: %s failed assign region: %ld\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
__func__, PTR_ERR(cxlr));
+
+ return cxlr;
+}
+
+/* Establish an empty region covering the given HPA range */
+static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder *cxled)
+{
+ struct cxl_port *port = cxlrd_to_port(cxlrd);
+ struct cxl_region *cxlr;
+ int rc;
+
+ cxlr = construct_region_begin(cxlrd, cxled);
+ if (IS_ERR(cxlr))
return cxlr;
- }
rc = __construct_region(cxlr, cxlrd, cxled);
if (rc) {
@@ -3726,6 +3746,104 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
return cxlr;
}
+DEFINE_FREE(cxl_region_drop, struct cxl_region *, if (_T) drop_region(_T))
+
+static struct cxl_region *
+__construct_new_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled, int ways)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled[0]);
+ struct cxl_decoder *cxld = &cxlrd->cxlsd.cxld;
+ struct cxl_region_params *p;
+ resource_size_t size = 0;
+ int rc, i;
+
+ struct cxl_region *cxlr __free(cxl_region_drop) =
+ construct_region_begin(cxlrd, cxled[0]);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ guard(rwsem_write)(&cxl_rwsem.region);
+
+ /*
+ * Sanity check. This should not happen with an accel driver handling
+ * the region creation.
+ */
+ p = &cxlr->params;
+ if (p->state >= CXL_CONFIG_INTERLEAVE_ACTIVE) {
+ dev_err(cxlmd->dev.parent,
+ "%s:%s: %s unexpected region state\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled[0]->cxld.dev),
+ __func__);
+ return ERR_PTR(-EBUSY);
+ }
+
+ rc = set_interleave_ways(cxlr, ways);
+ if (rc)
+ return ERR_PTR(rc);
+
+ rc = set_interleave_granularity(cxlr, cxld->interleave_granularity);
+ if (rc)
+ return ERR_PTR(rc);
+
+ scoped_guard(rwsem_read, &cxl_rwsem.dpa) {
+ for (i = 0; i < ways; i++) {
+ if (!cxled[i]->dpa_res)
+ return ERR_PTR(-EINVAL);
+ size += resource_size(cxled[i]->dpa_res);
+ }
+
+ rc = alloc_hpa(cxlr, size);
+ if (rc)
+ return ERR_PTR(rc);
+
+ for (i = 0; i < ways; i++) {
+ rc = cxl_region_attach(cxlr, cxled[i], 0);
+ if (rc)
+ return ERR_PTR(rc);
+ }
+ }
+
+ rc = cxl_region_decode_commit(cxlr);
+ if (rc)
+ return ERR_PTR(rc);
+
+ p->state = CXL_CONFIG_COMMIT;
+
+ return no_free_ptr(cxlr);
+}
+
+/**
+ * cxl_create_region - Establish a region given an endpoint decoder
+ * @cxlrd: root decoder to allocate HPA
+ * @cxled: endpoint decoders with reserved DPA capacity
+ * @ways: interleave ways required
+ *
+ * Returns a fully formed region in the commit state and attached to the
+ * cxl_region driver.
+ */
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways)
+{
+ struct cxl_region *cxlr;
+
+ mutex_lock(&cxlrd->range_lock);
+ cxlr = __construct_new_region(cxlrd, cxled, ways);
+ mutex_unlock(&cxlrd->range_lock);
+ if (IS_ERR(cxlr))
+ return cxlr;
+
+ if (device_attach(&cxlr->dev) <= 0) {
+ dev_err(&cxlr->dev, "failed to create region\n");
+ drop_region(cxlr);
+ return ERR_PTR(-ENODEV);
+ }
+
+ return cxlr;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_create_region, "CXL");
+
static struct cxl_region *
cxl_find_region_by_range(struct cxl_root_decoder *cxlrd, struct range *hpa)
{
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ef65d983e1c8..033de5a3ffd5 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -34,6 +34,7 @@ static void schedule_detach(void *cxlmd)
static int discover_region(struct device *dev, void *unused)
{
struct cxl_endpoint_decoder *cxled;
+ struct cxl_memdev *cxlmd;
int rc;
if (!is_endpoint_decoder(dev))
@@ -43,7 +44,9 @@ static int discover_region(struct device *dev, void *unused)
if ((cxled->cxld.flags & CXL_DECODER_F_ENABLE) == 0)
return 0;
- if (cxled->state != CXL_DECODER_STATE_AUTO)
+ cxlmd = cxled_to_memdev(cxled);
+ if (cxled->state != CXL_DECODER_STATE_AUTO ||
+ cxlmd->cxlds->type == CXL_DEVTYPE_DEVMEM)
return 0;
/*
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 1cbe53ad0416..c6fd8fbd36c4 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -275,4 +275,15 @@ struct cxl_endpoint_decoder *cxl_request_dpa(struct cxl_memdev *cxlmd,
enum cxl_partition_mode mode,
resource_size_t alloc);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
+struct cxl_region *cxl_create_region(struct cxl_root_decoder *cxlrd,
+ struct cxl_endpoint_decoder **cxled,
+ int ways);
+enum cxl_detach_mode {
+ DETACH_ONLY,
+ DETACH_INVALIDATE,
+};
+
+int cxl_decoder_detach(struct cxl_region *cxlr,
+ struct cxl_endpoint_decoder *cxled, int pos,
+ enum cxl_detach_mode mode);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 20/23] cxl: Avoid dax creation for accelerators
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (18 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 19/23] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 21/23] sfc: create cxl region alejandro.lucero-palau
` (4 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Davidlohr Bueso, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 3af96c265351..4f56d1ad062b 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -4128,6 +4128,13 @@ static int cxl_region_probe(struct device *dev)
return rc;
}
+ /*
+ * HDM-D[B] (device-memory) regions have accelerator specific usage.
+ * Skip device-dax registration.
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM)
+ return 0;
+
switch (cxlr->mode) {
case CXL_PARTMODE_PMEM:
rc = devm_cxl_region_edac_register(cxlr);
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 21/23] sfc: create cxl region
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (19 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 20/23] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 22/23] cxl: Add function for obtaining region range alejandro.lucero-palau
` (3 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl api for creating a region using the endpoint decoder related to
a DPA range.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 1a50bb2c0913..79fe99d83f9f 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -116,6 +116,14 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxled);
}
+ cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1);
+ if (IS_ERR(cxl->efx_region)) {
+ pci_err(pci_dev, "CXL accel create region failed");
+ cxl_put_root_decoder(cxl->cxlrd);
+ cxl_dpa_free(cxl->cxled);
+ return PTR_ERR(cxl->efx_region);
+ }
+
probe_data->cxl = cxl;
return 0;
@@ -124,6 +132,8 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
if (probe_data->cxl) {
+ cxl_decoder_detach(NULL, probe_data->cxl->cxled, 0,
+ DETACH_INVALIDATE);
cxl_dpa_free(probe_data->cxl->cxled);
cxl_put_root_decoder(probe_data->cxl->cxlrd);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 22/23] cxl: Add function for obtaining region range
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (20 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 21/23] sfc: create cxl region alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 23/23] sfc: support pio mapping based on cxl alejandro.lucero-palau
` (2 subsequent siblings)
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Zhi Wang, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A CXL region struct contains the physical address to work with.
Type2 drivers can create a CXL region but have not access to the
related struct as it is defined as private by the kernel CXL core.
Add a function for getting the cxl region range to be used for mapping
such memory range by a Type2 driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Zhi Wang <zhiw@nvidia.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/cxl/core/region.c | 23 +++++++++++++++++++++++
include/cxl/cxl.h | 2 ++
2 files changed, 25 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 4f56d1ad062b..44e82b2eb247 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2757,6 +2757,29 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(rc);
}
+/**
+ * cxl_get_region_range - obtain range linked to a CXL region
+ *
+ * @region: a pointer to struct cxl_region
+ * @range: a pointer to a struct range to be set
+ *
+ * Returns 0 or error.
+ */
+int cxl_get_region_range(struct cxl_region *region, struct range *range)
+{
+ if (WARN_ON_ONCE(!region))
+ return -ENODEV;
+
+ if (!region->params.res)
+ return -ENOSPC;
+
+ range->start = region->params.res->start;
+ range->end = region->params.res->end;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_get_region_range, "CXL");
+
static ssize_t __create_region_show(struct cxl_root_decoder *cxlrd, char *buf)
{
return sysfs_emit(buf, "region%u\n", atomic_read(&cxlrd->region_id));
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index c6fd8fbd36c4..e5d1e5a20e06 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -286,4 +286,6 @@ enum cxl_detach_mode {
int cxl_decoder_detach(struct cxl_region *cxlr,
struct cxl_endpoint_decoder *cxled, int pos,
enum cxl_detach_mode mode);
+struct range;
+int cxl_get_region_range(struct cxl_region *region, struct range *range);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v21 23/23] sfc: support pio mapping based on cxl
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (21 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 22/23] cxl: Add function for obtaining region range alejandro.lucero-palau
@ 2025-11-19 19:22 ` alejandro.lucero-palau
2025-11-21 6:41 ` [PATCH v21 00/23] Type2 device basic support PJ Waskiewicz
2025-11-28 19:44 ` PJ Waskiewicz
24 siblings, 0 replies; 51+ messages in thread
From: alejandro.lucero-palau @ 2025-11-19 19:22 UTC (permalink / raw)
To: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.
With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.
Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL code happens.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/ef10.c | 50 +++++++++++++++++++++++----
drivers/net/ethernet/sfc/efx_cxl.c | 31 ++++++++++++++---
drivers/net/ethernet/sfc/net_driver.h | 2 ++
drivers/net/ethernet/sfc/nic.h | 3 ++
4 files changed, 75 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index fcec81f862ec..2bb6d3136c7c 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -24,6 +24,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/udp_tunnel.h>
+#include "efx_cxl.h"
/* Hardware control for EF10 architecture including 'Huntington'. */
@@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
{
- MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
+ MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
efx->num_mac_stats);
}
+ if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
+ nic_data->datapath_caps3 = 0;
+ else
+ nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
+ GET_CAPABILITIES_V7_OUT_FLAGS3);
+
return 0;
}
@@ -919,6 +926,9 @@ static void efx_ef10_forget_old_piobufs(struct efx_nic *efx)
static void efx_ef10_remove(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
int rc;
#ifdef CONFIG_SFC_SRIOV
@@ -949,7 +959,12 @@ static void efx_ef10_remove(struct efx_nic *efx)
efx_mcdi_rx_free_indir_table(efx);
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if (nic_data->wc_membase && !probe_data->cxl_pio_in_use)
+#else
if (nic_data->wc_membase)
+#endif
iounmap(nic_data->wc_membase);
rc = efx_mcdi_free_vis(efx);
@@ -1140,6 +1155,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
unsigned int channel_vis, pio_write_vi_base, max_vis;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
unsigned int uc_mem_map_size, wc_mem_map_size;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
void __iomem *membase;
int rc;
@@ -1263,8 +1281,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
iounmap(efx->membase);
efx->membase = membase;
- /* Set up the WC mapping if needed */
- if (wc_mem_map_size) {
+ if (!wc_mem_map_size)
+ goto skip_pio;
+
+ /* Set up the WC mapping */
+
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if ((nic_data->datapath_caps3 &
+ (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
+ probe_data->cxl_pio_initialised) {
+ /* Using PIO through CXL mapping? */
+ nic_data->pio_write_base = probe_data->cxl->ctpio_cxl +
+ (pio_write_vi_base * efx->vi_stride +
+ ER_DZ_TX_PIOBUF - uc_mem_map_size);
+ probe_data->cxl_pio_in_use = true;
+ } else
+#endif
+ {
+ /* Using legacy PIO BAR mapping */
nic_data->wc_membase = ioremap_wc(efx->membase_phys +
uc_mem_map_size,
wc_mem_map_size);
@@ -1279,12 +1314,13 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
nic_data->wc_membase +
(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
uc_mem_map_size);
-
- rc = efx_ef10_link_piobufs(efx);
- if (rc)
- efx_ef10_free_piobufs(efx);
}
+ rc = efx_ef10_link_piobufs(efx);
+ if (rc)
+ efx_ef10_free_piobufs(efx);
+
+skip_pio:
netif_dbg(efx, probe, efx->net_dev,
"memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
&efx->membase_phys, efx->membase, uc_mem_map_size,
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 79fe99d83f9f..a84ce45398c1 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -11,6 +11,7 @@
#include <cxl/pci.h>
#include "net_driver.h"
#include "efx_cxl.h"
+#include "efx.h"
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
@@ -20,6 +21,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
resource_size_t max_size;
struct efx_cxl *cxl;
+ struct range range;
u16 dvsec;
int rc;
@@ -119,19 +121,40 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
cxl->efx_region = cxl_create_region(cxl->cxlrd, &cxl->cxled, 1);
if (IS_ERR(cxl->efx_region)) {
pci_err(pci_dev, "CXL accel create region failed");
- cxl_put_root_decoder(cxl->cxlrd);
- cxl_dpa_free(cxl->cxled);
- return PTR_ERR(cxl->efx_region);
+ rc = PTR_ERR(cxl->efx_region);
+ goto err_dpa;
+ }
+
+ rc = cxl_get_region_range(cxl->efx_region, &range);
+ if (rc) {
+ pci_err(pci_dev, "CXL getting regions params failed");
+ goto err_detach;
+ }
+
+ cxl->ctpio_cxl = ioremap(range.start, range.end - range.start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) failed", &range);
+ rc = -ENOMEM;
+ goto err_detach;
}
probe_data->cxl = cxl;
+ probe_data->cxl_pio_initialised = true;
return 0;
+
+err_detach:
+ cxl_decoder_detach(NULL, cxl->cxled, 0, DETACH_INVALIDATE);
+err_dpa:
+ cxl_put_root_decoder(cxl->cxlrd);
+ cxl_dpa_free(cxl->cxled);
+ return rc;
}
void efx_cxl_exit(struct efx_probe_data *probe_data)
{
- if (probe_data->cxl) {
+ if (probe_data->cxl_pio_initialised) {
+ iounmap(probe_data->cxl->ctpio_cxl);
cxl_decoder_detach(NULL, probe_data->cxl->cxled, 0,
DETACH_INVALIDATE);
cxl_dpa_free(probe_data->cxl->cxled);
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 3964b2c56609..bea4eecdf842 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1207,6 +1207,7 @@ struct efx_cxl;
* @efx: Efx NIC details
* @cxl: details of related cxl objects
* @cxl_pio_initialised: cxl initialization outcome.
+ * @cxl_pio_in_use: PIO using CXL mapping
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
@@ -1214,6 +1215,7 @@ struct efx_probe_data {
#ifdef CONFIG_SFC_CXL
struct efx_cxl *cxl;
bool cxl_pio_initialised;
+ bool cxl_pio_in_use;
#endif
};
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index 9fa5c4c713ab..c87cc9214690 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -152,6 +152,8 @@ enum {
* %MC_CMD_GET_CAPABILITIES response)
* @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
* %MC_CMD_GET_CAPABILITIES response)
+ * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
+ * %MC_CMD_GET_CAPABILITIES response)
* @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
* @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
* @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
@@ -186,6 +188,7 @@ struct efx_ef10_nic_data {
bool must_check_datapath_caps;
u32 datapath_caps;
u32 datapath_caps2;
+ u32 datapath_caps3;
unsigned int rx_dpcpu_fw_id;
unsigned int tx_dpcpu_fw_id;
bool must_probe_vswitching;
--
2.34.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
@ 2025-11-20 18:08 ` Jonathan Cameron
2025-11-20 18:27 ` Alejandro Lucero Palau
2025-11-20 20:27 ` Koralahalli Channabasappa, Smita
2025-12-02 2:52 ` dan.j.williams
2 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2025-11-20 18:08 UTC (permalink / raw)
To: alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang, Alejandro Lucero
On Wed, 19 Nov 2025 19:22:14 +0000
alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> In preparation for always-synchronous memdev attach, refactor memdev
> allocation and fix release bug in devm_cxl_add_memdev() when error after
> a successful allocation.
>
> The diff is busy as this moves cxl_memdev_alloc() down below the definition
> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
> preclude needing to export more symbols from the cxl_core.
>
> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
>
No line break here. Fixes is part of the tag block and some tools
get grumpy if that isn't contiguous. That includes a bot that runs
on linux-next.
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
This SOB chain is wrong. What was Dan's role in this? As first SOB with no
Co-developed tag he would normally also be the author (From above)
I'm out of time for today so will leave review for another time. Just flagging
that without these tag chains being correct Dave can't pick this up even
if everything else is good.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-20 18:08 ` Jonathan Cameron
@ 2025-11-20 18:27 ` Alejandro Lucero Palau
2025-11-21 12:06 ` Jonathan Cameron
0 siblings, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-20 18:27 UTC (permalink / raw)
To: Jonathan Cameron, alejandro.lucero-palau
Cc: linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 11/20/25 18:08, Jonathan Cameron wrote:
> On Wed, 19 Nov 2025 19:22:14 +0000
> alejandro.lucero-palau@amd.com wrote:
>
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> In preparation for always-synchronous memdev attach, refactor memdev
>> allocation and fix release bug in devm_cxl_add_memdev() when error after
>> a successful allocation.
>>
>> The diff is busy as this moves cxl_memdev_alloc() down below the definition
>> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
>> preclude needing to export more symbols from the cxl_core.
>>
>> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
>>
> No line break here. Fixes is part of the tag block and some tools
> get grumpy if that isn't contiguous. That includes a bot that runs
> on linux-next.
>
OK
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> This SOB chain is wrong. What was Dan's role in this? As first SOB with no
> Co-developed tag he would normally also be the author (From above)
The original patch is Dan's work. I did change it.
From the previous revision I asked what I should do and if adding my
Signed-off to Dan's one would be enough. Dave's answer was a yes.
Someone, likely I, misunderstood something in that exchange.
I did add my Signed-off to the patches 1 to 4 along with Dan's ones,
what I think it was suggested by Dave as well in another review.
Please, tell me what should I do here.
Thank you
>
> I'm out of time for today so will leave review for another time. Just flagging
> that without these tag chains being correct Dave can't pick this up even
> if everything else is good.
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
2025-11-20 18:08 ` Jonathan Cameron
@ 2025-11-20 20:27 ` Koralahalli Channabasappa, Smita
2025-11-21 13:41 ` Alejandro Lucero Palau
2025-12-02 2:52 ` dan.j.williams
2 siblings, 1 reply; 51+ messages in thread
From: Koralahalli Channabasappa, Smita @ 2025-11-20 20:27 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
Hi Alejandro,
On 11/19/2025 11:22 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> In preparation for always-synchronous memdev attach, refactor memdev
> allocation and fix release bug in devm_cxl_add_memdev() when error after
> a successful allocation.
>
> The diff is busy as this moves cxl_memdev_alloc() down below the definition
> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
> preclude needing to export more symbols from the cxl_core.
>
> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/memdev.c | 134 +++++++++++++++++++++-----------------
> drivers/cxl/private.h | 10 +++
> 2 files changed, 86 insertions(+), 58 deletions(-)
> create mode 100644 drivers/cxl/private.h
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e370d733e440..8de19807ac7b 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -8,6 +8,7 @@
> #include <linux/idr.h>
> #include <linux/pci.h>
> #include <cxlmem.h>
> +#include "private.h"
> #include "trace.h"
> #include "core.h"
>
> @@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
>
> static struct lock_class_key cxl_memdev_key;
>
> -static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> - const struct file_operations *fops)
> +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
> {
> - struct cxl_memdev *cxlmd;
> - struct device *dev;
> - struct cdev *cdev;
> + struct device *dev = &cxlmd->dev;
> + struct cdev *cdev = &cxlmd->cdev;
> int rc;
>
> - cxlmd = kzalloc(sizeof(*cxlmd), GFP_KERNEL);
> - if (!cxlmd)
> - return ERR_PTR(-ENOMEM);
> -
> - rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
> - if (rc < 0)
> - goto err;
> - cxlmd->id = rc;
> - cxlmd->depth = -1;
> -
> - dev = &cxlmd->dev;
> - device_initialize(dev);
> - lockdep_set_class(&dev->mutex, &cxl_memdev_key);
> - dev->parent = cxlds->dev;
> - dev->bus = &cxl_bus_type;
> - dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> - dev->type = &cxl_memdev_type;
> - device_set_pm_not_required(dev);
> - INIT_WORK(&cxlmd->detach_work, detach_memdev);
> -
> - cdev = &cxlmd->cdev;
> - cdev_init(cdev, fops);
> - return cxlmd;
> + rc = cdev_device_add(cdev, dev);
> + if (rc) {
> + /*
> + * The cdev was briefly live, shutdown any ioctl operations that
> + * saw that state.
> + */
> + cxl_memdev_shutdown(dev);
> + return rc;
> + }
>
> -err:
> - kfree(cxlmd);
> - return ERR_PTR(rc);
> + return devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
> }
> +EXPORT_SYMBOL_NS_GPL(devm_cxl_memdev_add_or_reset, "CXL");
>
> static long __cxl_memdev_ioctl(struct cxl_memdev *cxlmd, unsigned int cmd,
> unsigned long arg)
> @@ -1051,48 +1035,82 @@ static const struct file_operations cxl_memdev_fops = {
> .llseek = noop_llseek,
> };
>
> -struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> - struct cxl_dev_state *cxlds)
> +struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds)
> {
> - struct cxl_memdev *cxlmd;
> + struct cxl_memdev *cxlmd __free(kfree) =
> + kzalloc(sizeof(*cxlmd), GFP_KERNEL);
> struct device *dev;
> struct cdev *cdev;
> int rc;
>
> - cxlmd = cxl_memdev_alloc(cxlds, &cxl_memdev_fops);
> - if (IS_ERR(cxlmd))
> - return cxlmd;
> -
> - dev = &cxlmd->dev;
> - rc = dev_set_name(dev, "mem%d", cxlmd->id);
> - if (rc)
> - goto err;
> + if (!cxlmd)
> + return ERR_PTR(-ENOMEM);
>
> - /*
> - * Activate ioctl operations, no cxl_memdev_rwsem manipulation
> - * needed as this is ordered with cdev_add() publishing the device.
> - */
> + rc = ida_alloc_max(&cxl_memdev_ida, CXL_MEM_MAX_DEVS - 1, GFP_KERNEL);
> + if (rc < 0)
> + return ERR_PTR(rc);
> + cxlmd->id = rc;
> + cxlmd->depth = -1;
> cxlmd->cxlds = cxlds;
> cxlds->cxlmd = cxlmd;
>
> + dev = &cxlmd->dev;
> + device_initialize(dev);
> + lockdep_set_class(&dev->mutex, &cxl_memdev_key);
> + dev->parent = cxlds->dev;
> + dev->bus = &cxl_bus_type;
> + dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> + dev->type = &cxl_memdev_type;
> + device_set_pm_not_required(dev);
> + INIT_WORK(&cxlmd->detach_work, detach_memdev);
> +
> cdev = &cxlmd->cdev;
> - rc = cdev_device_add(cdev, dev);
> + cdev_init(cdev, &cxl_memdev_fops);
> + return_ptr(cxlmd);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_alloc, "CXL");
> +
> +static void __cxlmd_free(struct cxl_memdev *cxlmd)
> +{
> + if (IS_ERR(cxlmd))
> + return;
> +
> + if (cxlmd->cxlds)
> + cxlmd->cxlds->cxlmd = NULL;
> +
This series caused a NULL deref in devm_cxl_add_memdev().
__cxlmd_free() only checks IS_ERR(cxlmd) and proceeds to dereference
cxlmd->cxlds.
Adding a NULL check for cxlmd fixed the crash in my setup.
BUG: kernel NULL pointer dereference, address: 0000000000000358
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1553a7067 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:devm_cxl_add_memdev+0x71/0xb0 [cxl_mem]
Code: 89 c4 e8 c2 c8 be f8 85 c0 75 17 48 89 de 4c 89 ef e8 b3 08 f9 ff
85 c0 75 08 45 31 e4 45 31 ed eb 08 48 98 49 89 dd 48 89 c3 <49> 8b 85
58 03 00 00 48 85 c0 74 08 48 c7 40 08 00 00 00 00 4c 89
CR2: 0000000000000358 CR3: 00000001553a6002 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
cxl_pci_probe+0x409/0xb00 [cxl_pci]
? update_load_avg+0x83/0x780
local_pci_probe+0x4d/0xb0
work_for_cpu_fn+0x1e/0x30
process_scheduled_works+0xa9/0x420
? __pfx_worker_thread+0x10/0x10
worker_thread+0x127/0x270
...
Thanks
Smita
> + put_device(&cxlmd->dev);
> + kfree(cxlmd);
> +}
> +
> +DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
> +
> +/**
> + * devm_cxl_add_memdev - Add a CXL memory device
> + * @host: devres alloc/release context and parent for the memdev
> + * @cxlds: CXL device state to associate with the memdev
> + *
> + * Upon return the device will have had a chance to attach to the
> + * cxl_mem driver, but may fail if the CXL topology is not ready
> + * (hardware CXL link down, or software platform CXL root not attached)
> + */
> +struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
> + struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd __free(cxlmd_free) = cxl_memdev_alloc(cxlds);
> + int rc;
> +
> + if (IS_ERR(cxlmd))
> + return cxlmd;
> +
> + rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
> if (rc)
> - goto err;
> + return ERR_PTR(rc);
>
> - rc = devm_add_action_or_reset(host, cxl_memdev_unregister, cxlmd);
> + rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
> if (rc)
> return ERR_PTR(rc);
> - return cxlmd;
>
> -err:
> - /*
> - * The cdev was briefly live, shutdown any ioctl operations that
> - * saw that state.
> - */
> - cxl_memdev_shutdown(dev);
> - put_device(dev);
> - return ERR_PTR(rc);
> + return no_free_ptr(cxlmd);
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>
> diff --git a/drivers/cxl/private.h b/drivers/cxl/private.h
> new file mode 100644
> index 000000000000..50c2ac57afb5
> --- /dev/null
> +++ b/drivers/cxl/private.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* Copyright(c) 2025 Intel Corporation. */
> +
> +/* Private interfaces betwen common drivers ("cxl_mem") and the cxl_core */
> +
> +#ifndef __CXL_PRIVATE_H__
> +#define __CXL_PRIVATE_H__
> +struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds);
> +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd);
> +#endif /* __CXL_PRIVATE_H__ */
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (22 preceding siblings ...)
2025-11-19 19:22 ` [PATCH v21 23/23] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2025-11-21 6:41 ` PJ Waskiewicz
2025-11-21 10:40 ` Alejandro Lucero Palau
2025-11-28 19:44 ` PJ Waskiewicz
24 siblings, 1 reply; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-21 6:41 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
wrote:
Hi Alejandro,
Sorry it's been a bit since I've been able to comment. I've been
trying to test these patchsets with varying degrees of success. Still
haven't gotten things up and running fully. One comment below.
> From: Alejandro Lucero <alucerop@amd.com>
>
> The patchset should be applied on the described base commit then
> applying
> Terry's v13 about CXL error handling. The first 4 patches come from
> Dan's
> for-6.18/cxl-probe-order branch with minor modifications.
>
> v21 changes;
>
> patch1-2: v20 patch1 splitted up doing the code move in the second
> patch in v21. (Jonathan)
>
> patch1-4: adding my Signed-off tag along with Dan's
>
> patch5: fix duplication of CXL_NR_PARTITION definition
>
> patch7: dropped the cxl test fixes removing unused function. It was
> sent independently ahead of this version.
>
> patch12: optimization for max free space calculation (Jonathan)
>
> patch19: optimization for returning on error (Jonathan)
I cannot test these v21 patches or the v20 patches for the same reason.
I suspect v19 is also affected, but I was stuck on v17 for awhile (b4
was really not likely the prereq patches you required to get the tree
into a usable state to apply your patchset).
When I build and go to install the kernel mods, depmod fails:
DEPMOD /lib/modules/6.18.0-rc6+
depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_port ->
cxl_core
depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_core
depmod: ERROR: Found 3 modules in dependency cycles!
I repro'd this on a few different systems, and just finally repro'd
this on a box outside of my work network.
This is unusable unfortunately, so I can't test this if I wanted to.
My .config for CXL:
CONFIG_PCIEAER_CXL=y
CONFIG_CXL_BUS=m
CONFIG_CXL_PCI=y
# CONFIG_CXL_MEM_RAW_COMMANDS is not set
CONFIG_CXL_ACPI=m
CONFIG_CXL_PMEM=m
CONFIG_CXL_MEM=m
CONFIG_CXL_FEATURES=y
# CONFIG_CXL_EDAC_MEM_FEATURES is not set
CONFIG_CXL_PORT=m
CONFIG_CXL_SUSPEND=y
CONFIG_CXL_REGION=y
# CONFIG_CXL_REGION_INVALIDATION_TEST is not set
CONFIG_CXL_RAS=y
CONFIG_CXL_RCH_RAS=y
CONFIG_CXL_PMU=m
CONFIG_DEV_DAX_CXL=m
Pretty simple to repro.
$ make -j<N> && make modules && make modules_install
Hopefully there's a solution here that doesn't involve building the
whole mess into the kernel directly.
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 08/23] cxl/sfc: Map cxl component regs
2025-11-19 19:22 ` [PATCH v21 08/23] cxl/sfc: Map cxl component regs alejandro.lucero-palau
@ 2025-11-21 6:54 ` PJ Waskiewicz
2025-11-21 11:01 ` Alejandro Lucero Palau
0 siblings, 1 reply; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-21 6:54 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Edward Cree, Jonathan Cameron, Ben Cheatham
On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
wrote:
Hi Alejandro,
> From: Alejandro Lucero <alucerop@amd.com>
>
> Export cxl core functions for a Type2 driver being able to discover
> and
> map the device component registers.
>
> Use it in sfc driver cxl initialization.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
> drivers/cxl/core/pci.c | 1 +
> drivers/cxl/core/pci_drv.c | 1 +
> drivers/cxl/core/port.c | 1 +
> drivers/cxl/core/regs.c | 1 +
> drivers/cxl/cxl.h | 7 ------
> drivers/cxl/cxlpci.h | 12 ----------
> drivers/net/ethernet/sfc/efx_cxl.c | 35
> ++++++++++++++++++++++++++++++
> include/cxl/cxl.h | 19 ++++++++++++++++
> include/cxl/pci.h | 21 ++++++++++++++++++
> 9 files changed, 79 insertions(+), 19 deletions(-)
> create mode 100644 include/cxl/pci.h
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 566d57ba0579..90a0763e72c4 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -6,6 +6,7 @@
> #include <linux/delay.h>
> #include <linux/pci.h>
> #include <linux/pci-doe.h>
> +#include <cxl/pci.h>
> #include <linux/aer.h>
> #include <cxlpci.h>
> #include <cxlmem.h>
> diff --git a/drivers/cxl/core/pci_drv.c b/drivers/cxl/core/pci_drv.c
> index a35e746e6303..4c767e2471b8 100644
> --- a/drivers/cxl/core/pci_drv.c
> +++ b/drivers/cxl/core/pci_drv.c
> @@ -11,6 +11,7 @@
> #include <linux/pci.h>
> #include <linux/aer.h>
> #include <linux/io.h>
> +#include <cxl/pci.h>
> #include <cxl/mailbox.h>
> #include <cxl/cxl.h>
> #include "cxlmem.h"
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index d19ebf052d76..7c828c75e7b8 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -11,6 +11,7 @@
> #include <linux/idr.h>
> #include <linux/node.h>
> #include <cxl/einj.h>
> +#include <cxl/pci.h>
> #include <cxlmem.h>
> #include <cxlpci.h>
> #include <cxl.h>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index fc7fbd4f39d2..dcf444f1fe48 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -4,6 +4,7 @@
> #include <linux/device.h>
> #include <linux/slab.h>
> #include <linux/pci.h>
> +#include <cxl/pci.h>
> #include <cxlmem.h>
> #include <cxlpci.h>
> #include <pmu.h>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 536c9d99e0e6..d7ddca6f7115 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -39,10 +39,6 @@ extern const struct nvdimm_security_ops
> *cxl_security_ops;
> #define CXL_CM_CAP_HDR_ARRAY_SIZE_MASK GENMASK(31, 24)
> #define CXL_CM_CAP_PTR_MASK GENMASK(31, 20)
>
> -#define CXL_CM_CAP_CAP_ID_RAS 0x2
> -#define CXL_CM_CAP_CAP_ID_HDM 0x5
> -#define CXL_CM_CAP_CAP_HDM_VERSION 1
> -
> /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability
> Structure */
> #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> #define CXL_HDM_DECODER_COUNT_MASK GENMASK(3, 0)
> @@ -206,9 +202,6 @@ void cxl_probe_component_regs(struct device *dev,
> void __iomem *base,
> struct cxl_component_reg_map *map);
> void cxl_probe_device_regs(struct device *dev, void __iomem *base,
> struct cxl_device_reg_map *map);
> -int cxl_map_component_regs(const struct cxl_register_map *map,
> - struct cxl_component_regs *regs,
> - unsigned long map_mask);
> int cxl_map_device_regs(const struct cxl_register_map *map,
> struct cxl_device_regs *regs);
> int cxl_map_pmu_regs(struct cxl_register_map *map, struct
> cxl_pmu_regs *regs);
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 24aba9ff6d2e..53760ce31af8 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -13,16 +13,6 @@
> */
> #define CXL_PCI_DEFAULT_MAX_VECTORS 16
>
> -/* Register Block Identifier (RBI) */
> -enum cxl_regloc_type {
> - CXL_REGLOC_RBI_EMPTY = 0,
> - CXL_REGLOC_RBI_COMPONENT,
> - CXL_REGLOC_RBI_VIRT,
> - CXL_REGLOC_RBI_MEMDEV,
> - CXL_REGLOC_RBI_PMU,
> - CXL_REGLOC_RBI_TYPES
> -};
> -
> /*
> * Table Access DOE, CDAT Read Entry Response
> *
> @@ -100,6 +90,4 @@ static inline void
> cxl_uport_init_ras_reporting(struct cxl_port *port,
> struct device *host)
> { }
> #endif
>
> -int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type
> type,
> - struct cxl_register_map *map);
> #endif /* __CXL_PCI_H__ */
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
> b/drivers/net/ethernet/sfc/efx_cxl.c
> index 8e0481d8dced..34126bc4826c 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -7,6 +7,8 @@
>
> #include <linux/pci.h>
>
> +#include <cxl/cxl.h>
> +#include <cxl/pci.h>
> #include "net_driver.h"
> #include "efx_cxl.h"
>
> @@ -18,6 +20,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> struct pci_dev *pci_dev = efx->pci_dev;
> struct efx_cxl *cxl;
> u16 dvsec;
> + int rc;
>
> probe_data->cxl_pio_initialised = false;
>
> @@ -44,6 +47,38 @@ int efx_cxl_init(struct efx_probe_data
> *probe_data)
> if (!cxl)
> return -ENOMEM;
>
> + rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
> + &cxl->cxlds.reg_map);
> + if (rc) {
> + pci_err(pci_dev, "No component registers\n");
> + return rc;
> + }
> +
> + if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
> + pci_err(pci_dev, "Expected HDM component register
> not found\n");
> + return -ENODEV;
> + }
> +
> + if (!cxl->cxlds.reg_map.component_map.ras.valid) {
> + pci_err(pci_dev, "Expected RAS component register
> not found\n");
> + return -ENODEV;
> + }
> +
> + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
> + &cxl->cxlds.regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_RAS));
I'm going to reiterate a previous concern here with this. When all of
this was in the CXL core, the CXL core owned whatever BAR these
registers were in in its entirety. Now with a Type2 device, splitting
this out has implications.
The cxl_map_component_regs() is going to try and map the register map
you request as a reserved resource, which will fail if this Type2
driver has the BAR mapped (which basically all of these drivers do).
I think it's worth either a big comment or something explicit in the
patch description that calls this limitation or restriction out.
Hardware designers will be caught off-guard if they design their
hardware where the CXL component regs are in a BAR shared by other
register maps in their devices. If they land the CXL regs in the
middle of that BAR, they will have to do some serious gymnastics in the
drivers to map pieces of their BAR to allow the kernel to map the
component regs. OR...they can have some breadcrumbs to try and design
the HW where the CXL component regs are at the very beginning or very
end of their BAR. That way drivers have an easier way to reserve a
subset of a contiguous BAR, and allow the kernel to grab the remainder
for CXL access and management.
I think this is a pretty serious implication that I don't see a way
around. But letting a HW designer fall into this hole and realize they
can only fix it with a horrible set of driver hacks, or a silicon
respin, really sucks.
Cheers,
-PJ
> + if (rc) {
> + pci_err(pci_dev, "Failed to map RAS capability.\n");
> + return rc;
> + }
> +
> + /*
> + * Set media ready explicitly as there are neither mailbox
> for checking
> + * this state nor the CXL register involved, both not
> mandatory for
> + * type2.
> + */
> + cxl->cxlds.media_ready = true;
> +
> probe_data->cxl = cxl;
>
> return 0;
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 13d448686189..7f2e23bce1f7 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -70,6 +70,10 @@ struct cxl_regs {
> );
> };
>
> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
> +
> struct cxl_reg_map {
> bool valid;
> int id;
> @@ -223,4 +227,19 @@ struct cxl_dev_state
> *_devm_cxl_dev_state_create(struct device *dev,
> (drv_struct *)_devm_cxl_dev_state_create(parent,
> type, serial, dvsec, \
>
> sizeof(drv_struct), mbox); \
> })
> +
> +/**
> + * cxl_map_component_regs - map cxl component registers
> + *
> + * @map: cxl register map to update with the mappings
> + * @regs: cxl component registers to work with
> + * @map_mask: cxl component regs to map
> + *
> + * Returns integer: success (0) or error (-ENOMEM)
> + *
> + * Made public for Type2 driver support.
> + */
> +int cxl_map_component_regs(const struct cxl_register_map *map,
> + struct cxl_component_regs *regs,
> + unsigned long map_mask);
> #endif /* __CXL_CXL_H__ */
> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
> new file mode 100644
> index 000000000000..a172439f08c6
> --- /dev/null
> +++ b/include/cxl/pci.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
> +
> +#ifndef __CXL_CXL_PCI_H__
> +#define __CXL_CXL_PCI_H__
> +
> +/* Register Block Identifier (RBI) */
> +enum cxl_regloc_type {
> + CXL_REGLOC_RBI_EMPTY = 0,
> + CXL_REGLOC_RBI_COMPONENT,
> + CXL_REGLOC_RBI_VIRT,
> + CXL_REGLOC_RBI_MEMDEV,
> + CXL_REGLOC_RBI_PMU,
> + CXL_REGLOC_RBI_TYPES
> +};
> +
> +struct cxl_register_map;
> +
> +int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type
> type,
> + struct cxl_register_map *map);
> +#endif
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-21 6:41 ` [PATCH v21 00/23] Type2 device basic support PJ Waskiewicz
@ 2025-11-21 10:40 ` Alejandro Lucero Palau
2025-11-22 1:08 ` PJ Waskiewicz
0 siblings, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-21 10:40 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 11/21/25 06:41, PJ Waskiewicz wrote:
> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> wrote:
>
> Hi Alejandro,
>
> Sorry it's been a bit since I've been able to comment. I've been
> trying to test these patchsets with varying degrees of success. Still
> haven't gotten things up and running fully. One comment below.
Hi,
No worries!
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> The patchset should be applied on the described base commit then
>> applying
>> Terry's v13 about CXL error handling. The first 4 patches come from
>> Dan's
>> for-6.18/cxl-probe-order branch with minor modifications.
>>
>> v21 changes;
>>
>> patch1-2: v20 patch1 splitted up doing the code move in the second
>> patch in v21. (Jonathan)
>>
>> patch1-4: adding my Signed-off tag along with Dan's
>>
>> patch5: fix duplication of CXL_NR_PARTITION definition
>>
>> patch7: dropped the cxl test fixes removing unused function. It was
>> sent independently ahead of this version.
>>
>> patch12: optimization for max free space calculation (Jonathan)
>>
>> patch19: optimization for returning on error (Jonathan)
> I cannot test these v21 patches or the v20 patches for the same reason.
> I suspect v19 is also affected, but I was stuck on v17 for awhile (b4
> was really not likely the prereq patches you required to get the tree
> into a usable state to apply your patchset).
>
> When I build and go to install the kernel mods, depmod fails:
>
> DEPMOD /lib/modules/6.18.0-rc6+
> depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_port ->
> cxl_core
> depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_core
> depmod: ERROR: Found 3 modules in dependency cycles!
>
> I repro'd this on a few different systems, and just finally repro'd
> this on a box outside of my work network.
>
> This is unusable unfortunately, so I can't test this if I wanted to.
I have been able to reproduce this, and I think after the changes
introduced in patches 2 & 3, we also need this:
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 6b871cbbce13..94a3102ce86b 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0-only
menuconfig CXL_BUS
- tristate "CXL (Compute Express Link) Devices Support"
+ bool "CXL (Compute Express Link) Devices Support"
depends on PCI
select FW_LOADER
select FW_UPLOAD
This changes implies neither CXL_BUS optionally being a module nor cxl_mem.
This should be enough for at least you able to test the patchset.
If this is agreed, I will send a v22 with it.
Thank you!
> My .config for CXL:
>
>
> CONFIG_PCIEAER_CXL=y
> CONFIG_CXL_BUS=m
> CONFIG_CXL_PCI=y
> # CONFIG_CXL_MEM_RAW_COMMANDS is not set
> CONFIG_CXL_ACPI=m
> CONFIG_CXL_PMEM=m
> CONFIG_CXL_MEM=m
> CONFIG_CXL_FEATURES=y
> # CONFIG_CXL_EDAC_MEM_FEATURES is not set
> CONFIG_CXL_PORT=m
> CONFIG_CXL_SUSPEND=y
> CONFIG_CXL_REGION=y
> # CONFIG_CXL_REGION_INVALIDATION_TEST is not set
> CONFIG_CXL_RAS=y
> CONFIG_CXL_RCH_RAS=y
> CONFIG_CXL_PMU=m
> CONFIG_DEV_DAX_CXL=m
>
> Pretty simple to repro.
>
> $ make -j<N> && make modules && make modules_install
>
> Hopefully there's a solution here that doesn't involve building the
> whole mess into the kernel directly.
>
> Cheers,
> -PJ
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v21 08/23] cxl/sfc: Map cxl component regs
2025-11-21 6:54 ` PJ Waskiewicz
@ 2025-11-21 11:01 ` Alejandro Lucero Palau
2025-11-22 1:11 ` PJ Waskiewicz
0 siblings, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-21 11:01 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Edward Cree, Jonathan Cameron, Ben Cheatham
On 11/21/25 06:54, PJ Waskiewicz wrote:
> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> wrote:
>
> Hi Alejandro,
Hi PJ,
<snip>
>> + }
>> +
>> + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
>> + &cxl->cxlds.regs.component,
>> + BIT(CXL_CM_CAP_CAP_ID_RAS));
> I'm going to reiterate a previous concern here with this. When all of
> this was in the CXL core, the CXL core owned whatever BAR these
> registers were in in its entirety. Now with a Type2 device, splitting
> this out has implications.
I have not forgotten your concern and as I said then, I will work on a
follow-up for this once basic Type2 support patchset goes through.
The client linked to this patchset is the sfc driver and we do not have
this problem for the card supporting CXL. But I fully understand this is
a problem for other more than potential Type2 API clients.
> The cxl_map_component_regs() is going to try and map the register map
> you request as a reserved resource, which will fail if this Type2
> driver has the BAR mapped (which basically all of these drivers do).
>
> I think it's worth either a big comment or something explicit in the
> patch description that calls this limitation or restriction out.
> Hardware designers will be caught off-guard if they design their
> hardware where the CXL component regs are in a BAR shared by other
> register maps in their devices. If they land the CXL regs in the
> middle of that BAR, they will have to do some serious gymnastics in the
> drivers to map pieces of their BAR to allow the kernel to map the
> component regs. OR...they can have some breadcrumbs to try and design
> the HW where the CXL component regs are at the very beginning or very
> end of their BAR. That way drivers have an easier way to reserve a
> subset of a contiguous BAR, and allow the kernel to grab the remainder
> for CXL access and management.
I have thought about the proper solution for this and IMO implies to add
a new argument where the client can specify the already mapped memory
for getting the CXL regs available to the CXL core. It should not be too
much complicated, but I prefer to leave it for a follow up. Not sure if
you want something more complicated where the code can solve this
without the driver's write awareness, but the call failing could be more
chatty about this possibility so the user can know.
But I agree the current patchset should at least specifically comment on
this in the code. I will do so in v22, but if there exists generic
concern about this case not being supported in the current work, I'll be
addressing this for such a next patchset version.
Thank you!
>
> I think this is a pretty serious implication that I don't see a way
> around. But letting a HW designer fall into this hole and realize they
> can only fix it with a horrible set of driver hacks, or a silicon
> respin, really sucks.
>
> Cheers,
> -PJ
>
>> + if (rc) {
>> + pci_err(pci_dev, "Failed to map RAS capability.\n");
>> + return rc;
>> + }
>> +
>> + /*
>> + * Set media ready explicitly as there are neither mailbox
>> for checking
>> + * this state nor the CXL register involved, both not
>> mandatory for
>> + * type2.
>> + */
>> + cxl->cxlds.media_ready = true;
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 13d448686189..7f2e23bce1f7 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -70,6 +70,10 @@ struct cxl_regs {
>> );
>> };
>>
>> +#define CXL_CM_CAP_CAP_ID_RAS 0x2
>> +#define CXL_CM_CAP_CAP_ID_HDM 0x5
>> +#define CXL_CM_CAP_CAP_HDM_VERSION 1
>> +
>> struct cxl_reg_map {
>> bool valid;
>> int id;
>> @@ -223,4 +227,19 @@ struct cxl_dev_state
>> *_devm_cxl_dev_state_create(struct device *dev,
>> (drv_struct *)_devm_cxl_dev_state_create(parent,
>> type, serial, dvsec, \
>>
>> sizeof(drv_struct), mbox); \
>> })
>> +
>> +/**
>> + * cxl_map_component_regs - map cxl component registers
>> + *
>> + * @map: cxl register map to update with the mappings
>> + * @regs: cxl component registers to work with
>> + * @map_mask: cxl component regs to map
>> + *
>> + * Returns integer: success (0) or error (-ENOMEM)
>> + *
>> + * Made public for Type2 driver support.
>> + */
>> +int cxl_map_component_regs(const struct cxl_register_map *map,
>> + struct cxl_component_regs *regs,
>> + unsigned long map_mask);
>> #endif /* __CXL_CXL_H__ */
>> diff --git a/include/cxl/pci.h b/include/cxl/pci.h
>> new file mode 100644
>> index 000000000000..a172439f08c6
>> --- /dev/null
>> +++ b/include/cxl/pci.h
>> @@ -0,0 +1,21 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
>> +
>> +#ifndef __CXL_CXL_PCI_H__
>> +#define __CXL_CXL_PCI_H__
>> +
>> +/* Register Block Identifier (RBI) */
>> +enum cxl_regloc_type {
>> + CXL_REGLOC_RBI_EMPTY = 0,
>> + CXL_REGLOC_RBI_COMPONENT,
>> + CXL_REGLOC_RBI_VIRT,
>> + CXL_REGLOC_RBI_MEMDEV,
>> + CXL_REGLOC_RBI_PMU,
>> + CXL_REGLOC_RBI_TYPES
>> +};
>> +
>> +struct cxl_register_map;
>> +
>> +int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type
>> type,
>> + struct cxl_register_map *map);
>> +#endif
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-20 18:27 ` Alejandro Lucero Palau
@ 2025-11-21 12:06 ` Jonathan Cameron
2025-11-21 13:46 ` Alejandro Lucero Palau
0 siblings, 1 reply; 51+ messages in thread
From: Jonathan Cameron @ 2025-11-21 12:06 UTC (permalink / raw)
To: Alejandro Lucero Palau
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On Thu, 20 Nov 2025 18:27:50 +0000
Alejandro Lucero Palau <alucerop@amd.com> wrote:
> On 11/20/25 18:08, Jonathan Cameron wrote:
> > On Wed, 19 Nov 2025 19:22:14 +0000
> > alejandro.lucero-palau@amd.com wrote:
> >
> >> From: Alejandro Lucero <alucerop@amd.com>
> >>
> >> In preparation for always-synchronous memdev attach, refactor memdev
> >> allocation and fix release bug in devm_cxl_add_memdev() when error after
> >> a successful allocation.
> >>
> >> The diff is busy as this moves cxl_memdev_alloc() down below the definition
> >> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
> >> preclude needing to export more symbols from the cxl_core.
> >>
> >> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
> >>
> > No line break here. Fixes is part of the tag block and some tools
> > get grumpy if that isn't contiguous. That includes a bot that runs
> > on linux-next.
> >
>
> OK
>
>
> >> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> >> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> > This SOB chain is wrong. What was Dan's role in this? As first SOB with no
> > Co-developed tag he would normally also be the author (From above)
>
>
> The original patch is Dan's work. I did change it.
>
>
> From the previous revision I asked what I should do and if adding my
> Signed-off to Dan's one would be enough. Dave's answer was a yes.
>
> Someone, likely I, misunderstood something in that exchange.
>
>
> I did add my Signed-off to the patches 1 to 4 along with Dan's ones,
> what I think it was suggested by Dave as well in another review.
>
>
> Please, tell me what should I do here.
Change the author to Dan. IIRC
git commit --amend --author="Dan Williams <dan.j.williams@intel.com>"
should do that for you
Then author and first SoB will be Dan and you will be noting you 'handled'
the patch. Feel free to add a comment # Changed XYZ
to your SoB - or if appropriate a co-developed-by for yourself.
>
>
> Thank you
>
>
> >
> > I'm out of time for today so will leave review for another time. Just flagging
> > that without these tag chains being correct Dave can't pick this up even
> > if everything else is good.
> >
>
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-20 20:27 ` Koralahalli Channabasappa, Smita
@ 2025-11-21 13:41 ` Alejandro Lucero Palau
0 siblings, 0 replies; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-21 13:41 UTC (permalink / raw)
To: Koralahalli Channabasappa, Smita, alejandro.lucero-palau,
linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
On 11/20/25 20:27, Koralahalli Channabasappa, Smita wrote:
> Hi Alejandro,
>
Hi,
<snip>
> On 11/19/2025 11:22 AM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> +
>> +static void __cxlmd_free(struct cxl_memdev *cxlmd)
>> +{
>> + if (IS_ERR(cxlmd))
>> + return;
>> +
>> + if (cxlmd->cxlds)
>> + cxlmd->cxlds->cxlmd = NULL;
>> +
>
> This series caused a NULL deref in devm_cxl_add_memdev().
> __cxlmd_free() only checks IS_ERR(cxlmd) and proceeds to dereference
> cxlmd->cxlds.
>
> Adding a NULL check for cxlmd fixed the crash in my setup.
>
Yes. Believe it or not, but I 'm pretty sure I added that after the
IS_ERR check, but it seems I spoiled it with the refactoring.
But thank you for reporting it. I'll fix it in v22.
Thank you
> BUG: kernel NULL pointer dereference, address: 0000000000000358
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 1553a7067 P4D 0
> Oops: Oops: 0000 [#1] SMP NOPTI
> RIP: 0010:devm_cxl_add_memdev+0x71/0xb0 [cxl_mem]
> Code: 89 c4 e8 c2 c8 be f8 85 c0 75 17 48 89 de 4c 89 ef e8 b3 08 f9
> ff 85 c0 75 08 45 31 e4 45 31 ed eb 08 48 98 49 89 dd 48 89 c3 <49> 8b
> 85 58 03 00 00 48 85 c0 74 08 48 c7 40 08 00 00 00 00 4c 89
> CR2: 0000000000000358 CR3: 00000001553a6002 CR4: 0000000000771ef0
> PKRU: 55555554
> Call Trace:
> <TASK>
> cxl_pci_probe+0x409/0xb00 [cxl_pci]
> ? update_load_avg+0x83/0x780
> local_pci_probe+0x4d/0xb0
> work_for_cpu_fn+0x1e/0x30
> process_scheduled_works+0xa9/0x420
> ? __pfx_worker_thread+0x10/0x10
> worker_thread+0x127/0x270
> ...
>
> Thanks
> Smita
>
>> + put_device(&cxlmd->dev);
>> + kfree(cxlmd);
>> +}
>> +
>> +DEFINE_FREE(cxlmd_free, struct cxl_memdev *, __cxlmd_free(_T))
>> +
>> +/**
>> + * devm_cxl_add_memdev - Add a CXL memory device
>> + * @host: devres alloc/release context and parent for the memdev
>> + * @cxlds: CXL device state to associate with the memdev
>> + *
>> + * Upon return the device will have had a chance to attach to the
>> + * cxl_mem driver, but may fail if the CXL topology is not ready
>> + * (hardware CXL link down, or software platform CXL root not attached)
>> + */
>> +struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
>> + struct cxl_dev_state *cxlds)
>> +{
>> + struct cxl_memdev *cxlmd __free(cxlmd_free) =
>> cxl_memdev_alloc(cxlds);
>> + int rc;
>> +
>> + if (IS_ERR(cxlmd))
>> + return cxlmd;
>> +
>> + rc = dev_set_name(&cxlmd->dev, "mem%d", cxlmd->id);
>> if (rc)
>> - goto err;
>> + return ERR_PTR(rc);
>> - rc = devm_add_action_or_reset(host, cxl_memdev_unregister,
>> cxlmd);
>> + rc = devm_cxl_memdev_add_or_reset(host, cxlmd);
>> if (rc)
>> return ERR_PTR(rc);
>> - return cxlmd;
>> -err:
>> - /*
>> - * The cdev was briefly live, shutdown any ioctl operations that
>> - * saw that state.
>> - */
>> - cxl_memdev_shutdown(dev);
>> - put_device(dev);
>> - return ERR_PTR(rc);
>> + return no_free_ptr(cxlmd);
>> }
>> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
>> diff --git a/drivers/cxl/private.h b/drivers/cxl/private.h
>> new file mode 100644
>> index 000000000000..50c2ac57afb5
>> --- /dev/null
>> +++ b/drivers/cxl/private.h
>> @@ -0,0 +1,10 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/* Copyright(c) 2025 Intel Corporation. */
>> +
>> +/* Private interfaces betwen common drivers ("cxl_mem") and the
>> cxl_core */
>> +
>> +#ifndef __CXL_PRIVATE_H__
>> +#define __CXL_PRIVATE_H__
>> +struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds);
>> +int devm_cxl_memdev_add_or_reset(struct device *host, struct
>> cxl_memdev *cxlmd);
>> +#endif /* __CXL_PRIVATE_H__ */
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-21 12:06 ` Jonathan Cameron
@ 2025-11-21 13:46 ` Alejandro Lucero Palau
0 siblings, 0 replies; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-21 13:46 UTC (permalink / raw)
To: Jonathan Cameron
Cc: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 11/21/25 12:06, Jonathan Cameron wrote:
> On Thu, 20 Nov 2025 18:27:50 +0000
> Alejandro Lucero Palau <alucerop@amd.com> wrote:
>
>> On 11/20/25 18:08, Jonathan Cameron wrote:
>>> On Wed, 19 Nov 2025 19:22:14 +0000
>>> alejandro.lucero-palau@amd.com wrote:
>>>
>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>
>>>> In preparation for always-synchronous memdev attach, refactor memdev
>>>> allocation and fix release bug in devm_cxl_add_memdev() when error after
>>>> a successful allocation.
>>>>
>>>> The diff is busy as this moves cxl_memdev_alloc() down below the definition
>>>> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
>>>> preclude needing to export more symbols from the cxl_core.
>>>>
>>>> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
>>>>
>>> No line break here. Fixes is part of the tag block and some tools
>>> get grumpy if that isn't contiguous. That includes a bot that runs
>>> on linux-next.
>>>
>> OK
>>
>>
>>>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>> This SOB chain is wrong. What was Dan's role in this? As first SOB with no
>>> Co-developed tag he would normally also be the author (From above)
>>
>> The original patch is Dan's work. I did change it.
>>
>>
>> From the previous revision I asked what I should do and if adding my
>> Signed-off to Dan's one would be enough. Dave's answer was a yes.
>>
>> Someone, likely I, misunderstood something in that exchange.
>>
>>
>> I
Oh, the amend for patch 1 and 2 after the refactoring!
Silly me. I will do so.
Thank you
>> did add my Signed-off to the patches 1 to 4 along with Dan's ones,
>> what I think it was suggested by Dave as well in another review.
>>
>>
>> Please, tell me what should I do here.
> Change the author to Dan. IIRC
>
> git commit --amend --author="Dan Williams <dan.j.williams@intel.com>"
>
> should do that for you
>
> Then author and first SoB will be Dan and you will be noting you 'handled'
> the patch. Feel free to add a comment # Changed XYZ
> to your SoB - or if appropriate a co-developed-by for yourself.
>
>
>>
>> Thank you
>>
>>
>>> I'm out of time for today so will leave review for another time. Just flagging
>>> that without these tag chains being correct Dave can't pick this up even
>>> if everything else is good.
>>>
>>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-21 10:40 ` Alejandro Lucero Palau
@ 2025-11-22 1:08 ` PJ Waskiewicz
0 siblings, 0 replies; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-22 1:08 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On Fri, 2025-11-21 at 10:40 +0000, Alejandro Lucero Palau wrote:
>
> On 11/21/25 06:41, PJ Waskiewicz wrote:
> > On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> > wrote:
> >
> > Hi Alejandro,
> >
> > Sorry it's been a bit since I've been able to comment. I've been
> > trying to test these patchsets with varying degrees of success.
> > Still
> > haven't gotten things up and running fully. One comment below.
>
>
> Hi,
>
>
> No worries!
>
>
> > > From: Alejandro Lucero <alucerop@amd.com>
> > >
> > > The patchset should be applied on the described base commit then
> > > applying
> > > Terry's v13 about CXL error handling. The first 4 patches come
> > > from
> > > Dan's
> > > for-6.18/cxl-probe-order branch with minor modifications.
> > >
> > > v21 changes;
> > >
> > > patch1-2: v20 patch1 splitted up doing the code move in the
> > > second
> > > patch in v21. (Jonathan)
> > >
> > > patch1-4: adding my Signed-off tag along with Dan's
> > >
> > > patch5: fix duplication of CXL_NR_PARTITION definition
> > >
> > > patch7: dropped the cxl test fixes removing unused function.
> > > It was
> > > sent independently ahead of this version.
> > >
> > > patch12: optimization for max free space calculation
> > > (Jonathan)
> > >
> > > patch19: optimization for returning on error (Jonathan)
> > I cannot test these v21 patches or the v20 patches for the same
> > reason.
> > I suspect v19 is also affected, but I was stuck on v17 for awhile
> > (b4
> > was really not likely the prereq patches you required to get the
> > tree
> > into a usable state to apply your patchset).
> >
> > When I build and go to install the kernel mods, depmod fails:
> >
> > DEPMOD /lib/modules/6.18.0-rc6+
> > depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_port ->
> > cxl_core
> > depmod: ERROR: Cycle detected: cxl_core -> cxl_mem -> cxl_core
> > depmod: ERROR: Found 3 modules in dependency cycles!
> >
> > I repro'd this on a few different systems, and just finally repro'd
> > this on a box outside of my work network.
> >
> > This is unusable unfortunately, so I can't test this if I wanted
> > to.
>
>
> I have been able to reproduce this, and I think after the changes
> introduced in patches 2 & 3, we also need this:
>
>
>
> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
> index 6b871cbbce13..94a3102ce86b 100644
> --- a/drivers/cxl/Kconfig
> +++ b/drivers/cxl/Kconfig
> @@ -1,6 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0-only
> menuconfig CXL_BUS
> - tristate "CXL (Compute Express Link) Devices Support"
> + bool "CXL (Compute Express Link) Devices Support"
> depends on PCI
> select FW_LOADER
> select FW_UPLOAD
>
>
> This changes implies neither CXL_BUS optionally being a module nor
> cxl_mem.
>
>
> This should be enough for at least you able to test the patchset.
>
>
> If this is agreed, I will send a v22 with it.
This seems reasonable to me to make things work.
I'd definitely want Dave or Dan to weigh in though since this does make
CXL no longer be modular, and it's either built-in or not. Personally,
if this is the price to pay for the non-asynchronous nature of memdev
creation for a Type2 driver, I'm fine with that.
I'm rebuilding now and will rebase my drivers onto this, and hopefully
be testing again over the weekend. If things start looking good, I'll
send some Tested-by: and also review more deeply.
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 08/23] cxl/sfc: Map cxl component regs
2025-11-21 11:01 ` Alejandro Lucero Palau
@ 2025-11-22 1:11 ` PJ Waskiewicz
0 siblings, 0 replies; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-22 1:11 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Edward Cree, Jonathan Cameron, Ben Cheatham
On Fri, 2025-11-21 at 11:01 +0000, Alejandro Lucero Palau wrote:
>
> On 11/21/25 06:54, PJ Waskiewicz wrote:
> > On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> > wrote:
> >
> > Hi Alejandro,
>
>
> Hi PJ,
>
>
> <snip>
>
>
> > > + }
> > > +
> > > + rc = cxl_map_component_regs(&cxl->cxlds.reg_map,
> > > + &cxl->cxlds.regs.component,
> > > + BIT(CXL_CM_CAP_CAP_ID_RAS));
> > I'm going to reiterate a previous concern here with this. When all
> > of
> > this was in the CXL core, the CXL core owned whatever BAR these
> > registers were in in its entirety. Now with a Type2 device,
> > splitting
> > this out has implications.
>
>
> I have not forgotten your concern and as I said then, I will work on
> a
> follow-up for this once basic Type2 support patchset goes through.
>
> The client linked to this patchset is the sfc driver and we do not
> have
> this problem for the card supporting CXL. But I fully understand this
> is
> a problem for other more than potential Type2 API clients.
>
>
> > The cxl_map_component_regs() is going to try and map the register
> > map
> > you request as a reserved resource, which will fail if this Type2
> > driver has the BAR mapped (which basically all of these drivers
> > do).
> >
> > I think it's worth either a big comment or something explicit in
> > the
> > patch description that calls this limitation or restriction out.
> > Hardware designers will be caught off-guard if they design their
> > hardware where the CXL component regs are in a BAR shared by other
> > register maps in their devices. If they land the CXL regs in the
> > middle of that BAR, they will have to do some serious gymnastics in
> > the
> > drivers to map pieces of their BAR to allow the kernel to map the
> > component regs. OR...they can have some breadcrumbs to try and
> > design
> > the HW where the CXL component regs are at the very beginning or
> > very
> > end of their BAR. That way drivers have an easier way to reserve a
> > subset of a contiguous BAR, and allow the kernel to grab the
> > remainder
> > for CXL access and management.
>
>
> I have thought about the proper solution for this and IMO implies to
> add
> a new argument where the client can specify the already mapped memory
> for getting the CXL regs available to the CXL core. It should not be
> too
> much complicated, but I prefer to leave it for a follow up. Not sure
> if
> you want something more complicated where the code can solve this
> without the driver's write awareness, but the call failing could be
> more
> chatty about this possibility so the user can know.
That would be a good addition. Maybe something to indicate "hey, go
check if someone else already claimed ownership of this memory region"
instead of using a divining rod to find this in /proc/iomem on a hunch.
:)
>
>
> But I agree the current patchset should at least specifically comment
> on
> this in the code. I will do so in v22, but if there exists generic
> concern about this case not being supported in the current work, I'll
> be
> addressing this for such a next patchset version.
If you could capture this in either a comment or just the patch
description, I feel like there's enough paper-trail for people doing
this sort of design work should be informed.
I'd just hate to see all this work you're doing to make it in, and a
hardware designer somewhere not knowing the restrictions, and getting
irritated when their shiny new chip doesn't work with your code. We
can at least help them with documentation.
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-19 19:22 ` [PATCH v21 15/23] sfc: get endpoint decoder alejandro.lucero-palau
@ 2025-11-26 1:27 ` PJ Waskiewicz
2025-11-26 9:09 ` Alejandro Lucero Palau
0 siblings, 1 reply; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-26 1:27 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Martin Habets, Edward Cree, Jonathan Cameron,
Ben Cheatham
Hi Alejandro,
On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Use cxl api for getting DPA (Device Physical Address) to use through
> an
> endpoint decoder.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
> b/drivers/net/ethernet/sfc/efx_cxl.c
> index d7c34c978434..1a50bb2c0913 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
> *probe_data)
> return -ENOSPC;
> }
>
> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
> + EFX_CTPIO_BUFFER_SIZE);
I've been really struggling to get this flow working in my environment.
The above function call has a call-chain like this:
- cxl_request_dpa()
=> cxl_dpa_alloc()
=> __cxl_dpa_alloc()
=> __cxl_dpa_reserve()
=> __request_region()
That last call to __request_region() is not handling a Type2 device
that has its mem region defined as EFI Special Purpose memory.
Basically if the underlying hardware has the memory region marked that
way, it's still getting mapped into the host's physical address map,
but it's explicitly telling the OS to bugger off and not try to map it
as system RAM, which is what we want. Since this is being used as an
acceleration path, we don't want the OS to muck about with it.
The issue here is now that I have to build CXL into the kernel itself
to get around the circular dependency issue with depmod, I see this
when my kernel boots and the device trains, but *before* my driver
loads:
# cat /proc/iomem
[...snip...]
c050000000-c08fffffff : CXL Window 0
c050000000-c08fffffff : Soft Reserved
That right there is my device. And it's being presented correctly that
it's reserved so the OS doesn't mess with it. However, that call to
__request_region() fails with -EBUSY since it can't take ownership of
that region since it's already owned by the core.
I can't just skip over this flow for DPA init, so I'm at a bit of a
loss how to proceed. How is your device presenting the .mem region to
the host?
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-26 1:27 ` PJ Waskiewicz
@ 2025-11-26 9:09 ` Alejandro Lucero Palau
2025-11-26 18:35 ` PJ Waskiewicz
0 siblings, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-26 9:09 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
On 11/26/25 01:27, PJ Waskiewicz wrote:
> Hi Alejandro,
>
> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Use cxl api for getting DPA (Device Physical Address) to use through
>> an
>> endpoint decoder.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>> ---
>> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>> b/drivers/net/ethernet/sfc/efx_cxl.c
>> index d7c34c978434..1a50bb2c0913 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
>> *probe_data)
>> return -ENOSPC;
>> }
>>
>> + cxl->cxled = cxl_request_dpa(cxl->cxlmd, CXL_PARTMODE_RAM,
>> + EFX_CTPIO_BUFFER_SIZE);
> I've been really struggling to get this flow working in my environment.
> The above function call has a call-chain like this:
>
> - cxl_request_dpa()
> => cxl_dpa_alloc()
> => __cxl_dpa_alloc()
> => __cxl_dpa_reserve()
> => __request_region()
>
> That last call to __request_region() is not handling a Type2 device
> that has its mem region defined as EFI Special Purpose memory.
> Basically if the underlying hardware has the memory region marked that
> way, it's still getting mapped into the host's physical address map,
> but it's explicitly telling the OS to bugger off and not try to map it
> as system RAM, which is what we want. Since this is being used as an
> acceleration path, we don't want the OS to muck about with it.
>
> The issue here is now that I have to build CXL into the kernel itself
> to get around the circular dependency issue with depmod, I see this
> when my kernel boots and the device trains, but *before* my driver
> loads:
>
> # cat /proc/iomem
> [...snip...]
> c050000000-c08fffffff : CXL Window 0
> c050000000-c08fffffff : Soft Reserved
>
> That right there is my device. And it's being presented correctly that
> it's reserved so the OS doesn't mess with it. However, that call to
> __request_region() fails with -EBUSY since it can't take ownership of
> that region since it's already owned by the core.
>
> I can't just skip over this flow for DPA init, so I'm at a bit of a
> loss how to proceed. How is your device presenting the .mem region to
> the host?
Hi PJ,
My work is based on the device not using EFI_CONVENTIONAL_MEMORY +
EFI_MEMORY_SP but EFI_RESERVED_TYPE. In the first case the kernel can
try to use that memory and the BIOS goes through default initialization,
the latter will avoid BIOS or kernel to mess with such a memory. Because
there is no BIOS yet supporting this I had to remove DAX support from
the kernel and deal (for testing) with some BIOS initialization we will
not have in production.
For your case I thought this work
https://lore.kernel.org/linux-cxl/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com/T/#me2bc0d25a2129993e68df444aae073addf886751
was solving your problem but after looking at it now, I think that will
only be useful for Type3 and the hotplug case. Maybe it is time to add
Type2 handling there. I'll study that patchset with more detail and
comment for solving your case.
FWIW, last year in Vienna I raised the concern of the kernel doing
exactly what you are witnessing, and I proposed having a way for taking
the device/memory from DAX but I was told unanimously that was not
necessary and if the BIOS did the wrong thing, not fixing that in the
kernel. In hindsight I would say this conflict was not well understood
then (me included) with all the details, so maybe it is time to have
this capacity, maybe from user space or maybe specific kernel param
triggering the device passing from DAX.
>
> Cheers,
> -PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-26 9:09 ` Alejandro Lucero Palau
@ 2025-11-26 18:35 ` PJ Waskiewicz
2025-11-27 9:08 ` Alejandro Lucero Palau
2025-12-02 16:35 ` Dave Jiang
0 siblings, 2 replies; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-26 18:35 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
Hi Alejandro,
On Wed, 2025-11-26 at 09:09 +0000, Alejandro Lucero Palau wrote:
>
> On 11/26/25 01:27, PJ Waskiewicz wrote:
> > Hi Alejandro,
> >
> > On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> > wrote:
> > > From: Alejandro Lucero <alucerop@amd.com>
> > >
> > > Use cxl api for getting DPA (Device Physical Address) to use
> > > through
> > > an
> > > endpoint decoder.
> > >
> > > Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> > > Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
> > > Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > ---
> > > drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
> > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
> > > b/drivers/net/ethernet/sfc/efx_cxl.c
> > > index d7c34c978434..1a50bb2c0913 100644
> > > --- a/drivers/net/ethernet/sfc/efx_cxl.c
> > > +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> > > @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
> > > *probe_data)
> > > return -ENOSPC;
> > > }
> > >
> > > + cxl->cxled = cxl_request_dpa(cxl->cxlmd,
> > > CXL_PARTMODE_RAM,
> > > + EFX_CTPIO_BUFFER_SIZE);
> > I've been really struggling to get this flow working in my
> > environment.
> > The above function call has a call-chain like this:
> >
> > - cxl_request_dpa()
> > => cxl_dpa_alloc()
> > => __cxl_dpa_alloc()
> > => __cxl_dpa_reserve()
> > => __request_region()
> >
> > That last call to __request_region() is not handling a Type2 device
> > that has its mem region defined as EFI Special Purpose memory.
> > Basically if the underlying hardware has the memory region marked
> > that
> > way, it's still getting mapped into the host's physical address
> > map,
> > but it's explicitly telling the OS to bugger off and not try to map
> > it
> > as system RAM, which is what we want. Since this is being used as
> > an
> > acceleration path, we don't want the OS to muck about with it.
> >
> > The issue here is now that I have to build CXL into the kernel
> > itself
> > to get around the circular dependency issue with depmod, I see this
> > when my kernel boots and the device trains, but *before* my driver
> > loads:
> >
> > # cat /proc/iomem
> > [...snip...]
> > c050000000-c08fffffff : CXL Window 0
> > c050000000-c08fffffff : Soft Reserved
> >
> > That right there is my device. And it's being presented correctly
> > that
> > it's reserved so the OS doesn't mess with it. However, that call
> > to
> > __request_region() fails with -EBUSY since it can't take ownership
> > of
> > that region since it's already owned by the core.
> >
> > I can't just skip over this flow for DPA init, so I'm at a bit of a
> > loss how to proceed. How is your device presenting the .mem region
> > to
> > the host?
>
>
> Hi PJ,
>
>
> My work is based on the device not using EFI_CONVENTIONAL_MEMORY +
> EFI_MEMORY_SP but EFI_RESERVED_TYPE. In the first case the kernel can
> try to use that memory and the BIOS goes through default
> initialization,
I'm not sure I follow. Your device is based on using
EFI_RESERVED_TYPE? Or is it based on the former? My device is based
on EFI_RESERVED_TYPE, which translates into the Soft Reserved status as
a result of BIOS enumeration and the CXL core enumerating that memory
resource.
> the latter will avoid BIOS or kernel to mess with such a memory.
> Because
> there is no BIOS yet supporting this I had to remove DAX support from
> the kernel and deal (for testing) with some BIOS initialization we
> will
> not have in production.
Can you elaborate what you mean here? Do you mean the proposed patches
here are trying to work around this BIOS limitation?
I'm not sure I understand what BIOS limitations you mean though. I see
on both an AMD and Intel host (CXL 2.0-capable) the same behavior that
I'd expect of EFI_RESERVED_TYPE getting set aside so the OS doesn't
mess with it. This is on CRB-level stuff plus production-level
platforms.
>
>
> For your case I thought this work
> https://lore.kernel.org/linux-cxl/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com/T/#me2bc0d25a2129993e68df444aae073addf886751
>
> was solving your problem but after looking at it now, I think that
> will
> only be useful for Type3 and the hotplug case. Maybe it is time to
> add
> Type2 handling there. I'll study that patchset with more detail and
> comment for solving your case.
I just looked through that, and I might be able to cherry-pick some
stuff. I'll do the same offline and see if I can come up with a
workable solution to get past this wall for now.
That said though, I don't really want or care about DAX. I can already
find and map the underlying CXL.mem accelerated region through other
means (RCRB, DVSEC, etc.).
What I'm trying to do is get the regionX object to instantiate on my
CXL.mem memory block, so that I can remove the region, ultimately
tearing down the decoders, and allowing me to hotplug the device. The
patches here seem to still assume a Type3-ish device where there's DPA
needing to get mapped into HPA, which our devices are already allocated
in the decoders due to the EFI_RESERVED_TYPE enumeration. But the
patches aren't seeing that firmware already set them up, since the
decoders haven't been committed yet.
My root decoder has 1GB of space, which is the size of my endpoint
device's memory size (1GB). There is no DPA to map, and the HPA
already appears "full" since the device is already configured in the
decoder.
TL;DR: if your device you're testing with presents the CXL.mem region
as EFI_RESERVED_TYPE, I don't see how these patches are working.
Unless you're doing something extra outside of the patches, which isn't
obvious to me.
>
>
> FWIW, last year in Vienna I raised the concern of the kernel doing
> exactly what you are witnessing, and I proposed having a way for
> taking
> the device/memory from DAX but I was told unanimously that was not
> necessary and if the BIOS did the wrong thing, not fixing that in the
> kernel. In hindsight I would say this conflict was not well
> understood
> then (me included) with all the details, so maybe it is time to have
> this capacity, maybe from user space or maybe specific kernel param
> triggering the device passing from DAX.
I do recall this. Unfortunately I brought up similar concerns way back
in Dublin in 2021 regarding all of this flow well before 2.0-capable
hosts arrived. I think I started asking the questions way too early,
since this was of little to no concern at the time (nor was Type2
device support).
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-26 18:35 ` PJ Waskiewicz
@ 2025-11-27 9:08 ` Alejandro Lucero Palau
2025-12-02 8:49 ` PJ Waskiewicz
2025-12-02 16:35 ` Dave Jiang
1 sibling, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-27 9:08 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
On 11/26/25 18:35, PJ Waskiewicz wrote:
> Hi Alejandro,
>
> On Wed, 2025-11-26 at 09:09 +0000, Alejandro Lucero Palau wrote:
>> On 11/26/25 01:27, PJ Waskiewicz wrote:
>>> Hi Alejandro,
>>>
>>> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
>>> wrote:
>>>> From: Alejandro Lucero <alucerop@amd.com>
>>>>
>>>> Use cxl api for getting DPA (Device Physical Address) to use
>>>> through
>>>> an
>>>> endpoint decoder.
>>>>
>>>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>>>> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
>>>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
>>>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
>>>> ---
>>>> drivers/net/ethernet/sfc/efx_cxl.c | 12 +++++++++++-
>>>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c
>>>> b/drivers/net/ethernet/sfc/efx_cxl.c
>>>> index d7c34c978434..1a50bb2c0913 100644
>>>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>>>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>>>> @@ -108,6 +108,14 @@ int efx_cxl_init(struct efx_probe_data
>>>> *probe_data)
>>>> return -ENOSPC;
>>>> }
>>>>
>>>> + cxl->cxled = cxl_request_dpa(cxl->cxlmd,
>>>> CXL_PARTMODE_RAM,
>>>> + EFX_CTPIO_BUFFER_SIZE);
>>> I've been really struggling to get this flow working in my
>>> environment.
>>> The above function call has a call-chain like this:
>>>
>>> - cxl_request_dpa()
>>> => cxl_dpa_alloc()
>>> => __cxl_dpa_alloc()
>>> => __cxl_dpa_reserve()
>>> => __request_region()
>>>
>>> That last call to __request_region() is not handling a Type2 device
>>> that has its mem region defined as EFI Special Purpose memory.
>>> Basically if the underlying hardware has the memory region marked
>>> that
>>> way, it's still getting mapped into the host's physical address
>>> map,
>>> but it's explicitly telling the OS to bugger off and not try to map
>>> it
>>> as system RAM, which is what we want. Since this is being used as
>>> an
>>> acceleration path, we don't want the OS to muck about with it.
>>>
>>> The issue here is now that I have to build CXL into the kernel
>>> itself
>>> to get around the circular dependency issue with depmod, I see this
>>> when my kernel boots and the device trains, but *before* my driver
>>> loads:
>>>
>>> # cat /proc/iomem
>>> [...snip...]
>>> c050000000-c08fffffff : CXL Window 0
>>> c050000000-c08fffffff : Soft Reserved
>>>
>>> That right there is my device. And it's being presented correctly
>>> that
>>> it's reserved so the OS doesn't mess with it. However, that call
>>> to
>>> __request_region() fails with -EBUSY since it can't take ownership
>>> of
>>> that region since it's already owned by the core.
>>>
>>> I can't just skip over this flow for DPA init, so I'm at a bit of a
>>> loss how to proceed. How is your device presenting the .mem region
>>> to
>>> the host?
>>
>> Hi PJ,
>>
>>
>> My work is based on the device not using EFI_CONVENTIONAL_MEMORY +
>> EFI_MEMORY_SP but EFI_RESERVED_TYPE. In the first case the kernel can
>> try to use that memory and the BIOS goes through default
>> initialization,
> I'm not sure I follow. Your device is based on using
> EFI_RESERVED_TYPE? Or is it based on the former? My device is based
> on EFI_RESERVED_TYPE, which translates into the Soft Reserved status as
> a result of BIOS enumeration and the CXL core enumerating that memory
> resource.
My device will be based on EFI_RESERVED_TYPE but it has no that special
flag with the devices I got for testing, so the BIOS passes it to the
kernel in the HMAT as EFI_CONVENTIONAL_MEMORY + EFI_MEMORY_SP.
Not sure what you mean with that CXL core enumeration. With a type2,
supposedly not a PCI MEMORY CLASS, the CXL PCI driver can not attach to
it. AFAIK, that memory region can only be used, or what I think is
problem you have, reserved/allocated in terms of resources, by DAX/HMEM.
However, I thought with EFI_RESERVED_TYPE that could not happen at all.
If so, I would say it is fully wrong. If not, what is the meaning of
this EFI memory type from the kernel point of view?
>> the latter will avoid BIOS or kernel to mess with such a memory.
>> Because
>> there is no BIOS yet supporting this I had to remove DAX support from
>> the kernel and deal (for testing) with some BIOS initialization we
>> will
>> not have in production.
> Can you elaborate what you mean here? Do you mean the proposed patches
> here are trying to work around this BIOS limitation?
Apologies for not being clearer here. The proposed patches discussed
here are the right ones for our device ... once we got all the pieces
together, chiefly our UEFI driver advertising that special flag and the
(AMD) BIOS supporting it. If you got such a BIOS from AMD, lucky you!
>
> I'm not sure I understand what BIOS limitations you mean though. I see
> on both an AMD and Intel host (CXL 2.0-capable) the same behavior that
> I'd expect of EFI_RESERVED_TYPE getting set aside so the OS doesn't
> mess with it. This is on CRB-level stuff plus production-level
> platforms.
From your previous emails, the systems seems to detect the memory as
Soft Reserved ... which implies DAX can use it.
Not sure if you confirmed the flag being used by the kernel from the
HMAT table, but worth to double check if not.
>>
>> For your case I thought this work
>> https://lore.kernel.org/linux-cxl/20251120031925.87762-1-Smita.KoralahalliChannabasappa@amd.com/T/#me2bc0d25a2129993e68df444aae073addf886751
>>
>> was solving your problem but after looking at it now, I think that
>> will
>> only be useful for Type3 and the hotplug case. Maybe it is time to
>> add
>> Type2 handling there. I'll study that patchset with more detail and
>> comment for solving your case.
> I just looked through that, and I might be able to cherry-pick some
> stuff. I'll do the same offline and see if I can come up with a
> workable solution to get past this wall for now.
>
> That said though, I don't really want or care about DAX. I can already
> find and map the underlying CXL.mem accelerated region through other
> means (RCRB, DVSEC, etc.).
>
> What I'm trying to do is get the regionX object to instantiate on my
> CXL.mem memory block, so that I can remove the region, ultimately
> tearing down the decoders, and allowing me to hotplug the device. The
> patches here seem to still assume a Type3-ish device where there's DPA
> needing to get mapped into HPA, which our devices are already allocated
> in the decoders due to the EFI_RESERVED_TYPE enumeration. But the
> patches aren't seeing that firmware already set them up, since the
> decoders haven't been committed yet.
If you mean the HDM decoders are configured by the BIOS and the CXL Host
Bridge is also with also the right configuration for redirecting to the
CXL Root Port your device is attached to, the (AMD) BIOS is doing so
without the EFI_RESERVED_TYPE as well. So apart from that potential
conflict with DAX/HMEM, which I'm not sure it is happening, you could be
facing here the problem of the current patches not supporting a Type2
device with already committed HDM, but you are saying yours not having
it ... Annyways, Benjamin Cheatham pointed out this other problem which
I was also aware of due to my testing, but as I said when he brought it,
I would prefer to support that as a follow-up work as the client behind
this initial (and basic) Type2 support, the sfc driver, not requiring it.
>
> My root decoder has 1GB of space, which is the size of my endpoint
> device's memory size (1GB). There is no DPA to map, and the HPA
> already appears "full" since the device is already configured in the
> decoder.
This makes me to think it is weird your device HDM not committed. BTW,
is the Root decoder CFMWS size the same in Intel and AMD systems? I bet
it is not from discussing this with Dan and cia, but curious to know in
your case.
>
> TL;DR: if your device you're testing with presents the CXL.mem region
> as EFI_RESERVED_TYPE, I don't see how these patches are working.
> Unless you're doing something extra outside of the patches, which isn't
> obvious to me.
Yes, sorry, that is the case. I'm applying some dirty changes to these
patches for testing with my current testing devices, including the BIOS
and the Host.
>>
>> FWIW, last year in Vienna I raised the concern of the kernel doing
>> exactly what you are witnessing, and I proposed having a way for
>> taking
>> the device/memory from DAX but I was told unanimously that was not
>> necessary and if the BIOS did the wrong thing, not fixing that in the
>> kernel. In hindsight I would say this conflict was not well
>> understood
>> then (me included) with all the details, so maybe it is time to have
>> this capacity, maybe from user space or maybe specific kernel param
>> triggering the device passing from DAX.
> I do recall this. Unfortunately I brought up similar concerns way back
> in Dublin in 2021 regarding all of this flow well before 2.0-capable
> hosts arrived. I think I started asking the questions way too early,
> since this was of little to no concern at the time (nor was Type2
> device support).
Maybe we can make the case now. I'll seize LPC to discuss this further.
Will you be there?
Thank you
>
> Cheers,
> -PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
` (23 preceding siblings ...)
2025-11-21 6:41 ` [PATCH v21 00/23] Type2 device basic support PJ Waskiewicz
@ 2025-11-28 19:44 ` PJ Waskiewicz
2025-11-28 20:29 ` Alejandro Lucero Palau
24 siblings, 1 reply; 51+ messages in thread
From: PJ Waskiewicz @ 2025-11-28 19:44 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
Hi Alejandro,
On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> The patchset should be applied on the described base commit then
> applying
> Terry's v13 about CXL error handling. The first 4 patches come from
> Dan's
> for-6.18/cxl-probe-order branch with minor modifications.
>
> v21 changes;
>
> patch1-2: v20 patch1 splitted up doing the code move in the second
> patch in v21. (Jonathan)
>
> patch1-4: adding my Signed-off tag along with Dan's
>
> patch5: fix duplication of CXL_NR_PARTITION definition
>
> patch7: dropped the cxl test fixes removing unused function. It was
> sent independently ahead of this version.
>
> patch12: optimization for max free space calculation (Jonathan)
>
> patch19: optimization for returning on error (Jonathan)
>
So I'm unable to get these patches working with a Type2 device that
just needs its existing resources auto-discovered by the CXL core.
These patches are assuming the underlying device will require full
setup and allocations for DPA and HPA. I'd argue that a true Type2
device will not be doing that today with existing BIOS implementations.
I've tested this behavior on both Intel and AMD platforms (GNR and
Turin), and they're behaving the same way. Both will train up the
Type2 device, see there's an advertised CXL.mem region marked EFI
Special Purpose memory, and will map it and program the decoders.
These patches partially see those decoders are already programmed, but
does not bypass that fact, and still attemps to dynamically allocate,
configure, and commit, the whole flow. This assumption fails the init
path.
I think there needs to be a bit of a re-think here. I briefly chatted
with Dan offline about this, and we do think a different approach is
likely needed. The current CXL core for Type3 devices can handle when
the BIOS/platform firmware already discovers and maps resources, so we
should be able to do that for this case.
If you're going to be at Plumbers in a week or so, this would be a
great topic if we could grab a whiteboard somewhere and just hack on
it. Otherwise we can also chat on the Discord (I just joined finally).
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-28 19:44 ` PJ Waskiewicz
@ 2025-11-28 20:29 ` Alejandro Lucero Palau
2025-11-29 16:26 ` Alejandro Lucero Palau
0 siblings, 1 reply; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-28 20:29 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 11/28/25 19:44, PJ Waskiewicz wrote:
> Hi Alejandro,
>
> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
> wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> The patchset should be applied on the described base commit then
>> applying
>> Terry's v13 about CXL error handling. The first 4 patches come from
>> Dan's
>> for-6.18/cxl-probe-order branch with minor modifications.
>>
>> v21 changes;
>>
>> patch1-2: v20 patch1 splitted up doing the code move in the second
>> patch in v21. (Jonathan)
>>
>> patch1-4: adding my Signed-off tag along with Dan's
>>
>> patch5: fix duplication of CXL_NR_PARTITION definition
>>
>> patch7: dropped the cxl test fixes removing unused function. It was
>> sent independently ahead of this version.
>>
>> patch12: optimization for max free space calculation (Jonathan)
>>
>> patch19: optimization for returning on error (Jonathan)
>>
> So I'm unable to get these patches working with a Type2 device that
> just needs its existing resources auto-discovered by the CXL core.
> These patches are assuming the underlying device will require full
> setup and allocations for DPA and HPA. I'd argue that a true Type2
> device will not be doing that today with existing BIOS implementations.
Well, I'd argue this patchset is what the sfc driver needs, which is the
client for this "initial Type2 basic support".
> I've tested this behavior on both Intel and AMD platforms (GNR and
> Turin), and they're behaving the same way. Both will train up the
> Type2 device, see there's an advertised CXL.mem region marked EFI
> Special Purpose memory, and will map it and program the decoders.
> These patches partially see those decoders are already programmed, but
> does not bypass that fact, and still attemps to dynamically allocate,
> configure, and commit, the whole flow. This assumption fails the init
> path.
Fair enough. We knew about this and as I said, something I would prefer
to do as a follow up work or this patchset will be delayed, likely until
a new requirement is found out like the problem about DVSEC BAR already
being mapped, then waiting for the next thing not covered in this
"initial Type2 basic support".
>
> I think there needs to be a bit of a re-think here. I briefly chatted
> with Dan offline about this, and we do think a different approach is
> likely needed. The current CXL core for Type3 devices can handle when
> the BIOS/platform firmware already discovers and maps resources, so we
> should be able to do that for this case.
I'm sad to hear that ... I'm getting internal pressure for getting this
Type2 done and I realize now it will require "a different approach" for
being accepted.
Being honest, this is quite demoralizing. Maybe I'm not the right person
to get this through.
> If you're going to be at Plumbers in a week or so, this would be a
> great topic if we could grab a whiteboard somewhere and just hack on
> it. Otherwise we can also chat on the Discord (I just joined finally)
>
> Cheers,
> -PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 00/23] Type2 device basic support
2025-11-28 20:29 ` Alejandro Lucero Palau
@ 2025-11-29 16:26 ` Alejandro Lucero Palau
0 siblings, 0 replies; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-11-29 16:26 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
On 11/28/25 20:29, Alejandro Lucero Palau wrote:
>
> On 11/28/25 19:44, PJ Waskiewicz wrote:
>> Hi Alejandro,
>>
>> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
>> wrote:
>>> From: Alejandro Lucero <alucerop@amd.com>
>>>
>>> The patchset should be applied on the described base commit then
>>> applying
>>> Terry's v13 about CXL error handling. The first 4 patches come from
>>> Dan's
>>> for-6.18/cxl-probe-order branch with minor modifications.
>>>
>>> v21 changes;
>>>
>>> patch1-2: v20 patch1 splitted up doing the code move in the second
>>> patch in v21. (Jonathan)
>>> patch1-4: adding my Signed-off tag along with Dan's
>>>
>>> patch5: fix duplication of CXL_NR_PARTITION definition
>>>
>>> patch7: dropped the cxl test fixes removing unused function. It was
>>> sent independently ahead of this version.
>>>
>>> patch12: optimization for max free space calculation (Jonathan)
>>>
>>> patch19: optimization for returning on error (Jonathan)
>>>
>> So I'm unable to get these patches working with a Type2 device that
>> just needs its existing resources auto-discovered by the CXL core.
>> These patches are assuming the underlying device will require full
>> setup and allocations for DPA and HPA. I'd argue that a true Type2
>> device will not be doing that today with existing BIOS implementations.
>
>
> Well, I'd argue this patchset is what the sfc driver needs, which is
> the client for this "initial Type2 basic support".
>
>
>> I've tested this behavior on both Intel and AMD platforms (GNR and
>> Turin), and they're behaving the same way. Both will train up the
>> Type2 device, see there's an advertised CXL.mem region marked EFI
>> Special Purpose memory, and will map it and program the decoders.
>> These patches partially see those decoders are already programmed, but
>> does not bypass that fact, and still attemps to dynamically allocate,
>> configure, and commit, the whole flow. This assumption fails the init
>> path.
>
>
> Fair enough. We knew about this and as I said, something I would
> prefer to do as a follow up work or this patchset will be delayed,
> likely until a new requirement is found out like the problem about
> DVSEC BAR already being mapped, then waiting for the next thing not
> covered in this "initial Type2 basic support".
>
>
>>
>> I think there needs to be a bit of a re-think here. I briefly chatted
>> with Dan offline about this, and we do think a different approach is
>> likely needed. The current CXL core for Type3 devices can handle when
>> the BIOS/platform firmware already discovers and maps resources, so we
>> should be able to do that for this case.
>
>
> I'm sad to hear that ... I'm getting internal pressure for getting
> this Type2 done and I realize now it will require "a different
> approach" for being accepted.
>
>
> Being honest, this is quite demoralizing. Maybe I'm not the right
> person to get this through.
>
>
Feeling more positive today ...
Looking at my hack for solving this problem, what I suffered (as I did
explain) with my testing with a BIOS not supporting yet the
EFI_RESERVED_TYPE and the EFI_ADAPTER_INFO_PROTOCOL protocol, I think
it is possible to include the changes with some minor adjustments and
without too much code. It relies on the same code than for Type3 when
initializing the endpoint decoder, so the region will be created
automatically at that point, although through the device creation +
probe for the port and the region, so the way for the type2 driver to
get the HPA to work with (iorenmap) needs to be based on other means
since the region could not be there after the call for creation the memdev.
It brings other things to discuss about what the type2 should do on
exit, since the endpoint HDM and the CXL Host Bridge HDM should not be
modified then when doing the unwinding. So the sooner we can see what
could be done the better for starting such discussion.
I will send v22 including this functionality early next week. Benjamin
Cheatham solved this problem with a different approach, what, IMO, is
more complex, mainly due to the region creation when endpoint decoder is
initialised precluded by a check, which interestingly Ben proposed in
earlier patchset versions ...
>
>> If you're going to be at Plumbers in a week or so, this would be a
>> great topic if we could grab a whiteboard somewhere and just hack on
>> it. Otherwise we can also chat on the Discord (I just joined finally)
>>
>> Cheers,
>> -PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
2025-11-20 18:08 ` Jonathan Cameron
2025-11-20 20:27 ` Koralahalli Channabasappa, Smita
@ 2025-12-02 2:52 ` dan.j.williams
2025-12-02 4:58 ` dan.j.williams
2025-12-02 8:47 ` Alejandro Lucero Palau
2 siblings, 2 replies; 51+ messages in thread
From: dan.j.williams @ 2025-12-02 2:52 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> In preparation for always-synchronous memdev attach, refactor memdev
> allocation and fix release bug in devm_cxl_add_memdev() when error after
> a successful allocation.
Never do "refactor and fix". Always do "fix" then "refactor" separately.
In this case though I wonder what release bug you are referring to?
If cxl_memdev_alloc() fails, nothing to free.
If dev_set_name() fails, it puts the device which calls
cxl_memdev_release() which undoes cxl_memdev_alloc(). (Now, that weird
and busted devm_cxl_memdev_edac_release() somehow snuck into
cxl_memdev_release() when I was not looking. I will fix that separately,
but no leak there that I can see.)
If cdev_device_add() fails we need to shutdown the ioctl path, but
otherwise put_device() cleans everything up.
If the devm_add_action_or_reset() fails the device needs to be both
unregistered and final put. It does not use device_unregister() because
the cdev also needs to be deleted. So cdev_device_del() handles the
device_del() and the caller is responsible for the final put_device().
What bug are you referring to?
> The diff is busy as this moves cxl_memdev_alloc() down below the definition
> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
> preclude needing to export more symbols from the cxl_core.
Will need to read the code to figure out what this patch is trying to do
because this changelog is not orienting me to the problem that is being
solved.
> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
Maybe this Fixes: tag is wrong and this is instead a bug introduced by
my probe order RFC? At least Jonathan pinged me about a bug there that I
will go look at next.
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Why does this have my Sign-off?
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/memdev.c | 134 +++++++++++++++++++++-----------------
> drivers/cxl/private.h | 10 +++
> 2 files changed, 86 insertions(+), 58 deletions(-)
> create mode 100644 drivers/cxl/private.h
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index e370d733e440..8de19807ac7b 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -8,6 +8,7 @@
> #include <linux/idr.h>
> #include <linux/pci.h>
> #include <cxlmem.h>
> +#include "private.h"
> #include "trace.h"
> #include "core.h"
>
> @@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
>
> static struct lock_class_key cxl_memdev_key;
>
> -static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> - const struct file_operations *fops)
> +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
Can you say more why Type-2 drivers need an "_or_reset()" export? If a
Type-2 driver is calling devm_cxl_add_memdev() from its ->probe()
routine, then just return on failure. Confused.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-12-02 2:52 ` dan.j.williams
@ 2025-12-02 4:58 ` dan.j.williams
2025-12-02 8:47 ` Alejandro Lucero Palau
1 sibling, 0 replies; 51+ messages in thread
From: dan.j.williams @ 2025-12-02 4:58 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero
dan.j.williams@ wrote:
[..]
> > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > index e370d733e440..8de19807ac7b 100644
> > --- a/drivers/cxl/core/memdev.c
> > +++ b/drivers/cxl/core/memdev.c
> > @@ -8,6 +8,7 @@
> > #include <linux/idr.h>
> > #include <linux/pci.h>
> > #include <cxlmem.h>
> > +#include "private.h"
> > #include "trace.h"
> > #include "core.h"
> >
> > @@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
> >
> > static struct lock_class_key cxl_memdev_key;
> >
> > -static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> > - const struct file_operations *fops)
> > +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
>
> Can you say more why Type-2 drivers need an "_or_reset()" export? If a
> Type-2 driver is calling devm_cxl_add_memdev() from its ->probe()
> routine, then just return on failure. Confused.
Oh is this replacing my "cxl/mem: Arrange for always-synchronous memdev
attach"? I see an _or_reset method in there too. So the description in
my changelog, that did not get carried over to this replacement patch,
is a Type-2 driver may want to fallback PCIe only operation and not rely
on probe failure to cleanup the aborted memdev setup.
...but really that quick and dirty patch from me was poor quality and
introduced bugs. Here is a much smaller version that achieves the same
result and drops the opportunity for new bugs. I will send this out
after rebasing that whole probe-order branch.
-- >8 --
From c41c535392ed3fcf203d754b8468a5ca91d83438 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams@intel.com>
Date: Thu, 31 Jul 2025 14:15:05 -0700
Subject: [PATCH] cxl/mem: Arrange for always-synchronous memdev attach
In preparation for CXL accelerator drivers that have a hard dependency on
CXL capability initialization, arrange for the endpoint probe result to be
conveyed to the caller of devm_cxl_add_memdev().
As it stands cxl_pci does not care about the attach state of the cxl_memdev
because all generic memory expansion functionality can be handled by the
cxl_core. For accelerators, that driver needs to know perform driver
specific initialization if CXL is available, or exectute a fallback to PCIe
only operation.
By moving devm_cxl_add_memdev() to cxl_mem.ko it removes async module
loading as one reason that a memdev may not be attached upon return from
devm_cxl_add_memdev().
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/Kconfig | 2 +-
drivers/cxl/cxlmem.h | 2 ++
drivers/cxl/core/memdev.c | 10 +++++++---
drivers/cxl/mem.c | 17 +++++++++++++++++
4 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig
index 48b7314afdb8..f1361ed6a0d4 100644
--- a/drivers/cxl/Kconfig
+++ b/drivers/cxl/Kconfig
@@ -22,6 +22,7 @@ if CXL_BUS
config CXL_PCI
tristate "PCI manageability"
default CXL_BUS
+ select CXL_MEM
help
The CXL specification defines a "CXL memory device" sub-class in the
PCI "memory controller" base class of devices. Device's identified by
@@ -89,7 +90,6 @@ config CXL_PMEM
config CXL_MEM
tristate "CXL: Memory Expansion"
- depends on CXL_PCI
default CXL_BUS
help
The CXL.mem protocol allows a device to act as a provider of "System
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index c12ab4fc9512..012e68acad34 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -95,6 +95,8 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
return is_cxl_memdev(port->uport_dev);
}
+struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds);
struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
struct cxl_dev_state *cxlds);
int devm_cxl_sanitize_setup_notifier(struct device *host,
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 4dff7f44d908..7a4153e1c6a7 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -1050,8 +1050,12 @@ static const struct file_operations cxl_memdev_fops = {
.llseek = noop_llseek,
};
-struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
- struct cxl_dev_state *cxlds)
+/*
+ * Core helper for devm_cxl_add_memdev() that wants to both create a device and
+ * assert to the caller that upon return cxl_mem::probe() has been invoked.
+ */
+struct cxl_memdev *__devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds)
{
struct cxl_memdev *cxlmd;
struct device *dev;
@@ -1093,7 +1097,7 @@ struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
put_device(dev);
return ERR_PTR(rc);
}
-EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(__devm_cxl_add_memdev, "cxl_mem");
static void sanitize_teardown_notifier(void *data)
{
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 6e6777b7bafb..55883797ab2d 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -201,6 +201,22 @@ static int cxl_mem_probe(struct device *dev)
return devm_add_action_or_reset(dev, enable_suspend, NULL);
}
+/**
+ * devm_cxl_add_memdev - Add a CXL memory device
+ * @host: devres alloc/release context and parent for the memdev
+ * @cxlds: CXL device state to associate with the memdev
+ *
+ * Upon return the device will have had a chance to attach to the
+ * cxl_mem driver, but may fail if the CXL topology is not ready
+ * (hardware CXL link down, or software platform CXL root not attached)
+ */
+struct cxl_memdev *devm_cxl_add_memdev(struct device *host,
+ struct cxl_dev_state *cxlds)
+{
+ return __devm_cxl_add_memdev(host, cxlds);
+}
+EXPORT_SYMBOL_NS_GPL(devm_cxl_add_memdev, "CXL");
+
static ssize_t trigger_poison_list_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
@@ -248,6 +264,7 @@ static struct cxl_driver cxl_mem_driver = {
.probe = cxl_mem_probe,
.id = CXL_DEVICE_MEMORY_EXPANDER,
.drv = {
+ .probe_type = PROBE_FORCE_SYNCHRONOUS,
.dev_groups = cxl_mem_groups,
},
};
--
2.51.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach
2025-11-19 19:22 ` [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach alejandro.lucero-palau
@ 2025-12-02 5:03 ` dan.j.williams
0 siblings, 0 replies; 51+ messages in thread
From: dan.j.williams @ 2025-12-02 5:03 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> In preparation for CXL accelerator drivers that have a hard dependency on
> CXL capability initialization, arrange for the endpoint probe result to be
> conveyed to the caller of devm_cxl_add_memdev().
>
> As it stands cxl_pci does not care about the attach state of the cxl_memdev
> because all generic memory expansion functionality can be handled by the
> cxl_core. For accelerators, that driver needs to know perform driver
> specific initialization if CXL is available, or exectute a fallback to PCIe
> only operation.
>
> By moving devm_cxl_add_memdev() to cxl_mem.ko it removes async module
> loading as one reason that a memdev may not be attached upon return from
> devm_cxl_add_memdev().
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
No, there, is no such thing as non-author sign-offs without
Co-developed-by. If you take authorship and take most of the changelog
text verbatim then either be clear about what additional changes you
made to take authorship, or leave the original authorship in tact.
For this patch we can just drop it because the simpler proposal I
replied to patch1 seems a better way to go.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach
2025-11-19 19:22 ` [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach alejandro.lucero-palau
@ 2025-12-02 5:08 ` dan.j.williams
0 siblings, 0 replies; 51+ messages in thread
From: dan.j.williams @ 2025-12-02 5:08 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, netdev, dan.j.williams,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Dan Williams <dan.j.williams@intel.com>
>
> Make it so that upon return from devm_cxl_add_endpoint() that
> cxl_mem_probe() can assume that the endpoint has had a chance to complete
> cxl_port_probe().
>
> I.e. cxl_port module loading has completed prior to device registration.
>
> MODULE_SOFTDEP() is not sufficient for this purpose, but a hard link-time
> dependency is reliable.
>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
I am mostly keeping this patch the same, but removing private.h and
using EXPORT_SYMBOL_FOR_MODULES() to limit the visibility of the export.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 01/23] cxl/mem: refactor memdev allocation
2025-12-02 2:52 ` dan.j.williams
2025-12-02 4:58 ` dan.j.williams
@ 2025-12-02 8:47 ` Alejandro Lucero Palau
1 sibling, 0 replies; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-12-02 8:47 UTC (permalink / raw)
To: dan.j.williams, alejandro.lucero-palau, linux-cxl, netdev,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 12/2/25 02:52, dan.j.williams@intel.com wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> In preparation for always-synchronous memdev attach, refactor memdev
>> allocation and fix release bug in devm_cxl_add_memdev() when error after
>> a successful allocation.
> Never do "refactor and fix". Always do "fix" then "refactor" separately.
Ok.
> In this case though I wonder what release bug you are referring to?
>
> If cxl_memdev_alloc() fails, nothing to free.
>
> If dev_set_name() fails, it puts the device which calls
> cxl_memdev_release() which undoes cxl_memdev_alloc(). (Now, that weird
> and busted devm_cxl_memdev_edac_release() somehow snuck into
> cxl_memdev_release() when I was not looking. I will fix that separately,
> but no leak there that I can see.)
>
> If cdev_device_add() fails we need to shutdown the ioctl path, but
> otherwise put_device() cleans everything up.
>
> If the devm_add_action_or_reset() fails the device needs to be both
> unregistered and final put. It does not use device_unregister() because
> the cdev also needs to be deleted. So cdev_device_del() handles the
> device_del() and the caller is responsible for the final put_device().
>
> What bug are you referring to?
You are right. I was missing the release from cxl_memdev_type linked to
put_device.
I guess I got confused with devm and __free approaches ...
>
>> The diff is busy as this moves cxl_memdev_alloc() down below the definition
>> of cxl_memdev_fops and introduces devm_cxl_memdev_add_or_reset() to
>> preclude needing to export more symbols from the cxl_core.
> Will need to read the code to figure out what this patch is trying to do
> because this changelog is not orienting me to the problem that is being
> solved.
>
>> Fixes: 1c3333a28d45 ("cxl/mem: Do not rely on device_add() side effects for dev_set_name() failures")
> Maybe this Fixes: tag is wrong and this is instead a bug introduced by
> my probe order RFC? At least Jonathan pinged me about a bug there that I
> will go look at next.
This fixes tag is wrong due what you pointed out above.
Not sure what you/Jonathan are referring to here. PJ found a problem
with cyclic module dependencies with the changes introduced by these two
first patches.
It can be solved changing CXL _BUS config from tristate to bool ... what
PJ tried successfully. I was expecting some comments before adding it to
next patchset version ...
>
>> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> Why does this have my Sign-off?
It was your original patch.
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/memdev.c | 134 +++++++++++++++++++++-----------------
>> drivers/cxl/private.h | 10 +++
>> 2 files changed, 86 insertions(+), 58 deletions(-)
>> create mode 100644 drivers/cxl/private.h
>>
>> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
>> index e370d733e440..8de19807ac7b 100644
>> --- a/drivers/cxl/core/memdev.c
>> +++ b/drivers/cxl/core/memdev.c
>> @@ -8,6 +8,7 @@
>> #include <linux/idr.h>
>> #include <linux/pci.h>
>> #include <cxlmem.h>
>> +#include "private.h"
>> #include "trace.h"
>> #include "core.h"
>>
>> @@ -648,42 +649,25 @@ static void detach_memdev(struct work_struct *work)
>>
>> static struct lock_class_key cxl_memdev_key;
>>
>> -static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
>> - const struct file_operations *fops)
>> +int devm_cxl_memdev_add_or_reset(struct device *host, struct cxl_memdev *cxlmd)
> Can you say more why Type-2 drivers need an "_or_reset()" export? If a
> Type-2 driver is calling devm_cxl_add_memdev() from its ->probe()
> routine, then just return on failure. Confused.
Well, maybe it is you who should answer that question. It comes from
something you suggested I should use for solving problems with Type2 and
potential module removal. I added those patches first time two months
ago and now you are finally commenting on it.
This is the little story: my comments suggesting how I think we should
deal with that problem were ignored, then you suddenly commented in and
offer your way of solving it pointing to your branch. I used and tested
it which indeed fixed those potential removals ... I work on them for
solving some minor issues then Jonathan suggests to refactor the first
patch. I think I found a problem with the allocation ... I tried to
solve it ... I kept the original commit as you were the one proposing it
and you are a native english speaker ... you realized in the next patch
review those are indeed your work on solving the problem ... then you
propose another patch ...
I really hope you review all this in the impending v22 where I will
present a solution for the Type2 initialization when HDM committed by
firmware.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-27 9:08 ` Alejandro Lucero Palau
@ 2025-12-02 8:49 ` PJ Waskiewicz
2025-12-02 9:09 ` Alejandro Lucero Palau
0 siblings, 1 reply; 51+ messages in thread
From: PJ Waskiewicz @ 2025-12-02 8:49 UTC (permalink / raw)
To: Alejandro Lucero Palau, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
Hi Alejandro,
On Thu, 2025-11-27 at 09:08 +0000, Alejandro Lucero Palau wrote:
> >
> > TL;DR: if your device you're testing with presents the CXL.mem
> > region
> > as EFI_RESERVED_TYPE, I don't see how these patches are working.
> > Unless you're doing something extra outside of the patches, which
> > isn't
> > obvious to me.
>
>
> Yes, sorry, that is the case. I'm applying some dirty changes to
> these
> patches for testing with my current testing devices, including the
> BIOS
> and the Host.
>
Well, this is basically the issue.
You are proposing these patches to support Type 2 devices, and use the
X4 with SFC as the vehicle. But out of the box, following the same
flow, my driver for my (proprietary) device can't match the behavior.
If you're having to make modifications to these patches to work with
your device, even if it's to work around a weird platform or BIOS, then
these patches can't be considered as-is.
I have two main platforms in use for development. One is an Intel
Granite Rapids, one is an AMD Turin. I've got production SKU's and
CRB's, so I have a cross-section of BIOS's. All of them behave the
exact same way with these patches. I do have a BIOS that is doing the
right thing from what I can tell (tracing with a bus analyzer, and also
ILA taps).
CXL Type 2 device support is desparately needed. I'm happy you've been
championing this to get it merged. I'm also very committed to helping
test, modify, etc. So please don't be discouraged.
I'm also one who's dealt with internal pressures from a company to get
something working upstream. But honestly, upstream work doesn't align
with corporate or company calendars. Been there, done that, hasn't
gotten easier. The kernel can't take a patchset that doesn't work at
face value. It's unfortunately as simple as that. So let's figure out
how to get it working out of the box with the patches.
>
> > >
> > > FWIW, last year in Vienna I raised the concern of the kernel
> > > doing
> > > exactly what you are witnessing, and I proposed having a way for
> > > taking
> > > the device/memory from DAX but I was told unanimously that was
> > > not
> > > necessary and if the BIOS did the wrong thing, not fixing that in
> > > the
> > > kernel. In hindsight I would say this conflict was not well
> > > understood
> > > then (me included) with all the details, so maybe it is time to
> > > have
> > > this capacity, maybe from user space or maybe specific kernel
> > > param
> > > triggering the device passing from DAX.
> > I do recall this. Unfortunately I brought up similar concerns way
> > back
> > in Dublin in 2021 regarding all of this flow well before 2.0-
> > capable
> > hosts arrived. I think I started asking the questions way too
> > early,
> > since this was of little to no concern at the time (nor was Type2
> > device support).
>
>
> Maybe we can make the case now. I'll seize LPC to discuss this
> further.
> Will you be there?
Yep. I'll be there, as will Dan. We definitely need to find some time
and chat.
Cheers,
-PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-12-02 8:49 ` PJ Waskiewicz
@ 2025-12-02 9:09 ` Alejandro Lucero Palau
0 siblings, 0 replies; 51+ messages in thread
From: Alejandro Lucero Palau @ 2025-12-02 9:09 UTC (permalink / raw)
To: PJ Waskiewicz, alejandro.lucero-palau, linux-cxl, netdev,
dan.j.williams, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
On 12/2/25 08:49, PJ Waskiewicz wrote:
> Hi Alejandro,
Hi PJ,
> On Thu, 2025-11-27 at 09:08 +0000, Alejandro Lucero Palau wrote:
>>> TL;DR: if your device you're testing with presents the CXL.mem
>>> region
>>> as EFI_RESERVED_TYPE, I don't see how these patches are working.
>>> Unless you're doing something extra outside of the patches, which
>>> isn't
>>> obvious to me.
>>
>> Yes, sorry, that is the case. I'm applying some dirty changes to
>> these
>> patches for testing with my current testing devices, including the
>> BIOS
>> and the Host.
>>
> Well, this is basically the issue.
>
> You are proposing these patches to support Type 2 devices, and use the
> X4 with SFC as the vehicle. But out of the box, following the same
> flow, my driver for my (proprietary) device can't match the behavior.
> If you're having to make modifications to these patches to work with
> your device, even if it's to work around a weird platform or BIOS, then
> these patches can't be considered as-is.
I disagree. From day one I was told our device should not be touched by
BIOS. If I have no hardware yet doing so it should not be changing the
specific support I need, something I can easily emulate with QEMU and
what I have been using in parallel to other specific hardware emulations
internally.
>
> I have two main platforms in use for development. One is an Intel
> Granite Rapids, one is an AMD Turin. I've got production SKU's and
> CRB's, so I have a cross-section of BIOS's. All of them behave the
> exact same way with these patches. I do have a BIOS that is doing the
> right thing from what I can tell (tracing with a bus analyzer, and also
> ILA taps).
>
> CXL Type 2 device support is desparately needed. I'm happy you've been
> championing this to get it merged. I'm also very committed to helping
> test, modify, etc. So please don't be discouraged.
Thanks for the kind words.
>
> I'm also one who's dealt with internal pressures from a company to get
> something working upstream. But honestly, upstream work doesn't align
> with corporate or company calendars. Been there, done that, hasn't
> gotten easier. The kernel can't take a patchset that doesn't work at
> face value. It's unfortunately as simple as that. So let's figure out
> how to get it working out of the box with the patches.
Yes, management usually do not understand how kernel upstream effort
happens, but it is not a relieve to know that ...
I did work on your problem (and I guess not only yours) these last days
and I'm happy to say I have a solution ready to be shared. It will be in
v22 what I hope to have ready later today, and it is simple and, I
think, clean enough to be accepted without too much adjustments. Of
course, it will depend on how quick reviewing happen and exchanges about
how to do it if it is not liked follow.
Thank you
>>>> FWIW, last year in Vienna I raised the concern of the kernel
>>>> doing
>>>> exactly what you are witnessing, and I proposed having a way for
>>>> taking
>>>> the device/memory from DAX but I was told unanimously that was
>>>> not
>>>> necessary and if the BIOS did the wrong thing, not fixing that in
>>>> the
>>>> kernel. In hindsight I would say this conflict was not well
>>>> understood
>>>> then (me included) with all the details, so maybe it is time to
>>>> have
>>>> this capacity, maybe from user space or maybe specific kernel
>>>> param
>>>> triggering the device passing from DAX.
>>> I do recall this. Unfortunately I brought up similar concerns way
>>> back
>>> in Dublin in 2021 regarding all of this flow well before 2.0-
>>> capable
>>> hosts arrived. I think I started asking the questions way too
>>> early,
>>> since this was of little to no concern at the time (nor was Type2
>>> device support).
>>
>> Maybe we can make the case now. I'll seize LPC to discuss this
>> further.
>> Will you be there?
> Yep. I'll be there, as will Dan. We definitely need to find some time
> and chat.
>
>
> Cheers,
> -PJ
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v21 15/23] sfc: get endpoint decoder
2025-11-26 18:35 ` PJ Waskiewicz
2025-11-27 9:08 ` Alejandro Lucero Palau
@ 2025-12-02 16:35 ` Dave Jiang
1 sibling, 0 replies; 51+ messages in thread
From: Dave Jiang @ 2025-12-02 16:35 UTC (permalink / raw)
To: PJ Waskiewicz, Alejandro Lucero Palau, alejandro.lucero-palau,
linux-cxl, netdev, dan.j.williams, edward.cree, davem, kuba,
pabeni, edumazet
Cc: Martin Habets, Edward Cree, Jonathan Cameron, Ben Cheatham
On 11/26/25 11:35 AM, PJ Waskiewicz wrote:
> Hi Alejandro,
>
> On Wed, 2025-11-26 at 09:09 +0000, Alejandro Lucero Palau wrote:
>>
>> On 11/26/25 01:27, PJ Waskiewicz wrote:
>>> Hi Alejandro,
>>>
>>> On Wed, 2025-11-19 at 19:22 +0000, alejandro.lucero-palau@amd.com
>>> wrote:
>>>> From: Alejandro Lucero <alucerop@amd.com>
<snip>
>
> What I'm trying to do is get the regionX object to instantiate on my
> CXL.mem memory block, so that I can remove the region, ultimately
> tearing down the decoders, and allowing me to hotplug the device. The
> patches here seem to still assume a Type3-ish device where there's DPA
> needing to get mapped into HPA, which our devices are already allocated
> in the decoders due to the EFI_RESERVED_TYPE enumeration. But the
> patches aren't seeing that firmware already set them up, since the
> decoders haven't been committed yet.
Can you please clarify on what you mean by "the decoders haven't been committed yet"? If the region is setup, then isn't the expectation that the decoders are committed on device?
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2025-12-02 16:35 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19 19:22 [PATCH v21 00/23] Type2 device basic support alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 01/23] cxl/mem: refactor memdev allocation alejandro.lucero-palau
2025-11-20 18:08 ` Jonathan Cameron
2025-11-20 18:27 ` Alejandro Lucero Palau
2025-11-21 12:06 ` Jonathan Cameron
2025-11-21 13:46 ` Alejandro Lucero Palau
2025-11-20 20:27 ` Koralahalli Channabasappa, Smita
2025-11-21 13:41 ` Alejandro Lucero Palau
2025-12-02 2:52 ` dan.j.williams
2025-12-02 4:58 ` dan.j.williams
2025-12-02 8:47 ` Alejandro Lucero Palau
2025-11-19 19:22 ` [PATCH v21 02/23] cxl/mem: Arrange for always-synchronous memdev attach alejandro.lucero-palau
2025-12-02 5:03 ` dan.j.williams
2025-11-19 19:22 ` [PATCH v21 03/23] cxl/port: Arrange for always synchronous endpoint attach alejandro.lucero-palau
2025-12-02 5:08 ` dan.j.williams
2025-11-19 19:22 ` [PATCH v21 04/23] cxl/mem: Introduce a memdev creation ->probe() operation alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 05/23] cxl: Add type2 device basic support alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 06/23] sfc: add cxl support alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 07/23] cxl: Move pci generic code alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 08/23] cxl/sfc: Map cxl component regs alejandro.lucero-palau
2025-11-21 6:54 ` PJ Waskiewicz
2025-11-21 11:01 ` Alejandro Lucero Palau
2025-11-22 1:11 ` PJ Waskiewicz
2025-11-19 19:22 ` [PATCH v21 09/23] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 10/23] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 11/23] sfc: create type2 cxl memdev alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 12/23] cxl: Define a driver interface for HPA free space enumeration alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 13/23] sfc: get root decoder alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 14/23] cxl: Define a driver interface for DPA allocation alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 15/23] sfc: get endpoint decoder alejandro.lucero-palau
2025-11-26 1:27 ` PJ Waskiewicz
2025-11-26 9:09 ` Alejandro Lucero Palau
2025-11-26 18:35 ` PJ Waskiewicz
2025-11-27 9:08 ` Alejandro Lucero Palau
2025-12-02 8:49 ` PJ Waskiewicz
2025-12-02 9:09 ` Alejandro Lucero Palau
2025-12-02 16:35 ` Dave Jiang
2025-11-19 19:22 ` [PATCH v21 16/23] cxl: Make region type based on endpoint type alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 17/23] cxl/region: Factor out interleave ways setup alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 18/23] cxl/region: Factor out interleave granularity setup alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 19/23] cxl: Allow region creation by type2 drivers alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 20/23] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 21/23] sfc: create cxl region alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 22/23] cxl: Add function for obtaining region range alejandro.lucero-palau
2025-11-19 19:22 ` [PATCH v21 23/23] sfc: support pio mapping based on cxl alejandro.lucero-palau
2025-11-21 6:41 ` [PATCH v21 00/23] Type2 device basic support PJ Waskiewicz
2025-11-21 10:40 ` Alejandro Lucero Palau
2025-11-22 1:08 ` PJ Waskiewicz
2025-11-28 19:44 ` PJ Waskiewicz
2025-11-28 20:29 ` Alejandro Lucero Palau
2025-11-29 16:26 ` Alejandro Lucero Palau
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).