* [PATCH v26 0/8] Type2 device basic support
@ 2026-04-23 18:05 alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
` (8 more replies)
0 siblings, 9 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
I feel this version suits better a RFC based on v25 plus the changes
introduced. It tries to address the problem of user space actions
triggering accelerator region removal, and extending Dan's approach
in here:
https://lore.kernel.org/linux-cxl/20260403210050.1058650-1-dan.j.williams@intel.com/T/#m2d9e5c59ce4873641037f9bcd2f70d5679cda03c
As discussed there, the protection needs to cover the accelerator driver
handling of the region physical range for things like ioremap, and to
offer a way for such driver to undo things or at least stop using that
very same physical range if the region disappears. It ends up merging
several of v25 patches into just one.
Another change is linking the region release to the endpoint device and
keep with Dan's autoremove. This makes unnecessary the previous cxl_exit
and similar actions are now performed inside region detach callback.
I think the outcome is quite promising, lowering the accelerator driver
complexity regarding region handling. The cost is those new callbacks
which I think are totally necessary for doing things safely with the
region by the driver.
I have successfully tested the cases of cxl_acpi removal while sfc
driver remaining, and cxl_mem unbinding sfc cxl memdev.
v26 changes:
- patch 6: it implements all the new functionlity described above.
- patch 8: region detach callback invokes a new sfc function for safely
stopping cxl-backed piobufs use.
Alejandro Lucero (8):
sfc: add cxl support
cxl/sfc: Map cxl regs
cxl/sfc: Initialize dpa without a mailbox
cxl: Prepare memdev creation for type2
sfc: create type2 cxl memdev
cxl: attach region to an accelerator/type2 memdev
cxl: Avoid dax creation for accelerators
sfc: support pio mapping based on cxl
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/mbox.c | 51 +---------
drivers/cxl/core/memdev.c | 81 +++++++++++++++-
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/region.c | 125 ++++++++++++++++++++++++-
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxlmem.h | 6 --
drivers/cxl/cxlpci.h | 12 ---
drivers/cxl/mem.c | 45 ++++++---
drivers/cxl/pci.c | 1 +
drivers/net/ethernet/sfc/Kconfig | 9 ++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/ef10.c | 72 +++++++++++++--
drivers/net/ethernet/sfc/efx.c | 14 ++-
drivers/net/ethernet/sfc/efx.h | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 128 ++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 31 +++++++
drivers/net/ethernet/sfc/net_driver.h | 12 +++
drivers/net/ethernet/sfc/nic.h | 3 +
include/cxl/cxl.h | 25 +++++
include/cxl/pci.h | 22 +++++
22 files changed, 552 insertions(+), 92 deletions(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
create mode 100644 include/cxl/pci.h
base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
--
2.34.1
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v26 1/8] sfc: add cxl support
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-04-23 18:05 ` [PATCH v26 2/8] cxl/sfc: Map cxl regs alejandro.lucero-palau
` (7 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield,
Dan Williams
From: Alejandro Lucero <alucerop@amd.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/Kconfig | 9 +++++
drivers/net/ethernet/sfc/Makefile | 1 +
drivers/net/ethernet/sfc/efx.c | 14 +++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 52 +++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 29 +++++++++++++++
drivers/net/ethernet/sfc/net_driver.h | 10 ++++++
6 files changed, 114 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
diff --git a/drivers/net/ethernet/sfc/Kconfig b/drivers/net/ethernet/sfc/Kconfig
index c4c43434f314..979f2801e2a8 100644
--- a/drivers/net/ethernet/sfc/Kconfig
+++ b/drivers/net/ethernet/sfc/Kconfig
@@ -66,6 +66,15 @@ config SFC_MCDI_LOGGING
Driver-Interface) commands and responses, allowing debugging of
driver/firmware interaction. The tracing is actually enabled by
a sysfs file 'mcdi_logging' under the PCI device.
+config SFC_CXL
+ bool "Solarflare SFC9100-family CXL support"
+ depends on SFC && CXL_BUS >= SFC
+ default SFC
+ help
+ This enables SFC CXL support if the kernel is configuring CXL for
+ using CTPIO with CXL.mem. The SFC device with CXL support and
+ with a CXL-aware firmware can be used for minimizing latencies
+ when sending through CTPIO.
source "drivers/net/ethernet/sfc/falcon/Kconfig"
source "drivers/net/ethernet/sfc/siena/Kconfig"
diff --git a/drivers/net/ethernet/sfc/Makefile b/drivers/net/ethernet/sfc/Makefile
index d99039ec468d..bb0f1891cde6 100644
--- a/drivers/net/ethernet/sfc/Makefile
+++ b/drivers/net/ethernet/sfc/Makefile
@@ -13,6 +13,7 @@ sfc-$(CONFIG_SFC_SRIOV) += sriov.o ef10_sriov.o ef100_sriov.o ef100_rep.o \
mae.o tc.o tc_bindings.o tc_counters.o \
tc_encap_actions.o tc_conntrack.o
+sfc-$(CONFIG_SFC_CXL) += efx_cxl.o
obj-$(CONFIG_SFC) += sfc.o
obj-$(CONFIG_SFC_FALCON) += falcon/
diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 8f136a11d396..90ccbe310386 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -34,6 +34,7 @@
#include "selftest.h"
#include "sriov.h"
#include "efx_devlink.h"
+#include "efx_cxl.h"
#include "mcdi_port_common.h"
#include "mcdi_pcol.h"
@@ -981,12 +982,14 @@ static void efx_pci_remove(struct pci_dev *pci_dev)
efx_pci_remove_main(efx);
efx_fini_io(efx);
+
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+
pci_dbg(efx->pci_dev, "shutdown successful\n");
efx_fini_devlink_and_unlock(efx);
efx_fini_struct(efx);
free_netdev(efx->net_dev);
- probe_data = container_of(efx, struct efx_probe_data, efx);
kfree(probe_data);
};
@@ -1190,6 +1193,15 @@ static int efx_pci_probe(struct pci_dev *pci_dev,
if (rc)
goto fail2;
+ /* A successful cxl initialization implies a CXL region created to be
+ * used for PIO buffers. If there is no CXL support, or initialization
+ * fails, cxl_pio_initialised will be false and legacy PIO buffers
+ * defined at specific PCI BAR regions will be used.
+ */
+ rc = efx_cxl_init(probe_data);
+ if (rc)
+ pci_err(pci_dev, "CXL initialization failed with error %d\n", rc);
+
rc = efx_pci_probe_post_io(efx);
if (rc) {
/* On failure, retry once immediately.
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
new file mode 100644
index 000000000000..b7e8d85a43d3
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/****************************************************************************
+ *
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/pci.h>
+
+#include "net_driver.h"
+#include "efx_cxl.h"
+
+#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+
+int efx_cxl_init(struct efx_probe_data *probe_data)
+{
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl;
+ u16 dvsec;
+
+ probe_data->cxl_pio_initialised = false;
+
+ /* Is the device configured with and using CXL? */
+ if (!pcie_is_cxl(pci_dev))
+ return 0;
+
+ dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
+ PCI_DVSEC_CXL_DEVICE);
+ if (!dvsec) {
+ pci_info(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability not found\n");
+ return 0;
+ }
+
+ pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
+
+ /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
+ * specifying no mbox available.
+ */
+ cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
+ pci_get_dsn(pci_dev), dvsec,
+ struct efx_cxl, cxlds, false);
+
+ if (!cxl)
+ return -ENOMEM;
+
+ probe_data->cxl = cxl;
+
+ return 0;
+}
+
+MODULE_IMPORT_NS("CXL");
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
new file mode 100644
index 000000000000..04e46278464d
--- /dev/null
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/****************************************************************************
+ * Driver for AMD network controllers and boards
+ * Copyright (C) 2025, Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation, incorporated herein by reference.
+ */
+
+#ifndef EFX_CXL_H
+#define EFX_CXL_H
+
+#ifdef CONFIG_SFC_CXL
+
+#include <cxl/cxl.h>
+
+struct efx_probe_data;
+
+struct efx_cxl {
+ struct cxl_dev_state cxlds;
+ struct cxl_memdev *cxlmd;
+};
+
+int efx_cxl_init(struct efx_probe_data *probe_data);
+#else
+static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
+#endif
+#endif
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index b98c259f672d..3964b2c56609 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1197,14 +1197,24 @@ struct efx_nic {
atomic_t n_rx_noskb_drops;
};
+#ifdef CONFIG_SFC_CXL
+struct efx_cxl;
+#endif
+
/**
* struct efx_probe_data - State after hardware probe
* @pci_dev: The PCI device
* @efx: Efx NIC details
+ * @cxl: details of related cxl objects
+ * @cxl_pio_initialised: cxl initialization outcome.
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
struct efx_nic efx;
+#ifdef CONFIG_SFC_CXL
+ struct efx_cxl *cxl;
+ bool cxl_pio_initialised;
+#endif
};
static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 2/8] cxl/sfc: Map cxl regs
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 3/8] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
` (6 subsequent siblings)
8 siblings, 0 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Dan Williams, Jonathan Cameron, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device registers.
Use it in sfc driver cxl initialization.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/pci.c | 1 +
drivers/cxl/core/port.c | 1 +
drivers/cxl/core/regs.c | 1 +
drivers/cxl/cxlpci.h | 12 ------------
drivers/cxl/pci.c | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 26 ++++++++++++++++++++++++++
include/cxl/pci.h | 22 ++++++++++++++++++++++
7 files changed, 52 insertions(+), 12 deletions(-)
create mode 100644 include/cxl/pci.h
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index d1f487b3d809..2bcd683aa286 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -6,6 +6,7 @@
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
+#include <cxl/pci.h>
#include <linux/aer.h>
#include <cxlpci.h>
#include <cxlmem.h>
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c5aacd7054f1..dbe30e7c383b 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -11,6 +11,7 @@
#include <linux/idr.h>
#include <linux/node.h>
#include <cxl/einj.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <cxl.h>
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 93710cf4f0a6..20c2d9fbcfe7 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -4,6 +4,7 @@
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/pci.h>
+#include <cxl/pci.h>
#include <cxlmem.h>
#include <cxlpci.h>
#include <pmu.h>
diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
index b826eb53cf7b..110ec9c44f09 100644
--- a/drivers/cxl/cxlpci.h
+++ b/drivers/cxl/cxlpci.h
@@ -13,16 +13,6 @@
*/
#define CXL_PCI_DEFAULT_MAX_VECTORS 16
-/* Register Block Identifier (RBI) */
-enum cxl_regloc_type {
- CXL_REGLOC_RBI_EMPTY = 0,
- CXL_REGLOC_RBI_COMPONENT,
- CXL_REGLOC_RBI_VIRT,
- CXL_REGLOC_RBI_MEMDEV,
- CXL_REGLOC_RBI_PMU,
- CXL_REGLOC_RBI_TYPES
-};
-
/*
* Table Access DOE, CDAT Read Entry Response
*
@@ -112,6 +102,4 @@ static inline void devm_cxl_port_ras_setup(struct cxl_port *port)
}
#endif
-int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
- struct cxl_register_map *map);
#endif /* __CXL_PCI_H__ */
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bace662dc988..1f10231c8d43 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -11,6 +11,7 @@
#include <linux/pci.h>
#include <linux/aer.h>
#include <linux/io.h>
+#include <cxl/pci.h>
#include <cxl/mailbox.h>
#include "cxlmem.h"
#include "cxlpci.h"
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index b7e8d85a43d3..f88c2b3fbb2c 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -7,6 +7,8 @@
#include <linux/pci.h>
+#include <cxl/cxl.h>
+#include <cxl/pci.h>
#include "net_driver.h"
#include "efx_cxl.h"
@@ -18,6 +20,7 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
struct pci_dev *pci_dev = efx->pci_dev;
struct efx_cxl *cxl;
u16 dvsec;
+ int rc;
probe_data->cxl_pio_initialised = false;
@@ -44,6 +47,29 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
if (!cxl)
return -ENOMEM;
+ rc = cxl_pci_setup_regs(pci_dev, CXL_REGLOC_RBI_COMPONENT,
+ &cxl->cxlds.reg_map);
+ if (rc) {
+ pci_err(pci_dev, "No component registers\n");
+ return rc;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.hdm_decoder.valid) {
+ pci_err(pci_dev, "Expected HDM component register not found\n");
+ return -ENODEV;
+ }
+
+ if (!cxl->cxlds.reg_map.component_map.ras.valid) {
+ pci_err(pci_dev, "Expected RAS component register not found\n");
+ return -ENODEV;
+ }
+
+ /* Set media ready explicitly as there are neither mailbox for checking
+ * this state nor the CXL register involved, both not mandatory for
+ * type2.
+ */
+ cxl->cxlds.media_ready = true;
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/pci.h b/include/cxl/pci.h
new file mode 100644
index 000000000000..3e0000015871
--- /dev/null
+++ b/include/cxl/pci.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright(c) 2020 Intel Corporation. All rights reserved. */
+
+#ifndef __CXL_CXL_PCI_H__
+#define __CXL_CXL_PCI_H__
+
+/* Register Block Identifier (RBI) */
+enum cxl_regloc_type {
+ CXL_REGLOC_RBI_EMPTY = 0,
+ CXL_REGLOC_RBI_COMPONENT,
+ CXL_REGLOC_RBI_VIRT,
+ CXL_REGLOC_RBI_MEMDEV,
+ CXL_REGLOC_RBI_PMU,
+ CXL_REGLOC_RBI_TYPES
+};
+
+struct cxl_register_map;
+struct pci_dev;
+
+int cxl_pci_setup_regs(struct pci_dev *pdev, enum cxl_regloc_type type,
+ struct cxl_register_map *map);
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 3/8] cxl/sfc: Initialize dpa without a mailbox
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 2/8] cxl/sfc: Map cxl regs alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 4/8] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
` (5 subsequent siblings)
8 siblings, 0 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Dan Williams, Ben Cheatham, Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.
Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.
Move related functions to memdev.
Add sfc driver as the client.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/mbox.c | 51 +----------------------
drivers/cxl/core/memdev.c | 66 ++++++++++++++++++++++++++++++
drivers/net/ethernet/sfc/efx_cxl.c | 5 +++
include/cxl/cxl.h | 2 +
5 files changed, 76 insertions(+), 50 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 82ca3a476708..70a1da02ba9c 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -99,6 +99,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
struct dentry *cxl_debugfs_create_dir(const char *dir);
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode);
+struct cxl_memdev_state;
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 7c6c5b7450a5..97b1e61ad018 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -1152,7 +1152,7 @@ EXPORT_SYMBOL_NS_GPL(cxl_mem_get_event_records, "CXL");
*
* See CXL @8.2.9.5.2.1 Get Partition Info
*/
-static int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
+int cxl_mem_get_partition_info(struct cxl_memdev_state *mds)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
struct cxl_mbox_get_partition_info pi;
@@ -1308,55 +1308,6 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd)
return -EBUSY;
}
-static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
-{
- int i = info->nr_partitions;
-
- if (size == 0)
- return;
-
- info->part[i].range = (struct range) {
- .start = start,
- .end = start + size - 1,
- };
- info->part[i].mode = mode;
- info->nr_partitions++;
-}
-
-int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
-{
- struct cxl_dev_state *cxlds = &mds->cxlds;
- struct device *dev = cxlds->dev;
- int rc;
-
- if (!cxlds->media_ready) {
- info->size = 0;
- return 0;
- }
-
- info->size = mds->total_bytes;
-
- if (mds->partition_align_bytes == 0) {
- add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->volatile_only_bytes,
- mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
- return 0;
- }
-
- rc = cxl_mem_get_partition_info(mds);
- if (rc) {
- dev_err(dev, "Failed to query partition information\n");
- return rc;
- }
-
- add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
- add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
- CXL_PARTMODE_PMEM);
-
- return 0;
-}
-EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
-
int cxl_get_dirty_count(struct cxl_memdev_state *mds, u32 *count)
{
struct cxl_mailbox *cxl_mbox = &mds->cxlds.cxl_mbox;
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 80e65690eb77..b8ff1c07e528 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -585,6 +585,72 @@ bool is_cxl_memdev(const struct device *dev)
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
+static void add_part(struct cxl_dpa_info *info, u64 start, u64 size, enum cxl_partition_mode mode)
+{
+ int i = info->nr_partitions;
+
+ if (size == 0)
+ return;
+
+ info->part[i].range = (struct range) {
+ .start = start,
+ .end = start + size - 1,
+ };
+ info->part[i].mode = mode;
+ info->nr_partitions++;
+}
+
+int cxl_mem_dpa_fetch(struct cxl_memdev_state *mds, struct cxl_dpa_info *info)
+{
+ struct cxl_dev_state *cxlds = &mds->cxlds;
+ struct device *dev = cxlds->dev;
+ int rc;
+
+ if (!cxlds->media_ready) {
+ info->size = 0;
+ return 0;
+ }
+
+ info->size = mds->total_bytes;
+
+ if (mds->partition_align_bytes == 0) {
+ add_part(info, 0, mds->volatile_only_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->volatile_only_bytes,
+ mds->persistent_only_bytes, CXL_PARTMODE_PMEM);
+ return 0;
+ }
+
+ rc = cxl_mem_get_partition_info(mds);
+ if (rc) {
+ dev_err(dev, "Failed to query partition information\n");
+ return rc;
+ }
+
+ add_part(info, 0, mds->active_volatile_bytes, CXL_PARTMODE_RAM);
+ add_part(info, mds->active_volatile_bytes, mds->active_persistent_bytes,
+ CXL_PARTMODE_PMEM);
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mem_dpa_fetch, "CXL");
+
+/**
+ * cxl_set_capacity: initialize dpa by a driver without a mailbox.
+ *
+ * @cxlds: pointer to cxl_dev_state
+ * @capacity: device volatile memory size
+ */
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity)
+{
+ struct cxl_dpa_info range_info = {
+ .size = capacity,
+ };
+
+ add_part(&range_info, 0, capacity, CXL_PARTMODE_RAM);
+ return cxl_dpa_setup(cxlds, &range_info);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_set_capacity, "CXL");
+
/**
* set_exclusive_cxl_commands() - atomically disable user cxl commands
* @mds: The device state to operate on
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index f88c2b3fbb2c..4d55c08cf2a1 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -70,6 +70,11 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
*/
cxl->cxlds.media_ready = true;
+ if (cxl_set_capacity(&cxl->cxlds, EFX_CTPIO_BUFFER_SIZE)) {
+ pci_err(pci_dev, "dpa capacity setup failed\n");
+ return -ENODEV;
+ }
+
probe_data->cxl = cxl;
return 0;
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fa7269154620..1346771edc4e 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -223,4 +223,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
(drv_struct *)_devm_cxl_dev_state_create(parent, type, serial, dvsec, \
sizeof(drv_struct), mbox); \
})
+
+int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 4/8] cxl: Prepare memdev creation for type2
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (2 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 3/8] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-30 23:23 ` Dan Williams (nvidia)
2026-04-23 18:05 ` [PATCH v26 5/8] sfc: create type2 cxl memdev alejandro.lucero-palau
` (4 subsequent siblings)
8 siblings, 1 reply; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield, Dan Williams
From: Alejandro Lucero <alucerop@amd.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.
Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.
Make devm_cxl_add_memdev accessible from an accel driver.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/cxl/core/memdev.c | 15 +++++++++++--
drivers/cxl/cxlmem.h | 6 ------
drivers/cxl/mem.c | 45 +++++++++++++++++++++++++++++----------
include/cxl/cxl.h | 6 ++++++
4 files changed, 53 insertions(+), 19 deletions(-)
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index b8ff1c07e528..ef39803bb139 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -7,6 +7,7 @@
#include <linux/slab.h>
#include <linux/idr.h>
#include <linux/pci.h>
+#include <cxl/cxl.h>
#include <cxlmem.h>
#include "trace.h"
#include "core.h"
@@ -579,9 +580,16 @@ static const struct device_type cxl_memdev_type = {
.groups = cxl_memdev_attribute_groups,
};
+static const struct device_type cxl_accel_memdev_type = {
+ .name = "cxl_accel_memdev",
+ .release = cxl_memdev_release,
+ .devnode = cxl_memdev_devnode,
+};
+
bool is_cxl_memdev(const struct device *dev)
{
- return dev->type == &cxl_memdev_type;
+ return (dev->type == &cxl_memdev_type ||
+ dev->type == &cxl_accel_memdev_type);
}
EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
@@ -776,7 +784,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
dev->parent = cxlds->dev;
dev->bus = &cxl_bus_type;
dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
- dev->type = &cxl_memdev_type;
+ if (cxlds->type == CXL_DEVTYPE_DEVMEM)
+ dev->type = &cxl_accel_memdev_type;
+ else
+ dev->type = &cxl_memdev_type;
device_set_pm_not_required(dev);
INIT_WORK(&cxlmd->detach_work, detach_memdev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 776c50d1db51..92cca400d113 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -34,10 +34,6 @@
(FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
CXLMDEV_RESET_NEEDED_NOT)
-struct cxl_memdev_attach {
- int (*probe)(struct cxl_memdev *cxlmd);
-};
-
/**
* struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
* @dev: driver core device object
@@ -103,8 +99,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
const struct cxl_memdev_attach *attach);
-struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
- const struct cxl_memdev_attach *attach);
int devm_cxl_sanitize_setup_notifier(struct device *host,
struct cxl_memdev *cxlmd);
struct cxl_memdev_state;
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index fcffe24dcb42..ff858318091f 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -65,6 +65,26 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
cxl_debugfs_poison_clear, "%llx\n");
+static void cxl_memdev_poison_enable(struct cxl_memdev_state *mds,
+ struct cxl_memdev *cxlmd,
+ struct dentry *dentry)
+{
+ /*
+ * Avoid poison debugfs for DEVMEM aka accelerators as they rely on
+ * cxl_memdev_state.
+ */
+ if (!mds)
+ return;
+
+ if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
+ debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
+ &cxl_poison_inject_fops);
+
+ if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
+ debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
+ &cxl_poison_clear_fops);
+}
+
static int cxl_mem_probe(struct device *dev)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
@@ -92,12 +112,7 @@ static int cxl_mem_probe(struct device *dev)
dentry = cxl_debugfs_create_dir(dev_name(dev));
debugfs_create_devm_seqfile(dev, "dpamem", dentry, cxl_mem_dpa_show);
- if (test_bit(CXL_POISON_ENABLED_INJECT, mds->poison.enabled_cmds))
- debugfs_create_file("inject_poison", 0200, dentry, cxlmd,
- &cxl_poison_inject_fops);
- if (test_bit(CXL_POISON_ENABLED_CLEAR, mds->poison.enabled_cmds))
- debugfs_create_file("clear_poison", 0200, dentry, cxlmd,
- &cxl_poison_clear_fops);
+ cxl_memdev_poison_enable(mds, cxlmd, dentry);
rc = devm_add_action_or_reset(dev, remove_debugfs, dentry);
if (rc)
@@ -206,16 +221,24 @@ static ssize_t trigger_poison_list_store(struct device *dev,
}
static DEVICE_ATTR_WO(trigger_poison_list);
-static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+static bool cxl_poison_attr_visible(struct kobject *kobj, struct attribute *a)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
- if (a == &dev_attr_trigger_poison_list.attr)
- if (!test_bit(CXL_POISON_ENABLED_LIST,
- mds->poison.enabled_cmds))
- return 0;
+ if (!mds ||
+ !test_bit(CXL_POISON_ENABLED_LIST, mds->poison.enabled_cmds))
+ return false;
+
+ return true;
+}
+
+static umode_t cxl_mem_visible(struct kobject *kobj, struct attribute *a, int n)
+{
+ if (a == &dev_attr_trigger_poison_list.attr &&
+ !cxl_poison_attr_visible(kobj, a))
+ return 0;
return a->mode;
}
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 1346771edc4e..10a9b8fa2f6b 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -149,6 +149,10 @@ struct cxl_dpa_partition {
#define CXL_NR_PARTITIONS_MAX 2
+struct cxl_memdev_attach {
+ int (*probe)(struct cxl_memdev *cxlmd);
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -225,4 +229,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
})
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
+struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
+ const struct cxl_memdev_attach *attach);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 5/8] sfc: create type2 cxl memdev
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (3 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 4/8] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
` (3 subsequent siblings)
8 siblings, 0 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Martin Habets, Fan Ni, Edward Cree,
Jonathan Cameron
From: Alejandro Lucero <alucerop@amd.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
---
drivers/net/ethernet/sfc/efx_cxl.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 4d55c08cf2a1..7d8a6c2133c8 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -75,6 +75,12 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return -ENODEV;
}
+ cxl->cxlmd = devm_cxl_add_memdev(&cxl->cxlds, NULL);
+ if (IS_ERR(cxl->cxlmd)) {
+ pci_err(pci_dev, "CXL accel memdev creation failed\n");
+ return PTR_ERR(cxl->cxlmd);
+ }
+
probe_data->cxl = cxl;
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (4 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 5/8] sfc: create type2 cxl memdev alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-05-01 2:00 ` Dan Williams (nvidia)
2026-04-23 18:05 ` [PATCH v26 7/8] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
` (2 subsequent siblings)
8 siblings, 2 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
Support an accelerator driver to safely work with an autodiscovered
region from a committed HDM decoder through:
1) an accelerator driver cxl_attach_region struct with attach
and detach callbacks.
2) a specific function, cxl_memdev_attach_region() keeping the
required locks for finding a region linked to the memdev
endpoint, and
3) invoking attach callback while keeping the locking allowing to
work (ioremap and other internal stuff) with the related physical
range by the accelerator driver, and
4) linking a detach callback to the endpoint device removal where
the accelerator driver can stop using the region range.
This covers the cases of a potential removal of cxl_acpi module or a
accelerator memdev unbinding from cxl_mem driver through sysfs.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/cxl/core/region.c | 118 ++++++++++++++++++++++++++++-
drivers/net/ethernet/sfc/efx_cxl.c | 37 +++++++++
drivers/net/ethernet/sfc/efx_cxl.h | 2 +
include/cxl/cxl.h | 17 +++++
4 files changed, 171 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..68f5a1fd1b1c 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
if (rc)
goto err;
- rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
- if (rc)
- return ERR_PTR(rc);
+ /*
+ * For accelerators/type2, region release linked to endpoint device.
+ * See handling of cxl_endpoint_region_autoremove() below by
+ * cxl_memdev_attach_region().
+ */
+ if (type == CXL_DECODER_HOSTONLYMEM) {
+ rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
+ if (rc)
+ return ERR_PTR(rc);
+ }
dev_dbg(port->uport_dev, "%s: created %s\n",
dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
@@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
return 0;
}
+static int first_mapped_decoder(struct device *dev, const void *data)
+{
+ struct cxl_endpoint_decoder *cxled;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ if (cxled->cxld.region)
+ return 1;
+
+ return 0;
+}
+
+/*
+ * As this is running in endpoint port remove context it does not race cxl_root
+ * destruction since port topologies are always removed depth first.
+ */
+static void cxl_endpoint_region_autoremove(void *_cxlr)
+{
+ unregister_region(_cxlr);
+}
+
+/**
+ * cxl_memdev_attach_region - bind region to accelerator memdev
+ *
+ * @cxlmd: a pointer to cxl_memdev to use
+ * @attach: a pointer to region attach struct with callbacks for
+ * safely working with a region range by the caller
+ *
+ * Returns 0 or error.
+ */
+int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
+ struct cxl_attach_region *attach)
+{
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_endpoint_decoder *cxled;
+ struct cxl_region *cxlr;
+ int rc;
+
+ /* hold endpoint lock to setup autoremove of the region */
+ guard(device)(&endpoint->dev);
+ if (!endpoint->dev.driver)
+ return -ENXIO;
+ guard(rwsem_read)(&cxl_rwsem.region);
+ guard(rwsem_read)(&cxl_rwsem.dpa);
+
+ /*
+ * TODO auto-instantiate a region, for now assume this will find an
+ * auto-region
+ */
+ struct device *dev __free(put_device) =
+ device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
+
+ if (!dev) {
+ dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
+ dev_name(&cxlmd->dev));
+ return -ENXIO;
+ }
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ cxlr = cxled->cxld.region;
+
+ if (cxlr->params.state < CXL_CONFIG_COMMIT) {
+ dev_dbg(cxlmd->cxlds->dev,
+ "region %s not committed for memdev %s\n",
+ dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
+ return -ENXIO;
+ }
+
+ if (cxlr->params.nr_targets > 1) {
+ dev_dbg(cxlmd->cxlds->dev,
+ "Only attach to local non-interleaved region\n");
+ return -ENXIO;
+ }
+
+ attach->region = (struct range) {
+ .start = cxlr->params.res->start,
+ .end = cxlr->params.res->end,
+ };
+
+ /*
+ * With endpoint locked leave the caller to safely work with the region
+ * range.
+ */
+ rc = attach->attach(attach->data);
+ if (rc)
+ return rc;
+
+ /* Only teardown regions that pass validation, ignore the rest */
+ rc = devm_add_action_or_reset(&endpoint->dev,
+ cxl_endpoint_region_autoremove, cxlr);
+ if (rc)
+ return rc;
+
+ /* Link type2 driver callback for stopping use of the region range. */
+ rc = devm_add_action_or_reset(&endpoint->dev,
+ attach->detach, attach->data);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
+
static int cxl_region_probe(struct device *dev)
{
struct cxl_region *cxlr = to_cxl_region(dev);
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 7d8a6c2133c8..9ca16b051d62 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -14,6 +14,36 @@
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
+/* Called with cxl endpoint device locked for precluding potential related
+ * cxl region removal triggered from user space, allowing safely mapping of
+ * such cxl region by the sfc driver.
+ */
+static int efx_cxl_map_region(void *data) {
+ struct efx_probe_data *probe_data = data;
+ struct efx_nic *efx = &probe_data->efx;
+ struct pci_dev *pci_dev = efx->pci_dev;
+ struct efx_cxl *cxl = probe_data->cxl;
+ struct range *cxl_pio_range = &cxl->attach_region.region;
+
+ cxl->ctpio_cxl = ioremap(cxl_pio_range->start,
+ cxl_pio_range->end - cxl_pio_range->start + 1);
+ if (!cxl->ctpio_cxl) {
+ pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
+ cxl_pio_range);
+ return -ENOMEM;
+ }
+ probe_data->cxl_pio_initialised = true;
+ return 0;
+}
+
+/* Called at driver exit or when user space triggers cxl region removal. */
+static void efx_cxl_unmap_region(void *data) {
+ struct efx_probe_data *probe_data = data;
+
+ probe_data->cxl_pio_initialised = false;
+ iounmap(probe_data->cxl->ctpio_cxl);
+}
+
int efx_cxl_init(struct efx_probe_data *probe_data)
{
struct efx_nic *efx = &probe_data->efx;
@@ -81,6 +111,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
return PTR_ERR(cxl->cxlmd);
}
+ probe_data->cxl->attach_region.attach = efx_cxl_map_region;
+ probe_data->cxl->attach_region.detach = efx_cxl_unmap_region;
+ probe_data->cxl->attach_region.data = probe_data;
+
+ cxl_memdev_attach_region(cxl->cxlmd,
+ &probe_data->cxl->attach_region);
+
probe_data->cxl = cxl;
return 0;
diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
index 04e46278464d..1c294cd1df56 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.h
+++ b/drivers/net/ethernet/sfc/efx_cxl.h
@@ -20,6 +20,8 @@ struct efx_probe_data;
struct efx_cxl {
struct cxl_dev_state cxlds;
struct cxl_memdev *cxlmd;
+ struct cxl_attach_region attach_region;
+ void __iomem *ctpio_cxl;
};
int efx_cxl_init(struct efx_probe_data *probe_data);
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 10a9b8fa2f6b..22d9435b351f 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -7,6 +7,7 @@
#include <linux/node.h>
#include <linux/ioport.h>
+#include <linux/pci.h>
#include <cxl/mailbox.h>
/**
@@ -153,6 +154,20 @@ struct cxl_memdev_attach {
int (*probe)(struct cxl_memdev *cxlmd);
};
+/**
+ * struct cxl_attach_region - accelerator region handling
+ * @attach: invoked at cxl_memdev_attach_region() with endpoint device locked.
+ * @detach: invoked at endpoint release.
+ * @data: pointer referencing accelerator data for attach and detach calls.
+ * @region: initialised with autodiscovered region values linked to memdev.
+ */
+struct cxl_attach_region {
+ int (*attach)(void *);
+ void (*detach)(void *);
+ void *data;
+ struct range region;
+};
+
/**
* struct cxl_dev_state - The driver device state
*
@@ -231,4 +246,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
const struct cxl_memdev_attach *attach);
+struct cxl_region;
+int cxl_memdev_attach_region(struct cxl_memdev *cxlmd, struct cxl_attach_region *attach);
#endif /* __CXL_CXL_H__ */
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 7/8] cxl: Avoid dax creation for accelerators
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (5 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-04-23 18:05 ` [PATCH v26 8/8] sfc: support pio mapping based on cxl alejandro.lucero-palau
2026-04-23 22:07 ` [PATCH v26 0/8] Type2 device basic support Dave Jiang
8 siblings, 1 reply; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Davidlohr Bueso, Ben Cheatham
From: Alejandro Lucero <alucerop@amd.com>
By definition a type2 cxl device will use the host managed memory for
specific functionality, therefore it should not be available to other
uses like DAX.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
---
drivers/cxl/core/region.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index 68f5a1fd1b1c..3af2aed6c6f1 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -4165,6 +4165,13 @@ static int cxl_region_probe(struct device *dev)
if (rc)
return rc;
+ /*
+ * HDM-D[B] (device-memory) regions have accelerator specific usage.
+ * Skip device-dax registration.
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM)
+ return 0;
+
/*
* From this point on any path that changes the region's state away from
* CXL_CONFIG_COMMIT is also responsible for releasing the driver.
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v26 8/8] sfc: support pio mapping based on cxl
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (6 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 7/8] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2026-04-23 18:05 ` alejandro.lucero-palau
2026-04-23 22:07 ` [PATCH v26 0/8] Type2 device basic support Dave Jiang
8 siblings, 0 replies; 20+ messages in thread
From: alejandro.lucero-palau @ 2026-04-23 18:05 UTC (permalink / raw)
To: linux-cxl, djbw, edward.cree, davem, kuba, pabeni, edumazet,
dave.jiang
Cc: Alejandro Lucero
From: Alejandro Lucero <alucerop@amd.com>
A PIO buffer is a region of device memory to which the driver can write a
packet for TX, with the device handling the transmit doorbell without
requiring a DMA for getting the packet data, which helps reducing latency
in certain exchanges. With CXL mem protocol this latency can be lowered
further.
With a device supporting CXL and successfully initialised, use the cxl
region to map the memory range and use this mapping for PIO buffers.
Add the disabling of those CXL-based PIO buffers if the callback for
potential cxl endpoint removal by the CXL core happens.
Signed-off-by: Alejandro Lucero <alucerop@amd.com>
---
drivers/net/ethernet/sfc/ef10.c | 72 ++++++++++++++++++++++++---
drivers/net/ethernet/sfc/efx.h | 1 +
drivers/net/ethernet/sfc/efx_cxl.c | 2 +
drivers/net/ethernet/sfc/net_driver.h | 2 +
drivers/net/ethernet/sfc/nic.h | 3 ++
5 files changed, 73 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index 7e04f115bbaa..be9bd8d5aec9 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -24,6 +24,7 @@
#include <linux/wait.h>
#include <linux/workqueue.h>
#include <net/udp_tunnel.h>
+#include "efx_cxl.h"
/* Hardware control for EF10 architecture including 'Huntington'. */
@@ -106,7 +107,7 @@ static int efx_ef10_get_vf_index(struct efx_nic *efx)
static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
{
- MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V4_OUT_LEN);
+ MCDI_DECLARE_BUF(outbuf, MC_CMD_GET_CAPABILITIES_V7_OUT_LEN);
struct efx_ef10_nic_data *nic_data = efx->nic_data;
size_t outlen;
int rc;
@@ -177,6 +178,12 @@ static int efx_ef10_init_datapath_caps(struct efx_nic *efx)
efx->num_mac_stats);
}
+ if (outlen < MC_CMD_GET_CAPABILITIES_V7_OUT_LEN)
+ nic_data->datapath_caps3 = 0;
+ else
+ nic_data->datapath_caps3 = MCDI_DWORD(outbuf,
+ GET_CAPABILITIES_V7_OUT_FLAGS3);
+
return 0;
}
@@ -771,6 +778,35 @@ static int efx_ef10_alloc_piobufs(struct efx_nic *efx, unsigned int n)
return rc;
}
+#ifdef CONFIG_SFC_CXL
+/* Invoked from cxl core when a cxl region is removed. This is expected at
+ * driver exit linked to cxl core devm releases which does not require the
+ * below sync.
+ *
+ * However, it is required when user space actions triggger such a cxl region
+ * removal forcing any cxl piobuf usage to stop. Setting per tx queue piobuf
+ * to NULL is safe if such a tx queue is not currently in use inside
+ * efx_hard_start_xmit() implying tx_queue locked.
+ *
+ * After this the cxl region physical range can be safely unmap.
+ */
+void efx_ef10_disable_piobufs(struct efx_nic *efx)
+{
+ struct efx_tx_queue *tx_queue;
+ struct efx_channel *channel;
+
+ local_bh_disable();
+ efx_for_each_channel(channel, efx)
+ efx_for_each_channel_tx_queue(tx_queue, channel) {
+ HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq,
+ smp_processor_id());
+ tx_queue->piobuf = NULL;
+ HARD_TX_UNLOCK(efx->net_dev, tx_queue->core_txq);
+ }
+ local_bh_enable();
+}
+#endif
+
static int efx_ef10_link_piobufs(struct efx_nic *efx)
{
struct efx_ef10_nic_data *nic_data = efx->nic_data;
@@ -1140,6 +1176,9 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
unsigned int channel_vis, pio_write_vi_base, max_vis;
struct efx_ef10_nic_data *nic_data = efx->nic_data;
unsigned int uc_mem_map_size, wc_mem_map_size;
+#ifdef CONFIG_SFC_CXL
+ struct efx_probe_data *probe_data;
+#endif
void __iomem *membase;
int rc;
@@ -1263,8 +1302,25 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
iounmap(efx->membase);
efx->membase = membase;
- /* Set up the WC mapping if needed */
- if (wc_mem_map_size) {
+ if (!wc_mem_map_size)
+ goto skip_pio;
+
+ /* Set up the WC mapping */
+
+#ifdef CONFIG_SFC_CXL
+ probe_data = container_of(efx, struct efx_probe_data, efx);
+ if ((nic_data->datapath_caps3 &
+ (1 << MC_CMD_GET_CAPABILITIES_V7_OUT_CXL_CONFIG_ENABLE_LBN)) &&
+ probe_data->cxl_pio_initialised) {
+ /* Using PIO through CXL mapping */
+ nic_data->pio_write_base = probe_data->cxl->ctpio_cxl;
+ nic_data->pio_write_vi_base = pio_write_vi_base;
+
+ probe_data->cxl_pio_in_use = true;
+ } else
+#endif
+ {
+ /* Using legacy PIO BAR mapping */
nic_data->wc_membase = ioremap_wc(efx->membase_phys +
uc_mem_map_size,
wc_mem_map_size);
@@ -1279,12 +1335,14 @@ static int efx_ef10_dimension_resources(struct efx_nic *efx)
nic_data->wc_membase +
(pio_write_vi_base * efx->vi_stride + ER_DZ_TX_PIOBUF -
uc_mem_map_size);
-
- rc = efx_ef10_link_piobufs(efx);
- if (rc)
- efx_ef10_free_piobufs(efx);
}
+ rc = efx_ef10_link_piobufs(efx);
+ if (rc)
+ efx_ef10_free_piobufs(efx);
+
+skip_pio:
+
netif_dbg(efx, probe, efx->net_dev,
"memory BAR at %pa (virtual %p+%x UC, %p+%x WC)\n",
&efx->membase_phys, efx->membase, uc_mem_map_size,
diff --git a/drivers/net/ethernet/sfc/efx.h b/drivers/net/ethernet/sfc/efx.h
index 45e191686625..37fd1cf96582 100644
--- a/drivers/net/ethernet/sfc/efx.h
+++ b/drivers/net/ethernet/sfc/efx.h
@@ -237,4 +237,5 @@ static inline bool efx_rwsem_assert_write_locked(struct rw_semaphore *sem)
int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
bool flush);
+void efx_ef10_disable_piobufs(struct efx_nic *efx);
#endif /* EFX_EFX_H */
diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
index 9ca16b051d62..8ba1645cfed8 100644
--- a/drivers/net/ethernet/sfc/efx_cxl.c
+++ b/drivers/net/ethernet/sfc/efx_cxl.c
@@ -10,6 +10,7 @@
#include <cxl/cxl.h>
#include <cxl/pci.h>
#include "net_driver.h"
+#include "efx.h"
#include "efx_cxl.h"
#define EFX_CTPIO_BUFFER_SIZE SZ_256M
@@ -40,6 +41,7 @@ static int efx_cxl_map_region(void *data) {
static void efx_cxl_unmap_region(void *data) {
struct efx_probe_data *probe_data = data;
+ efx_ef10_disable_piobufs(&probe_data->efx);
probe_data->cxl_pio_initialised = false;
iounmap(probe_data->cxl->ctpio_cxl);
}
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index 3964b2c56609..bea4eecdf842 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1207,6 +1207,7 @@ struct efx_cxl;
* @efx: Efx NIC details
* @cxl: details of related cxl objects
* @cxl_pio_initialised: cxl initialization outcome.
+ * @cxl_pio_in_use: PIO using CXL mapping
*/
struct efx_probe_data {
struct pci_dev *pci_dev;
@@ -1214,6 +1215,7 @@ struct efx_probe_data {
#ifdef CONFIG_SFC_CXL
struct efx_cxl *cxl;
bool cxl_pio_initialised;
+ bool cxl_pio_in_use;
#endif
};
diff --git a/drivers/net/ethernet/sfc/nic.h b/drivers/net/ethernet/sfc/nic.h
index ec3b2df43b68..7480f9995dfb 100644
--- a/drivers/net/ethernet/sfc/nic.h
+++ b/drivers/net/ethernet/sfc/nic.h
@@ -152,6 +152,8 @@ enum {
* %MC_CMD_GET_CAPABILITIES response)
* @datapath_caps2: Further Capabilities of datapath firmware (FLAGS2 field of
* %MC_CMD_GET_CAPABILITIES response)
+ * @datapath_caps3: Further Capabilities of datapath firmware (FLAGS3 field of
+ * %MC_CMD_GET_CAPABILITIES response)
* @rx_dpcpu_fw_id: Firmware ID of the RxDPCPU
* @tx_dpcpu_fw_id: Firmware ID of the TxDPCPU
* @must_probe_vswitching: Flag: vswitching has yet to be setup after MC reboot
@@ -187,6 +189,7 @@ struct efx_ef10_nic_data {
bool must_check_datapath_caps;
u32 datapath_caps;
u32 datapath_caps2;
+ u32 datapath_caps3;
unsigned int rx_dpcpu_fw_id;
unsigned int tx_dpcpu_fw_id;
bool must_probe_vswitching;
--
2.34.1
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v26 0/8] Type2 device basic support
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
` (7 preceding siblings ...)
2026-04-23 18:05 ` [PATCH v26 8/8] sfc: support pio mapping based on cxl alejandro.lucero-palau
@ 2026-04-23 22:07 ` Dave Jiang
8 siblings, 0 replies; 20+ messages in thread
From: Dave Jiang @ 2026-04-23 22:07 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet
Cc: Alejandro Lucero
On 4/23/26 11:05 AM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> I feel this version suits better a RFC based on v25 plus the changes
> introduced. It tries to address the problem of user space actions
> triggering accelerator region removal, and extending Dan's approach
> in here:
>
> https://lore.kernel.org/linux-cxl/20260403210050.1058650-1-dan.j.williams@intel.com/T/#m2d9e5c59ce4873641037f9bcd2f70d5679cda03c
>
> As discussed there, the protection needs to cover the accelerator driver
> handling of the region physical range for things like ioremap, and to
> offer a way for such driver to undo things or at least stop using that
> very same physical range if the region disappears. It ends up merging
> several of v25 patches into just one.
>
> Another change is linking the region release to the endpoint device and
> keep with Dan's autoremove. This makes unnecessary the previous cxl_exit
> and similar actions are now performed inside region detach callback.
So this would only work if the region is single device right? Which is fine for initial implementation. Maybe make a comment in the code that this is the assumption being made for the person one day needing to deal with the possibility of interleaving.
DJ
>
> I think the outcome is quite promising, lowering the accelerator driver
> complexity regarding region handling. The cost is those new callbacks
> which I think are totally necessary for doing things safely with the
> region by the driver.
>
> I have successfully tested the cases of cxl_acpi removal while sfc
> driver remaining, and cxl_mem unbinding sfc cxl memdev.
>
> v26 changes:
> - patch 6: it implements all the new functionlity described above.
> - patch 8: region detach callback invokes a new sfc function for safely
> stopping cxl-backed piobufs use.
>
> Alejandro Lucero (8):
> sfc: add cxl support
> cxl/sfc: Map cxl regs
> cxl/sfc: Initialize dpa without a mailbox
> cxl: Prepare memdev creation for type2
> sfc: create type2 cxl memdev
> cxl: attach region to an accelerator/type2 memdev
> cxl: Avoid dax creation for accelerators
> sfc: support pio mapping based on cxl
>
> drivers/cxl/core/core.h | 2 +
> drivers/cxl/core/mbox.c | 51 +---------
> drivers/cxl/core/memdev.c | 81 +++++++++++++++-
> drivers/cxl/core/pci.c | 1 +
> drivers/cxl/core/port.c | 1 +
> drivers/cxl/core/region.c | 125 ++++++++++++++++++++++++-
> drivers/cxl/core/regs.c | 1 +
> drivers/cxl/cxlmem.h | 6 --
> drivers/cxl/cxlpci.h | 12 ---
> drivers/cxl/mem.c | 45 ++++++---
> drivers/cxl/pci.c | 1 +
> drivers/net/ethernet/sfc/Kconfig | 9 ++
> drivers/net/ethernet/sfc/Makefile | 1 +
> drivers/net/ethernet/sfc/ef10.c | 72 +++++++++++++--
> drivers/net/ethernet/sfc/efx.c | 14 ++-
> drivers/net/ethernet/sfc/efx.h | 1 +
> drivers/net/ethernet/sfc/efx_cxl.c | 128 ++++++++++++++++++++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 31 +++++++
> drivers/net/ethernet/sfc/net_driver.h | 12 +++
> drivers/net/ethernet/sfc/nic.h | 3 +
> include/cxl/cxl.h | 25 +++++
> include/cxl/pci.h | 22 +++++
> 22 files changed, 552 insertions(+), 92 deletions(-)
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.c
> create mode 100644 drivers/net/ethernet/sfc/efx_cxl.h
> create mode 100644 include/cxl/pci.h
>
>
> base-commit: 6596a02b207886e9e00bb0161c7fd59fea53c081
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 1/8] sfc: add cxl support
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
@ 2026-04-29 21:14 ` Cheatham, Benjamin
2026-05-01 10:07 ` Alejandro Lucero Palau
0 siblings, 1 reply; 20+ messages in thread
From: Cheatham, Benjamin @ 2026-04-29 21:14 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Edward Cree, Alison Schofield,
Dan Williams
On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Add CXL initialization based on new CXL API for accel drivers and make
> it dependent on kernel CXL configuration.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Code looks good, just some notes on the copyright notices and a tiny nit. With the copyright fixed:
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
[snip]
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> new file mode 100644
> index 000000000000..b7e8d85a43d3
> --- /dev/null
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/****************************************************************************
> + *
> + * Driver for AMD network controllers and boards
> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
I'm not a lawyer, but I'm pretty sure the year needs to be 2026 here.
> + */
> +
> +#include <linux/pci.h>
> +
> +#include "net_driver.h"
> +#include "efx_cxl.h"
> +
> +#define EFX_CTPIO_BUFFER_SIZE SZ_256M
> +
> +int efx_cxl_init(struct efx_probe_data *probe_data)
> +{
> + struct efx_nic *efx = &probe_data->efx;
> + struct pci_dev *pci_dev = efx->pci_dev;
> + struct efx_cxl *cxl;
> + u16 dvsec;
> +
> + probe_data->cxl_pio_initialised = false;
> +
> + /* Is the device configured with and using CXL? */
> + if (!pcie_is_cxl(pci_dev))
> + return 0;
> +
> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
> + PCI_DVSEC_CXL_DEVICE);
> + if (!dvsec) {
> + pci_info(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability not found\n");
> + return 0;
> + }
> +
> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
> +
> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
> + * specifying no mbox available.
> + */
> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
> + pci_get_dsn(pci_dev), dvsec,
> + struct efx_cxl, cxlds, false);
> +
> + if (!cxl)
> + return -ENOMEM;
> +
> + probe_data->cxl = cxl;
> +
> + return 0;
> +}
> +
> +MODULE_IMPORT_NS("CXL");
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
> new file mode 100644
> index 000000000000..04e46278464d
> --- /dev/null
> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> @@ -0,0 +1,29 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/****************************************************************************
> + * Driver for AMD network controllers and boards
> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
Same thing here.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU General Public License version 2 as published
> + * by the Free Software Foundation, incorporated herein by reference.
> + */
> +
> +#ifndef EFX_CXL_H
> +#define EFX_CXL_H
> +
> +#ifdef CONFIG_SFC_CXL
> +
> +#include <cxl/cxl.h>
> +
> +struct efx_probe_data;
> +
> +struct efx_cxl {
> + struct cxl_dev_state cxlds;
> + struct cxl_memdev *cxlmd;
> +};
> +
> +int efx_cxl_init(struct efx_probe_data *probe_data);
> +#else
> +static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
> +#endif
> +#endif
> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> index b98c259f672d..3964b2c56609 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -1197,14 +1197,24 @@ struct efx_nic {
> atomic_t n_rx_noskb_drops;
> };
>
> +#ifdef CONFIG_SFC_CXL
> +struct efx_cxl;
> +#endif
> +
> /**
> * struct efx_probe_data - State after hardware probe
> * @pci_dev: The PCI device
> * @efx: Efx NIC details
> + * @cxl: details of related cxl objects
> + * @cxl_pio_initialised: cxl initialization outcome.
Tiny nit: The description of the variable should use the english spelling as well (i.e. initialisation).
> */
> struct efx_probe_data {
> struct pci_dev *pci_dev;
> struct efx_nic efx;
> +#ifdef CONFIG_SFC_CXL
> + struct efx_cxl *cxl;
> + bool cxl_pio_initialised;
> +#endif
> };
>
> static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
@ 2026-04-29 21:14 ` Cheatham, Benjamin
2026-05-01 10:35 ` Alejandro Lucero Palau
2026-05-01 2:00 ` Dan Williams (nvidia)
1 sibling, 1 reply; 20+ messages in thread
From: Cheatham, Benjamin @ 2026-04-29 21:14 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Support an accelerator driver to safely work with an autodiscovered
> region from a committed HDM decoder through:
>
> 1) an accelerator driver cxl_attach_region struct with attach
> and detach callbacks.
>
> 2) a specific function, cxl_memdev_attach_region() keeping the
> required locks for finding a region linked to the memdev
> endpoint, and
>
> 3) invoking attach callback while keeping the locking allowing to
> work (ioremap and other internal stuff) with the related physical
> range by the accelerator driver, and
>
> 4) linking a detach callback to the endpoint device removal where
> the accelerator driver can stop using the region range.
>
> This covers the cases of a potential removal of cxl_acpi module or a
> accelerator memdev unbinding from cxl_mem driver through sysfs.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/region.c | 118 ++++++++++++++++++++++++++++-
> drivers/net/ethernet/sfc/efx_cxl.c | 37 +++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 2 +
> include/cxl/cxl.h | 17 +++++
> 4 files changed, 171 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..68f5a1fd1b1c 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> if (rc)
> goto err;
>
> - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
> - if (rc)
> - return ERR_PTR(rc);
> + /*
> + * For accelerators/type2, region release linked to endpoint device.
> + * See handling of cxl_endpoint_region_autoremove() below by
> + * cxl_memdev_attach_region().
> + */
> + if (type == CXL_DECODER_HOSTONLYMEM) {
> + rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
> + if (rc)
> + return ERR_PTR(rc);
> + }
>
> dev_dbg(port->uport_dev, "%s: created %s\n",
> dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
> @@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
> return 0;
> }
>
> +static int first_mapped_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + if (cxled->cxld.region)
> + return 1;
> +
> + return 0;
> +}
> +
> +/*
> + * As this is running in endpoint port remove context it does not race cxl_root
> + * destruction since port topologies are always removed depth first.
> + */
> +static void cxl_endpoint_region_autoremove(void *_cxlr)
> +{
> + unregister_region(_cxlr);
> +}
> +
> +/**
> + * cxl_memdev_attach_region - bind region to accelerator memdev
> + *
> + * @cxlmd: a pointer to cxl_memdev to use
> + * @attach: a pointer to region attach struct with callbacks for
> + * safely working with a region range by the caller
> + *
> + * Returns 0 or error.
> + */
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
> + struct cxl_attach_region *attach)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + int rc;
> +
> + /* hold endpoint lock to setup autoremove of the region */
> + guard(device)(&endpoint->dev);
> + if (!endpoint->dev.driver)
> + return -ENXIO;
> + guard(rwsem_read)(&cxl_rwsem.region);
> + guard(rwsem_read)(&cxl_rwsem.dpa);
> +
> + /*
> + * TODO auto-instantiate a region, for now assume this will find an
> + * auto-region
> + */
> + struct device *dev __free(put_device) =
> + device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
> +
> + if (!dev) {
> + dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
> + dev_name(&cxlmd->dev));
> + return -ENXIO;
> + }
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + cxlr = cxled->cxld.region;
> +
> + if (cxlr->params.state < CXL_CONFIG_COMMIT) {
> + dev_dbg(cxlmd->cxlds->dev,
> + "region %s not committed for memdev %s\n",
> + dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
> + return -ENXIO;
> + }
> +
> + if (cxlr->params.nr_targets > 1) {
> + dev_dbg(cxlmd->cxlds->dev,
> + "Only attach to local non-interleaved region\n");
> + return -ENXIO;
> + }
> +
> + attach->region = (struct range) {
> + .start = cxlr->params.res->start,
> + .end = cxlr->params.res->end,
> + };
> +
> + /*
> + * With endpoint locked leave the caller to safely work with the region
> + * range.
> + */
This reads a bit weirdly, how about: "Let the caller do its region-specific setup"? I
don't think it's important to mention the endpoint being locked here.
> + rc = attach->attach(attach->data);
> + if (rc)
> + return rc;
> +
> + /* Only teardown regions that pass validation, ignore the rest */
> + rc = devm_add_action_or_reset(&endpoint->dev,
> + cxl_endpoint_region_autoremove, cxlr);
> + if (rc)
> + return rc;
> +
> + /* Link type2 driver callback for stopping use of the region range. */
> + rc = devm_add_action_or_reset(&endpoint->dev,
> + attach->detach, attach->data);
> + if (rc)
> + return rc;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
> +
> static int cxl_region_probe(struct device *dev)
> {
> struct cxl_region *cxlr = to_cxl_region(dev);
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
> index 7d8a6c2133c8..9ca16b051d62 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.c
> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
> @@ -14,6 +14,36 @@
>
> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>
> +/* Called with cxl endpoint device locked for precluding potential related
> + * cxl region removal triggered from user space, allowing safely mapping of
> + * such cxl region by the sfc driver.
> + */
> +static int efx_cxl_map_region(void *data) {
> + struct efx_probe_data *probe_data = data;
> + struct efx_nic *efx = &probe_data->efx;
> + struct pci_dev *pci_dev = efx->pci_dev;
> + struct efx_cxl *cxl = probe_data->cxl;
> + struct range *cxl_pio_range = &cxl->attach_region.region;
> +
> + cxl->ctpio_cxl = ioremap(cxl_pio_range->start,
> + cxl_pio_range->end - cxl_pio_range->start + 1);
> + if (!cxl->ctpio_cxl) {
> + pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
> + cxl_pio_range);
> + return -ENOMEM;
> + }
> + probe_data->cxl_pio_initialised = true;
> + return 0;
> +}
> +
> +/* Called at driver exit or when user space triggers cxl region removal. */
> +static void efx_cxl_unmap_region(void *data) {
> + struct efx_probe_data *probe_data = data;
> +
> + probe_data->cxl_pio_initialised = false;
> + iounmap(probe_data->cxl->ctpio_cxl);
> +}
> +
> int efx_cxl_init(struct efx_probe_data *probe_data)
> {
> struct efx_nic *efx = &probe_data->efx;
> @@ -81,6 +111,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
> return PTR_ERR(cxl->cxlmd);
> }
>
> + probe_data->cxl->attach_region.attach = efx_cxl_map_region;
> + probe_data->cxl->attach_region.detach = efx_cxl_unmap_region;
> + probe_data->cxl->attach_region.data = probe_data;
> +
> + cxl_memdev_attach_region(cxl->cxlmd,
> + &probe_data->cxl->attach_region);
I would think you want to check the return code here, no? I looked at the last patch
and don't think this will cause issues, but I would at least expect some debug output
here.
> +
> probe_data->cxl = cxl;
>
> return 0;
> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
> index 04e46278464d..1c294cd1df56 100644
> --- a/drivers/net/ethernet/sfc/efx_cxl.h
> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
> @@ -20,6 +20,8 @@ struct efx_probe_data;
> struct efx_cxl {
> struct cxl_dev_state cxlds;
> struct cxl_memdev *cxlmd;
> + struct cxl_attach_region attach_region;
> + void __iomem *ctpio_cxl;
> };
>
> int efx_cxl_init(struct efx_probe_data *probe_data);
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 10a9b8fa2f6b..22d9435b351f 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -7,6 +7,7 @@
>
> #include <linux/node.h>
> #include <linux/ioport.h>
> +#include <linux/pci.h>
> #include <cxl/mailbox.h>
>
> /**
> @@ -153,6 +154,20 @@ struct cxl_memdev_attach {
> int (*probe)(struct cxl_memdev *cxlmd);
> };
>
> +/**
> + * struct cxl_attach_region - accelerator region handling
> + * @attach: invoked at cxl_memdev_attach_region() with endpoint device locked.
> + * @detach: invoked at endpoint release.
> + * @data: pointer referencing accelerator data for attach and detach calls.
> + * @region: initialised with autodiscovered region values linked to memdev.
> + */
> +struct cxl_attach_region {
> + int (*attach)(void *);
> + void (*detach)(void *);
> + void *data;
> + struct range region;
> +};
I like your approach here, but have a problem with this. It seems like an attempt to circumvent the cxl_memdev_attach
struct above, which makes it borderline useless (as of now). Would you be opposed to folding it into cxl_memdev_attach
instead? I'm thinking doing something like:
/**
* struct cxl_memdev_attach - CXL memdev driver and region attach handlers
*
* @mem_probe: Callback used during cxl_mem::probe for memdev specific setup
* @region_attach: Device specific setup invoked during cxl_memdev_attach_region() with endpoint device locked
* @region_detach: Device specific cleanup invoked at endpoint release
* @region_data: Pointer passed to region_attach and region_detach calls
* @region: Memory range covered by autodiscovered region linked to memdev
*/
struct cxl_memdev_attach {
int (*mem_probe)(struct cxl_memdev *cxlmd);
int (*region_attach)(void *data);
int (*region_detach)(void *data);
void *region_data;
struct range region;
};
and then replace cxl_attach_region with cxl_memdev_attach accordingly. I think that's a 3-way win: You get
what you want to do, Dan's work isn't invalidated, and cxl_memdev_attach becomes a one-stop shop for all
CXL accelerator attach shenanigans.
> +
> /**
> * struct cxl_dev_state - The driver device state
> *
> @@ -231,4 +246,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
> int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> const struct cxl_memdev_attach *attach);
> +struct cxl_region;
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd, struct cxl_attach_region *attach);
> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 7/8] cxl: Avoid dax creation for accelerators
2026-04-23 18:05 ` [PATCH v26 7/8] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
@ 2026-04-29 21:14 ` Cheatham, Benjamin
0 siblings, 0 replies; 20+ messages in thread
From: Cheatham, Benjamin @ 2026-04-29 21:14 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Jonathan Cameron, Davidlohr Bueso
On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> By definition a type2 cxl device will use the host managed memory for
> specific functionality, therefore it should not be available to other
> uses like DAX.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Davidlohr Bueso <daves@stgolabs.net>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> ---
> drivers/cxl/core/region.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 68f5a1fd1b1c..3af2aed6c6f1 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4165,6 +4165,13 @@ static int cxl_region_probe(struct device *dev)
> if (rc)
> return rc;
>
> + /*
> + * HDM-D[B] (device-memory) regions have accelerator specific usage.
> + * Skip device-dax registration.
> + */
> + if (cxlr->type == CXL_DECODER_DEVMEM)
> + return 0;
This should probably go before the call to cxl_region_can_probe() to avoid
taking the cxl_rwsem. Not really a deal-breaker but would be a good optimization.
Thanks,
Ben
> +
> /*
> * From this point on any path that changes the region's state away from
> * CXL_CONFIG_COMMIT is also responsible for releasing the driver.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 4/8] cxl: Prepare memdev creation for type2
2026-04-23 18:05 ` [PATCH v26 4/8] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
@ 2026-04-30 23:23 ` Dan Williams (nvidia)
0 siblings, 0 replies; 20+ messages in thread
From: Dan Williams (nvidia) @ 2026-04-30 23:23 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero, Ben Cheatham, Jonathan Cameron,
Alison Schofield
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
> creating a memdev leading to problems when obtaining cxl_memdev_state
> references from a CXL_DEVTYPE_DEVMEM type.
>
> Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
> support.
>
> Make devm_cxl_add_memdev accessible from an accel driver.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
This did not address the comment I had on the last posting about not
explicitly calling out accelerators.
https://lore.kernel.org/all/69cb4379e2b73_1b0cc610085@dwillia2-mobl4.notmuch/
So a small naming change to make this more palatable because the current
"cxl_memdev_type" is a superset of all possible CXL memdevs. In other
words, PCI_CLASS_MEMORY_CXL, requires the presence of a mailbox. Other
devices with CXL.mem may not be "accelerators".
See below a proposal to flip the polarity of the naming:
> ---
> drivers/cxl/core/memdev.c | 15 +++++++++++--
> drivers/cxl/cxlmem.h | 6 ------
> drivers/cxl/mem.c | 45 +++++++++++++++++++++++++++++----------
> include/cxl/cxl.h | 6 ++++++
> 4 files changed, 53 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> index b8ff1c07e528..ef39803bb139 100644
> --- a/drivers/cxl/core/memdev.c
> +++ b/drivers/cxl/core/memdev.c
> @@ -7,6 +7,7 @@
> #include <linux/slab.h>
> #include <linux/idr.h>
> #include <linux/pci.h>
> +#include <cxl/cxl.h>
> #include <cxlmem.h>
> #include "trace.h"
> #include "core.h"
> @@ -579,9 +580,16 @@ static const struct device_type cxl_memdev_type = {
> .groups = cxl_memdev_attribute_groups,
> };
>
> +static const struct device_type cxl_accel_memdev_type = {
> + .name = "cxl_accel_memdev",
> + .release = cxl_memdev_release,
> + .devnode = cxl_memdev_devnode,
> +};
/*
* CXL memory device with mandatory PCI_CLASS_MEMORY_CXL functions
* like a mailbox and corresponding sysfs ABI.
*/
static const struct device_type cxl_class_memdev_type = {
.name = "cxl_memdev",
.release = cxl_memdev_release,
.devnode = cxl_memdev_devnode,
.groups = cxl_memdev_attribute_groups,
};
/* CXL memory device with no class device attributes */
static const struct device_type cxl_memdev_type = {
.name = "cxl_memdev",
.release = cxl_memdev_release,
.devnode = cxl_memdev_devnode,
};
That ".name" can be the same between the types because to userspace they
are fundamentally the same type of object. The distinction can be
determined by walking sysfs to see what is there.
I do not want userspace to start keying off of the DEVTYPE uevent
variable distinction unless we have a really good reason for that.
> bool is_cxl_memdev(const struct device *dev)
> {
> - return dev->type == &cxl_memdev_type;
> + return (dev->type == &cxl_memdev_type ||
> + dev->type == &cxl_accel_memdev_type);
> }
> EXPORT_SYMBOL_NS_GPL(is_cxl_memdev, "CXL");
>
> @@ -776,7 +784,10 @@ static struct cxl_memdev *cxl_memdev_alloc(struct cxl_dev_state *cxlds,
> dev->parent = cxlds->dev;
> dev->bus = &cxl_bus_type;
> dev->devt = MKDEV(cxl_mem_major, cxlmd->id);
> - dev->type = &cxl_memdev_type;
> + if (cxlds->type == CXL_DEVTYPE_DEVMEM)
> + dev->type = &cxl_accel_memdev_type;
> + else
> + dev->type = &cxl_memdev_type;
> device_set_pm_not_required(dev);
> INIT_WORK(&cxlmd->detach_work, detach_memdev);
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 776c50d1db51..92cca400d113 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -34,10 +34,6 @@
> (FIELD_GET(CXLMDEV_RESET_NEEDED_MASK, status) != \
> CXLMDEV_RESET_NEEDED_NOT)
>
> -struct cxl_memdev_attach {
> - int (*probe)(struct cxl_memdev *cxlmd);
> -};
> -
> /**
> * struct cxl_memdev - CXL bus object representing a Type-3 Memory Device
> * @dev: driver core device object
> @@ -103,8 +99,6 @@ static inline bool is_cxl_endpoint(struct cxl_port *port)
>
> struct cxl_memdev *__devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> const struct cxl_memdev_attach *attach);
> -struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
> - const struct cxl_memdev_attach *attach);
One of the dirty aspects of my RFC was continuing with the idea that
drivers external to CXL would need to worry about all the details around
how to construct cxl_memdev_attach.
So similar to above I think we want to have a
devm_cxl_add_class_memdev() that only the cxl_pci driver uses, and a
specific export for the single-memdev-to-single-region case.
devm_cxl_add
I will move some of that into the rework of unregister_region() I am
doing to solve the sysfs region deletion race. It is a similar problem.
> int devm_cxl_sanitize_setup_notifier(struct device *host,
> struct cxl_memdev *cxlmd);
> struct cxl_memdev_state;
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index fcffe24dcb42..ff858318091f 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -65,6 +65,26 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
> DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
> cxl_debugfs_poison_clear, "%llx\n");
>
> +static void cxl_memdev_poison_enable(struct cxl_memdev_state *mds,
> + struct cxl_memdev *cxlmd,
> + struct dentry *dentry)
> +{
> + /*
> + * Avoid poison debugfs for DEVMEM aka accelerators as they rely on
> + * cxl_memdev_state.
> + */
"Disable poison debugfs for memdevs without mailboxes."
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
@ 2026-05-01 2:00 ` Dan Williams (nvidia)
2026-05-01 10:59 ` Alejandro Lucero Palau
1 sibling, 1 reply; 20+ messages in thread
From: Dan Williams (nvidia) @ 2026-05-01 2:00 UTC (permalink / raw)
To: alejandro.lucero-palau, linux-cxl, djbw, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Cc: Alejandro Lucero
alejandro.lucero-palau@ wrote:
> From: Alejandro Lucero <alucerop@amd.com>
>
> Support an accelerator driver to safely work with an autodiscovered
> region from a committed HDM decoder through:
>
> 1) an accelerator driver cxl_attach_region struct with attach
> and detach callbacks.
>
> 2) a specific function, cxl_memdev_attach_region() keeping the
> required locks for finding a region linked to the memdev
> endpoint, and
>
> 3) invoking attach callback while keeping the locking allowing to
> work (ioremap and other internal stuff) with the related physical
> range by the accelerator driver, and
>
> 4) linking a detach callback to the endpoint device removal where
> the accelerator driver can stop using the region range.
>
> This covers the cases of a potential removal of cxl_acpi module or a
> accelerator memdev unbinding from cxl_mem driver through sysfs.
>
> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
> ---
> drivers/cxl/core/region.c | 118 ++++++++++++++++++++++++++++-
> drivers/net/ethernet/sfc/efx_cxl.c | 37 +++++++++
> drivers/net/ethernet/sfc/efx_cxl.h | 2 +
> include/cxl/cxl.h | 17 +++++
> 4 files changed, 171 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..68f5a1fd1b1c 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
> if (rc)
> goto err;
>
> - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
> - if (rc)
> - return ERR_PTR(rc);
> + /*
> + * For accelerators/type2, region release linked to endpoint device.
> + * See handling of cxl_endpoint_region_autoremove() below by
> + * cxl_memdev_attach_region().
> + */
> + if (type == CXL_DECODER_HOSTONLYMEM) {
> + rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
> + if (rc)
> + return ERR_PTR(rc);
> + }
A couple problems here.
1/ Nothing stops a CXL class device from implementing a decoder with
CXL_DECODER_DEVMEM (HDM-DB).
2/ This breaks the automatic cleanup of autoassembly failures in
construct_region().
We simply need to support multiple independent sources of
unregister_region(). Stay tuned for a scheme for that.
>
> dev_dbg(port->uport_dev, "%s: created %s\n",
> dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
> @@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
> return 0;
> }
>
> +static int first_mapped_decoder(struct device *dev, const void *data)
> +{
> + struct cxl_endpoint_decoder *cxled;
> +
> + if (!is_endpoint_decoder(dev))
> + return 0;
> +
> + cxled = to_cxl_endpoint_decoder(dev);
> + if (cxled->cxld.region)
> + return 1;
> +
> + return 0;
> +}
> +
> +/*
> + * As this is running in endpoint port remove context it does not race cxl_root
> + * destruction since port topologies are always removed depth first.
> + */
> +static void cxl_endpoint_region_autoremove(void *_cxlr)
> +{
> + unregister_region(_cxlr);
> +}
> +
> +/**
> + * cxl_memdev_attach_region - bind region to accelerator memdev
> + *
> + * @cxlmd: a pointer to cxl_memdev to use
> + * @attach: a pointer to region attach struct with callbacks for
> + * safely working with a region range by the caller
> + *
> + * Returns 0 or error.
> + */
> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
> + struct cxl_attach_region *attach)
> +{
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_endpoint_decoder *cxled;
> + struct cxl_region *cxlr;
> + int rc;
> +
> + /* hold endpoint lock to setup autoremove of the region */
> + guard(device)(&endpoint->dev);
This does not handle the case when ->endpoint is an ERR_PTR() because
the memdev never attached in the first instance.
> +/* Called at driver exit or when user space triggers cxl region removal. */
> +static void efx_cxl_unmap_region(void *data) {
> + struct efx_probe_data *probe_data = data;
> +
> + probe_data->cxl_pio_initialised = false;
> + iounmap(probe_data->cxl->ctpio_cxl);
> +}
I do not see how an async event can safely zap that ctpio_cxl space with
zero coordination with the driver, and I do not think you want to burden
the fast path with new locks to coordinate this.
Can we please stick with the violent but simple "unload driver" approach
for now? Someone removing cxl_acpi, disabling port drivers, or disabling
the cxl_mem driver gets to keep all the pieces. Just like force
unloading your storage driver underneath your root filesystem. Do not do
it unless you want to see the fireworks or test various hotplug flows.
This graceful handling of something that should never happen, outside of a
test suite exercising CXL core object lifetimes, is not a near term need.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 1/8] sfc: add cxl support
2026-04-29 21:14 ` Cheatham, Benjamin
@ 2026-05-01 10:07 ` Alejandro Lucero Palau
0 siblings, 0 replies; 20+ messages in thread
From: Alejandro Lucero Palau @ 2026-05-01 10:07 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau, linux-cxl, djbw,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
Cc: Jonathan Cameron, Edward Cree, Alison Schofield, Dan Williams
On 4/29/26 22:14, Cheatham, Benjamin wrote:
> On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Add CXL initialization based on new CXL API for accel drivers and make
>> it dependent on kernel CXL configuration.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> Acked-by: Edward Cree <ecree.xilinx@gmail.com>
>> Reviewed-by: Alison Schofield <alison.schofield@intel.com>
>> Reviewed-by: Dan Williams <dan.j.williams@intel.com>
>> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Code looks good, just some notes on the copyright notices and a tiny nit. With the copyright fixed:
>
> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Thanks. I'll fix those minor issues.
>
> [snip]
>
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> new file mode 100644
>> index 000000000000..b7e8d85a43d3
>> --- /dev/null
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -0,0 +1,52 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/****************************************************************************
>> + *
>> + * Driver for AMD network controllers and boards
>> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
> I'm not a lawyer, but I'm pretty sure the year needs to be 2026 here.
>> + */
>> +
>> +#include <linux/pci.h>
>> +
>> +#include "net_driver.h"
>> +#include "efx_cxl.h"
>> +
>> +#define EFX_CTPIO_BUFFER_SIZE SZ_256M
>> +
>> +int efx_cxl_init(struct efx_probe_data *probe_data)
>> +{
>> + struct efx_nic *efx = &probe_data->efx;
>> + struct pci_dev *pci_dev = efx->pci_dev;
>> + struct efx_cxl *cxl;
>> + u16 dvsec;
>> +
>> + probe_data->cxl_pio_initialised = false;
>> +
>> + /* Is the device configured with and using CXL? */
>> + if (!pcie_is_cxl(pci_dev))
>> + return 0;
>> +
>> + dvsec = pci_find_dvsec_capability(pci_dev, PCI_VENDOR_ID_CXL,
>> + PCI_DVSEC_CXL_DEVICE);
>> + if (!dvsec) {
>> + pci_info(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability not found\n");
>> + return 0;
>> + }
>> +
>> + pci_dbg(pci_dev, "CXL_DVSEC_PCIE_DEVICE capability found\n");
>> +
>> + /* Create a cxl_dev_state embedded in the cxl struct using cxl core api
>> + * specifying no mbox available.
>> + */
>> + cxl = devm_cxl_dev_state_create(&pci_dev->dev, CXL_DEVTYPE_DEVMEM,
>> + pci_get_dsn(pci_dev), dvsec,
>> + struct efx_cxl, cxlds, false);
>> +
>> + if (!cxl)
>> + return -ENOMEM;
>> +
>> + probe_data->cxl = cxl;
>> +
>> + return 0;
>> +}
>> +
>> +MODULE_IMPORT_NS("CXL");
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
>> new file mode 100644
>> index 000000000000..04e46278464d
>> --- /dev/null
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
>> @@ -0,0 +1,29 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/****************************************************************************
>> + * Driver for AMD network controllers and boards
>> + * Copyright (C) 2025, Advanced Micro Devices, Inc.
> Same thing here.
>
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms of the GNU General Public License version 2 as published
>> + * by the Free Software Foundation, incorporated herein by reference.
>> + */
>> +
>> +#ifndef EFX_CXL_H
>> +#define EFX_CXL_H
>> +
>> +#ifdef CONFIG_SFC_CXL
>> +
>> +#include <cxl/cxl.h>
>> +
>> +struct efx_probe_data;
>> +
>> +struct efx_cxl {
>> + struct cxl_dev_state cxlds;
>> + struct cxl_memdev *cxlmd;
>> +};
>> +
>> +int efx_cxl_init(struct efx_probe_data *probe_data);
>> +#else
>> +static inline int efx_cxl_init(struct efx_probe_data *probe_data) { return 0; }
>> +#endif
>> +#endif
>> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
>> index b98c259f672d..3964b2c56609 100644
>> --- a/drivers/net/ethernet/sfc/net_driver.h
>> +++ b/drivers/net/ethernet/sfc/net_driver.h
>> @@ -1197,14 +1197,24 @@ struct efx_nic {
>> atomic_t n_rx_noskb_drops;
>> };
>>
>> +#ifdef CONFIG_SFC_CXL
>> +struct efx_cxl;
>> +#endif
>> +
>> /**
>> * struct efx_probe_data - State after hardware probe
>> * @pci_dev: The PCI device
>> * @efx: Efx NIC details
>> + * @cxl: details of related cxl objects
>> + * @cxl_pio_initialised: cxl initialization outcome.
> Tiny nit: The description of the variable should use the english spelling as well (i.e. initialisation).
>
>> */
>> struct efx_probe_data {
>> struct pci_dev *pci_dev;
>> struct efx_nic efx;
>> +#ifdef CONFIG_SFC_CXL
>> + struct efx_cxl *cxl;
>> + bool cxl_pio_initialised;
>> +#endif
>> };
>>
>> static inline struct efx_nic *efx_netdev_priv(struct net_device *dev)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-04-29 21:14 ` Cheatham, Benjamin
@ 2026-05-01 10:35 ` Alejandro Lucero Palau
0 siblings, 0 replies; 20+ messages in thread
From: Alejandro Lucero Palau @ 2026-05-01 10:35 UTC (permalink / raw)
To: Cheatham, Benjamin, alejandro.lucero-palau, linux-cxl, djbw,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 4/29/26 22:14, Cheatham, Benjamin wrote:
> On 4/23/2026 1:05 PM, alejandro.lucero-palau@amd.com wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Support an accelerator driver to safely work with an autodiscovered
>> region from a committed HDM decoder through:
>>
>> 1) an accelerator driver cxl_attach_region struct with attach
>> and detach callbacks.
>>
>> 2) a specific function, cxl_memdev_attach_region() keeping the
>> required locks for finding a region linked to the memdev
>> endpoint, and
>>
>> 3) invoking attach callback while keeping the locking allowing to
>> work (ioremap and other internal stuff) with the related physical
>> range by the accelerator driver, and
>>
>> 4) linking a detach callback to the endpoint device removal where
>> the accelerator driver can stop using the region range.
>>
>> This covers the cases of a potential removal of cxl_acpi module or a
>> accelerator memdev unbinding from cxl_mem driver through sysfs.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/region.c | 118 ++++++++++++++++++++++++++++-
>> drivers/net/ethernet/sfc/efx_cxl.c | 37 +++++++++
>> drivers/net/ethernet/sfc/efx_cxl.h | 2 +
>> include/cxl/cxl.h | 17 +++++
>> 4 files changed, 171 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e50dc716d4e8..68f5a1fd1b1c 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>> if (rc)
>> goto err;
>>
>> - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> - if (rc)
>> - return ERR_PTR(rc);
>> + /*
>> + * For accelerators/type2, region release linked to endpoint device.
>> + * See handling of cxl_endpoint_region_autoremove() below by
>> + * cxl_memdev_attach_region().
>> + */
>> + if (type == CXL_DECODER_HOSTONLYMEM) {
>> + rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> + if (rc)
>> + return ERR_PTR(rc);
>> + }
>>
>> dev_dbg(port->uport_dev, "%s: created %s\n",
>> dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
>> @@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>> return 0;
>> }
>>
>> +static int first_mapped_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + if (cxled->cxld.region)
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * As this is running in endpoint port remove context it does not race cxl_root
>> + * destruction since port topologies are always removed depth first.
>> + */
>> +static void cxl_endpoint_region_autoremove(void *_cxlr)
>> +{
>> + unregister_region(_cxlr);
>> +}
>> +
>> +/**
>> + * cxl_memdev_attach_region - bind region to accelerator memdev
>> + *
>> + * @cxlmd: a pointer to cxl_memdev to use
>> + * @attach: a pointer to region attach struct with callbacks for
>> + * safely working with a region range by the caller
>> + *
>> + * Returns 0 or error.
>> + */
>> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
>> + struct cxl_attach_region *attach)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_region *cxlr;
>> + int rc;
>> +
>> + /* hold endpoint lock to setup autoremove of the region */
>> + guard(device)(&endpoint->dev);
>> + if (!endpoint->dev.driver)
>> + return -ENXIO;
>> + guard(rwsem_read)(&cxl_rwsem.region);
>> + guard(rwsem_read)(&cxl_rwsem.dpa);
>> +
>> + /*
>> + * TODO auto-instantiate a region, for now assume this will find an
>> + * auto-region
>> + */
>> + struct device *dev __free(put_device) =
>> + device_find_child(&endpoint->dev, NULL, first_mapped_decoder);
>> +
>> + if (!dev) {
>> + dev_dbg(cxlmd->cxlds->dev, "no region found for memdev %s\n",
>> + dev_name(&cxlmd->dev));
>> + return -ENXIO;
>> + }
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + cxlr = cxled->cxld.region;
>> +
>> + if (cxlr->params.state < CXL_CONFIG_COMMIT) {
>> + dev_dbg(cxlmd->cxlds->dev,
>> + "region %s not committed for memdev %s\n",
>> + dev_name(&cxlr->dev), dev_name(&cxlmd->dev));
>> + return -ENXIO;
>> + }
>> +
>> + if (cxlr->params.nr_targets > 1) {
>> + dev_dbg(cxlmd->cxlds->dev,
>> + "Only attach to local non-interleaved region\n");
>> + return -ENXIO;
>> + }
>> +
>> + attach->region = (struct range) {
>> + .start = cxlr->params.res->start,
>> + .end = cxlr->params.res->end,
>> + };
>> +
>> + /*
>> + * With endpoint locked leave the caller to safely work with the region
>> + * range.
>> + */
> This reads a bit weirdly, how about: "Let the caller do its region-specific setup"? I
> don't think it's important to mention the endpoint being locked here.
I wanted to emphasize the safety of such attachment regarding the
problem of concurrent unwinding triggered from user space.
>> + rc = attach->attach(attach->data);
>> + if (rc)
>> + return rc;
>> +
>> + /* Only teardown regions that pass validation, ignore the rest */
>> + rc = devm_add_action_or_reset(&endpoint->dev,
>> + cxl_endpoint_region_autoremove, cxlr);
>> + if (rc)
>> + return rc;
>> +
>> + /* Link type2 driver callback for stopping use of the region range. */
>> + rc = devm_add_action_or_reset(&endpoint->dev,
>> + attach->detach, attach->data);
>> + if (rc)
>> + return rc;
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(cxl_memdev_attach_region, "CXL");
>> +
>> static int cxl_region_probe(struct device *dev)
>> {
>> struct cxl_region *cxlr = to_cxl_region(dev);
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.c b/drivers/net/ethernet/sfc/efx_cxl.c
>> index 7d8a6c2133c8..9ca16b051d62 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.c
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.c
>> @@ -14,6 +14,36 @@
>>
>> #define EFX_CTPIO_BUFFER_SIZE SZ_256M
>>
>> +/* Called with cxl endpoint device locked for precluding potential related
>> + * cxl region removal triggered from user space, allowing safely mapping of
>> + * such cxl region by the sfc driver.
>> + */
>> +static int efx_cxl_map_region(void *data) {
>> + struct efx_probe_data *probe_data = data;
>> + struct efx_nic *efx = &probe_data->efx;
>> + struct pci_dev *pci_dev = efx->pci_dev;
>> + struct efx_cxl *cxl = probe_data->cxl;
>> + struct range *cxl_pio_range = &cxl->attach_region.region;
>> +
>> + cxl->ctpio_cxl = ioremap(cxl_pio_range->start,
>> + cxl_pio_range->end - cxl_pio_range->start + 1);
>> + if (!cxl->ctpio_cxl) {
>> + pci_err(pci_dev, "CXL ioremap region (%pra) failed\n",
>> + cxl_pio_range);
>> + return -ENOMEM;
>> + }
>> + probe_data->cxl_pio_initialised = true;
>> + return 0;
>> +}
>> +
>> +/* Called at driver exit or when user space triggers cxl region removal. */
>> +static void efx_cxl_unmap_region(void *data) {
>> + struct efx_probe_data *probe_data = data;
>> +
>> + probe_data->cxl_pio_initialised = false;
>> + iounmap(probe_data->cxl->ctpio_cxl);
>> +}
>> +
>> int efx_cxl_init(struct efx_probe_data *probe_data)
>> {
>> struct efx_nic *efx = &probe_data->efx;
>> @@ -81,6 +111,13 @@ int efx_cxl_init(struct efx_probe_data *probe_data)
>> return PTR_ERR(cxl->cxlmd);
>> }
>>
>> + probe_data->cxl->attach_region.attach = efx_cxl_map_region;
>> + probe_data->cxl->attach_region.detach = efx_cxl_unmap_region;
>> + probe_data->cxl->attach_region.data = probe_data;
>> +
>> + cxl_memdev_attach_region(cxl->cxlmd,
>> + &probe_data->cxl->attach_region);
> I would think you want to check the return code here, no? I looked at the last patch
> and don't think this will cause issues, but I would at least expect some debug output
> here.
Of course, I forgot to do so. RFC hastiness ...
>> +
>> probe_data->cxl = cxl;
>>
>> return 0;
>> diff --git a/drivers/net/ethernet/sfc/efx_cxl.h b/drivers/net/ethernet/sfc/efx_cxl.h
>> index 04e46278464d..1c294cd1df56 100644
>> --- a/drivers/net/ethernet/sfc/efx_cxl.h
>> +++ b/drivers/net/ethernet/sfc/efx_cxl.h
>> @@ -20,6 +20,8 @@ struct efx_probe_data;
>> struct efx_cxl {
>> struct cxl_dev_state cxlds;
>> struct cxl_memdev *cxlmd;
>> + struct cxl_attach_region attach_region;
>> + void __iomem *ctpio_cxl;
>> };
>>
>> int efx_cxl_init(struct efx_probe_data *probe_data);
>> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
>> index 10a9b8fa2f6b..22d9435b351f 100644
>> --- a/include/cxl/cxl.h
>> +++ b/include/cxl/cxl.h
>> @@ -7,6 +7,7 @@
>>
>> #include <linux/node.h>
>> #include <linux/ioport.h>
>> +#include <linux/pci.h>
>> #include <cxl/mailbox.h>
>>
>> /**
>> @@ -153,6 +154,20 @@ struct cxl_memdev_attach {
>> int (*probe)(struct cxl_memdev *cxlmd);
>> };
>>
>> +/**
>> + * struct cxl_attach_region - accelerator region handling
>> + * @attach: invoked at cxl_memdev_attach_region() with endpoint device locked.
>> + * @detach: invoked at endpoint release.
>> + * @data: pointer referencing accelerator data for attach and detach calls.
>> + * @region: initialised with autodiscovered region values linked to memdev.
>> + */
>> +struct cxl_attach_region {
>> + int (*attach)(void *);
>> + void (*detach)(void *);
>> + void *data;
>> + struct range region;
>> +};
> I like your approach here,
Thank you.
> but have a problem with this. It seems like an attempt to circumvent the cxl_memdev_attach
> struct above, which makes it borderline useless (as of now). Would you be opposed to folding it into cxl_memdev_attach
> instead? I'm thinking doing something like:
>
> /**
> * struct cxl_memdev_attach - CXL memdev driver and region attach handlers
> *
> * @mem_probe: Callback used during cxl_mem::probe for memdev specific setup
> * @region_attach: Device specific setup invoked during cxl_memdev_attach_region() with endpoint device locked
> * @region_detach: Device specific cleanup invoked at endpoint release
> * @region_data: Pointer passed to region_attach and region_detach calls
> * @region: Memory range covered by autodiscovered region linked to memdev
> */
> struct cxl_memdev_attach {
> int (*mem_probe)(struct cxl_memdev *cxlmd);
> int (*region_attach)(void *data);
> int (*region_detach)(void *data);
> void *region_data;
> struct range region;
> };
>
> and then replace cxl_attach_region with cxl_memdev_attach accordingly. I think that's a 3-way win: You get
> what you want to do, Dan's work isn't invalidated, and cxl_memdev_attach becomes a one-stop shop for all
> CXL accelerator attach shenanigans.
I would prefer to have such mem_probe aka attach by DanW decoupled from
region attachment.
First of all, because they target different situations, endpoint not
being ready in the first case, dealing with a potential committed
decoder and auto discovered region in the second.
Also, the mem_probe thing is not necessary for our device and for any
other one connected to one CXL Root Bridge as I have been saying since
Dan added that option (or waiting for another reason behind its
necessity). I agree it could be necessary for other drivers, so happy to
have it ... for getting the memdev. Note the current patchset version
does only cover the case of a region already there, but as the effort
had since first version, there will be a need for getting a region where
the caller can decide the size based on not only available CFMWS but
whatever the driver considers knowing the amount available and the own
driver configuration. Once you have DRAM in the device, I envision cases
where the amount of memory used for CXL purposes being only part of it
because the device/driver configuration decides so, then using some
portion of that memory for other internal purposes.
>> +
>> /**
>> * struct cxl_dev_state - The driver device state
>> *
>> @@ -231,4 +246,6 @@ struct cxl_dev_state *_devm_cxl_dev_state_create(struct device *dev,
>> int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity);
>> struct cxl_memdev *devm_cxl_add_memdev(struct cxl_dev_state *cxlds,
>> const struct cxl_memdev_attach *attach);
>> +struct cxl_region;
>> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd, struct cxl_attach_region *attach);
>> #endif /* __CXL_CXL_H__ */
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-05-01 2:00 ` Dan Williams (nvidia)
@ 2026-05-01 10:59 ` Alejandro Lucero Palau
2026-05-02 0:46 ` Dan Williams (nvidia)
0 siblings, 1 reply; 20+ messages in thread
From: Alejandro Lucero Palau @ 2026-05-01 10:59 UTC (permalink / raw)
To: Dan Williams (nvidia), alejandro.lucero-palau, linux-cxl,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 5/1/26 03:00, Dan Williams (nvidia) wrote:
> alejandro.lucero-palau@ wrote:
>> From: Alejandro Lucero <alucerop@amd.com>
>>
>> Support an accelerator driver to safely work with an autodiscovered
>> region from a committed HDM decoder through:
>>
>> 1) an accelerator driver cxl_attach_region struct with attach
>> and detach callbacks.
>>
>> 2) a specific function, cxl_memdev_attach_region() keeping the
>> required locks for finding a region linked to the memdev
>> endpoint, and
>>
>> 3) invoking attach callback while keeping the locking allowing to
>> work (ioremap and other internal stuff) with the related physical
>> range by the accelerator driver, and
>>
>> 4) linking a detach callback to the endpoint device removal where
>> the accelerator driver can stop using the region range.
>>
>> This covers the cases of a potential removal of cxl_acpi module or a
>> accelerator memdev unbinding from cxl_mem driver through sysfs.
>>
>> Signed-off-by: Alejandro Lucero <alucerop@amd.com>
>> ---
>> drivers/cxl/core/region.c | 118 ++++++++++++++++++++++++++++-
>> drivers/net/ethernet/sfc/efx_cxl.c | 37 +++++++++
>> drivers/net/ethernet/sfc/efx_cxl.h | 2 +
>> include/cxl/cxl.h | 17 +++++
>> 4 files changed, 171 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
>> index e50dc716d4e8..68f5a1fd1b1c 100644
>> --- a/drivers/cxl/core/region.c
>> +++ b/drivers/cxl/core/region.c
>> @@ -2711,9 +2711,16 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd,
>> if (rc)
>> goto err;
>>
>> - rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> - if (rc)
>> - return ERR_PTR(rc);
>> + /*
>> + * For accelerators/type2, region release linked to endpoint device.
>> + * See handling of cxl_endpoint_region_autoremove() below by
>> + * cxl_memdev_attach_region().
>> + */
>> + if (type == CXL_DECODER_HOSTONLYMEM) {
>> + rc = devm_add_action_or_reset(port->uport_dev, unregister_region, cxlr);
>> + if (rc)
>> + return ERR_PTR(rc);
>> + }
> A couple problems here.
>
> 1/ Nothing stops a CXL class device from implementing a decoder with
> CXL_DECODER_DEVMEM (HDM-DB).
Uhmmm...
>
> 2/ This breaks the automatic cleanup of autoassembly failures in
> construct_region().
I did not see this potential problem. Looking at it.
> We simply need to support multiple independent sources of
> unregister_region(). Stay tuned for a scheme for that.
>
I will.
>>
>> dev_dbg(port->uport_dev, "%s: created %s\n",
>> dev_name(&cxlrd->cxlsd.cxld.dev), dev_name(dev));
>> @@ -4043,6 +4050,111 @@ static int cxl_region_can_probe(struct cxl_region *cxlr)
>> return 0;
>> }
>>
>> +static int first_mapped_decoder(struct device *dev, const void *data)
>> +{
>> + struct cxl_endpoint_decoder *cxled;
>> +
>> + if (!is_endpoint_decoder(dev))
>> + return 0;
>> +
>> + cxled = to_cxl_endpoint_decoder(dev);
>> + if (cxled->cxld.region)
>> + return 1;
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * As this is running in endpoint port remove context it does not race cxl_root
>> + * destruction since port topologies are always removed depth first.
>> + */
>> +static void cxl_endpoint_region_autoremove(void *_cxlr)
>> +{
>> + unregister_region(_cxlr);
>> +}
>> +
>> +/**
>> + * cxl_memdev_attach_region - bind region to accelerator memdev
>> + *
>> + * @cxlmd: a pointer to cxl_memdev to use
>> + * @attach: a pointer to region attach struct with callbacks for
>> + * safely working with a region range by the caller
>> + *
>> + * Returns 0 or error.
>> + */
>> +int cxl_memdev_attach_region(struct cxl_memdev *cxlmd,
>> + struct cxl_attach_region *attach)
>> +{
>> + struct cxl_port *endpoint = cxlmd->endpoint;
>> + struct cxl_endpoint_decoder *cxled;
>> + struct cxl_region *cxlr;
>> + int rc;
>> +
>> + /* hold endpoint lock to setup autoremove of the region */
>> + guard(device)(&endpoint->dev);
> This does not handle the case when ->endpoint is an ERR_PTR() because
> the memdev never attached in the first instance.
Not sure about this but, is it not the success of devm_cxl_add_memdev()
ensuring this can not happen?
Note I decouple region attach from memdev creation. No memdev, no call
to this function.
>> +/* Called at driver exit or when user space triggers cxl region removal. */
>> +static void efx_cxl_unmap_region(void *data) {
>> + struct efx_probe_data *probe_data = data;
>> +
>> + probe_data->cxl_pio_initialised = false;
>> + iounmap(probe_data->cxl->ctpio_cxl);
>> +}
> I do not see how an async event can safely zap that ctpio_cxl space with
> zero coordination with the driver, and I do not think you want to burden
> the fast path with new locks to coordinate this.
You should look patch 8 where your concern is hopefully tackled. I need
to test this further, but there is no need of additional locks.
>
> Can we please stick with the violent but simple "unload driver" approach
> for now? Someone removing cxl_acpi, disabling port drivers, or disabling
> the cxl_mem driver gets to keep all the pieces. Just like force
> unloading your storage driver underneath your root filesystem. Do not do
> it unless you want to see the fireworks or test various hotplug flows.
I have to disagree. The use of CXL is only part of the datapath. The
driver can keep going without CXL. The related buffers can be used until
the sync between the efx_ef10_disable_piobufs() added in patch 8 and the
CXL CTPIO datapath.
Although the option of unloading the driver is possible, I do not think
CXL should decide what to do here when there exists another option.
Thank you,
Alejandro
>
> This graceful handling of something that should never happen, outside of a
> test suite exercising CXL core object lifetimes, is not a near term need.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-05-01 10:59 ` Alejandro Lucero Palau
@ 2026-05-02 0:46 ` Dan Williams (nvidia)
2026-05-05 20:51 ` Alejandro Lucero Palau
0 siblings, 1 reply; 20+ messages in thread
From: Dan Williams (nvidia) @ 2026-05-02 0:46 UTC (permalink / raw)
To: Alejandro Lucero Palau, Dan Williams (nvidia),
alejandro.lucero-palau, linux-cxl, edward.cree, davem, kuba,
pabeni, edumazet, dave.jiang
Alejandro Lucero Palau wrote:
[..]
> >> + }
> > A couple problems here.
> >
> > 1/ Nothing stops a CXL class device from implementing a decoder with
> > CXL_DECODER_DEVMEM (HDM-DB).
>
> Uhmmm...
Consider a helpful expander that does not require the host OS to use
cpu_cache_invalidate_memregion() whenever DPA space is changed. I
imagine that would be useful for DCD where there could be a higher
frequency of extent changes.
> >> + /* hold endpoint lock to setup autoremove of the region */
> >> + guard(device)(&endpoint->dev);
> > This does not handle the case when ->endpoint is an ERR_PTR() because
> > the memdev never attached in the first instance.
>
> Not sure about this but, is it not the success of devm_cxl_add_memdev()
> ensuring this can not happen?
That is only ensured by using the "attach" mechanism.
devm_cxl_add_memdev(..., NULL) is only for the generic memory expander
case. Where the entire usage model is governed by memdev ABIs.
> Note I decouple region attach from memdev creation. No memdev, no call
> to this function.
Which leads to this problem.
> >> +/* Called at driver exit or when user space triggers cxl region removal. */
> >> +static void efx_cxl_unmap_region(void *data) {
> >> + struct efx_probe_data *probe_data = data;
> >> +
> >> + probe_data->cxl_pio_initialised = false;
> >> + iounmap(probe_data->cxl->ctpio_cxl);
> >> +}
> > I do not see how an async event can safely zap that ctpio_cxl space with
> > zero coordination with the driver, and I do not think you want to burden
> > the fast path with new locks to coordinate this.
>
> You should look patch 8 where your concern is hopefully tackled. I need
> to test this further, but there is no need of additional locks.
Please do not structure patches with bugs in the middle of the series.
It burns reviewer resources.
> > Can we please stick with the violent but simple "unload driver" approach
> > for now? Someone removing cxl_acpi, disabling port drivers, or disabling
> > the cxl_mem driver gets to keep all the pieces. Just like force
> > unloading your storage driver underneath your root filesystem. Do not do
> > it unless you want to see the fireworks or test various hotplug flows.
>
>
> I have to disagree. The use of CXL is only part of the datapath. The
> driver can keep going without CXL. The related buffers can be used until
> the sync between the efx_ef10_disable_piobufs() added in patch 8 and the
> CXL CTPIO datapath.
>
>
> Although the option of unloading the driver is possible, I do not think
> CXL should decide what to do here when there exists another option.
The concern is creeping complexity. There is no such concept in existing
Linux drivers where an MMIO mapping disappears out from underneath a
running driver. It may start returning all 0xff and fire an error
interrupt, but the driver does not need to worry about responding to
async unmap.
All drivers must already be prepared to be unloaded. So we start with
the simple semantic first to get this functionality landed and then
think about adding sophistication like live fallback to PCI operation.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev
2026-05-02 0:46 ` Dan Williams (nvidia)
@ 2026-05-05 20:51 ` Alejandro Lucero Palau
0 siblings, 0 replies; 20+ messages in thread
From: Alejandro Lucero Palau @ 2026-05-05 20:51 UTC (permalink / raw)
To: Dan Williams (nvidia), alejandro.lucero-palau, linux-cxl,
edward.cree, davem, kuba, pabeni, edumazet, dave.jiang
On 5/2/26 01:46, Dan Williams (nvidia) wrote:
> Alejandro Lucero Palau wrote:
> [..]
>>>> + }
>>> A couple problems here.
>>>
>>> 1/ Nothing stops a CXL class device from implementing a decoder with
>>> CXL_DECODER_DEVMEM (HDM-DB).
>> Uhmmm...
> Consider a helpful expander that does not require the host OS to use
> cpu_cache_invalidate_memregion() whenever DPA space is changed. I
> imagine that would be useful for DCD where there could be a higher
> frequency of extent changes.
What do you want here then? We had a param for specifically asking for
creating a dax device for the region, but it was considered unnecessary,
and I think it was you then. Should I put it back?
>>>> + /* hold endpoint lock to setup autoremove of the region */
>>>> + guard(device)(&endpoint->dev);
>>> This does not handle the case when ->endpoint is an ERR_PTR() because
>>> the memdev never attached in the first instance.
>> Not sure about this but, is it not the success of devm_cxl_add_memdev()
>> ensuring this can not happen?
> That is only ensured by using the "attach" mechanism.
> devm_cxl_add_memdev(..., NULL) is only for the generic memory expander
> case. Where the entire usage model is governed by memdev ABIs.
>
Is your concern that the memdev probe could not happen synchronously so
a further call like the one implemented here could fail due to the
endpoint not there yet?
If so, I do not think that is possible with cxl_mem driver using
PROBE_FORCE_SYNCHRONOUS. It can happen for the case of the device
connected to a cxl switch port which has not been initialised yet, as
you described to me in the LPC when I asked for a case for needing the
memdev attach call.
>> Note I decouple region attach from memdev creation. No memdev, no call
>> to this function.
> Which leads to this problem.
If I am right with my previous assertion, which problem do you refer to
here?
>
>>>> +/* Called at driver exit or when user space triggers cxl region removal. */
>>>> +static void efx_cxl_unmap_region(void *data) {
>>>> + struct efx_probe_data *probe_data = data;
>>>> +
>>>> + probe_data->cxl_pio_initialised = false;
>>>> + iounmap(probe_data->cxl->ctpio_cxl);
>>>> +}
>>> I do not see how an async event can safely zap that ctpio_cxl space with
>>> zero coordination with the driver, and I do not think you want to burden
>>> the fast path with new locks to coordinate this.
>> You should look patch 8 where your concern is hopefully tackled. I need
>> to test this further, but there is no need of additional locks.
> Please do not structure patches with bugs in the middle of the series.
> It burns reviewer resources.
Could you be more specific about the bug?
I'm introducing a detach callback here which will do things necessary up
to this point/patch in the driver functionality. In patch 8 the detach
code is extended to unwind the use of ctpio buffers which are not used
until that same patch.
>>> Can we please stick with the violent but simple "unload driver" approach
>>> for now? Someone removing cxl_acpi, disabling port drivers, or disabling
>>> the cxl_mem driver gets to keep all the pieces. Just like force
>>> unloading your storage driver underneath your root filesystem. Do not do
>>> it unless you want to see the fireworks or test various hotplug flows.
>>
>> I have to disagree. The use of CXL is only part of the datapath. The
>> driver can keep going without CXL. The related buffers can be used until
>> the sync between the efx_ef10_disable_piobufs() added in patch 8 and the
>> CXL CTPIO datapath.
>>
>>
>> Although the option of unloading the driver is possible, I do not think
>> CXL should decide what to do here when there exists another option.
> The concern is creeping complexity. There is no such concept in existing
> Linux drivers where an MMIO mapping disappears out from underneath a
> running driver. It may start returning all 0xff and fire an error
> interrupt, but the driver does not need to worry about responding to
> async unmap.
>
> All drivers must already be prepared to be unloaded. So we start with
> the simple semantic first to get this functionality landed and then
> think about adding sophistication like live fallback to PCI operation.
Ok. Let's try this one. You want to trigger device_release_driver or
something similar on the pci_dev->dev linked to the memdev. Right?
Do we have this support now? If we do not, have you evaluated the
complexity required for ensuring no deadlocks if this is triggered while
the sfc driver is still probing?
Maybe I'm overthinking this option, so if you have a clear idea about
how to do this, please tell me, assuming it is a matter of calling such
a pci_dev "unbinding" from its current driver.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2026-05-05 20:53 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 18:05 [PATCH v26 0/8] Type2 device basic support alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 1/8] sfc: add cxl support alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-05-01 10:07 ` Alejandro Lucero Palau
2026-04-23 18:05 ` [PATCH v26 2/8] cxl/sfc: Map cxl regs alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 3/8] cxl/sfc: Initialize dpa without a mailbox alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 4/8] cxl: Prepare memdev creation for type2 alejandro.lucero-palau
2026-04-30 23:23 ` Dan Williams (nvidia)
2026-04-23 18:05 ` [PATCH v26 5/8] sfc: create type2 cxl memdev alejandro.lucero-palau
2026-04-23 18:05 ` [PATCH v26 6/8] cxl: attach region to an accelerator/type2 memdev alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-05-01 10:35 ` Alejandro Lucero Palau
2026-05-01 2:00 ` Dan Williams (nvidia)
2026-05-01 10:59 ` Alejandro Lucero Palau
2026-05-02 0:46 ` Dan Williams (nvidia)
2026-05-05 20:51 ` Alejandro Lucero Palau
2026-04-23 18:05 ` [PATCH v26 7/8] cxl: Avoid dax creation for accelerators alejandro.lucero-palau
2026-04-29 21:14 ` Cheatham, Benjamin
2026-04-23 18:05 ` [PATCH v26 8/8] sfc: support pio mapping based on cxl alejandro.lucero-palau
2026-04-23 22:07 ` [PATCH v26 0/8] Type2 device basic support Dave Jiang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox