* [PATCH v4 0/5] cxl: Support Back-Invalidate
@ 2026-06-05 0:33 Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 1/5] cxl: Add BI register probing and port initialization Davidlohr Bueso
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
Hello,
Changes from v3 (https://lore.kernel.org/all/20260516022037.1761453-1-dave@stgolabs.net/):
- In patch 1, map BI Decoder of root port/DSP at dport creation time
in cxl_port_add_dport() (Ben).
- In patch 2, BI rt/decoder commit logic to be a function instead
of a macro approach. (Dave)
- Patch 4 is reworked to have endpoint decoders use the coherency
model based on the root decoder's cfmw restrictions (Dan).
- Folded patch 5 into patch 2 (Dave)
- Moved 'nr_bi' counter from port to dport (Sashiko).
- Added a new patch 5 for a trivial flag rename.
This series is the initial plumbing to enable HDM-DB in Linux. The
model allows Type 2 and Type 3 devices to expose their local memory
to the host CPU in a coherent manner. In alignment with what was
discussed at 2024 LPC type2 support session, this series takes the
Type 3 memory expander approach, which is more direct. Further,
afaik there is no Type 2 + BI hardware out there.
A flagship use case of Type 3 + BI is coherent shared memory, and
there is currently a big gap in this regard (ie: GFAM). Another
is P2P via PCIe UIO, which is also lacking today. Media Operation
(4402h) for ranged sanitize/zero also trigger snoops and there are
some patches out there[0]. As such this series focuses on BI enablement
in terms of discovery and configuration.
The idea is for Type 3 memdevs and future Type 2 devices to make use
of cxlds->bi when committing HDM decoders, such that different device
coherence models can be differentiated as:
type2 hdm-d: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == false
type2 hdm-db: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == true
type3 hdm-h: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == false
type3 hdm-db: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == true
Because ->bi becoming true does not depend on auto-committing upon HDM
decoder enumeration (port driver), for now an already-committed BI
decoder during enumeration is unsupported and the driver errors out.
o Patch 1 probes BI capabilities during register discovery, and maps BI
Decoder registers.
o Patch 2 implements the BI-ID allocation (and deallocation).
o Patch 3 adds coherency support for endpoint decoders based on what the
root decoder restricts.
o Patch 4 adds support for HDM-DB region creation.
o Patch 5 trivially renames some coherency flags.
Testing has been done with the qemu HDM-DB Type 3 counterpart + this series:
https://lore.kernel.org/all/20260602231806.2469011-1-dave@stgolabs.net/
1. HDM Decoder with BI through ad-hoc region creation.
------------------------------------------------------
# cxl list -D
[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"accelmem_capable":true,
"nr_targets":1
}
]
# ROOT=/sys/bus/cxl/devices/decoder0.0
# EP=/sys/bus/cxl/devices/decoder2.0
# echo region0 > $ROOT/create_ram_region
# REG=/sys/bus/cxl/devices/region0
# echo ram > $EP/mode
# echo 0x40000000 > $EP/dpa_size
# echo 1 > $REG/interleave_ways
# echo 4096 > $REG/interleave_granularity
# echo 0x40000000 > $REG/size
# cat $EP/bi
0
# echo decoder2.0 > $REG/target0
[ 53.495224] cxl_core:cxl_port_attach_region:1290: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 next: none nr_eps:1
[ 53.497758] cxl_core:cxl_port_attach_region:1290: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 0 next: mem0 nr_ep1
[ 53.499892] cxl_core:cxl_port_setup_targets:1608: cxl region0: pci0000:0c:port1 iw: 1 ig: 4096
[ 53.501233] cxl_core:cxl_port_setup_targets:1633: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem0:decoder2.0 @ 0
[ 53.502985] cxl_core:cxl_calc_interleave_pos:2003: cxl_mem mem0: decoder:decoder2.0 parent:0000:0d:00.0 port:endpoint2 range:0x110000
[ 53.504895] cxl_core:cxl_region_attach:2226: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxled->pos:0
# cat $EP/bi
1
# echo 1 > $REG/commit
# cxl list -D
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"accelmem_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4563402752,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4563402752,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":1073741824,
"mode":"ram"
}
]
}
]
2. Type3 device does not support BI, but CFMW has BI restriction
----------------------------------------------------------------
# echo decoder2.0 > $REG/target0
[ 104.557797] cxl region0: mem0:decoder2.0 BI not enabled on device
[ 104.561014] cxl_port endpoint2: failed to attach decoder2.0 to region0: -6
3. Type3 device supports BI but CFMW does not, create HDM-H region
------------------------------------------------------------------
# cxl list -D
[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":10737418240,
"pmem_capable":true,
"volatile_capable":true,
"nr_targets":1
}
]
# echo decoder2.0 > $REG/target0
[ 124.712027] cxl_core:cxl_port_attach_region:1290: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 next: none nr_eps:1
[ 124.714329] cxl_core:cxl_port_attach_region:1290: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 0 next: mem0 nr_ep1
[ 124.716407] cxl_core:cxl_port_setup_targets:1608: cxl region0: pci0000:0c:port1 iw: 1 ig: 4096
[ 124.718269] cxl_core:cxl_port_setup_targets:1633: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem0:decoder2.0 @ 0
[ 124.722076] cxl_core:cxl_calc_interleave_pos:2003: cxl_mem mem0: decoder:decoder2.0 parent:0000:0d:00.0 port:endpoint2 range:0x110000
[ 124.724454] cxl_core:cxl_region_attach:2226: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxled->pos:0
# cxl list -D
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":9663676416,
"pmem_capable":true,
"volatile_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4563402752,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4563402752,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":1073741824,
"mode":"ram"
}
]
}
]
3. Unbind + Bind
----------------
# echo mem0 > /sys/bus/cxl/drivers/cxl_mem/unbind
[ 64.553865] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0d:00.0: BI requests disabled
[ 64.557464] cxl_core:cxl_port_commit_reap:898: cxl_port port1: reap: decoder1.0 commit_end: -1
[ 64.559529] cxl_core:cxl_port_commit_reap:898: cxl_port endpoint2: reap: decoder2.0 commit_end: -1
[ 64.563576] cxl_core:cxl_detach_ep:1541: cxl_mem mem0: disconnect mem0 from port1
echo mem0 > /sys/bus/cxl/drivers/cxl_mem/bind
[ 5073.975585] cxl_core:devm_cxl_enumerate_ports:1856: cxl_mem mem0: scan: iter: mem0 dport_dev: 0000:0c:00.0 parent: pci0000:0c
[ 5073.978125] cxl_core:devm_cxl_enumerate_ports:1862: cxl_mem mem0: found already registered port port1:pci0000:0c
[ 5073.980013] cxl_core:cxl_port_alloc:752: cxl_mem mem0: host-bridge: pci0000:0c
[ 5073.982601] cxl_core:cxl_cdat_get_length:488: cxl_port endpoint2: CDAT length 160
[ 5073.988634] cxl_core:cxl_port_perf_data_calculate:207: cxl_port endpoint2: Failed to retrieve ep perf coordinates.
[ 5073.991824] cxl_core:cxl_endpoint_parse_cdat:423: cxl_port endpoint2: Failed to do perf coord calculations.
[ 5074.016377] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.0: range: 0x0-0xffffffffffffffff iw: 1 ig: 4096
[ 5074.021643] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.0 added to endpoint2
[ 5074.023072] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.1: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 5074.025527] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.1 added to endpoint2
[ 5074.027751] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.2: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 5074.032182] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.2 added to endpoint2
[ 5074.033535] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.3: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 5074.035690] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.3 added to endpoint2
[ 5074.036904] cxl_core:cxl_bus_probe:2281: cxl_port endpoint2: probe: 0
[ 5074.038017] cxl_core:devm_cxl_add_port:960: cxl_mem mem0: endpoint2 added to port1
[ 5074.039153] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0d:00.0: BI requests enabled
[ 5074.040168] cxl_core:cxl_bus_probe:2281: cxl_mem mem0: probe: 0
4. Discovery behind a Switch
----------------------------
# dmesg | grep BI
[ 0.628477] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
[ 0.737987] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
[ 0.747784] cxl_port:cxl_port_map_bi:109: cxl_port port2: BI RT registers not found
[ 0.755151] cxl_core:cxl_probe_component_regs:96: cxl_port port2: found BI RT capability (0xaa8)
[ 0.790106] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
[ 0.943920] cxl_core:__cxl_bi_wait_commit:994: pcieport 0000:0e:00.0: BI-ID commit wait took 101750us
[ 0.944767] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0f:00.0: BI requests enabled
5. Corrupted hw (commit timeout handling)
-----------------------------------------
[ 0.617234] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
[ 0.706959] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
[ 0.718585] cxl_port:cxl_port_map_bi:109: cxl_port port2: BI RT registers not found
[ 0.743932] cxl_core:cxl_probe_component_regs:96: cxl_port port2: found BI RT capability (0xaa8)
[ 0.783715] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
[ 2.994821] cxl_core:__cxl_bi_wait_commit:985: pcieport 0000:0e:00.0: BI-ID commit timed out (2000000us)
[ 5.010606] cxl_core:__cxl_bi_wait_commit:985: pcieport 0000:0e:00.0: BI-ID commit timed out (2000000us)
[ 5.013423] cxl_mem:cxl_mem_probe:155: cxl_mem mem0: BI setup failed rc=-110
6. Mixed Configurations (BI-capable Type 3 but DSP 68b)
-------------------------------------------------------
[ 0.837902] cxl_mem:cxl_mem_probe:155: cxl_mem mem0: BI setup failed rc=-22
Applies against 'next' branch from cxl.git.
[0] https://lore.kernel.org/linux-cxl/20260428200410.705675-1-dave@stgolabs.net/
Thanks!
Davidlohr Bueso (5):
cxl: Add BI register probing and port initialization
cxl/pci: Add BI topology enable/disable
cxl/hdm: Add BI coherency support for endpoint decoders
cxl: Add HDM-DB region creation
cxl/hdm: Rename decoder coherency flags
Documentation/ABI/testing/sysfs-bus-cxl | 18 +-
drivers/cxl/acpi.c | 15 +-
drivers/cxl/core/hdm.c | 38 ++-
drivers/cxl/core/pci.c | 365 ++++++++++++++++++++++++
drivers/cxl/core/port.c | 33 ++-
drivers/cxl/core/region.c | 51 ++--
drivers/cxl/core/regs.c | 14 +
drivers/cxl/cxl.h | 45 ++-
drivers/cxl/cxlmem.h | 2 +
drivers/cxl/mem.c | 4 +
drivers/cxl/port.c | 71 +++++
include/cxl/cxl.h | 8 +
12 files changed, 631 insertions(+), 33 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/5] cxl: Add BI register probing and port initialization
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
@ 2026-06-05 0:33 ` Davidlohr Bueso
2026-06-05 1:33 ` sashiko-bot
2026-06-05 0:33 ` [PATCH 2/5] cxl/pci: Add BI topology enable/disable Davidlohr Bueso
` (4 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
Add register probing for BI Route Table and BI Decoder capability
structures in cxl_probe_component_regs(), and helpers to map them.
cxl_dport_map_bi() maps the BI Decoder of a downstream port (root
port or switch DSP) at dport-creation time via cxl_port_add_dport();
cxl_port_map_bi() maps a port's own BI capability during port probe
when the upstream link is in 256B Flit operation -- BI Decoder for
an endpoint, BI RT for a switch USP.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/regs.c | 14 ++++++++
drivers/cxl/cxl.h | 6 ++++
drivers/cxl/port.c | 71 +++++++++++++++++++++++++++++++++++++++++
include/cxl/cxl.h | 6 ++++
4 files changed, 97 insertions(+)
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 93710cf4f0a6..a6caa793e7a4 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
length = CXL_RAS_CAPABILITY_LENGTH;
rmap = &map->ras;
break;
+ case CXL_CM_CAP_CAP_ID_BI_RT:
+ dev_dbg(dev, "found BI RT capability (0x%x)\n",
+ offset);
+ length = CXL_BI_RT_CAPABILITY_LENGTH;
+ rmap = &map->bi_rt;
+ break;
+ case CXL_CM_CAP_CAP_ID_BI_DECODER:
+ dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
+ offset);
+ length = CXL_BI_DECODER_CAPABILITY_LENGTH;
+ rmap = &map->bi_decoder;
+ break;
default:
dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
offset);
@@ -211,6 +223,8 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
} mapinfo[] = {
{ &map->component_map.hdm_decoder, ®s->hdm_decoder },
{ &map->component_map.ras, ®s->ras },
+ { &map->component_map.bi_rt, ®s->bi_rt },
+ { &map->component_map.bi_decoder, ®s->bi_decoder },
};
int i;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 1297594beaec..760f51b43891 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -42,6 +42,12 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_CAP_ID_RAS 0x2
#define CXL_CM_CAP_CAP_ID_HDM 0x5
#define CXL_CM_CAP_CAP_HDM_VERSION 1
+#define CXL_CM_CAP_CAP_ID_BI_RT 0xB
+#define CXL_CM_CAP_CAP_ID_BI_DECODER 0xC
+
+/* CXL 4.0 8.2.4.26 / 8.2.4.27 BI Capability Structures */
+#define CXL_BI_RT_CAPABILITY_LENGTH 0xC
+#define CXL_BI_DECODER_CAPABILITY_LENGTH 0xC
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ada51948d52f..2dd712c5d312 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -58,6 +58,62 @@ static int discover_region(struct device *dev, void *unused)
return 0;
}
+static void cxl_dport_map_bi(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (!map->component_map.bi_decoder.valid) {
+ dev_dbg(dev, "BI Decoder registers not found\n");
+ return;
+ }
+
+ if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_BI_DECODER)))
+ dev_dbg(dev, "Failed to map BI Decoder capability\n");
+}
+
+static void cxl_port_map_bi(struct cxl_port *port)
+{
+ struct cxl_register_map *map = &port->reg_map;
+ struct cxl_dport *parent_dport = port->parent_dport;
+ struct device *udev;
+ int cap_id;
+
+ /* no upstream BI registers above host bridges or the cxl_root */
+ if (!parent_dport || is_cxl_root(parent_dport->port))
+ return;
+
+ udev = is_cxl_endpoint(port) ?
+ port->uport_dev->parent : port->uport_dev;
+ if (!dev_is_pci(udev))
+ return;
+
+ /* BI requires 256B Flit on the upstream link */
+ if (!cxl_pci_flit_256(to_pci_dev(udev)))
+ return;
+
+ /* map this port's own BI capability */
+ if (is_cxl_endpoint(port)) {
+ if (!map->component_map.bi_decoder.valid) {
+ dev_dbg(&port->dev, "BI Decoder registers not found\n");
+ return;
+ }
+ cap_id = CXL_CM_CAP_CAP_ID_BI_DECODER;
+ } else {
+ if (!map->component_map.bi_rt.valid) {
+ dev_dbg(&port->dev, "BI RT registers not found\n");
+ return;
+ }
+ cap_id = CXL_CM_CAP_CAP_ID_BI_RT;
+ }
+
+ map->host = &port->dev;
+ if (cxl_map_component_regs(map, &port->regs, BIT(cap_id)))
+ dev_dbg(&port->dev, "Failed to map BI capability 0x%x\n",
+ cap_id);
+}
+
static int cxl_switch_port_probe(struct cxl_port *port)
{
/* Reset nr_dports for rebind of driver */
@@ -66,6 +122,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
+ cxl_port_map_bi(port);
+
return 0;
}
@@ -128,6 +186,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
read_cdat_data(port);
cxl_endpoint_parse_cdat(port);
+ cxl_port_map_bi(port);
+
get_device(&cxlmd->dev);
rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
if (rc)
@@ -261,6 +321,17 @@ static struct cxl_dport *cxl_port_add_dport(struct cxl_port *port,
/* This group was only needed for early exit above */
devres_remove_group(&port->dev, no_free_ptr(port_dr_group));
+ if (dev_is_pci(dport_dev)) {
+ switch (pci_pcie_type(to_pci_dev(dport_dev))) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ cxl_dport_map_bi(dport);
+ break;
+ default:
+ break;
+ }
+ }
+
cxl_switch_parse_cdat(dport);
/* New dport added, update the decoder targets */
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fa7269154620..8ce9b4e9ca73 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -34,10 +34,14 @@ struct cxl_regs {
* Common set of CXL Component register block base pointers
* @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
* @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ * @bi_rt: CXL 4.0 8.2.4.26 CXL BI Route Table Capability Structure
+ * @bi_decoder: CXL 4.0 8.2.4.27 CXL BI Decoder Capability Structure
*/
struct_group_tagged(cxl_component_regs, component,
void __iomem *hdm_decoder;
void __iomem *ras;
+ void __iomem *bi_rt;
+ void __iomem *bi_decoder;
);
/*
* Common set of CXL Device register block base pointers
@@ -80,6 +84,8 @@ struct cxl_reg_map {
struct cxl_component_reg_map {
struct cxl_reg_map hdm_decoder;
struct cxl_reg_map ras;
+ struct cxl_reg_map bi_rt;
+ struct cxl_reg_map bi_decoder;
};
struct cxl_device_reg_map {
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/5] cxl/pci: Add BI topology enable/disable
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 1/5] cxl: Add BI register probing and port initialization Davidlohr Bueso
@ 2026-06-05 0:33 ` Davidlohr Bueso
2026-06-05 1:29 ` sashiko-bot
2026-06-05 2:23 ` Richard Cheng
2026-06-05 0:33 ` [PATCH 3/5] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
` (3 subsequent siblings)
5 siblings, 2 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
Implement cxl_bi_setup(), which walks the CXL port topology to enable
BI flows on the device and every component in the path between the
device and the root port. The walk is two-pass: first to verify each
component advertises BI capability, then to write BI Enable on each
dport's BI Decoder Control register and on the device. On failure
during the enable pass, partially-enabled dports are rolled back.
The teardown counterpart, cxl_bi_dealloc(), runs automatically on
cxl_mem unbind.
cxl_bi_setup() does not touch HDM decoder configuration -- the BI bit
in each decoder is programmed separately at commit time. After a
successful setup the device is BI-capable but not yet using BI for
decode coherence.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/pci.c | 365 +++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxl.h | 28 ++++
drivers/cxl/mem.c | 4 +
include/cxl/cxl.h | 2 +
4 files changed, 399 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index d1f487b3d809..291d4f617e71 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -2,11 +2,13 @@
/* Copyright(c) 2021 Intel Corporation. All rights reserved. */
#include <linux/units.h>
#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/iopoll.h>
#include <linux/device.h>
#include <linux/delay.h>
#include <linux/pci.h>
#include <linux/pci-doe.h>
#include <linux/aer.h>
+#include <linux/string_choices.h>
#include <cxlpci.h>
#include <cxlmem.h>
#include <cxl.h>
@@ -926,3 +928,366 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
return ctx.count;
}
+
+/*
+ * BI requires 256B Flit operation on the link. RP/DSP/endpoint must
+ * also have the BI Decoder cap mapped (@bi); for USPs the BI RT cap
+ * is optional per CXL 4.0 8.2.4.26, so absent @bi is allowed.
+ */
+static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
+{
+ if (!cxl_pci_flit_256(pdev))
+ return false;
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
+ dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
+ return false;
+ }
+ return true;
+}
+
+/* limit any insane timeouts from hw */
+#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
+
+static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
+ unsigned int scale,
+ unsigned int base)
+{
+ static const unsigned long scale_tbl[] = {
+ 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
+ };
+
+ if (scale >= ARRAY_SIZE(scale_tbl)) {
+ dev_dbg(dev, "Invalid BI commit timeout scale: %u\n", scale);
+ return CXL_BI_COMMIT_MAXTMO_US;
+ }
+
+ return scale_tbl[scale] * base;
+}
+
+static int __cxl_bi_wait_commit(struct device *dev, void __iomem *status_reg,
+ u32 committed_bit, u32 err_bit,
+ unsigned int scale, unsigned int base)
+{
+ unsigned long tmo_us, poll_us;
+ ktime_t start;
+ u32 status;
+ int rc;
+
+ tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US,
+ __cxl_bi_get_timeout_us(dev, scale, base));
+ poll_us = max_t(unsigned long, tmo_us / 10, 1); /* ~10% */
+ start = ktime_get();
+
+ rc = readx_poll_timeout(readl, status_reg, status,
+ status & (committed_bit | err_bit),
+ poll_us, tmo_us);
+ if (rc) {
+ dev_dbg(dev, "BI-ID commit timed out (%luus)\n", tmo_us);
+ return rc; /* -ETIMEDOUT */
+ }
+
+ if (status & err_bit) {
+ dev_dbg(dev, "BI-ID commit rejected by hardware\n");
+ return -EIO;
+ }
+
+ dev_dbg(dev, "BI-ID commit wait took %lluus\n",
+ ktime_to_us(ktime_sub(ktime_get(), start)));
+ return 0;
+}
+
+/* BI RT only exists on switch upstream ports. */
+static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
+{
+ u32 status, ctrl;
+ unsigned int scale, base;
+
+ if (!bi)
+ return 0;
+
+ if (!FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
+ readl(bi + CXL_BI_RT_CAPS_OFFSET)))
+ return 0;
+
+ ctrl = readl(bi + CXL_BI_RT_CTRL_OFFSET);
+ writel(ctrl & ~CXL_BI_RT_CTRL_BI_COMMIT, bi + CXL_BI_RT_CTRL_OFFSET);
+ writel(ctrl | CXL_BI_RT_CTRL_BI_COMMIT, bi + CXL_BI_RT_CTRL_OFFSET);
+
+ status = readl(bi + CXL_BI_RT_STATUS_OFFSET);
+ scale = FIELD_GET(CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE, status);
+ base = FIELD_GET(CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE, status);
+
+ return __cxl_bi_wait_commit(dev, bi + CXL_BI_RT_STATUS_OFFSET,
+ CXL_BI_RT_STATUS_BI_COMMITTED,
+ CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED,
+ scale, base);
+}
+
+static int __cxl_bi_commit_decoder(struct device *dev, void __iomem *bi)
+{
+ u32 status, ctrl;
+ unsigned int scale, base;
+
+ if (!bi)
+ return -EINVAL;
+
+ if (!FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
+ readl(bi + CXL_BI_DECODER_CAPS_OFFSET)))
+ return 0;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+ writel(ctrl & ~CXL_BI_DECODER_CTRL_BI_COMMIT,
+ bi + CXL_BI_DECODER_CTRL_OFFSET);
+ writel(ctrl | CXL_BI_DECODER_CTRL_BI_COMMIT,
+ bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ status = readl(bi + CXL_BI_DECODER_STATUS_OFFSET);
+ scale = FIELD_GET(CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE, status);
+ base = FIELD_GET(CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE, status);
+
+ return __cxl_bi_wait_commit(dev, bi + CXL_BI_DECODER_STATUS_OFFSET,
+ CXL_BI_DECODER_STATUS_BI_COMMITTED,
+ CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED,
+ scale, base);
+}
+
+/* Enable or dealloc BI-ID changes in the given level of the topology. */
+static int __cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
+{
+ struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
+ void __iomem *bi = dport->regs.bi_decoder;
+ struct cxl_port *port = dport->port;
+ u32 ctrl, value;
+ int rc;
+
+ guard(device)(&port->dev);
+ if (!bi)
+ return -EINVAL;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ if (enable) {
+ /*
+ * There is no point of failure from here on,
+ * BI will be enabled on the endpoint device.
+ */
+ dport->nr_bi++;
+
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl) &&
+ !FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
+ return 0;
+
+ value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
+ value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (WARN_ON_ONCE(dport->nr_bi == 0))
+ return -EINVAL;
+ if (--dport->nr_bi > 0)
+ return 0;
+
+ value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
+ CXL_BI_DECODER_CTRL_BI_ENABLE);
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ return 0;
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ if (enable) {
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
+ value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
+ return 0;
+ value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
+ CXL_BI_DECODER_CTRL_BI_ENABLE);
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ rc = __cxl_bi_commit_decoder(dport->dport_dev, bi);
+ if (rc)
+ return rc;
+
+ return __cxl_bi_commit_rt(&port->dev, port->regs.bi_rt);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int cxl_bi_ctrl_dport_enable(struct cxl_dport *dport)
+{
+ return __cxl_bi_ctrl_dport(dport, true);
+}
+
+static int cxl_bi_ctrl_dport_disable(struct cxl_dport *dport)
+{
+ return __cxl_bi_ctrl_dport(dport, false);
+}
+
+static int __cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
+{
+ struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
+ void __iomem *bi = endpoint->regs.bi_decoder;
+ u32 ctrl, val;
+
+ if (!bi)
+ return -EINVAL;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ if (enable) {
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ if (cxlds->bi)
+ return 0;
+ dev_err(cxlds->dev,
+ "BI already enabled in hardware\n");
+ return -EBUSY;
+ }
+ val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ if (!cxlds->bi)
+ return 0;
+ dev_err(cxlds->dev,
+ "BI already disabled in hardware\n");
+ return -EBUSY;
+ }
+ val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ }
+
+ writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ cxlds->bi = enable;
+
+ dev_dbg(cxlds->dev, "BI requests %s\n",
+ str_enabled_disabled(enable));
+
+ return 0;
+}
+
+static int cxl_bi_ctrl_endpoint_enable(struct cxl_dev_state *cxlds)
+{
+ return __cxl_bi_ctrl_endpoint(cxlds, true);
+}
+
+static int cxl_bi_ctrl_endpoint_disable(struct cxl_dev_state *cxlds)
+{
+ return __cxl_bi_ctrl_endpoint(cxlds, false);
+}
+
+/*
+ * devm teardown on cxl_mem unbind. Endpoint decoders may still be
+ * committed here (cxl_workqueue tears them down asynchronously), but
+ * memory access has been quiesced.
+ */
+static void cxl_bi_dealloc(void *data)
+{
+ struct cxl_dev_state *cxlds = data;
+ struct cxl_dport *dport_iter, *dport;
+ struct cxl_port *port_iter;
+
+ if (!dev_is_pci(cxlds->dev))
+ return;
+
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
+ if (!port || !cxlds->bi)
+ return;
+
+ scoped_guard(rwsem_read, &cxl_rwsem.region)
+ cxl_bi_ctrl_endpoint_disable(cxlds);
+
+ port_iter = port;
+ dport_iter = dport;
+ while (!is_cxl_root(port_iter)) {
+ int rc = cxl_bi_ctrl_dport_disable(dport_iter);
+
+ /* best effort */
+ if (rc)
+ dev_dbg(&port_iter->dev,
+ "BI dport disable failed: %d\n", rc);
+
+ dport_iter = port_iter->parent_dport;
+ port_iter = dport_iter->port;
+ }
+}
+
+int cxl_bi_setup(struct cxl_dev_state *cxlds)
+{
+ struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
+ struct cxl_dport *dport_iter, *dport, *failed;
+ struct cxl_port *port_iter;
+ struct pci_dev *pdev;
+ int rc;
+
+ if (!dev_is_pci(cxlds->dev))
+ return 0;
+
+ pdev = to_pci_dev(cxlds->dev);
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &dport);
+
+ if (!port)
+ return -EINVAL;
+
+ if (!cxl_is_bi_capable(pdev, endpoint->regs.bi_decoder))
+ return 0;
+
+ /* walkup the topology twice, first to check, then to enable */
+ port_iter = port;
+ dport_iter = dport;
+ while (!is_cxl_root(port_iter)) {
+ /* check rp, dsp */
+ if (!cxl_is_bi_capable(to_pci_dev(dport_iter->dport_dev),
+ dport_iter->regs.bi_decoder))
+ return -EINVAL;
+
+ /* check usp */
+ if (dev_is_pci(port_iter->uport_dev) &&
+ pci_pcie_type(to_pci_dev(port_iter->uport_dev)) ==
+ PCI_EXP_TYPE_UPSTREAM)
+ if (!cxl_is_bi_capable(to_pci_dev(port_iter->uport_dev),
+ port_iter->regs.bi_rt))
+ return -EINVAL;
+
+ dport_iter = port_iter->parent_dport;
+ port_iter = dport_iter->port;
+ }
+
+ port_iter = port;
+ dport_iter = dport;
+ while (!is_cxl_root(port_iter)) {
+ rc = cxl_bi_ctrl_dport_enable(dport_iter);
+ if (rc)
+ goto err_rollback;
+
+ dport_iter = port_iter->parent_dport;
+ port_iter = dport_iter->port;
+ }
+
+ /* finally, enable BI on the device */
+ rc = cxl_bi_ctrl_endpoint_enable(cxlds);
+ if (rc)
+ goto err_rollback;
+
+ return devm_add_action_or_reset(&cxlds->cxlmd->dev,
+ cxl_bi_dealloc, cxlds);
+
+err_rollback:
+ /*
+ * Undo all dports enabled so far, including the failed one.
+ */
+ failed = dport_iter;
+ dport_iter = dport;
+ port_iter = port;
+ while (!is_cxl_root(port_iter)) {
+ cxl_bi_ctrl_dport_disable(dport_iter);
+ if (dport_iter == failed)
+ break;
+ dport_iter = port_iter->parent_dport;
+ port_iter = dport_iter->port;
+ }
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 760f51b43891..89c0c6ce24a3 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -167,6 +167,31 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_HEADERLOG_SIZE SZ_512
#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
+/* CXL 4.0 8.2.4.26 CXL BI Route Table Capability Structure */
+#define CXL_BI_RT_CAPS_OFFSET 0x0
+#define CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ BIT(0)
+#define CXL_BI_RT_CTRL_OFFSET 0x4
+#define CXL_BI_RT_CTRL_BI_COMMIT BIT(0)
+#define CXL_BI_RT_STATUS_OFFSET 0x8
+#define CXL_BI_RT_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
+/* CXL 4.0 8.2.4.27 CXL BI Decoder Capability Structure */
+#define CXL_BI_DECODER_CAPS_OFFSET 0x0
+#define CXL_BI_DECODER_CAPS_HDMD_CAP BIT(0)
+#define CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ BIT(1)
+#define CXL_BI_DECODER_CTRL_OFFSET 0x4
+#define CXL_BI_DECODER_CTRL_BI_FW BIT(0)
+#define CXL_BI_DECODER_CTRL_BI_ENABLE BIT(1)
+#define CXL_BI_DECODER_CTRL_BI_COMMIT BIT(2)
+#define CXL_BI_DECODER_STATUS_OFFSET 0x8
+#define CXL_BI_DECODER_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
#define CXLDEV_CAP_ARRAY_CAP_ID 0
@@ -630,6 +655,7 @@ struct cxl_rcrb_info {
* @coord: access coordinates (bandwidth and latency performance attributes)
* @link_latency: calculated PCIe downstream latency
* @gpf_dvsec: Cached GPF port DVSEC
+ * @nr_bi: number of BI-enabled endpoints below this dport
*/
struct cxl_dport {
struct device *dport_dev;
@@ -642,6 +668,7 @@ struct cxl_dport {
struct access_coordinate coord[ACCESS_COORDINATE_MAX];
long link_latency;
int gpf_dvsec;
+ int nr_bi;
};
/**
@@ -906,6 +933,7 @@ void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c2);
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
+int cxl_bi_setup(struct cxl_dev_state *cxlds);
struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
struct device *dport_dev);
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index fcffe24dcb42..f2d01a218895 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -150,6 +150,10 @@ static int cxl_mem_probe(struct device *dev)
if (rc)
dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
+ rc = cxl_bi_setup(cxlds);
+ if (rc)
+ dev_dbg(dev, "BI setup failed rc=%d\n", rc);
+
/*
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index 8ce9b4e9ca73..4938e93edb63 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -168,6 +168,7 @@ struct cxl_dpa_partition {
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @bi: device is BI (Back-Invalidate) enabled
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
* @part: DPA partition array
@@ -187,6 +188,7 @@ struct cxl_dev_state {
struct cxl_device_regs regs;
int cxl_dvsec;
bool rcd;
+ bool bi;
bool media_ready;
struct resource dpa_res;
struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/5] cxl/hdm: Add BI coherency support for endpoint decoders
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 1/5] cxl: Add BI register probing and port initialization Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 2/5] cxl/pci: Add BI topology enable/disable Davidlohr Bueso
@ 2026-06-05 0:33 ` Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 4/5] cxl: Add HDM-DB region creation Davidlohr Bueso
` (2 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
Cache the HDM decoder's Supported Coherency Models on struct cxl_hdm.
Region attach consults this to verify the HDM supports the region's
coherency type.
For uncommitted endpoint decoders, init_hdm_decoder() defaults
target_type from supported_coherency: Type 3 devices default to
HDM-DB when the HDM is device-coherent-only, HDM-H otherwise.
Pre-committed decoders with the BI bit set are not supported because
endpoint and port enumerations are independent -- at decoder
enumeration cxlds->bi is not yet known, so the topology cannot be
verified.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/hdm.c | 24 +++++++++++++++++++++---
drivers/cxl/cxl.h | 6 ++++++
drivers/cxl/cxlmem.h | 2 ++
3 files changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 0c80b76a5f9b..4a41cde9fdba 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -87,6 +87,8 @@ static void parse_hdm_decoder_caps(struct cxl_hdm *cxlhdm)
cxlhdm->iw_cap_mask |= BIT(3) | BIT(6) | BIT(12);
if (FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_16_WAY, hdm_cap))
cxlhdm->iw_cap_mask |= BIT(16);
+ cxlhdm->supported_coherency =
+ FIELD_GET(CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK, hdm_cap);
}
static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
@@ -1023,6 +1025,14 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
else
cxld->target_type = CXL_DECODER_DEVMEM;
+ /*
+ * Autocommit BI-enabled decoders is not supported.
+ * At this point cxlds->bi is not yet setup, so there
+ * are no guarantees that the platform supports BI.
+ */
+ if (FIELD_GET(CXL_HDM_DECODER0_CTRL_BI, ctrl))
+ return -ENXIO;
+
guard(rwsem_write)(&cxl_rwsem.region);
if (cxld->id != cxl_num_decoders_committed(port)) {
dev_warn(&port->dev,
@@ -1042,12 +1052,20 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
if (cxled) {
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
/*
- * Default by devtype until a device arrives that needs
- * more precision.
+ * HDMs that advertise support for both coherency
+ * modes (CXL_HDM_DECODER_COHERENCY_BOTH) default to
+ * host-only here; the region attach path will switch
+ * target_type to device-coherent if the region's
+ * root decoder has the CFMWS BI bit set. Only HDMs
+ * that strictly support device-coherent mode default
+ * to HDM-DB.
*/
- if (cxlds->type == CXL_DEVTYPE_CLASSMEM)
+ if (cxlds->type == CXL_DEVTYPE_CLASSMEM &&
+ cxlhdm->supported_coherency !=
+ CXL_HDM_DECODER_COHERENCY_DEV)
cxld->target_type = CXL_DECODER_HOSTONLYMEM;
else
cxld->target_type = CXL_DECODER_DEVMEM;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 89c0c6ce24a3..917a6aee5936 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -57,6 +57,11 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
#define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
+#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
+#define CXL_HDM_DECODER_COHERENCY_UNKNOWN 0x0
+#define CXL_HDM_DECODER_COHERENCY_DEV 0x1
+#define CXL_HDM_DECODER_COHERENCY_HOST 0x2
+#define CXL_HDM_DECODER_COHERENCY_BOTH 0x3
#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
#define CXL_HDM_DECODER_ENABLE BIT(1)
#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
@@ -71,6 +76,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
#define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
+#define CXL_HDM_DECODER0_CTRL_BI BIT(13)
#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 776c50d1db51..f41d39cc3795 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -837,6 +837,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
* @target_count: for switch decoders, max downstream port targets
* @interleave_mask: interleave granularity capability, see check_interleave_cap()
* @iw_cap_mask: bitmask of supported interleave ways, see check_interleave_cap()
+ * @supported_coherency: HDM Decoder Capability supported coherency mask
* @port: mapped cxl_port, see devm_cxl_setup_hdm()
*/
struct cxl_hdm {
@@ -845,6 +846,7 @@ struct cxl_hdm {
unsigned int target_count;
unsigned int interleave_mask;
unsigned long iw_cap_mask;
+ unsigned int supported_coherency;
struct cxl_port *port;
};
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 4/5] cxl: Add HDM-DB region creation
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
` (2 preceding siblings ...)
2026-06-05 0:33 ` [PATCH 3/5] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
@ 2026-06-05 0:33 ` Davidlohr Bueso
2026-06-05 1:34 ` sashiko-bot
2026-06-05 2:37 ` Richard Cheng
2026-06-05 0:33 ` [PATCH 5/5] cxl/hdm: Rename decoder coherency flags Davidlohr Bueso
2026-06-05 2:38 ` [PATCH v4 0/5] cxl: Support Back-Invalidate Richard Cheng
5 siblings, 2 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
A region inherits its coherency from the chosen root decoder: HDM-DB
if the root has CXL_DECODER_F_BI, otherwise HDM-H.
Surface the topology through read-only sysfs:
- decoderX.Y/cap_bi (root): CFMWS BI restriction.
- decoderX.Y/bi (endpoint): '1' when configured for HDM-DB.
cxl_region_attach() rejects endpoints whose device or HDM cannot
serve the region's type; target_type is inherited from cxlr->type
in cxl_rr_assign_decoder().
The HDM Decoder Control BI bit is set at commit time only when the
target_type is DEVMEM and the BI capability is advertised
(cxlds->bi for endpoints, root F_BI for switches).
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
Documentation/ABI/testing/sysfs-bus-cxl | 18 +++++++--
drivers/cxl/acpi.c | 11 +++++-
drivers/cxl/core/hdm.c | 14 +++++++
drivers/cxl/core/port.c | 29 ++++++++++++--
drivers/cxl/core/region.c | 51 +++++++++++++++++--------
drivers/cxl/cxl.h | 1 +
6 files changed, 100 insertions(+), 24 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 16a9b3d2e2c0..f631a94f098d 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -297,7 +297,7 @@ Description:
Each entry in the list is a dport id.
-What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
+What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3,bi}
Date: June, 2021
KernelVersion: v5.14
Contact: linux-cxl@vger.kernel.org
@@ -306,8 +306,9 @@ Description:
represents a fixed memory window identified by platform
firmware. A fixed window may only support a subset of memory
types. The 'cap_*' attributes indicate whether persistent
- memory, volatile memory, accelerator memory, and / or expander
- memory may be mapped behind this decoder's memory window.
+ memory, volatile memory, accelerator memory, expander memory,
+ and / or back-invalidate (HDM-DB) memory may be mapped behind
+ this decoder's memory window.
What: /sys/bus/cxl/devices/decoderX.Y/target_type
@@ -426,6 +427,17 @@ Description:
current cached value.
+What: /sys/bus/cxl/devices/decoderX.Y/bi
+Date: June, 2026
+KernelVersion: v7.2
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) Shows '1' if this endpoint decoder is currently configured
+ for HDM-DB (device-managed coherency with back-invalidate).
+ The HDM-DB state is inherited from the region the decoder is
+ attached to, which is in turn set from the chosen root
+ decoder's CFMWS BI restriction (see cap_bi).
+
What: /sys/bus/cxl/devices/decoderX.Y/delete_region
Date: May, 2022
KernelVersion: v6.0
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 127537628817..6a9ba82719e9 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -152,6 +152,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
flags |= CXL_DECODER_F_PMEM;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED)
flags |= CXL_DECODER_F_LOCK;
+ if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)
+ flags |= CXL_DECODER_F_BI;
return flags;
}
@@ -198,6 +200,12 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
dev_dbg(dev, "CFMWS length %d greater than expected %d\n",
cfmws->header.length, expected_len);
+ if ((cfmws->restrictions & ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM) &&
+ (cfmws->restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)) {
+ dev_err(dev, "CFMWS cannot have both HDM-H and HDM-DB\n");
+ return -EINVAL;
+ }
+
return 0;
}
@@ -437,7 +445,8 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
cxld = &cxlrd->cxlsd.cxld;
cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
- cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+ cxld->target_type = (cxld->flags & CXL_DECODER_F_TYPE2) ?
+ CXL_DECODER_DEVMEM : CXL_DECODER_HOSTONLYMEM;
cxld->hpa_range = (struct range) {
.start = cfmws->base_hpa,
.end = cfmws->base_hpa + cfmws->window_size - 1,
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index 4a41cde9fdba..70c797c77b70 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -705,9 +705,23 @@ static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
{
+ bool bi = cxld->target_type == CXL_DECODER_DEVMEM;
+
+ if (is_endpoint_decoder(&cxld->dev)) {
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
+ struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds;
+
+ bi = bi && cxlds->bi;
+ } else if (cxld->region) {
+ struct cxl_root_decoder *cxlrd = cxld->region->cxlrd;
+
+ bi = bi && (cxlrd->cxlsd.cxld.flags & CXL_DECODER_F_BI);
+ }
+
u32p_replace_bits(ctrl,
!!(cxld->target_type == CXL_DECODER_HOSTONLYMEM),
CXL_HDM_DECODER0_CTRL_HOSTONLY);
+ u32p_replace_bits(ctrl, bi, CXL_HDM_DECODER0_CTRL_BI);
}
static void cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index c5aacd7054f1..2b463a8b06a7 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -131,6 +131,7 @@ CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
+CXL_DECODER_FLAG_ATTR(cap_bi, CXL_DECODER_F_BI);
static ssize_t target_type_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -233,6 +234,18 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RW(mode);
+static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ return sysfs_emit(buf, "%d\n", cxlds->bi && cxled->cxld.region &&
+ cxled->cxld.target_type == CXL_DECODER_DEVMEM);
+}
+static DEVICE_ATTR_RO(bi);
+
static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -329,6 +342,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
&dev_attr_cap_ram.attr,
&dev_attr_cap_type2.attr,
&dev_attr_cap_type3.attr,
+ &dev_attr_cap_bi.attr,
&dev_attr_target_list.attr,
&dev_attr_qos_class.attr,
SET_CXL_REGION_ATTR(create_pmem_region)
@@ -339,16 +353,22 @@ static struct attribute *cxl_decoder_root_attrs[] = {
static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
{
- unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
+ unsigned long flags = cxlrd->cxlsd.cxld.flags;
+ unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
+ unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
+ CXL_DECODER_F_PMEM;
- return (cxlrd->cxlsd.cxld.flags & flags) == flags;
+ return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
}
static bool can_create_ram(struct cxl_root_decoder *cxlrd)
{
- unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
+ unsigned long flags = cxlrd->cxlsd.cxld.flags;
+ unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
+ unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
+ CXL_DECODER_F_RAM;
- return (cxlrd->cxlsd.cxld.flags & flags) == flags;
+ return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
}
static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
@@ -402,6 +422,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
static struct attribute *cxl_decoder_endpoint_attrs[] = {
&dev_attr_target_type.attr,
&dev_attr_mode.attr,
+ &dev_attr_bi.attr,
&dev_attr_dpa_size.attr,
&dev_attr_dpa_resource.attr,
SET_CXL_REGION_ATTR(region)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e50dc716d4e8..3f9ddacfae4e 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -1129,17 +1129,6 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
return -EBUSY;
}
- /*
- * Endpoints should already match the region type, but backstop that
- * assumption with an assertion. Switch-decoders change mapping-type
- * based on what is mapped when they are assigned to a region.
- */
- dev_WARN_ONCE(&cxlr->dev,
- port == cxled_to_port(cxled) &&
- cxld->target_type != cxlr->type,
- "%s:%s mismatch decoder type %d -> %d\n",
- dev_name(&cxled_to_memdev(cxled)->dev),
- dev_name(&cxld->dev), cxld->target_type, cxlr->type);
cxld->target_type = cxlr->type;
cxl_rr->decoder = cxld;
return 0;
@@ -2034,6 +2023,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
struct cxl_region_params *p = &cxlr->params;
struct cxl_port *ep_port, *root_port;
struct cxl_dport *dport;
+ struct cxl_hdm *cxlhdm;
int rc = -ENXIO;
rc = check_interleave_cap(&cxled->cxld, p->interleave_ways,
@@ -2083,10 +2073,31 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
- if (cxled->cxld.target_type != cxlr->type) {
- dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
- dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
- cxled->cxld.target_type, cxlr->type);
+ /*
+ * Verify the device and HDM are capable of the region's flavor before
+ * proceeding. The endpoint decoder's target_type is then inherited
+ * from cxlr->type later in cxl_rr_assign_decoder().
+ */
+ if (cxlr->type == CXL_DECODER_DEVMEM &&
+ cxlds->type == CXL_DEVTYPE_CLASSMEM && !cxlds->bi) {
+ dev_err(&cxlr->dev, "%s:%s BI not enabled on device\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
+ return -ENXIO;
+ }
+
+ cxlhdm = dev_get_drvdata(&ep_port->dev);
+ if (!cxlhdm)
+ return -ENXIO;
+ if (cxlr->type == CXL_DECODER_HOSTONLYMEM &&
+ cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV) {
+ dev_dbg(&cxlr->dev, "%s:%s HDM is device-coherent only\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
+ return -ENXIO;
+ }
+ if (cxlr->type == CXL_DECODER_DEVMEM &&
+ cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_HOST) {
+ dev_dbg(&cxlr->dev, "%s:%s HDM is host-only coherent\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
return -ENXIO;
}
@@ -2772,6 +2783,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
size_t len, enum cxl_partition_mode mode)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
+ enum cxl_decoder_type target_type;
struct cxl_region *cxlr;
int rc, id;
@@ -2779,7 +2791,14 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
+ /*
+ * The CFMWS dictates endpoint coherency: a BI-restricted window
+ * produces an HDM-DB region; otherwise HDM-H.
+ */
+ target_type = (cxlrd->cxlsd.cxld.flags & CXL_DECODER_F_BI) ?
+ CXL_DECODER_DEVMEM : CXL_DECODER_HOSTONLYMEM;
+
+ cxlr = __create_region(cxlrd, mode, id, target_type);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 917a6aee5936..73d4d2d4616d 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -287,6 +287,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_DECODER_F_LOCK BIT(4)
#define CXL_DECODER_F_ENABLE BIT(5)
#define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
+#define CXL_DECODER_F_BI BIT(7)
#define CXL_DECODER_F_RESET_MASK (CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK)
enum cxl_decoder_type {
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 5/5] cxl/hdm: Rename decoder coherency flags
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
` (3 preceding siblings ...)
2026-06-05 0:33 ` [PATCH 4/5] cxl: Add HDM-DB region creation Davidlohr Bueso
@ 2026-06-05 0:33 ` Davidlohr Bueso
2026-06-05 2:38 ` [PATCH v4 0/5] cxl: Support Back-Invalidate Richard Cheng
5 siblings, 0 replies; 15+ messages in thread
From: Davidlohr Bueso @ 2026-06-05 0:33 UTC (permalink / raw)
To: dave.jiang, djbw
Cc: jic23, benjamin.cheatham, alucerop, alison.schofield, gourry,
dongjoo.seo1, dave, linux-cxl
Align with the ACPI CXL Window restriction naming and convert
CXL_DECODER_F_TYPE2/F_TYPE3 to F_DEVMEM/F_HOSTONLY. Type2 and
Type3 coherency models was named prior to Back-Invalidate.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/acpi.c | 6 +++---
drivers/cxl/core/port.c | 12 ++++++------
drivers/cxl/cxl.h | 4 ++--
3 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 6a9ba82719e9..087dd1e86d40 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -143,9 +143,9 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
unsigned long flags = CXL_DECODER_F_ENABLE;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_DEVMEM)
- flags |= CXL_DECODER_F_TYPE2;
+ flags |= CXL_DECODER_F_DEVMEM;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM)
- flags |= CXL_DECODER_F_TYPE3;
+ flags |= CXL_DECODER_F_HOSTONLY;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE)
flags |= CXL_DECODER_F_RAM;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_PMEM)
@@ -445,7 +445,7 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
cxld = &cxlrd->cxlsd.cxld;
cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
- cxld->target_type = (cxld->flags & CXL_DECODER_F_TYPE2) ?
+ cxld->target_type = (cxld->flags & CXL_DECODER_F_DEVMEM) ?
CXL_DECODER_DEVMEM : CXL_DECODER_HOSTONLYMEM;
cxld->hpa_range = (struct range) {
.start = cfmws->base_hpa,
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 2b463a8b06a7..495a5fe869a1 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -128,8 +128,8 @@ static DEVICE_ATTR_RO(name)
CXL_DECODER_FLAG_ATTR(cap_pmem, CXL_DECODER_F_PMEM);
CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
-CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
-CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
+CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_DEVMEM);
+CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_HOSTONLY);
CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
CXL_DECODER_FLAG_ATTR(cap_bi, CXL_DECODER_F_BI);
@@ -354,8 +354,8 @@ static struct attribute *cxl_decoder_root_attrs[] = {
static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
{
unsigned long flags = cxlrd->cxlsd.cxld.flags;
- unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
- unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
+ unsigned long hdm_h = CXL_DECODER_F_HOSTONLY | CXL_DECODER_F_PMEM;
+ unsigned long hdm_db = CXL_DECODER_F_DEVMEM | CXL_DECODER_F_BI |
CXL_DECODER_F_PMEM;
return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
@@ -364,8 +364,8 @@ static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
static bool can_create_ram(struct cxl_root_decoder *cxlrd)
{
unsigned long flags = cxlrd->cxlsd.cxld.flags;
- unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
- unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
+ unsigned long hdm_h = CXL_DECODER_F_HOSTONLY | CXL_DECODER_F_RAM;
+ unsigned long hdm_db = CXL_DECODER_F_DEVMEM | CXL_DECODER_F_BI |
CXL_DECODER_F_RAM;
return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 73d4d2d4616d..bab8ca3dc2a4 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -282,8 +282,8 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
*/
#define CXL_DECODER_F_RAM BIT(0)
#define CXL_DECODER_F_PMEM BIT(1)
-#define CXL_DECODER_F_TYPE2 BIT(2)
-#define CXL_DECODER_F_TYPE3 BIT(3)
+#define CXL_DECODER_F_DEVMEM BIT(2)
+#define CXL_DECODER_F_HOSTONLY BIT(3)
#define CXL_DECODER_F_LOCK BIT(4)
#define CXL_DECODER_F_ENABLE BIT(5)
#define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] cxl/pci: Add BI topology enable/disable
2026-06-05 0:33 ` [PATCH 2/5] cxl/pci: Add BI topology enable/disable Davidlohr Bueso
@ 2026-06-05 1:29 ` sashiko-bot
2026-06-05 2:11 ` Richard Cheng
2026-06-05 2:20 ` Richard Cheng
2026-06-05 2:23 ` Richard Cheng
1 sibling, 2 replies; 15+ messages in thread
From: sashiko-bot @ 2026-06-05 1:29 UTC (permalink / raw)
To: Davidlohr Bueso; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
- [High] Missing BI reference counting for shared Downstream Ports breaks BI coherence in cascaded switch topologies.
- [High] A hardware-reported timeout base of 0 causes an infinite polling loop in `readx_poll_timeout`, leading to a kernel hang.
- [High] Missing explicit commit sequence for the BI Decoder on the endpoint device.
- [High] Topology walks in `cxl_bi_setup` and `cxl_bi_dealloc` stop early, failing to configure or rollback BI on the Root Port.
- [High] `cxl_bi_dealloc` returns early and fails to disable BI on the endpoint hardware if the upstream port topology is unbound prior to memdev teardown.
- [Medium] Mixing `goto`-based error handling with `__free()` scope cleanup attributes in the same function violates subsystem guidelines.
--
commit 6dd8e8661d8e77fc7f88e425d2633e5ccda84b51
Author: Davidlohr Bueso <dave@stgolabs.net>
cxl/pci: Add BI topology enable/disable
Implement cxl_bi_setup() which walks the CXL port topology to enable
BI flows on the device and every component in the path between the
device and the root port. The teardown counterpart, cxl_bi_dealloc(),
runs automatically on cxl_mem unbind.
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
[ ... ]
> +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> + unsigned int scale,
> + unsigned int base)
> +{
> + static const unsigned long scale_tbl[] = {
> + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> + };
> +
> + if (scale >= ARRAY_SIZE(scale_tbl)) {
> + dev_dbg(dev, "Invalid BI commit timeout scale: %u\n", scale);
> + return CXL_BI_COMMIT_MAXTMO_US;
> + }
> +
> + return scale_tbl[scale] * base;
> +}
[Severity: High]
If the hardware provides a base of 0, could this return a 0 timeout?
[ ... ]
> +static int __cxl_bi_wait_commit(struct device *dev, void __iomem *status_reg,
> + u32 committed_bit, u32 err_bit,
> + unsigned int scale, unsigned int base)
> +{
> + unsigned long tmo_us, poll_us;
> + ktime_t start;
> + u32 status;
> + int rc;
> +
> + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US,
> + __cxl_bi_get_timeout_us(dev, scale, base));
> + poll_us = max_t(unsigned long, tmo_us / 10, 1); /* ~10% */
> + start = ktime_get();
> +
> + rc = readx_poll_timeout(readl, status_reg, status,
> + status & (committed_bit | err_bit),
> + poll_us, tmo_us);
[Severity: High]
Does a tmo_us of 0 passed to readx_poll_timeout cause it to poll
indefinitely?
If a faulty or malicious device sets base to 0 and never sets the commit
or error bits, this looks like it might result in a kernel thread spinning
forever.
[ ... ]
> +static int __cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> +{
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> + void __iomem *bi = dport->regs.bi_decoder;
> + struct cxl_port *port = dport->port;
> + u32 ctrl, value;
> + int rc;
> +
> + guard(device)(&port->dev);
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + dport->nr_bi++;
[ ... ]
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + if (enable) {
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> + value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
> + CXL_BI_DECODER_CTRL_BI_ENABLE);
> + }
[Severity: High]
Should this case maintain a reference count for the downstream port similar
to how dport->nr_bi is used in the PCI_EXP_TYPE_ROOT_PORT case?
In cascaded switch topologies, an upstream switch's downstream port can be
shared by multiple downstream endpoints. If one endpoint is unbound,
cxl_bi_dealloc will clear the BI_ENABLE bit on the shared downstream port
unconditionally.
Could this silently break Back-Invalidate for all other active endpoints
relying on that same path?
[ ... ]
> +static int __cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> +{
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + void __iomem *bi = endpoint->regs.bi_decoder;
> + u32 ctrl, val;
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (enable) {
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + if (cxlds->bi)
> + return 0;
> + dev_err(cxlds->dev,
> + "BI already enabled in hardware\n");
> + return -EBUSY;
> + }
> + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + if (!cxlds->bi)
> + return 0;
> + dev_err(cxlds->dev,
> + "BI already disabled in hardware\n");
> + return -EBUSY;
> + }
> + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + cxlds->bi = enable;
> +
> + dev_dbg(cxlds->dev, "BI requests %s\n",
> + str_enabled_disabled(enable));
> +
> + return 0;
> +}
[Severity: High]
Is an explicit commit sequence required here after writing the control
register?
Unlike __cxl_bi_ctrl_dport which calls __cxl_bi_commit_decoder after
modifying the control register, the endpoint function does not commit
the changes.
If the endpoint's BI decoder requires an explicit commit, it seems the enable
or disable operation might not take effect on the hardware.
[ ... ]
> +static void cxl_bi_dealloc(void *data)
> +{
> + struct cxl_dev_state *cxlds = data;
> + struct cxl_dport *dport_iter, *dport;
> + struct cxl_port *port_iter;
> +
> + if (!dev_is_pci(cxlds->dev))
> + return;
> +
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
> + if (!port || !cxlds->bi)
> + return;
> +
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + cxl_bi_ctrl_endpoint_disable(cxlds);
[Severity: High]
Does this early return prevent proper endpoint cleanup if the port topology
is torn down before the memdev teardown?
If cxl_pci_find_port returns NULL because the upstream port was removed
or unbound, cxl_bi_ctrl_endpoint_disable is skipped. This leaves the
endpoint hardware with BI_ENABLE set and cxlds->bi equal to true.
If the device is subsequently rebound, cxl_bi_ctrl_endpoint_enable will see
the hardware bit already set, print 'BI already enabled in hardware',
and fail the probe with -EBUSY.
> +
> + port_iter = port;
> + dport_iter = dport;
> + while (!is_cxl_root(port_iter)) {
> + int rc = cxl_bi_ctrl_dport_disable(dport_iter);
> +
> + /* best effort */
> + if (rc)
> + dev_dbg(&port_iter->dev,
> + "BI dport disable failed: %d\n", rc);
> +
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> +}
[Severity: High]
Does this loop terminate before disabling BI on the Root Port?
In the CXL topology, the Root Port is a downstream port (dport) owned by
the CXL Root or Host Bridge (port). When the traversal reaches the Host
Bridge, is_cxl_root(port_iter) evaluates to true and the loop terminates
before executing the loop body for the Root Port's dport.
[ ... ]
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + struct cxl_dport *dport_iter, *dport, *failed;
> + struct cxl_port *port_iter;
> + struct pci_dev *pdev;
> + int rc;
> +
> + if (!dev_is_pci(cxlds->dev))
> + return 0;
> +
> + pdev = to_pci_dev(cxlds->dev);
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &dport);
> +
> + if (!port)
> + return -EINVAL;
[ ... ]
> + port_iter = port;
> + dport_iter = dport;
> + while (!is_cxl_root(port_iter)) {
> + rc = cxl_bi_ctrl_dport_enable(dport_iter);
> + if (rc)
> + goto err_rollback;
[Severity: Medium]
Is it safe to mix goto-based error handling with __free scope cleanup
attributes in the same function?
The Cleanup and Guard Subsystem Details explicitly forbid mixing goto-based
cleanup labels and scope-based __free declarations within the same function
due to the potential for confusing ownership semantics and double-free bugs.
[ ... ]
> +err_rollback:
> + /*
> + * Undo all dports enabled so far, including the failed one.
> + */
> + failed = dport_iter;
> + dport_iter = dport;
> + port_iter = port;
> + while (!is_cxl_root(port_iter)) {
> + cxl_bi_ctrl_dport_disable(dport_iter);
> + if (dport_iter == failed)
> + break;
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> + return rc;
> +}
[Severity: High]
Does this rollback loop also skip the Root Port because of the
is_cxl_root check, similar to the teardown loop?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=2
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] cxl: Add BI register probing and port initialization
2026-06-05 0:33 ` [PATCH 1/5] cxl: Add BI register probing and port initialization Davidlohr Bueso
@ 2026-06-05 1:33 ` sashiko-bot
2026-06-05 2:03 ` Richard Cheng
0 siblings, 1 reply; 15+ messages in thread
From: sashiko-bot @ 2026-06-05 1:33 UTC (permalink / raw)
To: Davidlohr Bueso; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Silent failure to map and configure CXL BI Route Table due to probe sequence mismatch
--
commit 34d3cc24691a25107f1dbbcc8e03fc18ffcae164
Author: Davidlohr Bueso <dave@stgolabs.net>
cxl: Add BI register probing and port initialization
This commit adds register probing for BI Route Table and BI Decoder capability
structures. It introduces helpers to map these structures at dport-creation
time and port probe time.
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index ada51948d52fa..2dd712c5d312a 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
[ ... ]
> @@ -58,6 +58,62 @@ static int discover_region(struct device *dev, void *unused)
[ ... ]
> +static void cxl_port_map_bi(struct cxl_port *port)
> +{
> + struct cxl_register_map *map = &port->reg_map;
[ ... ]
> + /* map this port's own BI capability */
> + if (is_cxl_endpoint(port)) {
> + if (!map->component_map.bi_decoder.valid) {
> + dev_dbg(&port->dev, "BI Decoder registers not found\n");
> + return;
> + }
> + cap_id = CXL_CM_CAP_CAP_ID_BI_DECODER;
> + } else {
> + if (!map->component_map.bi_rt.valid) {
> + dev_dbg(&port->dev, "BI RT registers not found\n");
> + return;
> + }
[Severity: High]
Does this silently fail to map the BI Route Table registers for switch
ports?
When called from cxl_switch_port_probe(), the component registers have not
been discovered yet. The registers are actually probed and marked valid when
the first downstream port is added, which triggers cxl_port_add_dport() to call
cxl_port_setup_regs().
Because the valid flag is always false here during switch port probe, the map
function returns early and permanently leaves port->regs.bi_rt unmapped. Will
this bypass the required hardware commit protocol for the switch upstream port
and break Back-Invalidate coherency for downstream endpoints?
> + cap_id = CXL_CM_CAP_CAP_ID_BI_RT;
> + }
> +
> + map->host = &port->dev;
> + if (cxl_map_component_regs(map, &port->regs, BIT(cap_id)))
> + dev_dbg(&port->dev, "Failed to map BI capability 0x%x\n",
> + cap_id);
> +}
> +
> static int cxl_switch_port_probe(struct cxl_port *port)
> {
> /* Reset nr_dports for rebind of driver */
[ ... ]
> @@ -66,6 +122,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> /* Cache the data early to ensure is_visible() works */
> read_cdat_data(port);
>
> + cxl_port_map_bi(port);
> +
> return 0;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=1
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/5] cxl: Add HDM-DB region creation
2026-06-05 0:33 ` [PATCH 4/5] cxl: Add HDM-DB region creation Davidlohr Bueso
@ 2026-06-05 1:34 ` sashiko-bot
2026-06-05 2:37 ` Richard Cheng
1 sibling, 0 replies; 15+ messages in thread
From: sashiko-bot @ 2026-06-05 1:34 UTC (permalink / raw)
To: Davidlohr Bueso; +Cc: linux-cxl
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Type-2 (CXL_DEVTYPE_DEVMEM) endpoints lacking Back-Invalidation (BI) support are incorrectly allowed to attach to HDM-DB regions, resulting in an invalid HDM-D configuration within a BI-restricted window.
--
commit 17274d5d8a1db5fc909bbe15ef4736f49f5f5bb1
Author: Davidlohr Bueso <dave@stgolabs.net>
cxl: Add HDM-DB region creation
A region inherits its coherency from the chosen root decoder: HDM-DB if the
root has CXL_DECODER_F_BI, otherwise HDM-H. This commit surfaces the
topology through sysfs and enables HDM-DB capability checking.
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e82..3f9ddacfae4ea 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
[ ... ]
> @@ -2083,10 +2073,31 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return -ENXIO;
> }
>
> - if (cxled->cxld.target_type != cxlr->type) {
> - dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
> - dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> - cxled->cxld.target_type, cxlr->type);
> + /*
> + * Verify the device and HDM are capable of the region's flavor before
> + * proceeding. The endpoint decoder's target_type is then inherited
> + * from cxlr->type later in cxl_rr_assign_decoder().
> + */
> + if (cxlr->type == CXL_DECODER_DEVMEM &&
> + cxlds->type == CXL_DEVTYPE_CLASSMEM && !cxlds->bi) {
[Severity: High]
Does this capability check incorrectly allow Type-2 devices without back
invalidation support to bypass the validation?
Since this check requires cxlds->type == CXL_DEVTYPE_CLASSMEM (Type-3
devices), Type-2 devices bypass it. If they attach to an HDM-DB region,
cxld_set_type() in drivers/cxl/core/hdm.c will configure the decoder as
HDM-D (BI=0) because of this logic:
cxld_set_type()
bi = bi && cxlds->bi;
Will this violate the host bridge's coherency expectations by allowing an
HDM-D configuration within a BI-restricted window?
> + dev_err(&cxlr->dev, "%s:%s BI not enabled on device\n",
> + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
> + return -ENXIO;
> + }
> +
> + cxlhdm = dev_get_drvdata(&ep_port->dev);
> + if (!cxlhdm)
> + return -ENXIO;
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=4
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/5] cxl: Add BI register probing and port initialization
2026-06-05 1:33 ` sashiko-bot
@ 2026-06-05 2:03 ` Richard Cheng
0 siblings, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:03 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Davidlohr Bueso, linux-cxl
On Fri, Jun 05, 2026 at 01:33:07AM +0800, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Silent failure to map and configure CXL BI Route Table due to probe sequence mismatch
> --
>
> commit 34d3cc24691a25107f1dbbcc8e03fc18ffcae164
> Author: Davidlohr Bueso <dave@stgolabs.net>
>
> cxl: Add BI register probing and port initialization
>
> This commit adds register probing for BI Route Table and BI Decoder capability
> structures. It introduces helpers to map these structures at dport-creation
> time and port probe time.
>
> > diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> > index ada51948d52fa..2dd712c5d312a 100644
> > --- a/drivers/cxl/port.c
> > +++ b/drivers/cxl/port.c
> [ ... ]
> > @@ -58,6 +58,62 @@ static int discover_region(struct device *dev, void *unused)
> [ ... ]
> > +static void cxl_port_map_bi(struct cxl_port *port)
> > +{
> > + struct cxl_register_map *map = &port->reg_map;
> [ ... ]
> > + /* map this port's own BI capability */
> > + if (is_cxl_endpoint(port)) {
> > + if (!map->component_map.bi_decoder.valid) {
> > + dev_dbg(&port->dev, "BI Decoder registers not found\n");
> > + return;
> > + }
> > + cap_id = CXL_CM_CAP_CAP_ID_BI_DECODER;
> > + } else {
> > + if (!map->component_map.bi_rt.valid) {
> > + dev_dbg(&port->dev, "BI RT registers not found\n");
> > + return;
> > + }
>
> [Severity: High]
> Does this silently fail to map the BI Route Table registers for switch
> ports?
>
> When called from cxl_switch_port_probe(), the component registers have not
> been discovered yet. The registers are actually probed and marked valid when
> the first downstream port is added, which triggers cxl_port_add_dport() to call
> cxl_port_setup_regs().
>
> Because the valid flag is always false here during switch port probe, the map
> function returns early and permanently leaves port->regs.bi_rt unmapped. Will
> this bypass the required hardware commit protocol for the switch upstream port
> and break Back-Invalidate coherency for downstream endpoints?
>
> > + cap_id = CXL_CM_CAP_CAP_ID_BI_RT;
> > + }
> > +
> > + map->host = &port->dev;
> > + if (cxl_map_component_regs(map, &port->regs, BIT(cap_id)))
> > + dev_dbg(&port->dev, "Failed to map BI capability 0x%x\n",
> > + cap_id);
> > +}
> > +
> > static int cxl_switch_port_probe(struct cxl_port *port)
> > {
> > /* Reset nr_dports for rebind of driver */
> [ ... ]
> > @@ -66,6 +122,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> > /* Cache the data early to ensure is_visible() works */
> > read_cdat_data(port);
> >
> > + cxl_port_map_bi(port);
> > +
> > return 0;
> > }
>
Hi Davidlohr,
cxl_port_map_bi() is called here before any dport exists.
Component register discovery happends later in cxl_port_add_dport() and then
cxl_port_setup_regs() on the first dport.
So for switch USP, bi_rt.valid is false at probe time, mapping is skipped
and never retried. I saw from your cover letter's dmesg
"""
cxl_port port2: BI RT registers not found ← probe-time map_bi
...
cxl_port port2: found BI RT capability ← later, on first dport
"""
Endpoints and RP/DSP paths are fine, but switch BI RT mapping might
be the gap.
Let me know what you think, please correct me if I'm wrong, thanks
Best regards,
Richard Cheng.
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=1
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] cxl/pci: Add BI topology enable/disable
2026-06-05 1:29 ` sashiko-bot
@ 2026-06-05 2:11 ` Richard Cheng
2026-06-05 2:20 ` Richard Cheng
1 sibling, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:11 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Davidlohr Bueso, linux-cxl
On Fri, Jun 05, 2026 at 01:29:39AM +0800, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [High] Missing BI reference counting for shared Downstream Ports breaks BI coherence in cascaded switch topologies.
> - [High] A hardware-reported timeout base of 0 causes an infinite polling loop in `readx_poll_timeout`, leading to a kernel hang.
> - [High] Missing explicit commit sequence for the BI Decoder on the endpoint device.
> - [High] Topology walks in `cxl_bi_setup` and `cxl_bi_dealloc` stop early, failing to configure or rollback BI on the Root Port.
> - [High] `cxl_bi_dealloc` returns early and fails to disable BI on the endpoint hardware if the upstream port topology is unbound prior to memdev teardown.
> - [Medium] Mixing `goto`-based error handling with `__free()` scope cleanup attributes in the same function violates subsystem guidelines.
> --
>
> commit 6dd8e8661d8e77fc7f88e425d2633e5ccda84b51
> Author: Davidlohr Bueso <dave@stgolabs.net>
>
> cxl/pci: Add BI topology enable/disable
>
> Implement cxl_bi_setup() which walks the CXL port topology to enable
> BI flows on the device and every component in the path between the
> device and the root port. The teardown counterpart, cxl_bi_dealloc(),
> runs automatically on cxl_mem unbind.
>
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
>
> [ ... ]
>
> > +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> > + unsigned int scale,
> > + unsigned int base)
> > +{
> > + static const unsigned long scale_tbl[] = {
> > + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> > + };
> > +
> > + if (scale >= ARRAY_SIZE(scale_tbl)) {
> > + dev_dbg(dev, "Invalid BI commit timeout scale: %u\n", scale);
> > + return CXL_BI_COMMIT_MAXTMO_US;
> > + }
> > +
> > + return scale_tbl[scale] * base;
> > +}
>
> [Severity: High]
> If the hardware provides a base of 0, could this return a 0 timeout?
>
> [ ... ]
>
> > +static int __cxl_bi_wait_commit(struct device *dev, void __iomem *status_reg,
> > + u32 committed_bit, u32 err_bit,
> > + unsigned int scale, unsigned int base)
> > +{
> > + unsigned long tmo_us, poll_us;
> > + ktime_t start;
> > + u32 status;
> > + int rc;
> > +
> > + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US,
> > + __cxl_bi_get_timeout_us(dev, scale, base));
> > + poll_us = max_t(unsigned long, tmo_us / 10, 1); /* ~10% */
> > + start = ktime_get();
> > +
> > + rc = readx_poll_timeout(readl, status_reg, status,
> > + status & (committed_bit | err_bit),
> > + poll_us, tmo_us);
>
> [Severity: High]
> Does a tmo_us of 0 passed to readx_poll_timeout cause it to poll
> indefinitely?
>
> If a faulty or malicious device sets base to 0 and never sets the commit
> or error bits, this looks like it might result in a kernel thread spinning
> forever.
>
> [ ... ]
>
> > +static int __cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> > +{
> > + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> > + void __iomem *bi = dport->regs.bi_decoder;
> > + struct cxl_port *port = dport->port;
> > + u32 ctrl, value;
> > + int rc;
> > +
> > + guard(device)(&port->dev);
> > + if (!bi)
> > + return -EINVAL;
> > +
> > + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> > +
> > + switch (pci_pcie_type(pdev)) {
> > + case PCI_EXP_TYPE_ROOT_PORT:
> > + if (enable) {
> > + /*
> > + * There is no point of failure from here on,
> > + * BI will be enabled on the endpoint device.
> > + */
> > + dport->nr_bi++;
>
> [ ... ]
>
> > + case PCI_EXP_TYPE_DOWNSTREAM:
> > + if (enable) {
> > + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> > + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + } else {
> > + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> > + return 0;
> > + value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
> > + CXL_BI_DECODER_CTRL_BI_ENABLE);
> > + }
>
> [Severity: High]
> Should this case maintain a reference count for the downstream port similar
> to how dport->nr_bi is used in the PCI_EXP_TYPE_ROOT_PORT case?
>
> In cascaded switch topologies, an upstream switch's downstream port can be
> shared by multiple downstream endpoints. If one endpoint is unbound,
> cxl_bi_dealloc will clear the BI_ENABLE bit on the shared downstream port
> unconditionally.
>
> Could this silently break Back-Invalidate for all other active endpoints
> relying on that same path?
>
Acked on sashiko-bot's finding, 2 BI-capable endpoints behind
the same DSP can break it, first unbind breaks the second.
Switch DSP unconditionally clear BI_ENABLE on the first memdev unbind.
> [ ... ]
>
> > +static int __cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> > +{
> > + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> > + void __iomem *bi = endpoint->regs.bi_decoder;
> > + u32 ctrl, val;
> > +
> > + if (!bi)
> > + return -EINVAL;
> > +
> > + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> > +
> > + if (enable) {
> > + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> > + if (cxlds->bi)
> > + return 0;
> > + dev_err(cxlds->dev,
> > + "BI already enabled in hardware\n");
> > + return -EBUSY;
> > + }
> > + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + } else {
> > + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> > + if (!cxlds->bi)
> > + return 0;
> > + dev_err(cxlds->dev,
> > + "BI already disabled in hardware\n");
> > + return -EBUSY;
> > + }
> > + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + }
> > +
> > + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> > + cxlds->bi = enable;
> > +
> > + dev_dbg(cxlds->dev, "BI requests %s\n",
> > + str_enabled_disabled(enable));
> > +
> > + return 0;
> > +}
>
> [Severity: High]
> Is an explicit commit sequence required here after writing the control
> register?
>
> Unlike __cxl_bi_ctrl_dport which calls __cxl_bi_commit_decoder after
> modifying the control register, the endpoint function does not commit
> the changes.
>
> If the endpoint's BI decoder requires an explicit commit, it seems the enable
> or disable operation might not take effect on the hardware.
>
> [ ... ]
>
> > +static void cxl_bi_dealloc(void *data)
> > +{
> > + struct cxl_dev_state *cxlds = data;
> > + struct cxl_dport *dport_iter, *dport;
> > + struct cxl_port *port_iter;
> > +
> > + if (!dev_is_pci(cxlds->dev))
> > + return;
> > +
> > + struct cxl_port *port __free(put_cxl_port) =
> > + cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
> > + if (!port || !cxlds->bi)
> > + return;
> > +
> > + scoped_guard(rwsem_read, &cxl_rwsem.region)
> > + cxl_bi_ctrl_endpoint_disable(cxlds);
>
> [Severity: High]
> Does this early return prevent proper endpoint cleanup if the port topology
> is torn down before the memdev teardown?
>
> If cxl_pci_find_port returns NULL because the upstream port was removed
> or unbound, cxl_bi_ctrl_endpoint_disable is skipped. This leaves the
> endpoint hardware with BI_ENABLE set and cxlds->bi equal to true.
>
> If the device is subsequently rebound, cxl_bi_ctrl_endpoint_enable will see
> the hardware bit already set, print 'BI already enabled in hardware',
> and fail the probe with -EBUSY.
>
> > +
> > + port_iter = port;
> > + dport_iter = dport;
> > + while (!is_cxl_root(port_iter)) {
> > + int rc = cxl_bi_ctrl_dport_disable(dport_iter);
> > +
> > + /* best effort */
> > + if (rc)
> > + dev_dbg(&port_iter->dev,
> > + "BI dport disable failed: %d\n", rc);
> > +
> > + dport_iter = port_iter->parent_dport;
> > + port_iter = dport_iter->port;
> > + }
> > +}
>
> [Severity: High]
> Does this loop terminate before disabling BI on the Root Port?
>
> In the CXL topology, the Root Port is a downstream port (dport) owned by
> the CXL Root or Host Bridge (port). When the traversal reaches the Host
> Bridge, is_cxl_root(port_iter) evaluates to true and the loop terminates
> before executing the loop body for the Root Port's dport.
>
> [ ... ]
>
> > +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> > +{
> > + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> > + struct cxl_dport *dport_iter, *dport, *failed;
> > + struct cxl_port *port_iter;
> > + struct pci_dev *pdev;
> > + int rc;
> > +
> > + if (!dev_is_pci(cxlds->dev))
> > + return 0;
> > +
> > + pdev = to_pci_dev(cxlds->dev);
> > + struct cxl_port *port __free(put_cxl_port) =
> > + cxl_pci_find_port(pdev, &dport);
> > +
> > + if (!port)
> > + return -EINVAL;
>
> [ ... ]
>
> > + port_iter = port;
> > + dport_iter = dport;
> > + while (!is_cxl_root(port_iter)) {
> > + rc = cxl_bi_ctrl_dport_enable(dport_iter);
> > + if (rc)
> > + goto err_rollback;
>
> [Severity: Medium]
> Is it safe to mix goto-based error handling with __free scope cleanup
> attributes in the same function?
>
> The Cleanup and Guard Subsystem Details explicitly forbid mixing goto-based
> cleanup labels and scope-based __free declarations within the same function
> due to the potential for confusing ownership semantics and double-free bugs.
>
> [ ... ]
>
> > +err_rollback:
> > + /*
> > + * Undo all dports enabled so far, including the failed one.
> > + */
> > + failed = dport_iter;
> > + dport_iter = dport;
> > + port_iter = port;
> > + while (!is_cxl_root(port_iter)) {
> > + cxl_bi_ctrl_dport_disable(dport_iter);
> > + if (dport_iter == failed)
> > + break;
> > + dport_iter = port_iter->parent_dport;
> > + port_iter = dport_iter->port;
> > + }
> > + return rc;
> > +}
>
> [Severity: High]
> Does this rollback loop also skip the Root Port because of the
> is_cxl_root check, similar to the teardown loop?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=2
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] cxl/pci: Add BI topology enable/disable
2026-06-05 1:29 ` sashiko-bot
2026-06-05 2:11 ` Richard Cheng
@ 2026-06-05 2:20 ` Richard Cheng
1 sibling, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:20 UTC (permalink / raw)
To: sashiko-reviews; +Cc: Davidlohr Bueso, linux-cxl
On Fri, Jun 05, 2026 at 01:29:39AM +0800, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 6 potential issue(s) to consider:
> - [High] Missing BI reference counting for shared Downstream Ports breaks BI coherence in cascaded switch topologies.
> - [High] A hardware-reported timeout base of 0 causes an infinite polling loop in `readx_poll_timeout`, leading to a kernel hang.
> - [High] Missing explicit commit sequence for the BI Decoder on the endpoint device.
> - [High] Topology walks in `cxl_bi_setup` and `cxl_bi_dealloc` stop early, failing to configure or rollback BI on the Root Port.
> - [High] `cxl_bi_dealloc` returns early and fails to disable BI on the endpoint hardware if the upstream port topology is unbound prior to memdev teardown.
> - [Medium] Mixing `goto`-based error handling with `__free()` scope cleanup attributes in the same function violates subsystem guidelines.
> --
>
> commit 6dd8e8661d8e77fc7f88e425d2633e5ccda84b51
> Author: Davidlohr Bueso <dave@stgolabs.net>
>
> cxl/pci: Add BI topology enable/disable
>
> Implement cxl_bi_setup() which walks the CXL port topology to enable
> BI flows on the device and every component in the path between the
> device and the root port. The teardown counterpart, cxl_bi_dealloc(),
> runs automatically on cxl_mem unbind.
>
> > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> > --- a/drivers/cxl/core/pci.c
> > +++ b/drivers/cxl/core/pci.c
>
> [ ... ]
>
> > +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> > + unsigned int scale,
> > + unsigned int base)
> > +{
> > + static const unsigned long scale_tbl[] = {
> > + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> > + };
> > +
> > + if (scale >= ARRAY_SIZE(scale_tbl)) {
> > + dev_dbg(dev, "Invalid BI commit timeout scale: %u\n", scale);
> > + return CXL_BI_COMMIT_MAXTMO_US;
> > + }
> > +
> > + return scale_tbl[scale] * base;
> > +}
>
> [Severity: High]
> If the hardware provides a base of 0, could this return a 0 timeout?
>
Yes, if base is 0, "readx_poll_timeout()" will never timeout
and thus hang there forever.
"tmo_us = min(5s, 0)" will result in 0, making tmo_us == 0
> [ ... ]
>
> > +static int __cxl_bi_wait_commit(struct device *dev, void __iomem *status_reg,
> > + u32 committed_bit, u32 err_bit,
> > + unsigned int scale, unsigned int base)
> > +{
> > + unsigned long tmo_us, poll_us;
> > + ktime_t start;
> > + u32 status;
> > + int rc;
> > +
> > + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US,
> > + __cxl_bi_get_timeout_us(dev, scale, base));
> > + poll_us = max_t(unsigned long, tmo_us / 10, 1); /* ~10% */
> > + start = ktime_get();
> > +
> > + rc = readx_poll_timeout(readl, status_reg, status,
> > + status & (committed_bit | err_bit),
> > + poll_us, tmo_us);
>
> [Severity: High]
> Does a tmo_us of 0 passed to readx_poll_timeout cause it to poll
> indefinitely?
>
> If a faulty or malicious device sets base to 0 and never sets the commit
> or error bits, this looks like it might result in a kernel thread spinning
> forever.
>
Acked on this.
We should consider this situation.
> [ ... ]
>
> > +static int __cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> > +{
> > + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> > + void __iomem *bi = dport->regs.bi_decoder;
> > + struct cxl_port *port = dport->port;
> > + u32 ctrl, value;
> > + int rc;
> > +
> > + guard(device)(&port->dev);
> > + if (!bi)
> > + return -EINVAL;
> > +
> > + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> > +
> > + switch (pci_pcie_type(pdev)) {
> > + case PCI_EXP_TYPE_ROOT_PORT:
> > + if (enable) {
> > + /*
> > + * There is no point of failure from here on,
> > + * BI will be enabled on the endpoint device.
> > + */
> > + dport->nr_bi++;
>
> [ ... ]
>
> > + case PCI_EXP_TYPE_DOWNSTREAM:
> > + if (enable) {
> > + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> > + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + } else {
> > + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> > + return 0;
> > + value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
> > + CXL_BI_DECODER_CTRL_BI_ENABLE);
> > + }
>
> [Severity: High]
> Should this case maintain a reference count for the downstream port similar
> to how dport->nr_bi is used in the PCI_EXP_TYPE_ROOT_PORT case?
>
> In cascaded switch topologies, an upstream switch's downstream port can be
> shared by multiple downstream endpoints. If one endpoint is unbound,
> cxl_bi_dealloc will clear the BI_ENABLE bit on the shared downstream port
> unconditionally.
>
> Could this silently break Back-Invalidate for all other active endpoints
> relying on that same path?
>
> [ ... ]
>
> > +static int __cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> > +{
> > + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> > + void __iomem *bi = endpoint->regs.bi_decoder;
> > + u32 ctrl, val;
> > +
> > + if (!bi)
> > + return -EINVAL;
> > +
> > + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> > +
> > + if (enable) {
> > + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> > + if (cxlds->bi)
> > + return 0;
> > + dev_err(cxlds->dev,
> > + "BI already enabled in hardware\n");
> > + return -EBUSY;
> > + }
> > + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + } else {
> > + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> > + if (!cxlds->bi)
> > + return 0;
> > + dev_err(cxlds->dev,
> > + "BI already disabled in hardware\n");
> > + return -EBUSY;
> > + }
> > + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> > + }
> > +
> > + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> > + cxlds->bi = enable;
> > +
> > + dev_dbg(cxlds->dev, "BI requests %s\n",
> > + str_enabled_disabled(enable));
> > +
> > + return 0;
> > +}
>
> [Severity: High]
> Is an explicit commit sequence required here after writing the control
> register?
>
> Unlike __cxl_bi_ctrl_dport which calls __cxl_bi_commit_decoder after
> modifying the control register, the endpoint function does not commit
> the changes.
>
> If the endpoint's BI decoder requires an explicit commit, it seems the enable
> or disable operation might not take effect on the hardware.
>
> [ ... ]
>
> > +static void cxl_bi_dealloc(void *data)
> > +{
> > + struct cxl_dev_state *cxlds = data;
> > + struct cxl_dport *dport_iter, *dport;
> > + struct cxl_port *port_iter;
> > +
> > + if (!dev_is_pci(cxlds->dev))
> > + return;
> > +
> > + struct cxl_port *port __free(put_cxl_port) =
> > + cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
> > + if (!port || !cxlds->bi)
> > + return;
> > +
> > + scoped_guard(rwsem_read, &cxl_rwsem.region)
> > + cxl_bi_ctrl_endpoint_disable(cxlds);
>
> [Severity: High]
> Does this early return prevent proper endpoint cleanup if the port topology
> is torn down before the memdev teardown?
>
> If cxl_pci_find_port returns NULL because the upstream port was removed
> or unbound, cxl_bi_ctrl_endpoint_disable is skipped. This leaves the
> endpoint hardware with BI_ENABLE set and cxlds->bi equal to true.
>
> If the device is subsequently rebound, cxl_bi_ctrl_endpoint_enable will see
> the hardware bit already set, print 'BI already enabled in hardware',
> and fail the probe with -EBUSY.
>
> > +
> > + port_iter = port;
> > + dport_iter = dport;
> > + while (!is_cxl_root(port_iter)) {
> > + int rc = cxl_bi_ctrl_dport_disable(dport_iter);
> > +
> > + /* best effort */
> > + if (rc)
> > + dev_dbg(&port_iter->dev,
> > + "BI dport disable failed: %d\n", rc);
> > +
> > + dport_iter = port_iter->parent_dport;
> > + port_iter = dport_iter->port;
> > + }
> > +}
>
> [Severity: High]
> Does this loop terminate before disabling BI on the Root Port?
>
> In the CXL topology, the Root Port is a downstream port (dport) owned by
> the CXL Root or Host Bridge (port). When the traversal reaches the Host
> Bridge, is_cxl_root(port_iter) evaluates to true and the loop terminates
> before executing the loop body for the Root Port's dport.
>
> [ ... ]
>
> > +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> > +{
> > + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> > + struct cxl_dport *dport_iter, *dport, *failed;
> > + struct cxl_port *port_iter;
> > + struct pci_dev *pdev;
> > + int rc;
> > +
> > + if (!dev_is_pci(cxlds->dev))
> > + return 0;
> > +
> > + pdev = to_pci_dev(cxlds->dev);
> > + struct cxl_port *port __free(put_cxl_port) =
> > + cxl_pci_find_port(pdev, &dport);
> > +
> > + if (!port)
> > + return -EINVAL;
>
> [ ... ]
>
> > + port_iter = port;
> > + dport_iter = dport;
> > + while (!is_cxl_root(port_iter)) {
> > + rc = cxl_bi_ctrl_dport_enable(dport_iter);
> > + if (rc)
> > + goto err_rollback;
>
> [Severity: Medium]
> Is it safe to mix goto-based error handling with __free scope cleanup
> attributes in the same function?
>
> The Cleanup and Guard Subsystem Details explicitly forbid mixing goto-based
> cleanup labels and scope-based __free declarations within the same function
> due to the potential for confusing ownership semantics and double-free bugs.
>
> [ ... ]
>
> > +err_rollback:
> > + /*
> > + * Undo all dports enabled so far, including the failed one.
> > + */
> > + failed = dport_iter;
> > + dport_iter = dport;
> > + port_iter = port;
> > + while (!is_cxl_root(port_iter)) {
> > + cxl_bi_ctrl_dport_disable(dport_iter);
> > + if (dport_iter == failed)
> > + break;
> > + dport_iter = port_iter->parent_dport;
> > + port_iter = dport_iter->port;
> > + }
> > + return rc;
> > +}
>
> [Severity: High]
> Does this rollback loop also skip the Root Port because of the
> is_cxl_root check, similar to the teardown loop?
>
> --
> Sashiko AI review · https://sashiko.dev/#/patchset/20260605003329.2584012-1-dave@stgolabs.net?part=2
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/5] cxl/pci: Add BI topology enable/disable
2026-06-05 0:33 ` [PATCH 2/5] cxl/pci: Add BI topology enable/disable Davidlohr Bueso
2026-06-05 1:29 ` sashiko-bot
@ 2026-06-05 2:23 ` Richard Cheng
1 sibling, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:23 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, djbw, jic23, benjamin.cheatham, alucerop,
alison.schofield, gourry, dongjoo.seo1, linux-cxl
On Thu, Jun 04, 2026 at 05:33:26PM +0800, Davidlohr Bueso wrote:
> Implement cxl_bi_setup(), which walks the CXL port topology to enable
> BI flows on the device and every component in the path between the
> device and the root port. The walk is two-pass: first to verify each
> component advertises BI capability, then to write BI Enable on each
> dport's BI Decoder Control register and on the device. On failure
> during the enable pass, partially-enabled dports are rolled back.
>
> The teardown counterpart, cxl_bi_dealloc(), runs automatically on
> cxl_mem unbind.
>
> cxl_bi_setup() does not touch HDM decoder configuration -- the BI bit
> in each decoder is programmed separately at commit time. After a
> successful setup the device is BI-capable but not yet using BI for
> decode coherence.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/pci.c | 365 +++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxl.h | 28 ++++
> drivers/cxl/mem.c | 4 +
> include/cxl/cxl.h | 2 +
> 4 files changed, 399 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d1f487b3d809..291d4f617e71 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -2,11 +2,13 @@
> /* Copyright(c) 2021 Intel Corporation. All rights reserved. */
> #include <linux/units.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/iopoll.h>
> #include <linux/device.h>
> #include <linux/delay.h>
> #include <linux/pci.h>
> #include <linux/pci-doe.h>
> #include <linux/aer.h>
> +#include <linux/string_choices.h>
> #include <cxlpci.h>
> #include <cxlmem.h>
> #include <cxl.h>
> @@ -926,3 +928,366 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
>
> return ctx.count;
> }
> +
> +/*
> + * BI requires 256B Flit operation on the link. RP/DSP/endpoint must
> + * also have the BI Decoder cap mapped (@bi); for USPs the BI RT cap
> + * is optional per CXL 4.0 8.2.4.26, so absent @bi is allowed.
> + */
> +static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
> +{
> + if (!cxl_pci_flit_256(pdev))
> + return false;
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
> + dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
> + return false;
> + }
> + return true;
> +}
> +
> +/* limit any insane timeouts from hw */
> +#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
> +
> +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> + unsigned int scale,
> + unsigned int base)
> +{
> + static const unsigned long scale_tbl[] = {
> + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> + };
> +
> + if (scale >= ARRAY_SIZE(scale_tbl)) {
> + dev_dbg(dev, "Invalid BI commit timeout scale: %u\n", scale);
> + return CXL_BI_COMMIT_MAXTMO_US;
> + }
> +
> + return scale_tbl[scale] * base;
> +}
> +
> +static int __cxl_bi_wait_commit(struct device *dev, void __iomem *status_reg,
> + u32 committed_bit, u32 err_bit,
> + unsigned int scale, unsigned int base)
> +{
> + unsigned long tmo_us, poll_us;
> + ktime_t start;
> + u32 status;
> + int rc;
> +
> + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US,
> + __cxl_bi_get_timeout_us(dev, scale, base));
> + poll_us = max_t(unsigned long, tmo_us / 10, 1); /* ~10% */
> + start = ktime_get();
> +
> + rc = readx_poll_timeout(readl, status_reg, status,
> + status & (committed_bit | err_bit),
> + poll_us, tmo_us);
> + if (rc) {
> + dev_dbg(dev, "BI-ID commit timed out (%luus)\n", tmo_us);
> + return rc; /* -ETIMEDOUT */
> + }
> +
> + if (status & err_bit) {
> + dev_dbg(dev, "BI-ID commit rejected by hardware\n");
> + return -EIO;
> + }
> +
> + dev_dbg(dev, "BI-ID commit wait took %lluus\n",
> + ktime_to_us(ktime_sub(ktime_get(), start)));
> + return 0;
> +}
> +
> +/* BI RT only exists on switch upstream ports. */
> +static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
> +{
> + u32 status, ctrl;
> + unsigned int scale, base;
> +
> + if (!bi)
> + return 0;
> +
> + if (!FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_RT_CAPS_OFFSET)))
> + return 0;
> +
> + ctrl = readl(bi + CXL_BI_RT_CTRL_OFFSET);
> + writel(ctrl & ~CXL_BI_RT_CTRL_BI_COMMIT, bi + CXL_BI_RT_CTRL_OFFSET);
> + writel(ctrl | CXL_BI_RT_CTRL_BI_COMMIT, bi + CXL_BI_RT_CTRL_OFFSET);
> +
> + status = readl(bi + CXL_BI_RT_STATUS_OFFSET);
> + scale = FIELD_GET(CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE, status);
> + base = FIELD_GET(CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE, status);
> +
> + return __cxl_bi_wait_commit(dev, bi + CXL_BI_RT_STATUS_OFFSET,
> + CXL_BI_RT_STATUS_BI_COMMITTED,
> + CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED,
> + scale, base);
> +}
> +
> +static int __cxl_bi_commit_decoder(struct device *dev, void __iomem *bi)
> +{
> + u32 status, ctrl;
> + unsigned int scale, base;
> +
> + if (!bi)
> + return -EINVAL;
> +
> + if (!FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_DECODER_CAPS_OFFSET)))
> + return 0;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> + writel(ctrl & ~CXL_BI_DECODER_CTRL_BI_COMMIT,
> + bi + CXL_BI_DECODER_CTRL_OFFSET);
> + writel(ctrl | CXL_BI_DECODER_CTRL_BI_COMMIT,
> + bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + status = readl(bi + CXL_BI_DECODER_STATUS_OFFSET);
> + scale = FIELD_GET(CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE, status);
> + base = FIELD_GET(CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE, status);
> +
> + return __cxl_bi_wait_commit(dev, bi + CXL_BI_DECODER_STATUS_OFFSET,
> + CXL_BI_DECODER_STATUS_BI_COMMITTED,
> + CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED,
> + scale, base);
> +}
> +
> +/* Enable or dealloc BI-ID changes in the given level of the topology. */
> +static int __cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> +{
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> + void __iomem *bi = dport->regs.bi_decoder;
> + struct cxl_port *port = dport->port;
> + u32 ctrl, value;
> + int rc;
> +
> + guard(device)(&port->dev);
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + dport->nr_bi++;
> +
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl) &&
> + !FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> +
> + value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
> + value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (WARN_ON_ONCE(dport->nr_bi == 0))
> + return -EINVAL;
> + if (--dport->nr_bi > 0)
> + return 0;
> +
> + value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
> + CXL_BI_DECODER_CTRL_BI_ENABLE);
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + return 0;
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + if (enable) {
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> + value = ctrl & ~(CXL_BI_DECODER_CTRL_BI_FW |
> + CXL_BI_DECODER_CTRL_BI_ENABLE);
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + rc = __cxl_bi_commit_decoder(dport->dport_dev, bi);
> + if (rc)
> + return rc;
> +
> + return __cxl_bi_commit_rt(&port->dev, port->regs.bi_rt);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int cxl_bi_ctrl_dport_enable(struct cxl_dport *dport)
> +{
> + return __cxl_bi_ctrl_dport(dport, true);
> +}
> +
> +static int cxl_bi_ctrl_dport_disable(struct cxl_dport *dport)
> +{
> + return __cxl_bi_ctrl_dport(dport, false);
> +}
> +
> +static int __cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> +{
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + void __iomem *bi = endpoint->regs.bi_decoder;
> + u32 ctrl, val;
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (enable) {
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + if (cxlds->bi)
> + return 0;
> + dev_err(cxlds->dev,
> + "BI already enabled in hardware\n");
> + return -EBUSY;
> + }
> + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + if (!cxlds->bi)
> + return 0;
> + dev_err(cxlds->dev,
> + "BI already disabled in hardware\n");
> + return -EBUSY;
> + }
> + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + cxlds->bi = enable;
> +
> + dev_dbg(cxlds->dev, "BI requests %s\n",
> + str_enabled_disabled(enable));
> +
> + return 0;
> +}
> +
> +static int cxl_bi_ctrl_endpoint_enable(struct cxl_dev_state *cxlds)
> +{
> + return __cxl_bi_ctrl_endpoint(cxlds, true);
> +}
> +
> +static int cxl_bi_ctrl_endpoint_disable(struct cxl_dev_state *cxlds)
> +{
> + return __cxl_bi_ctrl_endpoint(cxlds, false);
> +}
> +
> +/*
> + * devm teardown on cxl_mem unbind. Endpoint decoders may still be
> + * committed here (cxl_workqueue tears them down asynchronously), but
> + * memory access has been quiesced.
> + */
> +static void cxl_bi_dealloc(void *data)
> +{
> + struct cxl_dev_state *cxlds = data;
> + struct cxl_dport *dport_iter, *dport;
> + struct cxl_port *port_iter;
> +
> + if (!dev_is_pci(cxlds->dev))
> + return;
> +
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
> + if (!port || !cxlds->bi)
> + return;
> +
> + scoped_guard(rwsem_read, &cxl_rwsem.region)
> + cxl_bi_ctrl_endpoint_disable(cxlds);
> +
> + port_iter = port;
> + dport_iter = dport;
> + while (!is_cxl_root(port_iter)) {
> + int rc = cxl_bi_ctrl_dport_disable(dport_iter);
> +
> + /* best effort */
> + if (rc)
> + dev_dbg(&port_iter->dev,
> + "BI dport disable failed: %d\n", rc);
> +
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> +}
> +
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + struct cxl_dport *dport_iter, *dport, *failed;
> + struct cxl_port *port_iter;
> + struct pci_dev *pdev;
> + int rc;
> +
> + if (!dev_is_pci(cxlds->dev))
> + return 0;
> +
> + pdev = to_pci_dev(cxlds->dev);
> + struct cxl_port *port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &dport);
> +
> + if (!port)
> + return -EINVAL;
> +
> + if (!cxl_is_bi_capable(pdev, endpoint->regs.bi_decoder))
> + return 0;
> +
> + /* walkup the topology twice, first to check, then to enable */
> + port_iter = port;
> + dport_iter = dport;
> + while (!is_cxl_root(port_iter)) {
> + /* check rp, dsp */
> + if (!cxl_is_bi_capable(to_pci_dev(dport_iter->dport_dev),
> + dport_iter->regs.bi_decoder))
> + return -EINVAL;
> +
> + /* check usp */
> + if (dev_is_pci(port_iter->uport_dev) &&
> + pci_pcie_type(to_pci_dev(port_iter->uport_dev)) ==
> + PCI_EXP_TYPE_UPSTREAM)
> + if (!cxl_is_bi_capable(to_pci_dev(port_iter->uport_dev),
> + port_iter->regs.bi_rt))
> + return -EINVAL;
> +
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> +
> + port_iter = port;
> + dport_iter = dport;
> + while (!is_cxl_root(port_iter)) {
> + rc = cxl_bi_ctrl_dport_enable(dport_iter);
> + if (rc)
> + goto err_rollback;
> +
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> +
> + /* finally, enable BI on the device */
> + rc = cxl_bi_ctrl_endpoint_enable(cxlds);
> + if (rc)
> + goto err_rollback;
> +
> + return devm_add_action_or_reset(&cxlds->cxlmd->dev,
> + cxl_bi_dealloc, cxlds);
> +
> +err_rollback:
> + /*
> + * Undo all dports enabled so far, including the failed one.
> + */
> + failed = dport_iter;
> + dport_iter = dport;
> + port_iter = port;
> + while (!is_cxl_root(port_iter)) {
> + cxl_bi_ctrl_dport_disable(dport_iter);
> + if (dport_iter == failed)
> + break;
> + dport_iter = port_iter->parent_dport;
> + port_iter = dport_iter->port;
> + }
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 760f51b43891..89c0c6ce24a3 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -167,6 +167,31 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXL_HEADERLOG_SIZE SZ_512
> #define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
>
> +/* CXL 4.0 8.2.4.26 CXL BI Route Table Capability Structure */
> +#define CXL_BI_RT_CAPS_OFFSET 0x0
> +#define CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ BIT(0)
> +#define CXL_BI_RT_CTRL_OFFSET 0x4
> +#define CXL_BI_RT_CTRL_BI_COMMIT BIT(0)
> +#define CXL_BI_RT_STATUS_OFFSET 0x8
> +#define CXL_BI_RT_STATUS_BI_COMMITTED BIT(0)
> +#define CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
> +#define CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
> +#define CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
> +
> +/* CXL 4.0 8.2.4.27 CXL BI Decoder Capability Structure */
> +#define CXL_BI_DECODER_CAPS_OFFSET 0x0
> +#define CXL_BI_DECODER_CAPS_HDMD_CAP BIT(0)
> +#define CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ BIT(1)
> +#define CXL_BI_DECODER_CTRL_OFFSET 0x4
> +#define CXL_BI_DECODER_CTRL_BI_FW BIT(0)
> +#define CXL_BI_DECODER_CTRL_BI_ENABLE BIT(1)
> +#define CXL_BI_DECODER_CTRL_BI_COMMIT BIT(2)
> +#define CXL_BI_DECODER_STATUS_OFFSET 0x8
> +#define CXL_BI_DECODER_STATUS_BI_COMMITTED BIT(0)
> +#define CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
> +#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
> +#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
> +
> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> #define CXLDEV_CAP_ARRAY_CAP_ID 0
> @@ -630,6 +655,7 @@ struct cxl_rcrb_info {
> * @coord: access coordinates (bandwidth and latency performance attributes)
> * @link_latency: calculated PCIe downstream latency
> * @gpf_dvsec: Cached GPF port DVSEC
> + * @nr_bi: number of BI-enabled endpoints below this dport
> */
> struct cxl_dport {
> struct device *dport_dev;
> @@ -642,6 +668,7 @@ struct cxl_dport {
> struct access_coordinate coord[ACCESS_COORDINATE_MAX];
> long link_latency;
> int gpf_dvsec;
> + int nr_bi;
> };
>
> /**
> @@ -906,6 +933,7 @@ void cxl_coordinates_combine(struct access_coordinate *out,
> struct access_coordinate *c2);
>
> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> +int cxl_bi_setup(struct cxl_dev_state *cxlds);
> struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
> struct device *dport_dev);
>
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index fcffe24dcb42..f2d01a218895 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -150,6 +150,10 @@ static int cxl_mem_probe(struct device *dev)
> if (rc)
> dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
>
> + rc = cxl_bi_setup(cxlds);
> + if (rc)
> + dev_dbg(dev, "BI setup failed rc=%d\n", rc);
> +
I don't think dynamic debug here is enough, we probe the device
and the error might not surface, HDM-DB might fails later at region attach
and we lack of traces. Maybe dev_warn() would be better for this kind of
non-recoverable errors.
What do you think ?
Best regards,
Richard Cheng.
> /*
> * The kernel may be operating out of CXL memory on this device,
> * there is no spec defined way to determine whether this device
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index 8ce9b4e9ca73..4938e93edb63 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -168,6 +168,7 @@ struct cxl_dpa_partition {
> * @regs: Parsed register blocks
> * @cxl_dvsec: Offset to the PCIe device DVSEC
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> + * @bi: device is BI (Back-Invalidate) enabled
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> * @part: DPA partition array
> @@ -187,6 +188,7 @@ struct cxl_dev_state {
> struct cxl_device_regs regs;
> int cxl_dvsec;
> bool rcd;
> + bool bi;
> bool media_ready;
> struct resource dpa_res;
> struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
> --
> 2.39.5
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/5] cxl: Add HDM-DB region creation
2026-06-05 0:33 ` [PATCH 4/5] cxl: Add HDM-DB region creation Davidlohr Bueso
2026-06-05 1:34 ` sashiko-bot
@ 2026-06-05 2:37 ` Richard Cheng
1 sibling, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:37 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, djbw, jic23, benjamin.cheatham, alucerop,
alison.schofield, gourry, dongjoo.seo1, linux-cxl
On Thu, Jun 04, 2026 at 05:33:28PM +0800, Davidlohr Bueso wrote:
> A region inherits its coherency from the chosen root decoder: HDM-DB
> if the root has CXL_DECODER_F_BI, otherwise HDM-H.
>
> Surface the topology through read-only sysfs:
>
> - decoderX.Y/cap_bi (root): CFMWS BI restriction.
> - decoderX.Y/bi (endpoint): '1' when configured for HDM-DB.
>
> cxl_region_attach() rejects endpoints whose device or HDM cannot
> serve the region's type; target_type is inherited from cxlr->type
> in cxl_rr_assign_decoder().
>
> The HDM Decoder Control BI bit is set at commit time only when the
> target_type is DEVMEM and the BI capability is advertised
> (cxlds->bi for endpoints, root F_BI for switches).
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 18 +++++++--
> drivers/cxl/acpi.c | 11 +++++-
> drivers/cxl/core/hdm.c | 14 +++++++
> drivers/cxl/core/port.c | 29 ++++++++++++--
> drivers/cxl/core/region.c | 51 +++++++++++++++++--------
> drivers/cxl/cxl.h | 1 +
> 6 files changed, 100 insertions(+), 24 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 16a9b3d2e2c0..f631a94f098d 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> +++ b/Documentation/ABI/testing/sysfs-bus-cxl
> @@ -297,7 +297,7 @@ Description:
> Each entry in the list is a dport id.
>
>
> -What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
> +What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3,bi}
> Date: June, 2021
> KernelVersion: v5.14
> Contact: linux-cxl@vger.kernel.org
> @@ -306,8 +306,9 @@ Description:
> represents a fixed memory window identified by platform
> firmware. A fixed window may only support a subset of memory
> types. The 'cap_*' attributes indicate whether persistent
> - memory, volatile memory, accelerator memory, and / or expander
> - memory may be mapped behind this decoder's memory window.
> + memory, volatile memory, accelerator memory, expander memory,
> + and / or back-invalidate (HDM-DB) memory may be mapped behind
> + this decoder's memory window.
>
>
> What: /sys/bus/cxl/devices/decoderX.Y/target_type
> @@ -426,6 +427,17 @@ Description:
> current cached value.
>
>
> +What: /sys/bus/cxl/devices/decoderX.Y/bi
> +Date: June, 2026
> +KernelVersion: v7.2
> +Contact: linux-cxl@vger.kernel.org
> +Description:
> + (RO) Shows '1' if this endpoint decoder is currently configured
> + for HDM-DB (device-managed coherency with back-invalidate).
> + The HDM-DB state is inherited from the region the decoder is
> + attached to, which is in turn set from the chosen root
> + decoder's CFMWS BI restriction (see cap_bi).
> +
> What: /sys/bus/cxl/devices/decoderX.Y/delete_region
> Date: May, 2022
> KernelVersion: v6.0
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 127537628817..6a9ba82719e9 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -152,6 +152,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
> flags |= CXL_DECODER_F_PMEM;
> if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED)
> flags |= CXL_DECODER_F_LOCK;
> + if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)
> + flags |= CXL_DECODER_F_BI;
>
> return flags;
> }
> @@ -198,6 +200,12 @@ static int cxl_acpi_cfmws_verify(struct device *dev,
> dev_dbg(dev, "CFMWS length %d greater than expected %d\n",
> cfmws->header.length, expected_len);
>
> + if ((cfmws->restrictions & ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM) &&
> + (cfmws->restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)) {
> + dev_err(dev, "CFMWS cannot have both HDM-H and HDM-DB\n");
> + return -EINVAL;
> + }
> +
> return 0;
> }
>
> @@ -437,7 +445,8 @@ static int __cxl_parse_cfmws(struct acpi_cedt_cfmws *cfmws,
>
> cxld = &cxlrd->cxlsd.cxld;
> cxld->flags = cfmws_to_decoder_flags(cfmws->restrictions);
> - cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> + cxld->target_type = (cxld->flags & CXL_DECODER_F_TYPE2) ?
> + CXL_DECODER_DEVMEM : CXL_DECODER_HOSTONLYMEM;
> cxld->hpa_range = (struct range) {
> .start = cfmws->base_hpa,
> .end = cfmws->base_hpa + cfmws->window_size - 1,
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 4a41cde9fdba..70c797c77b70 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -705,9 +705,23 @@ static void cxld_set_interleave(struct cxl_decoder *cxld, u32 *ctrl)
>
> static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
> {
> + bool bi = cxld->target_type == CXL_DECODER_DEVMEM;
> +
> + if (is_endpoint_decoder(&cxld->dev)) {
> + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(&cxld->dev);
> + struct cxl_dev_state *cxlds = cxled_to_memdev(cxled)->cxlds;
> +
> + bi = bi && cxlds->bi;
> + } else if (cxld->region) {
> + struct cxl_root_decoder *cxlrd = cxld->region->cxlrd;
> +
> + bi = bi && (cxlrd->cxlsd.cxld.flags & CXL_DECODER_F_BI);
> + }
> +
> u32p_replace_bits(ctrl,
> !!(cxld->target_type == CXL_DECODER_HOSTONLYMEM),
> CXL_HDM_DECODER0_CTRL_HOSTONLY);
> + u32p_replace_bits(ctrl, bi, CXL_HDM_DECODER0_CTRL_BI);
> }
>
> static void cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index c5aacd7054f1..2b463a8b06a7 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -131,6 +131,7 @@ CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
> CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
> CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
> CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
> +CXL_DECODER_FLAG_ATTR(cap_bi, CXL_DECODER_F_BI);
>
> static ssize_t target_type_show(struct device *dev,
> struct device_attribute *attr, char *buf)
> @@ -233,6 +234,18 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
> }
> static DEVICE_ATTR_RW(mode);
>
> +static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> +
> + return sysfs_emit(buf, "%d\n", cxlds->bi && cxled->cxld.region &&
> + cxled->cxld.target_type == CXL_DECODER_DEVMEM);
> +}
> +static DEVICE_ATTR_RO(bi);
> +
> static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> @@ -329,6 +342,7 @@ static struct attribute *cxl_decoder_root_attrs[] = {
> &dev_attr_cap_ram.attr,
> &dev_attr_cap_type2.attr,
> &dev_attr_cap_type3.attr,
> + &dev_attr_cap_bi.attr,
> &dev_attr_target_list.attr,
> &dev_attr_qos_class.attr,
> SET_CXL_REGION_ATTR(create_pmem_region)
> @@ -339,16 +353,22 @@ static struct attribute *cxl_decoder_root_attrs[] = {
>
> static bool can_create_pmem(struct cxl_root_decoder *cxlrd)
> {
> - unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
> + unsigned long flags = cxlrd->cxlsd.cxld.flags;
> + unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_PMEM;
> + unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
> + CXL_DECODER_F_PMEM;
>
> - return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> + return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
> }
>
> static bool can_create_ram(struct cxl_root_decoder *cxlrd)
> {
> - unsigned long flags = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> + unsigned long flags = cxlrd->cxlsd.cxld.flags;
> + unsigned long hdm_h = CXL_DECODER_F_TYPE3 | CXL_DECODER_F_RAM;
> + unsigned long hdm_db = CXL_DECODER_F_TYPE2 | CXL_DECODER_F_BI |
> + CXL_DECODER_F_RAM;
>
> - return (cxlrd->cxlsd.cxld.flags & flags) == flags;
> + return (flags & hdm_h) == hdm_h || (flags & hdm_db) == hdm_db;
> }
>
> static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
> @@ -402,6 +422,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
> static struct attribute *cxl_decoder_endpoint_attrs[] = {
> &dev_attr_target_type.attr,
> &dev_attr_mode.attr,
> + &dev_attr_bi.attr,
> &dev_attr_dpa_size.attr,
> &dev_attr_dpa_resource.attr,
> SET_CXL_REGION_ATTR(region)
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e50dc716d4e8..3f9ddacfae4e 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -1129,17 +1129,6 @@ static int cxl_rr_assign_decoder(struct cxl_port *port, struct cxl_region *cxlr,
> return -EBUSY;
> }
>
> - /*
> - * Endpoints should already match the region type, but backstop that
> - * assumption with an assertion. Switch-decoders change mapping-type
> - * based on what is mapped when they are assigned to a region.
> - */
> - dev_WARN_ONCE(&cxlr->dev,
> - port == cxled_to_port(cxled) &&
> - cxld->target_type != cxlr->type,
> - "%s:%s mismatch decoder type %d -> %d\n",
> - dev_name(&cxled_to_memdev(cxled)->dev),
> - dev_name(&cxld->dev), cxld->target_type, cxlr->type);
> cxld->target_type = cxlr->type;
> cxl_rr->decoder = cxld;
> return 0;
> @@ -2034,6 +2023,7 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> struct cxl_region_params *p = &cxlr->params;
> struct cxl_port *ep_port, *root_port;
> struct cxl_dport *dport;
> + struct cxl_hdm *cxlhdm;
> int rc = -ENXIO;
>
> rc = check_interleave_cap(&cxled->cxld, p->interleave_ways,
> @@ -2083,10 +2073,31 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> return -ENXIO;
> }
>
> - if (cxled->cxld.target_type != cxlr->type) {
> - dev_dbg(&cxlr->dev, "%s:%s type mismatch: %d vs %d\n",
> - dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> - cxled->cxld.target_type, cxlr->type);
> + /*
> + * Verify the device and HDM are capable of the region's flavor before
> + * proceeding. The endpoint decoder's target_type is then inherited
> + * from cxlr->type later in cxl_rr_assign_decoder().
> + */
> + if (cxlr->type == CXL_DECODER_DEVMEM &&
> + cxlds->type == CXL_DEVTYPE_CLASSMEM && !cxlds->bi) {
> + dev_err(&cxlr->dev, "%s:%s BI not enabled on device\n",
> + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
> + return -ENXIO;
> + }
Hi Davidlohr,
This might not be a hole today, but CXL_DEVTYPE_DEVMEM type-2 device
will bypass this part and commit HDM-D in a BI-restricted window, Type-2
will never have cxlds->bi set.
Maybe something like "DEVMEM" regions on F_BI windows, require cxlds->bi
for all device types.
> +
> + cxlhdm = dev_get_drvdata(&ep_port->dev);
> + if (!cxlhdm)
> + return -ENXIO;
> + if (cxlr->type == CXL_DECODER_HOSTONLYMEM &&
> + cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV) {
> + dev_dbg(&cxlr->dev, "%s:%s HDM is device-coherent only\n",
> + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
> + return -ENXIO;
> + }
> + if (cxlr->type == CXL_DECODER_DEVMEM &&
> + cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_HOST) {
> + dev_dbg(&cxlr->dev, "%s:%s HDM is host-only coherent\n",
> + dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
> return -ENXIO;
> }
>
> @@ -2772,6 +2783,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
> size_t len, enum cxl_partition_mode mode)
> {
> struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
> + enum cxl_decoder_type target_type;
> struct cxl_region *cxlr;
> int rc, id;
>
> @@ -2779,7 +2791,14 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
> if (rc != 1)
> return -EINVAL;
>
> - cxlr = __create_region(cxlrd, mode, id, CXL_DECODER_HOSTONLYMEM);
> + /*
> + * The CFMWS dictates endpoint coherency: a BI-restricted window
> + * produces an HDM-DB region; otherwise HDM-H.
> + */
> + target_type = (cxlrd->cxlsd.cxld.flags & CXL_DECODER_F_BI) ?
> + CXL_DECODER_DEVMEM : CXL_DECODER_HOSTONLYMEM;
> +
> + cxlr = __create_region(cxlrd, mode, id, target_type);
> if (IS_ERR(cxlr))
> return PTR_ERR(cxlr);
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 917a6aee5936..73d4d2d4616d 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -287,6 +287,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
> #define CXL_DECODER_F_LOCK BIT(4)
> #define CXL_DECODER_F_ENABLE BIT(5)
> #define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
> +#define CXL_DECODER_F_BI BIT(7)
> #define CXL_DECODER_F_RESET_MASK (CXL_DECODER_F_ENABLE | CXL_DECODER_F_LOCK)
>
> enum cxl_decoder_type {
> --
> 2.39.5
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 0/5] cxl: Support Back-Invalidate
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
` (4 preceding siblings ...)
2026-06-05 0:33 ` [PATCH 5/5] cxl/hdm: Rename decoder coherency flags Davidlohr Bueso
@ 2026-06-05 2:38 ` Richard Cheng
5 siblings, 0 replies; 15+ messages in thread
From: Richard Cheng @ 2026-06-05 2:38 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, djbw, jic23, benjamin.cheatham, alucerop,
alison.schofield, gourry, dongjoo.seo1, linux-cxl
On Thu, Jun 04, 2026 at 05:33:24PM +0800, Davidlohr Bueso wrote:
> Hello,
>
> Changes from v3 (https://lore.kernel.org/all/20260516022037.1761453-1-dave@stgolabs.net/):
> - In patch 1, map BI Decoder of root port/DSP at dport creation time
> in cxl_port_add_dport() (Ben).
> - In patch 2, BI rt/decoder commit logic to be a function instead
> of a macro approach. (Dave)
> - Patch 4 is reworked to have endpoint decoders use the coherency
> model based on the root decoder's cfmw restrictions (Dan).
> - Folded patch 5 into patch 2 (Dave)
> - Moved 'nr_bi' counter from port to dport (Sashiko).
> - Added a new patch 5 for a trivial flag rename.
>
Hi Davidlohr,
All the above patches lacks of v4 tag,
small nit pick but it confuses me at first,
maybe add the tags to every patches in the patch series.
Best regards,
Richard Cheng.
> This series is the initial plumbing to enable HDM-DB in Linux. The
> model allows Type 2 and Type 3 devices to expose their local memory
> to the host CPU in a coherent manner. In alignment with what was
> discussed at 2024 LPC type2 support session, this series takes the
> Type 3 memory expander approach, which is more direct. Further,
> afaik there is no Type 2 + BI hardware out there.
>
> A flagship use case of Type 3 + BI is coherent shared memory, and
> there is currently a big gap in this regard (ie: GFAM). Another
> is P2P via PCIe UIO, which is also lacking today. Media Operation
> (4402h) for ranged sanitize/zero also trigger snoops and there are
> some patches out there[0]. As such this series focuses on BI enablement
> in terms of discovery and configuration.
>
> The idea is for Type 3 memdevs and future Type 2 devices to make use
> of cxlds->bi when committing HDM decoders, such that different device
> coherence models can be differentiated as:
>
> type2 hdm-d: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == false
> type2 hdm-db: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == true
> type3 hdm-h: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == false
> type3 hdm-db: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == true
>
> Because ->bi becoming true does not depend on auto-committing upon HDM
> decoder enumeration (port driver), for now an already-committed BI
> decoder during enumeration is unsupported and the driver errors out.
>
> o Patch 1 probes BI capabilities during register discovery, and maps BI
> Decoder registers.
>
> o Patch 2 implements the BI-ID allocation (and deallocation).
>
> o Patch 3 adds coherency support for endpoint decoders based on what the
> root decoder restricts.
>
> o Patch 4 adds support for HDM-DB region creation.
>
> o Patch 5 trivially renames some coherency flags.
>
> Testing has been done with the qemu HDM-DB Type 3 counterpart + this series:
> https://lore.kernel.org/all/20260602231806.2469011-1-dave@stgolabs.net/
>
> 1. HDM Decoder with BI through ad-hoc region creation.
> ------------------------------------------------------
> # cxl list -D
> [
> {
> "decoder":"decoder0.0",
> "resource":4563402752,
> "size":10737418240,
> "interleave_ways":1,
> "accelmem_capable":true,
> "nr_targets":1
> }
> ]
>
> # ROOT=/sys/bus/cxl/devices/decoder0.0
> # EP=/sys/bus/cxl/devices/decoder2.0
> # echo region0 > $ROOT/create_ram_region
> # REG=/sys/bus/cxl/devices/region0
> # echo ram > $EP/mode
> # echo 0x40000000 > $EP/dpa_size
> # echo 1 > $REG/interleave_ways
> # echo 4096 > $REG/interleave_granularity
> # echo 0x40000000 > $REG/size
> # cat $EP/bi
> 0
> # echo decoder2.0 > $REG/target0
> [ 53.495224] cxl_core:cxl_port_attach_region:1290: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 next: none nr_eps:1
> [ 53.497758] cxl_core:cxl_port_attach_region:1290: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 0 next: mem0 nr_ep1
> [ 53.499892] cxl_core:cxl_port_setup_targets:1608: cxl region0: pci0000:0c:port1 iw: 1 ig: 4096
> [ 53.501233] cxl_core:cxl_port_setup_targets:1633: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem0:decoder2.0 @ 0
> [ 53.502985] cxl_core:cxl_calc_interleave_pos:2003: cxl_mem mem0: decoder:decoder2.0 parent:0000:0d:00.0 port:endpoint2 range:0x110000
> [ 53.504895] cxl_core:cxl_region_attach:2226: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxled->pos:0
> # cat $EP/bi
> 1
> # echo 1 > $REG/commit
> # cxl list -D
> [
> {
> "root decoders":[
> {
> "decoder":"decoder0.0",
> "resource":4563402752,
> "size":10737418240,
> "interleave_ways":1,
> "accelmem_capable":true,
> "nr_targets":1
> }
> ]
> },
> {
> "port decoders":[
> {
> "decoder":"decoder1.0",
> "resource":4563402752,
> "size":1073741824,
> "interleave_ways":1,
> "region":"region0",
> "nr_targets":1
> }
> ]
> },
> {
> "endpoint decoders":[
> {
> "decoder":"decoder2.0",
> "resource":4563402752,
> "size":1073741824,
> "interleave_ways":1,
> "region":"region0",
> "dpa_resource":0,
> "dpa_size":1073741824,
> "mode":"ram"
> }
> ]
> }
> ]
>
> 2. Type3 device does not support BI, but CFMW has BI restriction
> ----------------------------------------------------------------
> # echo decoder2.0 > $REG/target0
> [ 104.557797] cxl region0: mem0:decoder2.0 BI not enabled on device
> [ 104.561014] cxl_port endpoint2: failed to attach decoder2.0 to region0: -6
>
> 3. Type3 device supports BI but CFMW does not, create HDM-H region
> ------------------------------------------------------------------
> # cxl list -D
> [
> {
> "decoder":"decoder0.0",
> "resource":4563402752,
> "size":10737418240,
> "interleave_ways":1,
> "max_available_extent":10737418240,
> "pmem_capable":true,
> "volatile_capable":true,
> "nr_targets":1
> }
> ]
> # echo decoder2.0 > $REG/target0
> [ 124.712027] cxl_core:cxl_port_attach_region:1290: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 next: none nr_eps:1
> [ 124.714329] cxl_core:cxl_port_attach_region:1290: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 0 next: mem0 nr_ep1
> [ 124.716407] cxl_core:cxl_port_setup_targets:1608: cxl region0: pci0000:0c:port1 iw: 1 ig: 4096
> [ 124.718269] cxl_core:cxl_port_setup_targets:1633: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem0:decoder2.0 @ 0
> [ 124.722076] cxl_core:cxl_calc_interleave_pos:2003: cxl_mem mem0: decoder:decoder2.0 parent:0000:0d:00.0 port:endpoint2 range:0x110000
> [ 124.724454] cxl_core:cxl_region_attach:2226: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxled->pos:0
> # cxl list -D
> [
> {
> "root decoders":[
> {
> "decoder":"decoder0.0",
> "resource":4563402752,
> "size":10737418240,
> "interleave_ways":1,
> "max_available_extent":9663676416,
> "pmem_capable":true,
> "volatile_capable":true,
> "nr_targets":1
> }
> ]
> },
> {
> "port decoders":[
> {
> "decoder":"decoder1.0",
> "resource":4563402752,
> "size":1073741824,
> "interleave_ways":1,
> "region":"region0",
> "nr_targets":1
> }
> ]
> },
> {
> "endpoint decoders":[
> {
> "decoder":"decoder2.0",
> "resource":4563402752,
> "size":1073741824,
> "interleave_ways":1,
> "region":"region0",
> "dpa_resource":0,
> "dpa_size":1073741824,
> "mode":"ram"
> }
> ]
> }
> ]
>
>
> 3. Unbind + Bind
> ----------------
> # echo mem0 > /sys/bus/cxl/drivers/cxl_mem/unbind
> [ 64.553865] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0d:00.0: BI requests disabled
> [ 64.557464] cxl_core:cxl_port_commit_reap:898: cxl_port port1: reap: decoder1.0 commit_end: -1
> [ 64.559529] cxl_core:cxl_port_commit_reap:898: cxl_port endpoint2: reap: decoder2.0 commit_end: -1
> [ 64.563576] cxl_core:cxl_detach_ep:1541: cxl_mem mem0: disconnect mem0 from port1
>
> echo mem0 > /sys/bus/cxl/drivers/cxl_mem/bind
> [ 5073.975585] cxl_core:devm_cxl_enumerate_ports:1856: cxl_mem mem0: scan: iter: mem0 dport_dev: 0000:0c:00.0 parent: pci0000:0c
> [ 5073.978125] cxl_core:devm_cxl_enumerate_ports:1862: cxl_mem mem0: found already registered port port1:pci0000:0c
> [ 5073.980013] cxl_core:cxl_port_alloc:752: cxl_mem mem0: host-bridge: pci0000:0c
> [ 5073.982601] cxl_core:cxl_cdat_get_length:488: cxl_port endpoint2: CDAT length 160
> [ 5073.988634] cxl_core:cxl_port_perf_data_calculate:207: cxl_port endpoint2: Failed to retrieve ep perf coordinates.
> [ 5073.991824] cxl_core:cxl_endpoint_parse_cdat:423: cxl_port endpoint2: Failed to do perf coord calculations.
> [ 5074.016377] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.0: range: 0x0-0xffffffffffffffff iw: 1 ig: 4096
> [ 5074.021643] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.0 added to endpoint2
> [ 5074.023072] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.1: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
> [ 5074.025527] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.1 added to endpoint2
> [ 5074.027751] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.2: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
> [ 5074.032182] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.2 added to endpoint2
> [ 5074.033535] cxl_core:init_hdm_decoder:1112: cxl_port endpoint2: decoder2.3: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
> [ 5074.035690] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.3 added to endpoint2
> [ 5074.036904] cxl_core:cxl_bus_probe:2281: cxl_port endpoint2: probe: 0
> [ 5074.038017] cxl_core:devm_cxl_add_port:960: cxl_mem mem0: endpoint2 added to port1
> [ 5074.039153] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0d:00.0: BI requests enabled
> [ 5074.040168] cxl_core:cxl_bus_probe:2281: cxl_mem mem0: probe: 0
>
>
> 4. Discovery behind a Switch
> ----------------------------
> # dmesg | grep BI
> [ 0.628477] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
> [ 0.737987] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
> [ 0.747784] cxl_port:cxl_port_map_bi:109: cxl_port port2: BI RT registers not found
> [ 0.755151] cxl_core:cxl_probe_component_regs:96: cxl_port port2: found BI RT capability (0xaa8)
> [ 0.790106] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
> [ 0.943920] cxl_core:__cxl_bi_wait_commit:994: pcieport 0000:0e:00.0: BI-ID commit wait took 101750us
> [ 0.944767] cxl_core:__cxl_bi_ctrl_endpoint:1163: cxl_pci 0000:0f:00.0: BI requests enabled
>
> 5. Corrupted hw (commit timeout handling)
> -----------------------------------------
> [ 0.617234] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
> [ 0.706959] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
> [ 0.718585] cxl_port:cxl_port_map_bi:109: cxl_port port2: BI RT registers not found
> [ 0.743932] cxl_core:cxl_probe_component_regs:96: cxl_port port2: found BI RT capability (0xaa8)
> [ 0.783715] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
> [ 2.994821] cxl_core:__cxl_bi_wait_commit:985: pcieport 0000:0e:00.0: BI-ID commit timed out (2000000us)
> [ 5.010606] cxl_core:__cxl_bi_wait_commit:985: pcieport 0000:0e:00.0: BI-ID commit timed out (2000000us)
> [ 5.013423] cxl_mem:cxl_mem_probe:155: cxl_mem mem0: BI setup failed rc=-110
>
> 6. Mixed Configurations (BI-capable Type 3 but DSP 68b)
> -------------------------------------------------------
> [ 0.837902] cxl_mem:cxl_mem_probe:155: cxl_mem mem0: BI setup failed rc=-22
>
>
> Applies against 'next' branch from cxl.git.
>
> [0] https://lore.kernel.org/linux-cxl/20260428200410.705675-1-dave@stgolabs.net/
>
> Thanks!
>
> Davidlohr Bueso (5):
> cxl: Add BI register probing and port initialization
> cxl/pci: Add BI topology enable/disable
> cxl/hdm: Add BI coherency support for endpoint decoders
> cxl: Add HDM-DB region creation
> cxl/hdm: Rename decoder coherency flags
>
> Documentation/ABI/testing/sysfs-bus-cxl | 18 +-
> drivers/cxl/acpi.c | 15 +-
> drivers/cxl/core/hdm.c | 38 ++-
> drivers/cxl/core/pci.c | 365 ++++++++++++++++++++++++
> drivers/cxl/core/port.c | 33 ++-
> drivers/cxl/core/region.c | 51 ++--
> drivers/cxl/core/regs.c | 14 +
> drivers/cxl/cxl.h | 45 ++-
> drivers/cxl/cxlmem.h | 2 +
> drivers/cxl/mem.c | 4 +
> drivers/cxl/port.c | 71 +++++
> include/cxl/cxl.h | 8 +
> 12 files changed, 631 insertions(+), 33 deletions(-)
>
> --
> 2.39.5
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-06-05 2:38 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05 0:33 [PATCH v4 0/5] cxl: Support Back-Invalidate Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 1/5] cxl: Add BI register probing and port initialization Davidlohr Bueso
2026-06-05 1:33 ` sashiko-bot
2026-06-05 2:03 ` Richard Cheng
2026-06-05 0:33 ` [PATCH 2/5] cxl/pci: Add BI topology enable/disable Davidlohr Bueso
2026-06-05 1:29 ` sashiko-bot
2026-06-05 2:11 ` Richard Cheng
2026-06-05 2:20 ` Richard Cheng
2026-06-05 2:23 ` Richard Cheng
2026-06-05 0:33 ` [PATCH 3/5] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
2026-06-05 0:33 ` [PATCH 4/5] cxl: Add HDM-DB region creation Davidlohr Bueso
2026-06-05 1:34 ` sashiko-bot
2026-06-05 2:37 ` Richard Cheng
2026-06-05 0:33 ` [PATCH 5/5] cxl/hdm: Rename decoder coherency flags Davidlohr Bueso
2026-06-05 2:38 ` [PATCH v4 0/5] cxl: Support Back-Invalidate Richard Cheng
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.