* [PATCH 0/6] cxl: Support Back-Invalidate
@ 2026-03-15 20:27 Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
` (5 more replies)
0 siblings, 6 replies; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Hello,
Changes from rfc (https://lore.kernel.org/all/20250812010228.2589787-1-dave@stgolabs.net/):
o Dropped rfc status and split the series into smaller patches.
o Renamed from toggle to ctrl + cleanup cxl_bi_ctrl_dport() by using a
switch statement + better variable declarations. (Jonathan)
o Introduce proper definitions for HDM supported coherence models. (Jonathan)
o Fixed a regs remapping (unbind+bind) issue.
o Added rollback logic if enabling BI on a component fails during the
hierarchy walk.
o Reworked BI regmaps.
o BI-ID deallocation via delete_endpoint() instead of in cxl_detach_ep().
The following is the initial plumbing to enabling HDM-DB in Linux. This model
allows type2 and type3 devices, to expose their local memory to the host CPU
in a coherent manner. In alignment with what was discussed at 2024 LPC type2
support session, this series takes the type3 memory expander approach, which
is more direct. Further, afaik there is no type2+BI hardware out there.
A flagship use case of type3 + BI is coherent shared memory, and there is
currently a big gap in this regard (ie: GFAM). Another one is P2P via PCIe UIO,
which is also lacking today. Media Operation (4402h) command for ranged
sanitize/zero also trigger snoops, but such cmd support is missing as well
in the driver (albeit this might be the easiest entry point). As such this
series focuses on BI enablement in terms of discovery and configuration.
This adds two interfaces around cxlds to allow the setup and deallocations of
BI-IDs. The idea is for type3 memdevs and future type2 devices to make use of
cxlds->bi when committing HDM decoders, such that different device coherence
models can be differentiated as:
type2 hdm-d: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == false
type2 hdm-db: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == true
type3 hdm-h: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == false
type3 hdm-db: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == true
Because ->bi becoming true does not depend on auto-committing upon HDM decoder
port/enumeration (port driver), for now this is set as unsupported and will error
out when initializing the HDM decoder that has its BI bit set.
o Patch 1 adds BI Decoder and BI Route Table register definitions, capability IDs,
and structures.
o Patch 2 probes BI capabilities during register discovery and maps BI Decoder
registers on dports and endpoints, and BI Route Table registers on upstream
switch ports.
o Patch 3 implements the BI-ID allocation and deallocation API. Both do the
required hierarchy iteration (bottom-up) ensuring it is safe to enable/disable
BI for every component.
o Patch 4 wires BI setup into cxl_mem_probe and teardown.
o Patch 5 introduces cxl_dpa_set_coherence() to switch between host-only and
device coherent modes, and programs the BI control bit at commit time.
o Patch 6 adds sysfs interfaces for creating BI-enabled region for memory
devices.
Testing has been done with the qemu HDM-DB type3 counterpart (now merged).
1. HDM Decoder with BI through ad-hoc region creation.
------------------------------------------------------
# cxl list -D
[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":10737418240,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
Program the endpoint decoder
# echo ram > /sys/bus/cxl/devices/decoder2.0/mode
# echo 1 > /sys/bus/cxl/devices/decoder2.0/bi
# echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
Create a region in the root decoder
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_bi_region
[ 22.730899] cxl_core:cxl_region_can_probe:4214: cxl_region region0: config state: 0
[ 22.732655] cxl_core:cxl_bus_probe:2334: cxl_region region0: probe: -6
[ 22.734200] cxl_core:devm_cxl_add_region:2636: cxl_acpi ACPI0017:00: decoder0.0: created region0
Configure the region with the same IG, IW the root and endpoint decoders
# echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
# echo 0x40000000 > /sys/bus/cxl/devices/region0/size
Link the endpoint decoder as a target in the region
# echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
[ 99.819398] cxl_core:cxl_port_attach_region:1244: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 n1
[ 99.821802] cxl_core:cxl_port_attach_region:1244: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 01
[ 99.823965] cxl_core:cxl_port_setup_targets:1562: cxl region0: pci0000:0c:port1 iw: 1 ig: 256
[ 99.825153] cxl_core:cxl_port_setup_targets:1587: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:01.0 for mem0:d0
[ 99.826715] cxl_core:cxl_calc_interleave_pos:1956: cxl_mem mem0: decoder:decoder2.0 parent:0000:0e:00.0 port:endpoi0
[ 99.828491] cxl_core:cxl_region_attach:2157: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxl0
Commit the changes
# echo 1 > /sys/bus/cxl/devices/region0/commit
[ 134.556073] cxl region0: Bypassing cpu_cache_invalidate_memregion() for testing!
non interleaved decoder 120000000 40000000 0
non interleaved decoder 120000000 40000000 0
# cxl list -D (tooling still needs updated to show decoder target type)
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":9395240960,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4831838208,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4831838208,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":1073741824,
"mode":"ram"
}
]
}
]
2. On a type3 HDM-DB volatile device, create one BI and one regular region.
--------------------------------------------------------------------------
# echo ram > /sys/bus/cxl/devices/decoder2.0/mode
# echo 1 > /sys/bus/cxl/devices/decoder2.0/bi
# echo 0x20000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_bi_region
# echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
# echo 0x20000000 > /sys/bus/cxl/devices/region0/size
# echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
# echo 1 > /sys/bus/cxl/devices/region0/commit
# echo ram > /sys/bus/cxl/devices/decoder2.1/mode
# echo 0 > /sys/bus/cxl/devices/decoder2.1/bi (this is already the default)
# echo 0x20000000 > /sys/bus/cxl/devices/decoder2.1/dpa_size
# echo region1 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
# echo 256 > /sys/bus/cxl/devices/region1/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region1/interleave_ways
# echo 0x20000000 > /sys/bus/cxl/devices/region1/size
# echo decoder2.1 > /sys/bus/cxl/devices/region1/target0
# echo 1 > /sys/bus/cxl/devices/region1/commit
# cxl list -D
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":9395240960,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4831838208,
"size":536870912,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
},
{
"decoder":"decoder1.1",
"resource":5368709120,
"size":536870912,
"interleave_ways":1,
"region":"region1",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4831838208,
"size":536870912,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":536870912,
"mode":"ram"
},
{
"decoder":"decoder2.1",
"resource":5368709120,
"size":536870912,
"interleave_ways":1,
"region":"region1",
"dpa_resource":536870912,
"dpa_size":536870912,
"mode":"ram"
}
]
}
]
3. Unbind + Bind
----------------
[ 0.799944] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0d:00.0: device capable of issuing BI requests
# echo mem0 > /sys/bus/cxl/drivers/cxl_mem/unbind
[ 108.682475] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0d:00.0: device incapable of issuing BI requests
[ 108.685569] cxl_core:cxl_detach_ep:1600: cxl_mem mem0: disconnect mem0 from port1
# echo mem0 > /sys/bus/cxl/drivers/cxl_mem/bind
[ 152.828550] cxl_core:devm_cxl_enumerate_ports:1906: cxl_mem mem0: scan: iter: mem0 dport_dev: 0000:0c:00.0 parent: c
[ 152.833431] cxl_core:devm_cxl_enumerate_ports:1912: cxl_mem mem0: found already registered port port1:pci0000:0c
[ 152.836573] cxl_core:cxl_port_alloc:802: cxl_mem mem0: host-bridge: pci0000:0c
[ 152.838898] cxl_core:cxl_cdat_get_length:486: cxl_port endpoint2: CDAT length 160
[ 152.846065] cxl_core:cxl_port_perf_data_calculate:207: cxl_port endpoint2: Failed to retrieve ep perf coordinates.
[ 152.848011] cxl_core:cxl_endpoint_parse_cdat:423: cxl_port endpoint2: Failed to do perf coord calculations.
[ 152.870738] cxl_core:init_hdm_decoder:1139: cxl_port endpoint2: decoder2.0: range: 0x0-0xffffffffffffffff iw: 1 ig:6
[ 152.876618] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.0 added to endpoint2
[ 152.880951] cxl_core:init_hdm_decoder:1139: cxl_port endpoint2: decoder2.1: range: 0x0-0xffffffffffffffff iw: 1 ig:6
[ 152.883983] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.1 added to endpoint2
[ 152.885263] cxl_core:init_hdm_decoder:1139: cxl_port endpoint2: decoder2.2: range: 0x0-0xffffffffffffffff iw: 1 ig:6
[ 152.887181] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.2 added to endpoint2
[ 152.888514] cxl_core:init_hdm_decoder:1139: cxl_port endpoint2: decoder2.3: range: 0x0-0xffffffffffffffff iw: 1 ig:6
[ 152.890362] cxl_core:add_hdm_decoder:39: cxl_mem mem0: decoder2.3 added to endpoint2
[ 152.891493] cxl_core:cxl_bus_probe:2334: cxl_port endpoint2: probe: 0
[ 152.892420] cxl_core:devm_cxl_add_port:1010: cxl_mem mem0: endpoint2 added to port1
[ 152.893360] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0d:00.0: device capable of issuing BI requests
[ 152.894395] cxl_core:cxl_bus_probe:2334: cxl_mem mem0: probe: 0
Unbind both devices:
# echo mem0 > /sys/bus/cxl/drivers/cxl_mem/unbind
[ 344.593463] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0d:00.0: device incapable of issuing BI requests
# echo mem1 > /sys/bus/cxl/drivers/cxl_mem/unbind
[ 349.861528] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0e:00.0: device incapable of issuing BI requests
4. Discovery of 2 BI capable devices behind a switch (w/ flit mode)
-------------------------------------------------------------------
[ 0.758525] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:11:00.0: found BI Decoder capability (0xab4)
[ 0.767234] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
[ 0.842162] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
[ 0.863272] cxl_port:cxl_uport_init_bi:94: cxl_port port2: BI RT registers not found
[ 0.907717] cxl_core:cxl_probe_component_regs:96: cxl_port port2: found BI RT capability (0xaa8)
[ 1.113556] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:03.0: found BI Decoder capability (0xab4)
[ 1.128571] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:01.0: found BI Decoder capability (0xab4)
[ 1.281506] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:02.0: found BI Decoder capability (0xab4)
[ 1.290050] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
[ 1.612582] cxl_core:__cxl_bi_commit:1024: pcieport 0000:0e:00.0: BI-ID commit wait took 203541us
[ 1.616836] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:0f:00.0: device capable of issuing BI requests
[ 1.820621] cxl_core:__cxl_bi_commit:1024: pcieport 0000:0e:02.0: BI-ID commit wait took 203526us
[ 1.823760] cxl_core:cxl_bi_ctrl_endpoint:1123: cxl_pci 0000:11:00.0: device capable of issuing BI requests
5. Mixed Configurations (BI-capable type3 but DSP 68b)
------------------------------------------------------
[ 0.842026] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:11:00.0: found BI Decoder capability (0xab4)
[ 0.842092] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
[ 0.878436] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0c:00.0: found BI Decoder capability (0xab4)
[ 1.297020] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:03.0: found BI Decoder capability (0xab4)
[ 1.315171] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:02.0: found BI Decoder capability (0xab4)
[ 1.475849] cxl_mem:cxl_mem_probe:160: cxl_mem mem3: BI setup failed rc=-22
[ 1.476831] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:01.0: found BI Decoder capability (0xab4)
[ 1.477785] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
[ 1.671067] cxl_mem:cxl_mem_probe:160: cxl_mem mem0: BI setup failed rc=-22
# echo 1 > /sys/bus/cxl/devices/decoder5.0/bi
-bash: echo: write error: Invalid argument
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_bi_region
[ 287.198551] cxl_core:cxl_region_can_probe:4227: cxl_region region0: config state: 0
[ 287.200005] cxl_core:cxl_bus_probe:2334: cxl_region region0: probe: -6
[ 287.201012] cxl_core:devm_cxl_add_region:2649: cxl_acpi ACPI0017:00: decoder0.0: created region0
# echo decoder5.0 > /sys/bus/cxl/devices/region0/target0
[ 384.895032] cxl_core:cxl_region_attach:2062: cxl region0: mem0:decoder5.0 BI not enabled on device
[ 384.896300] cxl_port endpoint5: failed to attach decoder5.0 to region0: -6
-bash: echo: write error: No such device or address
6. Detect mismatch between decoder and region types
---------------------------------------------------
# echo ram > /sys/bus/cxl/devices/decoder2.0/mode
# echo 1 > /sys/bus/cxl/devices/decoder2.0/bi
# echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region <- no bi
# echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
# echo 0x40000000 > /sys/bus/cxl/devices/region0/size
# echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
# echo 1 > /sys/bus/cxl/devices/region0/commit
[ 24.738308] cxl_core:cxl_region_attach:2048: cxl region0: mem0:decoder2.0 type mismatch: 2 vs 3
Applies against the 'next' branch of cxl.git + the type2 support preparation
series (https://lore.kernel.org/linux-cxl/20260306164741.3796372-1-alejandro.lucero-palau@amd.com)
Thanks!
Davidlohr Bueso (6):
cxl: Add Back-Invalidate register definitions and structures
cxl: Add BI register probing and port initialization
cxl/pci: Add Back-Invalidate topology enable/disable
cxl: Wire BI setup and dealloc into device lifecycle
cxl/hdm: Add BI coherency support for endpoint decoders
cxl: Add HDM-DB region creation and sysfs interface
Documentation/ABI/testing/sysfs-bus-cxl | 40 ++-
drivers/cxl/acpi.c | 2 +
drivers/cxl/core/core.h | 4 +
drivers/cxl/core/hdm.c | 60 ++++-
drivers/cxl/core/pci.c | 339 ++++++++++++++++++++++++
drivers/cxl/core/port.c | 63 ++++-
drivers/cxl/core/region.c | 65 ++++-
drivers/cxl/core/regs.c | 13 +
drivers/cxl/cxl.h | 41 +++
drivers/cxl/cxlmem.h | 2 +
drivers/cxl/mem.c | 20 ++
drivers/cxl/port.c | 88 ++++++
include/cxl/cxl.h | 5 +
13 files changed, 726 insertions(+), 16 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-19 16:59 ` Jonathan Cameron
` (2 more replies)
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
` (4 subsequent siblings)
5 siblings, 3 replies; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Add CXL Back-Invalidate (BI) capability IDs, register definitions for
the BI Route Table and BI Decoder capability structures, and associated
fields. This includes HDM decoder coherency capability and control fields
needed to support HDM-DB (device-managed coherency with back-invalidate).
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/cxl.h | 41 +++++++++++++++++++++++++++++++++++++++++
drivers/cxl/cxlmem.h | 2 ++
include/cxl/cxl.h | 5 +++++
3 files changed, 48 insertions(+)
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 031846eab02c..efe06d60b364 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -42,6 +42,8 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_CAP_ID_RAS 0x2
#define CXL_CM_CAP_CAP_ID_HDM 0x5
#define CXL_CM_CAP_CAP_HDM_VERSION 1
+#define CXL_CM_CAP_CAP_ID_BI_RT 0xB
+#define CXL_CM_CAP_CAP_ID_BI_DECODER 0xC
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
#define CXL_HDM_DECODER_CAP_OFFSET 0x0
@@ -51,6 +53,10 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
#define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
+#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
+#define CXL_HDM_DECODER_COHERENCY_DEV 0x1
+#define CXL_HDM_DECODER_COHERENCY_HOST 0x2
+#define CXL_HDM_DECODER_COHERENCY_BOTH 0x3
#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
#define CXL_HDM_DECODER_ENABLE BIT(1)
#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
@@ -65,6 +71,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
#define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
+#define CXL_HDM_DECODER0_CTRL_BI BIT(13)
#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
@@ -152,6 +159,33 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_HEADERLOG_SIZE SZ_512
#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
+/* CXL 3.2 8.2.4.26 CXL BI Route Table Capability Structure */
+#define CXL_BI_RT_CAPABILITY_LENGTH 0xC
+#define CXL_BI_RT_CAPS_OFFSET 0x0
+#define CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ BIT(0)
+#define CXL_BI_RT_CTRL_OFFSET 0x4
+#define CXL_BI_RT_CTRL_BI_COMMIT BIT(0)
+#define CXL_BI_RT_STATUS_OFFSET 0x8
+#define CXL_BI_RT_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
+/* CXL 3.2 8.2.4.27 CXL BI Decoder Capability Structure */
+#define CXL_BI_DECODER_CAPABILITY_LENGTH 0xC
+#define CXL_BI_DECODER_CAPS_OFFSET 0x0
+#define CXL_BI_DECODER_CAPS_HDMD_CAP BIT(0)
+#define CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ BIT(1)
+#define CXL_BI_DECODER_CTRL_OFFSET 0x4
+#define CXL_BI_DECODER_CTRL_BI_FW BIT(0)
+#define CXL_BI_DECODER_CTRL_BI_ENABLE BIT(1)
+#define CXL_BI_DECODER_CTRL_BI_COMMIT BIT(2)
+#define CXL_BI_DECODER_STATUS_OFFSET 0x8
+#define CXL_BI_DECODER_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
#define CXLDEV_CAP_ARRAY_CAP_ID 0
@@ -241,6 +275,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_DECODER_F_LOCK BIT(4)
#define CXL_DECODER_F_ENABLE BIT(5)
#define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
+#define CXL_DECODER_F_BI BIT(7)
enum cxl_decoder_type {
CXL_DECODER_DEVMEM = 2,
@@ -522,6 +557,7 @@ struct cxl_dax_region {
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
* @regs: mapped component registers
+ * @uport_regs: mapped upstream port component registers (BI RT)
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -530,6 +566,7 @@ struct cxl_dax_region {
* @cdat: Cached CDAT data
* @cdat_available: Should a CDAT attribute be available in sysfs
* @pci_latency: Upstream latency in picoseconds
+ * @nr_bi: number of BI-enabled endpoints below this port
* @component_reg_phys: Physical address of component register
*/
struct cxl_port {
@@ -544,6 +581,7 @@ struct cxl_port {
struct ida decoder_ida;
struct cxl_register_map reg_map;
struct cxl_component_regs regs;
+ struct cxl_component_regs uport_regs;
int nr_dports;
int hdm_end;
int commit_end;
@@ -555,6 +593,7 @@ struct cxl_port {
} cdat;
bool cdat_available;
long pci_latency;
+ int nr_bi;
resource_size_t component_reg_phys;
};
@@ -875,6 +914,8 @@ void cxl_coordinates_combine(struct access_coordinate *out,
struct access_coordinate *c2);
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
+int cxl_bi_setup(struct cxl_dev_state *cxlds);
+int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
struct device *dport_dev);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 281546de426e..efab65f68575 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -837,6 +837,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
* @target_count: for switch decoders, max downstream port targets
* @interleave_mask: interleave granularity capability, see check_interleave_cap()
* @iw_cap_mask: bitmask of supported interleave ways, see check_interleave_cap()
+ * @supported_coherency: HDM Decoder Capability supported coherency mask
* @port: mapped cxl_port, see devm_cxl_setup_hdm()
*/
struct cxl_hdm {
@@ -845,6 +846,7 @@ struct cxl_hdm {
unsigned int target_count;
unsigned int interleave_mask;
unsigned long iw_cap_mask;
+ unsigned int supported_coherency;
struct cxl_port *port;
};
diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
index fa7269154620..74be940364e1 100644
--- a/include/cxl/cxl.h
+++ b/include/cxl/cxl.h
@@ -34,10 +34,12 @@ struct cxl_regs {
* Common set of CXL Component register block base pointers
* @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
* @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ * @bi: CXL 3.2 8.2.4.26/27 CXL BI Capability Structure
*/
struct_group_tagged(cxl_component_regs, component,
void __iomem *hdm_decoder;
void __iomem *ras;
+ void __iomem *bi;
);
/*
* Common set of CXL Device register block base pointers
@@ -80,6 +82,7 @@ struct cxl_reg_map {
struct cxl_component_reg_map {
struct cxl_reg_map hdm_decoder;
struct cxl_reg_map ras;
+ struct cxl_reg_map bi;
};
struct cxl_device_reg_map {
@@ -162,6 +165,7 @@ struct cxl_dpa_partition {
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @bi: device is BI (Back-Invalidate) enabled
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
* @part: DPA partition array
@@ -181,6 +185,7 @@ struct cxl_dev_state {
struct cxl_device_regs regs;
int cxl_dvsec;
bool rcd;
+ bool bi;
bool media_ready;
struct resource dpa_res;
struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 2/6] cxl: Add BI register probing and port initialization
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-20 15:46 ` Jonathan Cameron
` (2 more replies)
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
` (3 subsequent siblings)
5 siblings, 3 replies; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Add register probing for BI Route Table and BI Decoder capability
structures in cxl_probe_component_regs(), and initialize BI registers
during port probe for both switch ports and endpoint ports.
For switch ports, map BI Decoder registers on downstream ports and
BI Route Table registers on upstream ports. For endpoint ports, map
the BI Decoder registers directly into the port's register block.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/regs.c | 13 ++++++
drivers/cxl/port.c | 88 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 101 insertions(+)
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 93710cf4f0a6..82e6018fd4cf 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
length = CXL_RAS_CAPABILITY_LENGTH;
rmap = &map->ras;
break;
+ case CXL_CM_CAP_CAP_ID_BI_RT:
+ dev_dbg(dev, "found BI RT capability (0x%x)\n",
+ offset);
+ length = CXL_BI_RT_CAPABILITY_LENGTH;
+ rmap = &map->bi;
+ break;
+ case CXL_CM_CAP_CAP_ID_BI_DECODER:
+ dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
+ offset);
+ length = CXL_BI_DECODER_CAPABILITY_LENGTH;
+ rmap = &map->bi;
+ break;
default:
dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
offset);
@@ -211,6 +223,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
} mapinfo[] = {
{ &map->component_map.hdm_decoder, ®s->hdm_decoder },
{ &map->component_map.ras, ®s->ras },
+ { &map->component_map.bi, ®s->bi },
};
int i;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index ada51948d52f..0540f0681ffb 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -58,6 +58,90 @@ static int discover_region(struct device *dev, void *unused)
return 0;
}
+static int cxl_dport_init_bi(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (dport->regs.bi)
+ return 0;
+
+ if (!cxl_pci_flit_256(to_pci_dev(dev)))
+ return 0;
+
+ if (!map->component_map.bi.valid) {
+ dev_dbg(dev, "BI decoder registers not found\n");
+ return 0;
+ }
+
+ if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_BI_DECODER))) {
+ dev_dbg(dev, "Failed to map BI decoder capability.\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void cxl_uport_init_bi(struct cxl_port *port, struct device *host)
+{
+ struct cxl_register_map *map = &port->reg_map;
+
+ if (port->uport_regs.bi)
+ return;
+
+ if (!map->component_map.bi.valid) {
+ dev_dbg(host, "BI RT registers not found\n");
+ return;
+ }
+
+ map->host = host;
+ if (cxl_map_component_regs(map, &port->uport_regs,
+ BIT(CXL_CM_CAP_CAP_ID_BI_RT)))
+ dev_dbg(&port->dev, "Failed to map BI RT capability\n");
+}
+
+static void cxl_endpoint_init_bi(struct cxl_port *port)
+{
+ struct cxl_register_map *map = &port->reg_map;
+
+ cxl_dport_init_bi(port->parent_dport);
+
+ if (!map->component_map.bi.valid)
+ return;
+
+ if (cxl_map_component_regs(map, &port->regs,
+ BIT(CXL_CM_CAP_CAP_ID_BI_DECODER)))
+ dev_dbg(&port->dev, "Failed to map BI decoder capability\n");
+}
+
+static void cxl_switch_port_init_bi(struct cxl_port *port)
+{
+ struct cxl_dport *parent_dport = port->parent_dport;
+
+ if (is_cxl_root(to_cxl_port(port->dev.parent)))
+ return;
+
+ if (dev_is_pci(port->uport_dev) &&
+ !cxl_pci_flit_256(to_pci_dev(port->uport_dev)))
+ return;
+
+ if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
+ struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ cxl_dport_init_bi(parent_dport);
+ break;
+ default:
+ break;
+ }
+ }
+
+ cxl_uport_init_bi(port, &port->dev);
+}
+
static int cxl_switch_port_probe(struct cxl_port *port)
{
/* Reset nr_dports for rebind of driver */
@@ -66,6 +150,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
/* Cache the data early to ensure is_visible() works */
read_cdat_data(port);
+ cxl_switch_port_init_bi(port);
+
return 0;
}
@@ -128,6 +214,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
read_cdat_data(port);
cxl_endpoint_parse_cdat(port);
+ cxl_endpoint_init_bi(port);
+
get_device(&cxlmd->dev);
rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
if (rc)
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
` (2 more replies)
2026-03-15 20:27 ` [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle Davidlohr Bueso
` (2 subsequent siblings)
5 siblings, 3 replies; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Implement cxl_bi_setup() and cxl_bi_dealloc() which walk the CXL port
topology to enable/disable BI flows on all components in the path.
Upon a successful setup, this enablement does not influence the current
HDM decoder setup by enabling the BI bit, and therefore the device is left
in a BI capable state, but not making use of it in the decode coherence.
Upon a BI-ID removal event, it is expected for the device to be offline.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/pci.c | 339 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 339 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index d1f487b3d809..5f0226397dfa 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -926,3 +926,342 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
return ctx.count;
}
+
+static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
+{
+ if (!cxl_pci_flit_256(pdev))
+ return false;
+
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
+ dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
+ return false;
+ }
+
+ return true;
+}
+
+/* limit any insane timeouts from hw */
+#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
+
+static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
+ int scale, int base)
+{
+ static const unsigned long scale_tbl[] = {
+ 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
+ };
+
+ if (scale >= ARRAY_SIZE(scale_tbl)) {
+ dev_dbg(dev, "Invalid BI commit timeout scale: %d\n", scale);
+ return CXL_BI_COMMIT_MAXTMO_US;
+ }
+
+ return scale_tbl[scale] * base;
+}
+
+#define ___cxl_bi_commit(dev, bi, ctype) \
+do { \
+ u32 status, ctrl; \
+ int scale, base; \
+ ktime_t tmo, now, start; \
+ unsigned long poll_us, tmo_us; \
+ \
+ ctrl = readl(bi + CXL_BI_##ctype##_CTRL_OFFSET); \
+ writel(ctrl & ~CXL_BI_##ctype##_CTRL_BI_COMMIT, \
+ (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
+ writel(ctrl | CXL_BI_##ctype##_CTRL_BI_COMMIT, \
+ (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
+ \
+ status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
+ scale = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_SCALE, \
+ status); \
+ base = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_BASE, \
+ status); \
+ \
+ /* ... and poll */ \
+ tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US, \
+ __cxl_bi_get_timeout_us((dev), scale, base)); \
+ poll_us = tmo_us / 10; /* arbitrary 10% of timeout */ \
+ start = now = ktime_get(); \
+ tmo = ktime_add_us(now, tmo_us); \
+ while (!FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMITTED, status) && \
+ !FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, \
+ status)) { \
+ if (ktime_after(now, tmo)) { \
+ dev_dbg((dev), "BI-ID commit timed out (%luus)\n", \
+ tmo_us); \
+ return -ETIMEDOUT; \
+ } \
+ \
+ fsleep(poll_us); \
+ now = ktime_get(); \
+ status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
+ } \
+ \
+ if (FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, status)) \
+ return -EIO; \
+ \
+ dev_dbg((dev), "BI-ID commit wait took %lluus\n", \
+ ktime_to_us(ktime_sub(now, start))); \
+} while (0)
+
+static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
+{
+ if (!bi)
+ return 0;
+
+ if (FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
+ readl(bi + CXL_BI_RT_CAPS_OFFSET)))
+ ___cxl_bi_commit(dev, bi, RT);
+
+ return 0;
+}
+
+static int __cxl_bi_commit(struct device *dev, void __iomem *bi)
+{
+ if (!bi)
+ return -EINVAL;
+
+ ___cxl_bi_commit(dev, bi, DECODER);
+ return 0;
+}
+
+/* enable or dealloc BI-ID changes in the given level of the topology */
+static int cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
+{
+ u32 ctrl, value;
+ void __iomem *bi = dport->regs.bi;
+ struct cxl_port *port = dport->port;
+ struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
+
+ guard(device)(&port->dev);
+
+ if (!bi)
+ return -EINVAL;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ if (enable) {
+ /*
+ * There is no point of failure from here on,
+ * BI will be enabled on the endpoint device.
+ */
+ port->nr_bi++;
+
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
+ return 0;
+
+ value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
+ value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (WARN_ON_ONCE(port->nr_bi == 0))
+ return -EINVAL;
+ if (--port->nr_bi > 0)
+ return 0;
+
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ return 0;
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ if (enable) {
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
+ value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
+ return 0;
+
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
+ readl(bi + CXL_BI_DECODER_CAPS_OFFSET))) {
+ int rc = __cxl_bi_commit(dport->dport_dev,
+ dport->regs.bi);
+ if (rc)
+ return rc;
+ }
+
+ return __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
+ default:
+ return -EINVAL;
+ }
+}
+
+static int cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
+{
+ u32 ctrl, val;
+ struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
+ void __iomem *bi = endpoint->regs.bi;
+
+ if (!bi)
+ return -EINVAL;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ if (enable) {
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ WARN_ON_ONCE(!cxlds->bi);
+ return 0;
+ }
+ val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ WARN_ON_ONCE(cxlds->bi);
+ return 0;
+ }
+ val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ }
+
+ writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ cxlds->bi = enable;
+
+ dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
+ enable ? "" : "in");
+
+ return 0;
+}
+
+int cxl_bi_setup(struct cxl_dev_state *cxlds)
+{
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
+ struct cxl_dport *dport, *_dport, *failed;
+ struct cxl_port *parent_port, *port;
+ int rc;
+
+ struct cxl_port *_port __free(put_cxl_port) =
+ cxl_pci_find_port(pdev, &_dport);
+
+ if (!_port)
+ return -EINVAL;
+
+ if (!cxl_is_bi_capable(pdev, endpoint->regs.bi))
+ return 0;
+
+ /* walkup the topology twice, first to check, then to enable */
+ port = _port;
+ dport = _dport;
+ while (1) {
+ parent_port = to_cxl_port(port->dev.parent);
+ /* check rp, dsp */
+ if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
+ dport->regs.bi))
+ return -EINVAL;
+
+ /* check usp */
+ if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
+ PCI_EXP_TYPE_DOWNSTREAM) {
+ if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
+ port->uport_regs.bi))
+ return -EINVAL;
+ }
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ port = _port;
+ dport = _dport;
+ while (1) {
+ parent_port = to_cxl_port(port->dev.parent);
+
+ rc = cxl_bi_ctrl_dport(dport, true);
+ if (rc)
+ goto err_rollback;
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ /* finally, enable BI on the device */
+ cxl_bi_ctrl_endpoint(cxlds, true);
+ return 0;
+
+err_rollback:
+ /*
+ * Undo all dports enabled so far by re-walking from the bottom
+ * up to (but not including) the failed dport.
+ */
+ failed = dport;
+ dport = _dport;
+ port = _port;
+ while (dport != failed) {
+ parent_port = to_cxl_port(port->dev.parent);
+
+ cxl_bi_ctrl_dport(dport, false);
+ if (is_cxl_root(parent_port))
+ break;
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+ return rc;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
+
+int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
+{
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_port *parent_port, *port;
+ struct cxl_dport *dport, *_dport;
+
+ struct cxl_port *_port __free(put_cxl_port) =
+ cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
+
+ if (!_port)
+ return -EINVAL;
+
+ if (!cxlds->bi)
+ return 0;
+
+ if (endpoint) {
+ /* ensure the device is offline and unmapped */
+ scoped_guard(rwsem_read, &cxl_rwsem.region) {
+ if (cxl_num_decoders_committed(endpoint) > 0)
+ return -EBUSY;
+ }
+
+ /* first, disable BI on the device */
+ cxl_bi_ctrl_endpoint(cxlds, false);
+ } else {
+ /*
+ * Teardown path: the endpoint was already removed, which
+ * tears down regions and uncommits decoders. The endpoint
+ * BI registers are no longer mapped so just clear the flag
+ * and walk the dports below.
+ */
+ cxlds->bi = false;
+ }
+
+ port = _port;
+ dport = _dport;
+ while (1) {
+ int rc;
+
+ parent_port = to_cxl_port(port->dev.parent);
+
+ rc = cxl_bi_ctrl_dport(dport, false);
+ if (rc)
+ return rc;
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
` (2 preceding siblings ...)
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 16:29 ` Jonathan Cameron
2026-03-15 20:27 ` [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface Davidlohr Bueso
5 siblings, 2 replies; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Setup BI flows during cxl_mem_probe() after endpoint enumeration and
EDAC registration, and tear down BI during endpoint detach.
BI setup failure is non-fatal - the device continues to operate
in HDM-H mode without back-invalidate support.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/mem.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index fcffe24dcb42..3adee41885c5 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -65,6 +65,11 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
cxl_debugfs_poison_clear, "%llx\n");
+static void devm_cxl_bi_dealloc(void *data)
+{
+ cxl_bi_dealloc(data);
+}
+
static int cxl_mem_probe(struct device *dev)
{
struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
@@ -150,6 +155,21 @@ static int cxl_mem_probe(struct device *dev)
if (rc)
dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
+ rc = cxl_bi_setup(cxlds);
+ if (rc)
+ dev_dbg(dev, "BI setup failed rc=%d\n", rc);
+
+ /*
+ * Register BI dealloc after setup so devm ordering ensures
+ * it runs before the endpoint is removed by delete_endpoint().
+ */
+ if (cxlds->bi) {
+ rc = devm_add_action_or_reset(dev, devm_cxl_bi_dealloc,
+ cxlds);
+ if (rc)
+ return rc;
+ }
+
/*
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
` (3 preceding siblings ...)
2026-03-15 20:27 ` [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-15 20:27 ` [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface Davidlohr Bueso
5 siblings, 1 reply; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
Add cxl_dpa_set_coherence() to allow setting the coherency type of an
endpoint decoder, with validation against hardware capabilities. Other
than the HDM decoder supported coherency models, the main dependency
to create these regions is for the device state to be BI-ready (cxlds->bi),
for which already committed HDM decoders with the BI bit set detected during
enumeration is currently not supported because endpoint and port enumerations
are independent. Program the BI bit in the HDM decoder control register at
commit time.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/core.h | 2 ++
drivers/cxl/core/hdm.c | 60 +++++++++++++++++++++++++++++++++++++----
2 files changed, 57 insertions(+), 5 deletions(-)
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 19494a8615d3..da4c094b6702 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -97,6 +97,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
struct dentry *cxl_debugfs_create_dir(const char *dir);
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode);
+int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
+ enum cxl_decoder_type type);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index b2db5967f5c0..6a6071b1f1dd 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -87,6 +87,8 @@ static void parse_hdm_decoder_caps(struct cxl_hdm *cxlhdm)
cxlhdm->iw_cap_mask |= BIT(3) | BIT(6) | BIT(12);
if (FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_16_WAY, hdm_cap))
cxlhdm->iw_cap_mask |= BIT(16);
+ cxlhdm->supported_coherency =
+ FIELD_GET(CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK, hdm_cap);
}
static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
@@ -603,6 +605,34 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
return 0;
}
+int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
+ enum cxl_decoder_type type)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent);
+ struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
+
+ guard(rwsem_write)(&cxl_rwsem.dpa);
+ if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
+ return -EBUSY;
+
+ if (!cxlds->bi && type == CXL_DECODER_DEVMEM)
+ return -EINVAL;
+
+ /* Device coherent only cannot be set to host-only */
+ if (type == CXL_DECODER_HOSTONLYMEM &&
+ cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV)
+ return -EINVAL;
+ /* Host-only coherent cannot be set to device coherent */
+ if (type == CXL_DECODER_DEVMEM &&
+ cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_HOST)
+ return -EINVAL;
+
+ cxled->cxld.target_type = type;
+ return 0;
+}
+
static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
@@ -713,6 +743,9 @@ static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
u32p_replace_bits(ctrl,
!!(cxld->target_type == CXL_DECODER_HOSTONLYMEM),
CXL_HDM_DECODER0_CTRL_HOSTONLY);
+ u32p_replace_bits(ctrl,
+ !!(cxld->target_type == CXL_DECODER_DEVMEM),
+ CXL_HDM_DECODER0_CTRL_BI);
}
static void cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
@@ -1030,6 +1063,14 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
else
cxld->target_type = CXL_DECODER_DEVMEM;
+ /*
+ * Autocommit BI-enabled decoders is not supported.
+ * At this point ->bi is not yet setup, so there
+ * are no guarantees that the platform supports BI.
+ */
+ if (FIELD_GET(CXL_HDM_DECODER0_CTRL_BI, ctrl))
+ return -ENXIO;
+
guard(rwsem_write)(&cxl_rwsem.region);
if (cxld->id != cxl_num_decoders_committed(port)) {
dev_warn(&port->dev,
@@ -1049,16 +1090,24 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
if (cxled) {
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_port *ep_port = to_cxl_port(cxld->dev.parent);
+ struct cxl_hdm *cxlhdm = dev_get_drvdata(&ep_port->dev);
/*
- * Default by devtype until a device arrives that needs
- * more precision.
+ * For type3 HDM-DB devices, users can later change the
+ * target_type, if supported by the HDM decoder.
+ *
+ * Devices that support both coherency modes default
+ * to host-only.
*/
- if (cxlds->type == CXL_DEVTYPE_CLASSMEM)
- cxld->target_type = CXL_DECODER_HOSTONLYMEM;
- else
+ if (cxlds->type == CXL_DEVTYPE_CLASSMEM) {
+ if (cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV)
+ cxld->target_type = CXL_DECODER_DEVMEM;
+ else
+ cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+ } else {
cxld->target_type = CXL_DECODER_DEVMEM;
+ }
} else {
/* To be overridden by region type at commit time */
cxld->target_type = CXL_DECODER_HOSTONLYMEM;
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
` (4 preceding siblings ...)
2026-03-15 20:27 ` [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
@ 2026-03-15 20:27 ` Davidlohr Bueso
2026-03-20 16:39 ` Jonathan Cameron
5 siblings, 1 reply; 21+ messages in thread
From: Davidlohr Bueso @ 2026-03-15 20:27 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl, Davidlohr Bueso
A single Type 3 device can expose different parts of its memory with different
coherency semantics. For example, some memory ranges within a Type 3 device
could be configured as HDM-H, while other memory ranges on the same device
could be configured as HDM-DB. This allows for flexible memory configuration
within a single device. As such, coherency models are defined per memory region.
For accelerators (type2), it is expected for the respective drivers to manage
the HDM-D[B] region creation. For type3, relevant sysfs tunables are provided
to the user.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
Documentation/ABI/testing/sysfs-bus-cxl | 40 +++++++++++++--
drivers/cxl/acpi.c | 2 +
drivers/cxl/core/core.h | 2 +
drivers/cxl/core/port.c | 63 +++++++++++++++++++++++-
drivers/cxl/core/region.c | 65 ++++++++++++++++++++++---
5 files changed, 161 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index c80a1b5a03db..2959cc532f7c 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -297,7 +297,7 @@ Description:
Each entry in the list is a dport id.
-What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
+What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3,bi}
Date: June, 2021
KernelVersion: v5.14
Contact: linux-cxl@vger.kernel.org
@@ -306,8 +306,9 @@ Description:
represents a fixed memory window identified by platform
firmware. A fixed window may only support a subset of memory
types. The 'cap_*' attributes indicate whether persistent
- memory, volatile memory, accelerator memory, and / or expander
- memory may be mapped behind this decoder's memory window.
+ memory, volatile memory, accelerator memory, expander memory,
+ and / or back-invalidate (HDM-DB) memory may be mapped behind
+ this decoder's memory window.
What: /sys/bus/cxl/devices/decoderX.Y/target_type
@@ -426,6 +427,39 @@ Description:
current cached value.
+What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_bi_region
+Date: March, 2026
+KernelVersion: v7.1
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RW) Same as create_{pmem,ram}_region but creates an HDM-DB
+ (Back-Invalidate) region where the device manages coherency
+ via BISnp/BIRsp messages. Only visible when the root decoder
+ has the cap_bi flag set. Requires 256B Flit mode and BI
+ capability throughout the CXL topology.
+
+
+What: /sys/bus/cxl/devices/decoderX.Y/bi
+Date: March, 2026
+KernelVersion: v7.1
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RW) For endpoint decoders: set the coherence model of this
+ decoder. Write '1' to set HDM-DB (device-managed coherency
+ with back-invalidate), '0' for HDM-H (host-only coherency).
+ Reads show '1' if the device has Back-Invalidate enabled,
+ '0' otherwise. Can only be written when the decoder is
+ disabled.
+
+What: /sys/bus/cxl/devices/regionZ/bi
+Date: March, 2026
+KernelVersion: v7.1
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) Shows '1' if the region uses HDM-DB coherency model,
+ '0' otherwise.
+
+
What: /sys/bus/cxl/devices/decoderX.Y/delete_region
Date: May, 2022
KernelVersion: v6.0
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index c959fca69b3f..7a5cdf5081cd 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -145,6 +145,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
flags |= CXL_DECODER_F_PMEM;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED)
flags |= CXL_DECODER_F_LOCK;
+ if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)
+ flags |= CXL_DECODER_F_BI;
return flags;
}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index da4c094b6702..eceba6c71564 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -29,6 +29,8 @@ struct cxl_region_context {
extern struct device_attribute dev_attr_create_pmem_region;
extern struct device_attribute dev_attr_create_ram_region;
+extern struct device_attribute dev_attr_create_pmem_bi_region;
+extern struct device_attribute dev_attr_create_ram_bi_region;
extern struct device_attribute dev_attr_delete_region;
extern struct device_attribute dev_attr_region;
extern const struct device_type cxl_pmem_region_type;
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 5c82e6f32572..9bc2fb804aa0 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -131,6 +131,7 @@ CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
+CXL_DECODER_FLAG_ATTR(cap_bi, CXL_DECODER_F_BI);
static ssize_t target_type_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -233,6 +234,38 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RW(mode);
+static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ return sysfs_emit(buf, "%d\n", !!cxlds->bi);
+}
+
+static ssize_t bi_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ enum cxl_decoder_type type;
+ ssize_t rc;
+
+ if (sysfs_streq(buf, "1"))
+ type = CXL_DECODER_DEVMEM;
+ else if (sysfs_streq(buf, "0"))
+ type = CXL_DECODER_HOSTONLYMEM;
+ else
+ return -EINVAL;
+
+ rc = cxl_dpa_set_coherence(cxled, type);
+ if (rc)
+ return rc;
+
+ return len;
+}
+static DEVICE_ATTR_RW(bi);
+
static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -329,10 +362,13 @@ static struct attribute *cxl_decoder_root_attrs[] = {
&dev_attr_cap_ram.attr,
&dev_attr_cap_type2.attr,
&dev_attr_cap_type3.attr,
+ &dev_attr_cap_bi.attr,
&dev_attr_target_list.attr,
&dev_attr_qos_class.attr,
SET_CXL_REGION_ATTR(create_pmem_region)
SET_CXL_REGION_ATTR(create_ram_region)
+ SET_CXL_REGION_ATTR(create_pmem_bi_region)
+ SET_CXL_REGION_ATTR(create_ram_bi_region)
SET_CXL_REGION_ATTR(delete_region)
NULL,
};
@@ -351,6 +387,21 @@ static bool can_create_ram(struct cxl_root_decoder *cxlrd)
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
+static bool can_create_bi(struct cxl_root_decoder *cxlrd)
+{
+ return cxlrd->cxlsd.cxld.flags & CXL_DECODER_F_BI;
+}
+
+static bool can_create_pmem_bi(struct cxl_root_decoder *cxlrd)
+{
+ return can_create_pmem(cxlrd) && can_create_bi(cxlrd);
+}
+
+static bool can_create_ram_bi(struct cxl_root_decoder *cxlrd)
+{
+ return can_create_ram(cxlrd) && can_create_bi(cxlrd);
+}
+
static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
@@ -362,8 +413,17 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
return 0;
+ if (a == CXL_REGION_ATTR(create_pmem_bi_region) &&
+ !can_create_pmem_bi(cxlrd))
+ return 0;
+
+ if (a == CXL_REGION_ATTR(create_ram_bi_region) &&
+ !can_create_ram_bi(cxlrd))
+ return 0;
+
if (a == CXL_REGION_ATTR(delete_region) &&
- !(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
+ !(can_create_pmem(cxlrd) || can_create_ram(cxlrd) ||
+ can_create_pmem_bi(cxlrd) || can_create_ram_bi(cxlrd)))
return 0;
return a->mode;
@@ -402,6 +462,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
static struct attribute *cxl_decoder_endpoint_attrs[] = {
&dev_attr_target_type.attr,
&dev_attr_mode.attr,
+ &dev_attr_bi.attr,
&dev_attr_dpa_size.attr,
&dev_attr_dpa_resource.attr,
SET_CXL_REGION_ATTR(region)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index bd4c4a4a27da..4bc23ac3b5ed 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -626,6 +626,15 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(mode);
+static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+
+ return sysfs_emit(buf, "%d\n", cxlr->type == CXL_DECODER_DEVMEM);
+}
+static DEVICE_ATTR_RO(bi);
+
static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
{
struct cxl_root_decoder *cxlrd = cxlr->cxlrd;
@@ -775,6 +784,7 @@ static struct attribute *cxl_region_attrs[] = {
&dev_attr_resource.attr,
&dev_attr_size.attr,
&dev_attr_mode.attr,
+ &dev_attr_bi.attr,
&dev_attr_extended_linear_cache_size.attr,
NULL,
};
@@ -2041,6 +2051,13 @@ static int cxl_region_attach(struct cxl_region *cxlr,
return -ENXIO;
}
+ if (cxlr->type == CXL_DECODER_DEVMEM &&
+ cxlds->type == CXL_DEVTYPE_CLASSMEM && !cxlds->bi) {
+ dev_dbg(&cxlr->dev, "%s:%s BI not enabled on device\n",
+ dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
+ return -ENXIO;
+ }
+
if (!cxled->dpa_res) {
dev_dbg(&cxlr->dev, "%s:%s: missing DPA allocation.\n",
dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev));
@@ -2650,7 +2667,8 @@ static ssize_t create_ram_region_show(struct device *dev,
}
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode,
+ bool bi, int id)
{
int rc;
@@ -2672,11 +2690,14 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode,
+ bi ? CXL_DECODER_DEVMEM
+ : CXL_DECODER_HOSTONLYMEM);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
- size_t len, enum cxl_partition_mode mode)
+ size_t len, enum cxl_partition_mode mode,
+ bool bi)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_region *cxlr;
@@ -2686,7 +2707,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, bi, id);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -2697,7 +2718,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM, false);
}
DEVICE_ATTR_RW(create_pmem_region);
@@ -2705,10 +2726,40 @@ static ssize_t create_ram_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM, false);
}
DEVICE_ATTR_RW(create_ram_region);
+static ssize_t create_pmem_bi_region_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static ssize_t create_pmem_bi_region_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM, true);
+}
+DEVICE_ATTR_RW(create_pmem_bi_region);
+
+static ssize_t create_ram_bi_region_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static ssize_t create_ram_bi_region_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM, true);
+}
+DEVICE_ATTR_RW(create_ram_bi_region);
+
static ssize_t region_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -3883,7 +3934,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
struct cxl_region *cxlr;
do {
- cxlr = __create_region(cxlrd, cxlds->part[part].mode,
+ cxlr = __create_region(cxlrd, cxlds->part[part].mode, false,
atomic_read(&cxlrd->region_id));
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
--
2.39.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
@ 2026-03-19 16:59 ` Jonathan Cameron
2026-03-20 14:57 ` Jonathan Cameron
2026-03-23 22:11 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-19 16:59 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:36 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Add CXL Back-Invalidate (BI) capability IDs, register definitions for
> the BI Route Table and BI Decoder capability structures, and associated
> fields. This includes HDM decoder coherency capability and control fields
> needed to support HDM-DB (device-managed coherency with back-invalidate).
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/cxl.h | 41 +++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxlmem.h | 2 ++
> include/cxl/cxl.h | 5 +++++
> 3 files changed, 48 insertions(+)
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 031846eab02c..efe06d60b364 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
>
> /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> @@ -51,6 +53,10 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> #define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> #define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> +#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
Given this definitely wasn't in CXL 2.0 probably want to update the spec
references. I think that would be neater than adding specific notes
to say where these are in CXL r4.0
I'd define 0 as well. My favorite case of papering over a spec hole.
The 'unknown' value ;) To actually make use of this on devices
that have that value we'd need a list of IDs :(
> +#define CXL_HDM_DECODER_COHERENCY_DEV 0x1
> +#define CXL_HDM_DECODER_COHERENCY_HOST 0x2
> +#define CXL_HDM_DECODER_COHERENCY_BOTH 0x3
> #define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> #define CXL_HDM_DECODER_ENABLE BIT(1)
> #define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> @@ -65,6 +71,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> #define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> #define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> +#define CXL_HDM_DECODER0_CTRL_BI BIT(13)
> #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> #define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> @@ -152,6 +159,33 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXL_HEADERLOG_SIZE SZ_512
> #define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
>
> +/* CXL 3.2 8.2.4.26 CXL BI Route Table Capability Structure */
I think we decided a while back that all new comments should use
latest public (via click through) available spec which is 4.0 now.
That being driven by the fact that getting hold of old spec versions
is not as trivial as it should be.
> +#define CXL_BI_RT_CAPABILITY_LENGTH 0xC
> +
> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> #define CXLDEV_CAP_ARRAY_CAP_ID 0
> @@ -241,6 +275,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
> #define CXL_DECODER_F_LOCK BIT(4)
> #define CXL_DECODER_F_ENABLE BIT(5)
> #define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
> +#define CXL_DECODER_F_BI BIT(7)
Hmm. The comment above these looks to have grown stale.
/*
* cxl_decoder flags that define the type of memory / devices this
* decoder supports as well as configuration lock status See "CXL 2.0
* 8.2.5.12.7 CXL HDM Decoder 0 Control Register" for details.
* Additionally indicate whether decoder settings were autodetected,
* user customized.
*/
That definitely doesn't cover what is here. Anyhow, not
a problem for this patch.
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index fa7269154620..74be940364e1 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -34,10 +34,12 @@ struct cxl_regs {
> * Common set of CXL Component register block base pointers
> * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
> * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
> + * @bi: CXL 3.2 8.2.4.26/27 CXL BI Capability Structure
Not sure what general view is but I think it's odd to have different spec
versions used in a single block of comments. Maybe drag them all into
the modern CXL r4.0 world.
> */
> struct_group_tagged(cxl_component_regs, component,
> void __iomem *hdm_decoder;
> void __iomem *ras;
> + void __iomem *bi;
> );
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
2026-03-19 16:59 ` Jonathan Cameron
@ 2026-03-20 14:57 ` Jonathan Cameron
2026-03-23 22:11 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-20 14:57 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:36 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Add CXL Back-Invalidate (BI) capability IDs, register definitions for
> the BI Route Table and BI Decoder capability structures, and associated
> fields. This includes HDM decoder coherency capability and control fields
> needed to support HDM-DB (device-managed coherency with back-invalidate).
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index fa7269154620..74be940364e1 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -34,10 +34,12 @@ struct cxl_regs {
> * Common set of CXL Component register block base pointers
> * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
> * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
> + * @bi: CXL 3.2 8.2.4.26/27 CXL BI Capability Structure
Whilst it is indeed one or the other depending on whether we are upstream or
downstream ports, smashing them into a single entry in here seems liable
to generate long term confusion.
Other than a bit of storage, given this is optional anyway I'm guessing it
doesn't cost us much to have bi_rt and bi_decoder?
I think we just hit the
if (!mi->rmap->valid) check and continue in cxl_map_component_regs().
I also don't really think it's our problem to check for hardware that
surfaces these caps on the wrong type of devices. We'll map
it but not use it.
> */
> struct_group_tagged(cxl_component_regs, component,
> void __iomem *hdm_decoder;
> void __iomem *ras;
> + void __iomem *bi;
> );
> /*
> * Common set of CXL Device register block base pointers
> @@ -80,6 +82,7 @@ struct cxl_reg_map {
> struct cxl_component_reg_map {
> struct cxl_reg_map hdm_decoder;
> struct cxl_reg_map ras;
> + struct cxl_reg_map bi;
> };
>
> struct cxl_device_reg_map {
> @@ -162,6 +165,7 @@ struct cxl_dpa_partition {
> * @regs: Parsed register blocks
> * @cxl_dvsec: Offset to the PCIe device DVSEC
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> + * @bi: device is BI (Back-Invalidate) enabled
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> * @part: DPA partition array
> @@ -181,6 +185,7 @@ struct cxl_dev_state {
> struct cxl_device_regs regs;
> int cxl_dvsec;
> bool rcd;
> + bool bi;
> bool media_ready;
> struct resource dpa_res;
> struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/6] cxl: Add BI register probing and port initialization
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
@ 2026-03-20 15:46 ` Jonathan Cameron
2026-03-20 16:19 ` Cheatham, Benjamin
2026-03-23 23:10 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-20 15:46 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:37 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Add register probing for BI Route Table and BI Decoder capability
> structures in cxl_probe_component_regs(), and initialize BI registers
> during port probe for both switch ports and endpoint ports.
>
> For switch ports, map BI Decoder registers on downstream ports and
> BI Route Table registers on upstream ports. For endpoint ports, map
> the BI Decoder registers directly into the port's register block.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Various minor things in here. Biggest one is whether we can use port->regs
for the switch upstream port registers.
> ---
> drivers/cxl/core/regs.c | 13 ++++++
> drivers/cxl/port.c | 88 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 101 insertions(+)
>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 93710cf4f0a6..82e6018fd4cf 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
> length = CXL_RAS_CAPABILITY_LENGTH;
> rmap = &map->ras;
> break;
> + case CXL_CM_CAP_CAP_ID_BI_RT:
> + dev_dbg(dev, "found BI RT capability (0x%x)\n",
> + offset);
> + length = CXL_BI_RT_CAPABILITY_LENGTH;
> + rmap = &map->bi;
> + break;
> + case CXL_CM_CAP_CAP_ID_BI_DECODER:
> + dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
> + offset);
> + length = CXL_BI_DECODER_CAPABILITY_LENGTH;
> + rmap = &map->bi;
This was what triggered my reply on whether we should separate the two bi
capabilities.
> + break;
> default:
> dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
> offset);
> @@ -211,6 +223,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> } mapinfo[] = {
> { &map->component_map.hdm_decoder, ®s->hdm_decoder },
> { &map->component_map.ras, ®s->ras },
> + { &map->component_map.bi, ®s->bi },
> };
> int i;
>
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index ada51948d52f..0540f0681ffb 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -58,6 +58,90 @@ static int discover_region(struct device *dev, void *unused)
> return 0;
> }
>
> +static int cxl_dport_init_bi(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (dport->regs.bi)
As below. Maybe a comment on why we might hit this twice.
> + return 0;
> +
> + if (!cxl_pci_flit_256(to_pci_dev(dev)))
> + return 0;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(dev, "BI decoder registers not found\n");
> + return 0;
> + }
> +
> + if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER))) {
> + dev_dbg(dev, "Failed to map BI decoder capability.\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void cxl_uport_init_bi(struct cxl_port *port, struct device *host)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + if (port->uport_regs.bi)
Maybe a comment on why we'd hit this twice and locking requirements
that we don't end up racing between this check and initializing it.
In general uport_regs feels too general a name to me given we have
uport related registers mapped in various different places already.
I.e. in cxl_hdm and devm_cxl_port_ras_setup() which puts it in
port->regs. Can we not put this one in there as well alongside the RAS
register map? Assuming I read this right I'd be keen on a comment
update to make it clear that one is all about upstream port component
regs.
> + return;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(host, "BI RT registers not found\n");
> + return;
> + }
> +
> + map->host = host;
> + if (cxl_map_component_regs(map, &port->uport_regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_RT)))
> + dev_dbg(&port->dev, "Failed to map BI RT capability\n");
> +}
> +
> +static void cxl_endpoint_init_bi(struct cxl_port *port)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + cxl_dport_init_bi(port->parent_dport);
> +
> + if (!map->component_map.bi.valid)
> + return;
> +
> + if (cxl_map_component_regs(map, &port->regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER)))
> + dev_dbg(&port->dev, "Failed to map BI decoder capability\n");
> +}
> +
> +static void cxl_switch_port_init_bi(struct cxl_port *port)
> +{
> + struct cxl_dport *parent_dport = port->parent_dport;
> +
> + if (is_cxl_root(to_cxl_port(port->dev.parent)))
Rings a bell as something I moaned about before in a different series
as a pattern that is repeated too much. Though I'm still not
hugely keen on this not being named for what it is really checking
about this device. I believe this is excluding the
devices/platform/ACPI0017:00/root0/portX which are the host bridges.
if (parent_port_is_cxl_root(port))
return;
> + return;
> +
> + if (dev_is_pci(port->uport_dev) &&
I kind of wish we also had this named to indicate (I think) that it's
excluding CXL test devices.
> + !cxl_pci_flit_256(to_pci_dev(port->uport_dev)))
> + return;
> +
> + if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
> + struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + cxl_dport_init_bi(parent_dport);
> + break;
> + default:
> + break;
> + }
> + }
> +
> + cxl_uport_init_bi(port, &port->dev);
> +}
> +
> static int cxl_switch_port_probe(struct cxl_port *port)
> {
> /* Reset nr_dports for rebind of driver */
> @@ -66,6 +150,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> /* Cache the data early to ensure is_visible() works */
> read_cdat_data(port);
>
> + cxl_switch_port_init_bi(port);
> +
> return 0;
> }
>
> @@ -128,6 +214,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> read_cdat_data(port);
> cxl_endpoint_parse_cdat(port);
>
> + cxl_endpoint_init_bi(port);
> +
> get_device(&cxlmd->dev);
> rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> if (rc)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/6] cxl: Add BI register probing and port initialization
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
2026-03-20 15:46 ` Jonathan Cameron
@ 2026-03-20 16:19 ` Cheatham, Benjamin
2026-03-23 23:10 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Cheatham, Benjamin @ 2026-03-20 16:19 UTC (permalink / raw)
To: Davidlohr Bueso, dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/2026 3:27 PM, Davidlohr Bueso wrote:
> Add register probing for BI Route Table and BI Decoder capability
> structures in cxl_probe_component_regs(), and initialize BI registers
> during port probe for both switch ports and endpoint ports.
>
> For switch ports, map BI Decoder registers on downstream ports and
> BI Route Table registers on upstream ports. For endpoint ports, map
> the BI Decoder registers directly into the port's register block.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/regs.c | 13 ++++++
> drivers/cxl/port.c | 88 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 101 insertions(+)
>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 93710cf4f0a6..82e6018fd4cf 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
> length = CXL_RAS_CAPABILITY_LENGTH;
> rmap = &map->ras;
> break;
> + case CXL_CM_CAP_CAP_ID_BI_RT:
> + dev_dbg(dev, "found BI RT capability (0x%x)\n",
> + offset);
> + length = CXL_BI_RT_CAPABILITY_LENGTH;
> + rmap = &map->bi;
> + break;
> + case CXL_CM_CAP_CAP_ID_BI_DECODER:
> + dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
> + offset);
> + length = CXL_BI_DECODER_CAPABILITY_LENGTH;
> + rmap = &map->bi;
> + break;
> default:
> dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
> offset);
> @@ -211,6 +223,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> } mapinfo[] = {
> { &map->component_map.hdm_decoder, ®s->hdm_decoder },
> { &map->component_map.ras, ®s->ras },
> + { &map->component_map.bi, ®s->bi },
> };
> int i;
>
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index ada51948d52f..0540f0681ffb 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -58,6 +58,90 @@ static int discover_region(struct device *dev, void *unused)
> return 0;
> }
>
> +static int cxl_dport_init_bi(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (dport->regs.bi)
> + return 0;
> +
> + if (!cxl_pci_flit_256(to_pci_dev(dev)))
> + return 0;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(dev, "BI decoder registers not found\n");
> + return 0;
> + }
> +
> + if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER))) {
> + dev_dbg(dev, "Failed to map BI decoder capability.\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void cxl_uport_init_bi(struct cxl_port *port, struct device *host)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + if (port->uport_regs.bi)
> + return;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(host, "BI RT registers not found\n");
> + return;
> + }
> +
> + map->host = host;
> + if (cxl_map_component_regs(map, &port->uport_regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_RT)))
> + dev_dbg(&port->dev, "Failed to map BI RT capability\n");
Same question as Jonathan here.
If you can use port->regs then you can combine the endpoint and switch port
init BI functions below and rework this accordingly.
> +}
> +
> +static void cxl_endpoint_init_bi(struct cxl_port *port)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + cxl_dport_init_bi(port->parent_dport);
> +
> + if (!map->component_map.bi.valid)
> + return;
> +
> + if (cxl_map_component_regs(map, &port->regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER)))
> + dev_dbg(&port->dev, "Failed to map BI decoder capability\n");
> +}
> +
> +static void cxl_switch_port_init_bi(struct cxl_port *port)
> +{
> + struct cxl_dport *parent_dport = port->parent_dport;
> +
> + if (is_cxl_root(to_cxl_port(port->dev.parent)))
This will panic if to_cxl_port() returns NULL. I'd split it out and check to_cxl_port() != NULL first.
parent_port_is_cxl_root() has the same issue, so still applies there.
> + return;
> +
> + if (dev_is_pci(port->uport_dev) &&
> + !cxl_pci_flit_256(to_pci_dev(port->uport_dev)))
If you can use port->regs above, this check would become:
if (!is_cxl_endpoint(port) && dev_is_pci(port->uport_dev) &&
!cxl_pci_flit_256(to_pci_dev(port->uport_dev))
> + return;
> +
> + if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
> + struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + cxl_dport_init_bi(parent_dport);
> + break;
> + default:
> + break;
> + }
> + }
> +
> + cxl_uport_init_bi(port, &port->dev);
> +}
> +
> static int cxl_switch_port_probe(struct cxl_port *port)
> {
> /* Reset nr_dports for rebind of driver */
> @@ -66,6 +150,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> /* Cache the data early to ensure is_visible() works */
> read_cdat_data(port);
>
> + cxl_switch_port_init_bi(port);
> +
> return 0;
> }
>
> @@ -128,6 +214,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> read_cdat_data(port);
> cxl_endpoint_parse_cdat(port);
>
> + cxl_endpoint_init_bi(port);
> +
> get_device(&cxlmd->dev);
> rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> if (rc)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
@ 2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 20:52 ` Alison Schofield
2026-03-20 16:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disabl Jonathan Cameron
2026-03-24 0:21 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Dave Jiang
2 siblings, 1 reply; 21+ messages in thread
From: Cheatham, Benjamin @ 2026-03-20 16:20 UTC (permalink / raw)
To: Davidlohr Bueso, dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/2026 3:27 PM, Davidlohr Bueso wrote:
> Implement cxl_bi_setup() and cxl_bi_dealloc() which walk the CXL port
> topology to enable/disable BI flows on all components in the path.
>
> Upon a successful setup, this enablement does not influence the current
> HDM decoder setup by enabling the BI bit, and therefore the device is left
> in a BI capable state, but not making use of it in the decode coherence.
> Upon a BI-ID removal event, it is expected for the device to be offline.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/pci.c | 339 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 339 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d1f487b3d809..5f0226397dfa 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -926,3 +926,342 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
>
> return ctx.count;
> }
> +
> +static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
> +{
> + if (!cxl_pci_flit_256(pdev))
> + return false;
I think you can drop this, I don't see a case where it wasn't already checked in the last
patch.
> +
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
> + dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
> + return false;
> + }
> +
> + return true;
> +}
> +
> +/* limit any insane timeouts from hw */
> +#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
> +
> +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> + int scale, int base)
> +{
> + static const unsigned long scale_tbl[] = {
> + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> + };
> +
> + if (scale >= ARRAY_SIZE(scale_tbl)) {
> + dev_dbg(dev, "Invalid BI commit timeout scale: %d\n", scale);
> + return CXL_BI_COMMIT_MAXTMO_US;
> + }
> +
> + return scale_tbl[scale] * base;
scale needs to be an unsigned int for this
> +}
> +
> +#define ___cxl_bi_commit(dev, bi, ctype) \
> +do { \
> + u32 status, ctrl; \
> + int scale, base; \
> + ktime_t tmo, now, start; \
> + unsigned long poll_us, tmo_us; \
> + \
> + ctrl = readl(bi + CXL_BI_##ctype##_CTRL_OFFSET); \
> + writel(ctrl & ~CXL_BI_##ctype##_CTRL_BI_COMMIT, \
> + (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
> + writel(ctrl | CXL_BI_##ctype##_CTRL_BI_COMMIT, \
> + (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
> + \
> + status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
> + scale = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_SCALE, \
> + status); \
> + base = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_BASE, \
> + status); \
> + \
> + /* ... and poll */ \
> + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US, \
> + __cxl_bi_get_timeout_us((dev), scale, base)); \
> + poll_us = tmo_us / 10; /* arbitrary 10% of timeout */ \
> + start = now = ktime_get(); \
> + tmo = ktime_add_us(now, tmo_us); \
> + while (!FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMITTED, status) && \
> + !FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, \
> + status)) { \
> + if (ktime_after(now, tmo)) { \
> + dev_dbg((dev), "BI-ID commit timed out (%luus)\n", \
> + tmo_us); \
> + return -ETIMEDOUT; \
> + } \
> + \
> + fsleep(poll_us); \
> + now = ktime_get(); \
> + status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
> + } \
> + \
> + if (FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, status)) \
> + return -EIO; \
> + \
> + dev_dbg((dev), "BI-ID commit wait took %lluus\n", \
> + ktime_to_us(ktime_sub(now, start))); \
> +} while (0)
I'm split on whether it's better to use a macro here or just duplicate the
code between __cxl_bi_commit() and __cxl_bi_commit_rt(). On one hand it saves
space, but I hate looking at long macros...
> +
> +static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return 0;
> +
> + if (FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_RT_CAPS_OFFSET)))
> + ___cxl_bi_commit(dev, bi, RT);
> +
> + return 0;
> +}
I'd make this return void. Leaving it as an int return suggests it can fail
which isn't worth the mental overhead imo, especially if it only saves a few
lines below.
> +
> +static int __cxl_bi_commit(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return -EINVAL;
> +
> + ___cxl_bi_commit(dev, bi, DECODER);
> + return 0;
> +}
> +
> +/* enable or dealloc BI-ID changes in the given level of the topology */
> +static int cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> +{
> + u32 ctrl, value;
> + void __iomem *bi = dport->regs.bi;
> + struct cxl_port *port = dport->port;
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> +
> + guard(device)(&port->dev);
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + port->nr_bi++;
> +
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
> + return 0;
> +
> + value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
> + value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (WARN_ON_ONCE(port->nr_bi == 0))
> + return -EINVAL;
> + if (--port->nr_bi > 0)
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + return 0;
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + if (enable) {
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_DECODER_CAPS_OFFSET))) {
> + int rc = __cxl_bi_commit(dport->dport_dev,
> + dport->regs.bi);
It's been a while since I looked at how these decoders work, but why not move the
explicit commit check into __cxl_bi_commit()? I think it's only called here,
but I could be missing something.
> + if (rc)
> + return rc;
> + }
> +
> + return __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> +{
> + u32 ctrl, val;
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + void __iomem *bi = endpoint->regs.bi;
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (enable) {
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(!cxlds->bi);
> + return 0;
> + }
> + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(cxlds->bi);
> + return 0;
> + }
> + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + cxlds->bi = enable;
> +
> + dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
> + enable ? "" : "in");
Nit: It's pretty hard to tell the difference between the enable/disable case when it's only two letters.
I'd prefer something like:
dev_dbg(cxlds->dev, "%s issuing BI requests\n", enable ? "enabled" : "disabled");
The device part isn't really important since the dev_dbg() prints the device name anyway.
> +
> + return 0;
> +}
> +
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + struct cxl_dport *dport, *_dport, *failed;
> + struct cxl_port *parent_port, *port;
> + int rc;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxl_is_bi_capable(pdev, endpoint->regs.bi))
> + return 0;
> +
> + /* walkup the topology twice, first to check, then to enable */
> + port = _port;
> + dport = _dport;
> + while (1) {
You should be able to replace this with "while (!is_cxl_root(port))"; it's more
resilient imo.
> + parent_port = to_cxl_port(port->dev.parent);
> + /* check rp, dsp */
> + if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
> + dport->regs.bi))
> + return -EINVAL;
> +
> + /* check usp */
> + if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
> + PCI_EXP_TYPE_DOWNSTREAM) {
> + if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
> + port->uport_regs.bi))
> + return -EINVAL;
> + }
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
Same thing here.
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_ctrl_dport(dport, true);
> + if (rc)
> + goto err_rollback;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + /* finally, enable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, true);
> + return 0;
> +
> +err_rollback:
> + /*
> + * Undo all dports enabled so far by re-walking from the bottom
> + * up to (but not including) the failed dport.
> + */
> + failed = dport;
> + dport = _dport;
> + port = _port;
> + while (dport != failed) {
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + cxl_bi_ctrl_dport(dport, false);
> + if (is_cxl_root(parent_port))
> + break;
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
> +
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_port *parent_port, *port;
> + struct cxl_dport *dport, *_dport;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxlds->bi)
> + return 0;
> +
> + if (endpoint) {
> + /* ensure the device is offline and unmapped */
> + scoped_guard(rwsem_read, &cxl_rwsem.region) {
> + if (cxl_num_decoders_committed(endpoint) > 0)
> + return -EBUSY;
> + }
> +
> + /* first, disable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, false);
> + } else {
> + /*
> + * Teardown path: the endpoint was already removed, which
> + * tears down regions and uncommits decoders. The endpoint
> + * BI registers are no longer mapped so just clear the flag
> + * and walk the dports below.
> + */
> + cxlds->bi = false;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
And here.
> + int rc;
> +
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_ctrl_dport(dport, false);
> + if (rc)
> + return rc;
Would it be better to not return on an error here? It may be better to
disable as many decoders as possible, but I don't know enough to
say either way.
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle
2026-03-15 20:27 ` [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle Davidlohr Bueso
@ 2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 16:29 ` Jonathan Cameron
1 sibling, 0 replies; 21+ messages in thread
From: Cheatham, Benjamin @ 2026-03-20 16:20 UTC (permalink / raw)
To: Davidlohr Bueso, dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/2026 3:27 PM, Davidlohr Bueso wrote:
> Setup BI flows during cxl_mem_probe() after endpoint enumeration and
> EDAC registration, and tear down BI during endpoint detach.
>
> BI setup failure is non-fatal - the device continues to operate
> in HDM-H mode without back-invalidate support.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/mem.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index fcffe24dcb42..3adee41885c5 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -65,6 +65,11 @@ static int cxl_debugfs_poison_clear(void *data, u64 dpa)
> DEFINE_DEBUGFS_ATTRIBUTE(cxl_poison_clear_fops, NULL,
> cxl_debugfs_poison_clear, "%llx\n");
>
> +static void devm_cxl_bi_dealloc(void *data)
> +{
> + cxl_bi_dealloc(data);
> +}
> +
> static int cxl_mem_probe(struct device *dev)
> {
> struct cxl_memdev *cxlmd = to_cxl_memdev(dev);
> @@ -150,6 +155,21 @@ static int cxl_mem_probe(struct device *dev)
> if (rc)
> dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
>
> + rc = cxl_bi_setup(cxlds);
> + if (rc)
> + dev_dbg(dev, "BI setup failed rc=%d\n", rc);
> +
> + /*
> + * Register BI dealloc after setup so devm ordering ensures
> + * it runs before the endpoint is removed by delete_endpoint().
> + */
> + if (cxlds->bi) {
> + rc = devm_add_action_or_reset(dev, devm_cxl_bi_dealloc,
> + cxlds);
> + if (rc)
> + return rc;
Can I ask why this isn't at the end of cxl_bi_setup()? It would alleviate ordering
concerns and make more sense imo. The only reason I can see is that you're worried
deallocation would fail and don't want the device to be usable in a half-on/half-off
state. I'm not sure that's a concern since it looks like disabling BI on the endpoint
can't fail, at which point the hardware above would fall back to HDM-H requests?
> + }
> +
> /*
> * The kernel may be operating out of CXL memory on this device,
> * there is no spec defined way to determine whether this device
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders
2026-03-15 20:27 ` [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
@ 2026-03-20 16:20 ` Cheatham, Benjamin
0 siblings, 0 replies; 21+ messages in thread
From: Cheatham, Benjamin @ 2026-03-20 16:20 UTC (permalink / raw)
To: Davidlohr Bueso, dave.jiang, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/2026 3:27 PM, Davidlohr Bueso wrote:
> Add cxl_dpa_set_coherence() to allow setting the coherency type of an
> endpoint decoder, with validation against hardware capabilities. Other
> than the HDM decoder supported coherency models, the main dependency
> to create these regions is for the device state to be BI-ready (cxlds->bi),
> for which already committed HDM decoders with the BI bit set detected during
> enumeration is currently not supported because endpoint and port enumerations
> are independent. Program the BI bit in the HDM decoder control register at
> commit time.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/core.h | 2 ++
> drivers/cxl/core/hdm.c | 60 +++++++++++++++++++++++++++++++++++++----
> 2 files changed, 57 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 19494a8615d3..da4c094b6702 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -97,6 +97,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
> struct dentry *cxl_debugfs_create_dir(const char *dir);
> int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> enum cxl_partition_mode mode);
> +int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
> + enum cxl_decoder_type type);
> int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
> int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
> resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index b2db5967f5c0..6a6071b1f1dd 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -87,6 +87,8 @@ static void parse_hdm_decoder_caps(struct cxl_hdm *cxlhdm)
> cxlhdm->iw_cap_mask |= BIT(3) | BIT(6) | BIT(12);
> if (FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_16_WAY, hdm_cap))
> cxlhdm->iw_cap_mask |= BIT(16);
> + cxlhdm->supported_coherency =
> + FIELD_GET(CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK, hdm_cap);
> }
>
> static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
> @@ -603,6 +605,34 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
> return 0;
> }
>
> +int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
> + enum cxl_decoder_type type)
> +{
> + struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> + struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent);
> + struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> +
> + guard(rwsem_write)(&cxl_rwsem.dpa);
> + if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
> + return -EBUSY;
> +
> + if (!cxlds->bi && type == CXL_DECODER_DEVMEM)
> + return -EINVAL;
> +
> + /* Device coherent only cannot be set to host-only */
> + if (type == CXL_DECODER_HOSTONLYMEM &&
> + cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV)
> + return -EINVAL;
> + /* Host-only coherent cannot be set to device coherent */
> + if (type == CXL_DECODER_DEVMEM &&
> + cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_HOST)
> + return -EINVAL;
> +
> + cxled->cxld.target_type = type;
> + return 0;
> +}
> +
> static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
> {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> @@ -713,6 +743,9 @@ static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
> u32p_replace_bits(ctrl,
> !!(cxld->target_type == CXL_DECODER_HOSTONLYMEM),
> CXL_HDM_DECODER0_CTRL_HOSTONLY);
> + u32p_replace_bits(ctrl,
> + !!(cxld->target_type == CXL_DECODER_DEVMEM),
> + CXL_HDM_DECODER0_CTRL_BI);
> }
>
> static void cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
> @@ -1030,6 +1063,14 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> else
> cxld->target_type = CXL_DECODER_DEVMEM;
>
> + /*
> + * Autocommit BI-enabled decoders is not supported.
> + * At this point ->bi is not yet setup, so there
> + * are no guarantees that the platform supports BI.
> + */
> + if (FIELD_GET(CXL_HDM_DECODER0_CTRL_BI, ctrl))
> + return -ENXIO;
->bi here is ambiguous, I think you mean the endpoint device isn't configured yet. If so,
I'd spell it out.
Also, why don't we leave it enabled and then disable it if the device doesn't support it
as part of cxl_mem probe? It adds some unnecessary work in when it's disabled, but
allows for more configurations.
> +
> guard(rwsem_write)(&cxl_rwsem.region);
> if (cxld->id != cxl_num_decoders_committed(port)) {
> dev_warn(&port->dev,
> @@ -1049,16 +1090,24 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
> if (cxled) {
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> + struct cxl_port *ep_port = to_cxl_port(cxld->dev.parent);
> + struct cxl_hdm *cxlhdm = dev_get_drvdata(&ep_port->dev);
>
> /*
> - * Default by devtype until a device arrives that needs
> - * more precision.
> + * For type3 HDM-DB devices, users can later change the
> + * target_type, if supported by the HDM decoder.
> + *
> + * Devices that support both coherency modes default
> + * to host-only.
> */
> - if (cxlds->type == CXL_DEVTYPE_CLASSMEM)
> - cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> - else
> + if (cxlds->type == CXL_DEVTYPE_CLASSMEM) {
> + if (cxlhdm->supported_coherency == CXL_HDM_DECODER_COHERENCY_DEV)
> + cxld->target_type = CXL_DECODER_DEVMEM;
> + else
> + cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> + } else {
> cxld->target_type = CXL_DECODER_DEVMEM;
> + }
Could refactor to:
if (cxlds->type == CXL_DEVTYPE_CLASSMEM && !FIELD_GET(cxlhdm->supported_coherency, CXL_HDM_DECODER_COHERENCY_DEV))
cxld->target_type = CXL_DECODER_HOSTONLYMEM;
else
cxld->target_type = CXL_DECODER_DEVMEM;
The if statement is a bit long, but you could add a bool above to hold the second part, i.e.:
bool dev_coherent = FIELD_GET(...);
> } else {
> /* To be overridden by region type at commit time */
> cxld->target_type = CXL_DECODER_HOSTONLYMEM;
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disabl
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
@ 2026-03-20 16:27 ` Jonathan Cameron
2026-03-24 0:21 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-20 16:27 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:38 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Implement cxl_bi_setup() and cxl_bi_dealloc() which walk the CXL port
> topology to enable/disable BI flows on all components in the path.
>
> Upon a successful setup, this enablement does not influence the current
> HDM decoder setup by enabling the BI bit, and therefore the device is left
> in a BI capable state, but not making use of it in the decode coherence.
> Upon a BI-ID removal event, it is expected for the device to be offline.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Various things inline.
> ---
> drivers/cxl/core/pci.c | 339 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 339 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d1f487b3d809..5f0226397dfa 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> +static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return 0;
> +
> + if (FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_RT_CAPS_OFFSET)))
> + ___cxl_bi_commit(dev, bi, RT);
> +
> + return 0;
> +}
> +
> +static int __cxl_bi_commit(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return -EINVAL;
> +
See below. I'd do
if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
readl(bi + CXL_BI_DECODER_CAPS_OFFSET)))
___cxl_bi_commit(dev, bi, DECODER);
and call this function unconditionally.
> + ___cxl_bi_commit(dev, bi, DECODER);
> + return 0;
> +}
> +
> +/* enable or dealloc BI-ID changes in the given level of the topology */
> +static int cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> +{
> + u32 ctrl, value;
> + void __iomem *bi = dport->regs.bi;
> + struct cxl_port *port = dport->port;
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> +
> + guard(device)(&port->dev);
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + port->nr_bi++;
> +
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
> + return 0;
> +
> + value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
> + value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (WARN_ON_ONCE(port->nr_bi == 0))
> + return -EINVAL;
> + if (--port->nr_bi > 0)
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + return 0;
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + if (enable) {
This stuff is odd enough that I'd throw a reference it for table 9-13
Downtstream Port Handling of BIsnp. Confused me that we were turning
off forwarding, but we aren't. We are simply performing checks before
doing so!
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_DECODER_CAPS_OFFSET))) {
Why check externally here internally in __cxl_bi_commit_rt?
I'd do same in both cases (proabbly move the check inside for this one)
> + int rc = __cxl_bi_commit(dport->dport_dev,
> + dport->regs.bi);
> + if (rc)
> + return rc;
> + }
> +
> + return __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
> + default:
> + return -EINVAL;
> + }
> +}
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + struct cxl_dport *dport, *_dport, *failed;
> + struct cxl_port *parent_port, *port;
> + int rc;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxl_is_bi_capable(pdev, endpoint->regs.bi))
> + return 0;
> +
> + /* walkup the topology twice, first to check, then to enable */
> + port = _port;
> + dport = _dport;
> + while (1) {
> + parent_port = to_cxl_port(port->dev.parent);
> + /* check rp, dsp */
> + if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
> + dport->regs.bi))
> + return -EINVAL;
> +
> + /* check usp */
> + if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
> + PCI_EXP_TYPE_DOWNSTREAM) {
Why check the type for the dport_dev rather than
checking the type of to_pci_dev(port->uport_dev) + if it's a pci dev
in the first place?
> + if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
> + port->uport_regs.bi))
> + return -EINVAL;
> + }
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
> + parent_port = to_cxl_port(port->dev.parent);
I wonder if we can construct an iterator macro to do this walk.
Something like (completely untested and maybe horribly broken)
#define for_cxl_port_to_root(port, dport) \
for (struct cxl_port *parent_port = to_cxl_port(port->dev.parent); \
!is_cxl_root(parent_port); \
dport = port->parent_port, port = parent_port, parent_port = to_cxl_port(port->dev.parent))
used as
port = _port;
dport = _dport;
for_cxl_port_to_root(port, dport) {
if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
dport->regs.bi))
return -EINVAL;
if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
PCI_EXP_TYPE_DOWNSTREAM)
if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
port->uport_regs.bi))
return -EINVAL;
}
}
port = _port;
dport = _dport;
for_cxl_port_to_root(port, dport) {
rc = cxl_bi_ctrl_dport(dport, true);
if (rc)
goto err_rollback;
}
May not worth it if there aren't more instances of this already that
we can also use it for.
> +
> + rc = cxl_bi_ctrl_dport(dport, true);
> + if (rc)
> + goto err_rollback;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + /* finally, enable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, true);
> + return 0;
> +
> +err_rollback:
> + /*
> + * Undo all dports enabled so far by re-walking from the bottom
> + * up to (but not including) the failed dport.
> + */
> + failed = dport;
> + dport = _dport;
> + port = _port;
> + while (dport != failed) {
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + cxl_bi_ctrl_dport(dport, false);
> + if (is_cxl_root(parent_port))
> + break;
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
> +
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_port *parent_port, *port;
> + struct cxl_dport *dport, *_dport;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxlds->bi)
> + return 0;
> +
> + if (endpoint) {
> + /* ensure the device is offline and unmapped */
> + scoped_guard(rwsem_read, &cxl_rwsem.region) {
One time check enough? I've not checked but are we holding
something else to prevent a race immediately after this before
cxl_bi_ctrl_endpoint() is called?
> + if (cxl_num_decoders_committed(endpoint) > 0)
> + return -EBUSY;
> + }
> +
> + /* first, disable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, false);
> + } else {
> + /*
> + * Teardown path: the endpoint was already removed, which
> + * tears down regions and uncommits decoders. The endpoint
> + * BI registers are no longer mapped so just clear the flag
> + * and walk the dports below.
> + */
> + cxlds->bi = false;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
> + int rc;
> +
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_ctrl_dport(dport, false);
> + if (rc)
> + return rc;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle
2026-03-15 20:27 ` [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
@ 2026-03-20 16:29 ` Jonathan Cameron
1 sibling, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-20 16:29 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:39 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Setup BI flows during cxl_mem_probe() after endpoint enumeration and
> EDAC registration, and tear down BI during endpoint detach.
>
> BI setup failure is non-fatal - the device continues to operate
> in HDM-H mode without back-invalidate support.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Not sure it needed to be separate from patch 3 but it's fine as is.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface
2026-03-15 20:27 ` [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface Davidlohr Bueso
@ 2026-03-20 16:39 ` Jonathan Cameron
0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Cameron @ 2026-03-20 16:39 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On Sun, 15 Mar 2026 13:27:41 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> A single Type 3 device can expose different parts of its memory with different
> coherency semantics. For example, some memory ranges within a Type 3 device
> could be configured as HDM-H, while other memory ranges on the same device
> could be configured as HDM-DB. This allows for flexible memory configuration
> within a single device. As such, coherency models are defined per memory region.
>
> For accelerators (type2), it is expected for the respective drivers to manage
> the HDM-D[B] region creation. For type3, relevant sysfs tunables are provided
> to the user.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Seems fine to me, but I'm suffering Friday so don't really trust my
wakefulness enough to tag this version.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable
2026-03-20 16:20 ` Cheatham, Benjamin
@ 2026-03-20 20:52 ` Alison Schofield
0 siblings, 0 replies; 21+ messages in thread
From: Alison Schofield @ 2026-03-20 20:52 UTC (permalink / raw)
To: Cheatham, Benjamin
Cc: Davidlohr Bueso, dave.jiang, dan.j.williams, jonathan.cameron,
ira.weiny, gourry, dongjoo.seo1, anisa.su, linux-cxl
On Fri, Mar 20, 2026 at 11:20:05AM -0500, Cheatham, Benjamin wrote:
> On 3/15/2026 3:27 PM, Davidlohr Bueso wrote:
Littl bit-
> > + cxlds->bi = enable;
> > +
> > + dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
> > + enable ? "" : "in");
>
> Nit: It's pretty hard to tell the difference between the enable/disable case when it's only two letters.
> I'd prefer something like:
> dev_dbg(cxlds->dev, "%s issuing BI requests\n", enable ? "enabled" : "disabled");
>
> The device part isn't really important since the dev_dbg() prints the device name anyway.
>
There's a string helper for that:
dev_dbg(cxlds->dev, "%s issuing BI requests\n",
str_enabled_disabled(enable));
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
2026-03-19 16:59 ` Jonathan Cameron
2026-03-20 14:57 ` Jonathan Cameron
@ 2026-03-23 22:11 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2026-03-23 22:11 UTC (permalink / raw)
To: Davidlohr Bueso, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/26 1:27 PM, Davidlohr Bueso wrote:
> Add CXL Back-Invalidate (BI) capability IDs, register definitions for
> the BI Route Table and BI Decoder capability structures, and associated
> fields. This includes HDM decoder coherency capability and control fields
> needed to support HDM-DB (device-managed coherency with back-invalidate).
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/cxl.h | 41 +++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/cxlmem.h | 2 ++
> include/cxl/cxl.h | 5 +++++
> 3 files changed, 48 insertions(+)
>
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 031846eab02c..efe06d60b364 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -42,6 +42,8 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_CM_CAP_CAP_ID_RAS 0x2
> #define CXL_CM_CAP_CAP_ID_HDM 0x5
> #define CXL_CM_CAP_CAP_HDM_VERSION 1
> +#define CXL_CM_CAP_CAP_ID_BI_RT 0xB
> +#define CXL_CM_CAP_CAP_ID_BI_DECODER 0xC
>
> /* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
> #define CXL_HDM_DECODER_CAP_OFFSET 0x0
> @@ -51,6 +53,10 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> #define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> #define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> +#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
> +#define CXL_HDM_DECODER_COHERENCY_DEV 0x1
> +#define CXL_HDM_DECODER_COHERENCY_HOST 0x2
> +#define CXL_HDM_DECODER_COHERENCY_BOTH 0x3
> #define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> #define CXL_HDM_DECODER_ENABLE BIT(1)
> #define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> @@ -65,6 +71,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> #define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> #define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
> +#define CXL_HDM_DECODER0_CTRL_BI BIT(13)
> #define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
> #define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
> #define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
> @@ -152,6 +159,33 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
> #define CXL_HEADERLOG_SIZE SZ_512
> #define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
>
> +/* CXL 3.2 8.2.4.26 CXL BI Route Table Capability Structure */
> +#define CXL_BI_RT_CAPABILITY_LENGTH 0xC
> +#define CXL_BI_RT_CAPS_OFFSET 0x0
> +#define CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ BIT(0)
> +#define CXL_BI_RT_CTRL_OFFSET 0x4
> +#define CXL_BI_RT_CTRL_BI_COMMIT BIT(0)
> +#define CXL_BI_RT_STATUS_OFFSET 0x8
> +#define CXL_BI_RT_STATUS_BI_COMMITTED BIT(0)
> +#define CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
> +#define CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
> +#define CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
> +
> +/* CXL 3.2 8.2.4.27 CXL BI Decoder Capability Structure */
> +#define CXL_BI_DECODER_CAPABILITY_LENGTH 0xC
> +#define CXL_BI_DECODER_CAPS_OFFSET 0x0
> +#define CXL_BI_DECODER_CAPS_HDMD_CAP BIT(0)
> +#define CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ BIT(1)
> +#define CXL_BI_DECODER_CTRL_OFFSET 0x4
> +#define CXL_BI_DECODER_CTRL_BI_FW BIT(0)
> +#define CXL_BI_DECODER_CTRL_BI_ENABLE BIT(1)
> +#define CXL_BI_DECODER_CTRL_BI_COMMIT BIT(2)
> +#define CXL_BI_DECODER_STATUS_OFFSET 0x8
> +#define CXL_BI_DECODER_STATUS_BI_COMMITTED BIT(0)
> +#define CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
> +#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
> +#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
> +
> /* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
> #define CXLDEV_CAP_ARRAY_OFFSET 0x0
> #define CXLDEV_CAP_ARRAY_CAP_ID 0
> @@ -241,6 +275,7 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
> #define CXL_DECODER_F_LOCK BIT(4)
> #define CXL_DECODER_F_ENABLE BIT(5)
> #define CXL_DECODER_F_NORMALIZED_ADDRESSING BIT(6)
> +#define CXL_DECODER_F_BI BIT(7)
>
> enum cxl_decoder_type {
> CXL_DECODER_DEVMEM = 2,
> @@ -522,6 +557,7 @@ struct cxl_dax_region {
> * @decoder_ida: allocator for decoder ids
> * @reg_map: component and ras register mapping parameters
> * @regs: mapped component registers
> + * @uport_regs: mapped upstream port component registers (BI RT)
Aren't port->regs upstream port registers? Why another entry?
DJ
> * @nr_dports: number of entries in @dports
> * @hdm_end: track last allocated HDM decoder instance for allocation ordering
> * @commit_end: cursor to track highest committed decoder for commit ordering
> @@ -530,6 +566,7 @@ struct cxl_dax_region {
> * @cdat: Cached CDAT data
> * @cdat_available: Should a CDAT attribute be available in sysfs
> * @pci_latency: Upstream latency in picoseconds
> + * @nr_bi: number of BI-enabled endpoints below this port
> * @component_reg_phys: Physical address of component register
> */
> struct cxl_port {
> @@ -544,6 +581,7 @@ struct cxl_port {
> struct ida decoder_ida;
> struct cxl_register_map reg_map;
> struct cxl_component_regs regs;
> + struct cxl_component_regs uport_regs;
> int nr_dports;
> int hdm_end;
> int commit_end;
> @@ -555,6 +593,7 @@ struct cxl_port {
> } cdat;
> bool cdat_available;
> long pci_latency;
> + int nr_bi;
> resource_size_t component_reg_phys;
> };
>
> @@ -875,6 +914,8 @@ void cxl_coordinates_combine(struct access_coordinate *out,
> struct access_coordinate *c2);
>
> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
> +int cxl_bi_setup(struct cxl_dev_state *cxlds);
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
> struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port,
> struct device *dport_dev);
>
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 281546de426e..efab65f68575 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -837,6 +837,7 @@ int cxl_mem_sanitize(struct cxl_memdev *cxlmd, u16 cmd);
> * @target_count: for switch decoders, max downstream port targets
> * @interleave_mask: interleave granularity capability, see check_interleave_cap()
> * @iw_cap_mask: bitmask of supported interleave ways, see check_interleave_cap()
> + * @supported_coherency: HDM Decoder Capability supported coherency mask
> * @port: mapped cxl_port, see devm_cxl_setup_hdm()
> */
> struct cxl_hdm {
> @@ -845,6 +846,7 @@ struct cxl_hdm {
> unsigned int target_count;
> unsigned int interleave_mask;
> unsigned long iw_cap_mask;
> + unsigned int supported_coherency;
> struct cxl_port *port;
> };
>
> diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h
> index fa7269154620..74be940364e1 100644
> --- a/include/cxl/cxl.h
> +++ b/include/cxl/cxl.h
> @@ -34,10 +34,12 @@ struct cxl_regs {
> * Common set of CXL Component register block base pointers
> * @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
> * @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
> + * @bi: CXL 3.2 8.2.4.26/27 CXL BI Capability Structure
> */
> struct_group_tagged(cxl_component_regs, component,
> void __iomem *hdm_decoder;
> void __iomem *ras;
> + void __iomem *bi;
> );
> /*
> * Common set of CXL Device register block base pointers
> @@ -80,6 +82,7 @@ struct cxl_reg_map {
> struct cxl_component_reg_map {
> struct cxl_reg_map hdm_decoder;
> struct cxl_reg_map ras;
> + struct cxl_reg_map bi;
> };
>
> struct cxl_device_reg_map {
> @@ -162,6 +165,7 @@ struct cxl_dpa_partition {
> * @regs: Parsed register blocks
> * @cxl_dvsec: Offset to the PCIe device DVSEC
> * @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
> + * @bi: device is BI (Back-Invalidate) enabled
> * @media_ready: Indicate whether the device media is usable
> * @dpa_res: Overall DPA resource tree for the device
> * @part: DPA partition array
> @@ -181,6 +185,7 @@ struct cxl_dev_state {
> struct cxl_device_regs regs;
> int cxl_dvsec;
> bool rcd;
> + bool bi;
> bool media_ready;
> struct resource dpa_res;
> struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 2/6] cxl: Add BI register probing and port initialization
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
2026-03-20 15:46 ` Jonathan Cameron
2026-03-20 16:19 ` Cheatham, Benjamin
@ 2026-03-23 23:10 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2026-03-23 23:10 UTC (permalink / raw)
To: Davidlohr Bueso, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/26 1:27 PM, Davidlohr Bueso wrote:
> Add register probing for BI Route Table and BI Decoder capability
> structures in cxl_probe_component_regs(), and initialize BI registers
> during port probe for both switch ports and endpoint ports.
>
> For switch ports, map BI Decoder registers on downstream ports and
> BI Route Table registers on upstream ports. For endpoint ports, map
> the BI Decoder registers directly into the port's register block.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/regs.c | 13 ++++++
> drivers/cxl/port.c | 88 +++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 101 insertions(+)
>
> diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
> index 93710cf4f0a6..82e6018fd4cf 100644
> --- a/drivers/cxl/core/regs.c
> +++ b/drivers/cxl/core/regs.c
> @@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
> length = CXL_RAS_CAPABILITY_LENGTH;
> rmap = &map->ras;
> break;
> + case CXL_CM_CAP_CAP_ID_BI_RT:
> + dev_dbg(dev, "found BI RT capability (0x%x)\n",
> + offset);
> + length = CXL_BI_RT_CAPABILITY_LENGTH;
> + rmap = &map->bi;
> + break;
> + case CXL_CM_CAP_CAP_ID_BI_DECODER:
> + dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
> + offset);
> + length = CXL_BI_DECODER_CAPABILITY_LENGTH;
> + rmap = &map->bi;
> + break;
> default:
> dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
> offset);
> @@ -211,6 +223,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
> } mapinfo[] = {
> { &map->component_map.hdm_decoder, ®s->hdm_decoder },
> { &map->component_map.ras, ®s->ras },
> + { &map->component_map.bi, ®s->bi },
> };
> int i;
>
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index ada51948d52f..0540f0681ffb 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> @@ -58,6 +58,90 @@ static int discover_region(struct device *dev, void *unused)
> return 0;
> }
>
> +static int cxl_dport_init_bi(struct cxl_dport *dport)
> +{
> + struct cxl_register_map *map = &dport->reg_map;
> + struct device *dev = dport->dport_dev;
> +
> + if (dport->regs.bi)
> + return 0;
> +
> + if (!cxl_pci_flit_256(to_pci_dev(dev)))
> + return 0;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(dev, "BI decoder registers not found\n");
> + return 0;
> + }
> +
> + if (cxl_map_component_regs(map, &dport->regs.component,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER))) {
> + dev_dbg(dev, "Failed to map BI decoder capability.\n");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static void cxl_uport_init_bi(struct cxl_port *port, struct device *host)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + if (port->uport_regs.bi)
> + return;
> +
> + if (!map->component_map.bi.valid) {
> + dev_dbg(host, "BI RT registers not found\n");
> + return;
> + }
> +
> + map->host = host;
> + if (cxl_map_component_regs(map, &port->uport_regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_RT)))
> + dev_dbg(&port->dev, "Failed to map BI RT capability\n");
> +}
> +
> +static void cxl_endpoint_init_bi(struct cxl_port *port)
> +{
> + struct cxl_register_map *map = &port->reg_map;
> +
> + cxl_dport_init_bi(port->parent_dport);
I'm not sure this looks right. The parent dport of an endpoint would be the dport of a switch or a root port. Wouldn't this already been probed when the RP or switch is being probed for BI? To init the endpoint BI, wouldn't you want to init the BI registers on the device? And therefore shouldn't it be be cxl_uport_init_bi(endpoint_port, host)?
DJ
> +
> + if (!map->component_map.bi.valid)
> + return;
> +
> + if (cxl_map_component_regs(map, &port->regs,
> + BIT(CXL_CM_CAP_CAP_ID_BI_DECODER)))
> + dev_dbg(&port->dev, "Failed to map BI decoder capability\n");
> +}
> +
> +static void cxl_switch_port_init_bi(struct cxl_port *port)
> +{
> + struct cxl_dport *parent_dport = port->parent_dport;
> +
> + if (is_cxl_root(to_cxl_port(port->dev.parent)))
> + return;
> +
> + if (dev_is_pci(port->uport_dev) &&
> + !cxl_pci_flit_256(to_pci_dev(port->uport_dev)))
> + return;
> +
> + if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
> + struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + cxl_dport_init_bi(parent_dport);
> + break;
> + default:
> + break;
> + }
> + }
> +
> + cxl_uport_init_bi(port, &port->dev);
> +}
> +
> static int cxl_switch_port_probe(struct cxl_port *port)
> {
> /* Reset nr_dports for rebind of driver */
> @@ -66,6 +150,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
> /* Cache the data early to ensure is_visible() works */
> read_cdat_data(port);
>
> + cxl_switch_port_init_bi(port);
> +
> return 0;
> }
>
> @@ -128,6 +214,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
> read_cdat_data(port);
> cxl_endpoint_parse_cdat(port);
>
> + cxl_endpoint_init_bi(port);
> +
> get_device(&cxlmd->dev);
> rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
> if (rc)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 16:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disabl Jonathan Cameron
@ 2026-03-24 0:21 ` Dave Jiang
2 siblings, 0 replies; 21+ messages in thread
From: Dave Jiang @ 2026-03-24 0:21 UTC (permalink / raw)
To: Davidlohr Bueso, dan.j.williams
Cc: jonathan.cameron, alison.schofield, ira.weiny, gourry,
dongjoo.seo1, anisa.su, linux-cxl
On 3/15/26 1:27 PM, Davidlohr Bueso wrote:
> Implement cxl_bi_setup() and cxl_bi_dealloc() which walk the CXL port
> topology to enable/disable BI flows on all components in the path.
>
> Upon a successful setup, this enablement does not influence the current
> HDM decoder setup by enabling the BI bit, and therefore the device is left
> in a BI capable state, but not making use of it in the decode coherence.
> Upon a BI-ID removal event, it is expected for the device to be offline.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> drivers/cxl/core/pci.c | 339 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 339 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index d1f487b3d809..5f0226397dfa 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -926,3 +926,342 @@ int cxl_port_get_possible_dports(struct cxl_port *port)
>
> return ctx.count;
> }
> +
> +static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
> +{
> + if (!cxl_pci_flit_256(pdev))
> + return false;
> +
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
> + dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
> + return false;
> + }
> +
> + return true;
> +}
> +
> +/* limit any insane timeouts from hw */
> +#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
> +
> +static unsigned long __cxl_bi_get_timeout_us(struct device *dev,
> + int scale, int base)
> +{
> + static const unsigned long scale_tbl[] = {
> + 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,
> + };
> +
> + if (scale >= ARRAY_SIZE(scale_tbl)) {
> + dev_dbg(dev, "Invalid BI commit timeout scale: %d\n", scale);
> + return CXL_BI_COMMIT_MAXTMO_US;
> + }
> +
> + return scale_tbl[scale] * base;
> +}
> +
> +#define ___cxl_bi_commit(dev, bi, ctype) \
> +do { \
> + u32 status, ctrl; \
> + int scale, base; \
> + ktime_t tmo, now, start; \
> + unsigned long poll_us, tmo_us; \
> + \
> + ctrl = readl(bi + CXL_BI_##ctype##_CTRL_OFFSET); \
> + writel(ctrl & ~CXL_BI_##ctype##_CTRL_BI_COMMIT, \
> + (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
> + writel(ctrl | CXL_BI_##ctype##_CTRL_BI_COMMIT, \
> + (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
> + \
> + status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
> + scale = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_SCALE, \
> + status); \
> + base = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_BASE, \
> + status); \
> + \
> + /* ... and poll */ \
> + tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US, \
> + __cxl_bi_get_timeout_us((dev), scale, base)); \
> + poll_us = tmo_us / 10; /* arbitrary 10% of timeout */ \
> + start = now = ktime_get(); \
> + tmo = ktime_add_us(now, tmo_us); \
> + while (!FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMITTED, status) && \
> + !FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, \
> + status)) { \
> + if (ktime_after(now, tmo)) { \
> + dev_dbg((dev), "BI-ID commit timed out (%luus)\n", \
> + tmo_us); \
> + return -ETIMEDOUT; \
> + } \
> + \
> + fsleep(poll_us); \
> + now = ktime_get(); \
> + status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
> + } \
> + \
> + if (FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, status)) \
> + return -EIO; \
> + \
> + dev_dbg((dev), "BI-ID commit wait took %lluus\n", \
> + ktime_to_us(ktime_sub(now, start))); \
> +} while (0)
This is pretty ugly. Can you use a macro to do string subst within this function? Maybe something that spits back out the identifier that concats everything together like:
#define CONCAT_CXL_BI(ctype, suffix) CXL_BI_##ctype##_##suffix
DJ
> +
> +static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return 0;
> +
> + if (FIELD_GET(CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_RT_CAPS_OFFSET)))
> + ___cxl_bi_commit(dev, bi, RT);
> +
> + return 0;
> +}
> +
> +static int __cxl_bi_commit(struct device *dev, void __iomem *bi)
> +{
> + if (!bi)
> + return -EINVAL;
> +
> + ___cxl_bi_commit(dev, bi, DECODER);
> + return 0;
> +}
> +
> +/* enable or dealloc BI-ID changes in the given level of the topology */
> +static int cxl_bi_ctrl_dport(struct cxl_dport *dport, bool enable)
> +{
> + u32 ctrl, value;
> + void __iomem *bi = dport->regs.bi;
> + struct cxl_port *port = dport->port;
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> +
> + guard(device)(&port->dev);
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + port->nr_bi++;
> +
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
> + return 0;
> +
> + value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
> + value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (WARN_ON_ONCE(port->nr_bi == 0))
> + return -EINVAL;
> + if (--port->nr_bi > 0)
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + return 0;
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + if (enable) {
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ,
> + readl(bi + CXL_BI_DECODER_CAPS_OFFSET))) {
> + int rc = __cxl_bi_commit(dport->dport_dev,
> + dport->regs.bi);
> + if (rc)
> + return rc;
> + }
> +
> + return __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +static int cxl_bi_ctrl_endpoint(struct cxl_dev_state *cxlds, bool enable)
> +{
> + u32 ctrl, val;
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + void __iomem *bi = endpoint->regs.bi;
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (enable) {
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(!cxlds->bi);
> + return 0;
> + }
> + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(cxlds->bi);
> + return 0;
> + }
> + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + cxlds->bi = enable;
> +
> + dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
> + enable ? "" : "in");
> +
> + return 0;
> +}
> +
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct cxl_port *endpoint = cxlds->cxlmd->endpoint;
> + struct cxl_dport *dport, *_dport, *failed;
> + struct cxl_port *parent_port, *port;
> + int rc;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(pdev, &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxl_is_bi_capable(pdev, endpoint->regs.bi))
> + return 0;
> +
> + /* walkup the topology twice, first to check, then to enable */
> + port = _port;
> + dport = _dport;
> + while (1) {
> + parent_port = to_cxl_port(port->dev.parent);
> + /* check rp, dsp */
> + if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
> + dport->regs.bi))
> + return -EINVAL;
> +
> + /* check usp */
> + if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
> + PCI_EXP_TYPE_DOWNSTREAM) {
> + if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
> + port->uport_regs.bi))
> + return -EINVAL;
> + }
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_ctrl_dport(dport, true);
> + if (rc)
> + goto err_rollback;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + /* finally, enable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, true);
> + return 0;
> +
> +err_rollback:
> + /*
> + * Undo all dports enabled so far by re-walking from the bottom
> + * up to (but not including) the failed dport.
> + */
> + failed = dport;
> + dport = _dport;
> + port = _port;
> + while (dport != failed) {
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + cxl_bi_ctrl_dport(dport, false);
> + if (is_cxl_root(parent_port))
> + break;
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> + return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
> +
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_port *parent_port, *port;
> + struct cxl_dport *dport, *_dport;
> +
> + struct cxl_port *_port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxlds->bi)
> + return 0;
> +
> + if (endpoint) {
> + /* ensure the device is offline and unmapped */
> + scoped_guard(rwsem_read, &cxl_rwsem.region) {
> + if (cxl_num_decoders_committed(endpoint) > 0)
> + return -EBUSY;
> + }
> +
> + /* first, disable BI on the device */
> + cxl_bi_ctrl_endpoint(cxlds, false);
> + } else {
> + /*
> + * Teardown path: the endpoint was already removed, which
> + * tears down regions and uncommits decoders. The endpoint
> + * BI registers are no longer mapped so just clear the flag
> + * and walk the dports below.
> + */
> + cxlds->bi = false;
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
> + int rc;
> +
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_ctrl_dport(dport, false);
> + if (rc)
> + return rc;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-03-24 0:21 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-15 20:27 [PATCH 0/6] cxl: Support Back-Invalidate Davidlohr Bueso
2026-03-15 20:27 ` [PATCH 1/6] cxl: Add Back-Invalidate register definitions and structures Davidlohr Bueso
2026-03-19 16:59 ` Jonathan Cameron
2026-03-20 14:57 ` Jonathan Cameron
2026-03-23 22:11 ` Dave Jiang
2026-03-15 20:27 ` [PATCH 2/6] cxl: Add BI register probing and port initialization Davidlohr Bueso
2026-03-20 15:46 ` Jonathan Cameron
2026-03-20 16:19 ` Cheatham, Benjamin
2026-03-23 23:10 ` Dave Jiang
2026-03-15 20:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 20:52 ` Alison Schofield
2026-03-20 16:27 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disabl Jonathan Cameron
2026-03-24 0:21 ` [PATCH 3/6] cxl/pci: Add Back-Invalidate topology enable/disable Dave Jiang
2026-03-15 20:27 ` [PATCH 4/6] cxl: Wire BI setup and dealloc into device lifecycle Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-20 16:29 ` Jonathan Cameron
2026-03-15 20:27 ` [PATCH 5/6] cxl/hdm: Add BI coherency support for endpoint decoders Davidlohr Bueso
2026-03-20 16:20 ` Cheatham, Benjamin
2026-03-15 20:27 ` [PATCH 6/6] cxl: Add HDM-DB region creation and sysfs interface Davidlohr Bueso
2026-03-20 16:39 ` Jonathan Cameron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox