* [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate
@ 2025-08-12 1:02 Davidlohr Bueso
2025-08-12 1:02 ` [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup Davidlohr Bueso
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Davidlohr Bueso @ 2025-08-12 1:02 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl, dave
Hello,
The following is some initial plumbing to enabling HDM-DB in Linux. This model allows devices,
specifically Type 2 and Type 3 devices, to expose their local memory to the host CPU in a
coherent manner, In alignment with what was discussed at last year's LPC type2 support session,
this series takes the type3 memory expander approach, which is more direct.
While this is an early RFC and I'm sure many thoughts, the next phase of this would be to
integrate Bi with Alejandro's Type2 work as well as with Jonathan's Cache Coherency subsystem
series (aka memregion inv)... this might be a good topic for the upcoming LPC's devmem session:
https://lore.kernel.org/linux-cxl/20250624141355.269056-1-alejandro.lucero-palau@amd.com
https://lore.kernel.org/linux-cxl/20250624154805.66985-1-Jonathan.Cameron@huawei.com
o Patch 1 adds the BI cachemem register discovery along with two interfaces around cxlds to allow
the setup and deallocations of BI-IDs. The idea is for type3 memdevs and future type2 devices
to make use of cxlds->bi when committing HDM decoders, such that different device coherence models
can be differentiated as:
type2 hdm-db: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == true
type2 hdm-d: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == false
type3 hdm-h: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == false
type3 hdm-db: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == true
Because ->bi becoming true does not depend on auto-committing upon HDM decoder port/enumeration
(port driver), for now this is set as unsupported and will error out when initializing the HDM
decoder that has its BI bit set.
o Patch 2 renames/updates some of the CXL Window coherency restrictions. This should be picked
up regardless as the spec has been updated already.
o Patch 3 deals with the HDM decoder programming changes around whether or not to set the
BI bit. Based on the model above, decoder target types are straightforward: DEVMEM or HOSTONLY
for type2 and regular type3, but for type3 HDM-DB, this is not as clear, for which this patch
will 1) rely on the HDM capability for supporting coherence models, and 2) allow, when possible,
to change it by the user when configuring the BI-capable HDM decoder. It gives the user sysfs
tools to create BI-enabled memory regions (see testing below).
Testing has been done with the qemu hdm-db type3 counterpart:
https://lore.kernel.org/linux-cxl/20250806055708.196851-1-dave@stgolabs.net/
1. BI discovery + BI-ID setup flow.
----------------------------------
i. 1 direct attached + 1 regular (hdm-h)
-+-[0000:00]-+-00.0
| +-01.0
| +-02.0
| +-03.0
| +-1f.0
| +-1f.2
| \-1f.3
\-[0000:0c]-+-00.0-[0d]----00.0
\-01.0-[0e]----00.0
[ 0.799322] cxl_core:cxl_probe_component_regs:102: pci 0000:0c:00.0: found BI Decoder capability (0xab4)
[ 0.805787] cxl_core:cxl_probe_component_regs:102: pci 0000:0c:01.0: found BI Decoder capability (0xab4)
[ 1.862826] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0d:00.0: found BI Decoder capability (0xab4)
[ 1.944015] cxl_core:__cxl_is_bi_capable:1115: cxl_pci 0000:0e:00.0: No BI Decoder registers.
[ 1.994400] cxl_core:__cxl_bi_decoder_endpoint_enable:1279: cxl_pci 0000:0d:00.0: device capable of issuing BI requests
ii. 4 attached hdm-db to a 4x switch under the rp
-+-[0000:00]-+-00.0
| +-01.0
| +-02.0
| +-03.0
| +-1f.0
| +-1f.2
| \-1f.3
\-[0000:0c]-+-00.0-[0d-12]----00.0-[0e-12]--+-00.0-[0f]----00.0
| +-01.0-[10]----00.0
| +-02.0-[11]----00.0
| \-03.0-[12]----00.0
\-01.0-[13]--
[ 1.866181] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:10:00.0: found BI Decoder capability (0xab4)
[ 1.871194] cxl_core:cxl_probe_component_regs:102: cxl_pci 0000:0f:00.0: found BI Decoder capability (0xab4)
[ 1.891820] cxl_core:cxl_probe_component_regs:96: cxl port2: found BI RT capability (0xaa8)
[ 1.896952] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:00.0: found BI Decoder capability (0xab4)
[ 1.897578] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:01.0: found BI Decoder capability (0xab4)
[ 1.906578] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:02.0: found BI Decoder capability (0xab4)
[ 2.036098] cxl_core:cxl_probe_component_regs:102: pcieport 0000:0e:03.0: found BI Decoder capability (0xab4)
[ 3.740107] cxl_core:__cxl_bi_commit:1180: pcieport 0000:0e:03.0: BI-ID commit wait took 212059us
[ 3.944239] cxl_core:__cxl_bi_commit_rt:1170: cxl_port port2: BI-ID commit wait took 202985us
[ 3.950446] cxl_core:__cxl_bi_decoder_endpoint_enable:1278: cxl_pci 0000:12:00.0: device capable of issuing BI requests
[ 4.192058] cxl_core:__cxl_bi_commit:1180: pcieport 0000:0e:02.0: BI-ID commit wait took 207341us
[ 4.400570] cxl_core:__cxl_bi_commit_rt:1170: cxl_port port2: BI-ID commit wait took 207559us
[ 4.411170] cxl_core:__cxl_bi_decoder_endpoint_enable:1278: cxl_pci 0000:11:00.0: device capable of issuing BI requests
[ 4.664350] cxl_core:__cxl_bi_commit:1180: pcieport 0000:0e:01.0: BI-ID commit wait took 205648us
[ 4.872299] cxl_core:__cxl_bi_commit_rt:1170: cxl_port port2: BI-ID commit wait took 204156us
[ 4.884823] cxl_core:__cxl_bi_decoder_endpoint_enable:1278: cxl_pci 0000:10:00.0: device capable of issuing BI requests
[ 5.128481] cxl_core:__cxl_bi_commit:1180: pcieport 0000:0e:00.0: BI-ID commit wait took 203259us
[ 5.336216] cxl_core:__cxl_bi_commit_rt:1170: cxl_port port2: BI-ID commit wait took 204688us
[ 5.341730] cxl_core:__cxl_bi_decoder_endpoint_enable:1278: cxl_pci 0000:0f:00.0: device capable of issuing BI requests
# echo 1 > /sys/bus/pci/devices/0000\:0f\:00.0/remove
[ 52.280098] cxl_core:__cxl_bi_commit:1180: pcieport 0000:0e:00.0: BI-ID commit wait took 204661us
[ 52.488101] cxl_core:__cxl_bi_commit_rt:1170: cxl_port port2: BI-ID commit wait took 206595us
[ 52.497619] cxl_core:cxl_detach_ep:1500: cxl_mem mem0: disconnect mem0 from port2
[ 52.500290] cxl_core:cxl_detach_ep:1500: cxl_mem mem0: disconnect mem0 from port1
2. HDM Decoder with BI through ad-hoc region creation.
------------------------------------------------------
# cxl list -D
[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":10737418240,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
Program the endpoint decoder
# echo ram > /sys/bus/cxl/devices/decoder2.0/mode
# echo 1 > /sys/bus/cxl/devices/decoder2.0/bi
# echo 0x40000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
Create a region in the root decoder
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_bi_region
[ 154.073077] cxl_core:cxl_region_can_probe:3588: cxl_region region0: config state: 0
[ 154.073495] cxl_core:cxl_bus_probe:2125: cxl_region region0: probe: -6
[ 154.073939] cxl_core:devm_cxl_add_region:2563: cxl_acpi ACPI0017:00: decoder0.0: created region0
Configure the region with the same IG, IW the root and endpoint decoders
# echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
# echo 0x40000000 > /sys/bus/cxl/devices/region0/size
Link the endpoint decoder as a target in the region
# echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
[ 177.351922] cxl_core:cxl_port_attach_region:1174: cxl region0: mem0:endpoint2 decoder2.0 add: mem0:decoder2.0 @ 0 next: none nr_eps: 1 nr_targets: 1
[ 177.353539] cxl_core:cxl_port_attach_region:1174: cxl region0: pci0000:0c:port1 decoder1.0 add: mem0:decoder2.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1
[ 177.354133] cxl_core:cxl_port_setup_targets:1494: cxl region0: pci0000:0c:port1 iw: 1 ig: 256
[ 177.354508] cxl_core:cxl_port_setup_targets:1518: cxl region0: pci0000:0c:port1 target[0] = 0000:0c:00.0 for mem0:decoder2.0 @ 0
[ 177.355182] cxl_core:cxl_calc_interleave_pos:1885: cxl_mem mem0: decoder:decoder2.0 parent:0000:0d:00.0 port:endpoint2 range:0x120000000-0x15fffffff pos:0
[ 177.355793] cxl_core:cxl_region_attach:2087: cxl decoder2.0: Test cxl_calc_interleave_pos(): success test_pos:0 cxled->pos:0
Commit the changes
# echo 1 > /sys/bus/cxl/devices/region0/commit
[ 183.126515] cxl region0: Bypassing cpu_cache_invalidate_memregion() for testing!
non interleaved decoder 120000000 40000000 0
non interleaved decoder 120000000 40000000 0
# cxl list -D (tooling still needs updated to show decoder target type)
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":9395240960,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4831838208,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4831838208,
"size":1073741824,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":1073741824,
"mode":"ram"
}
]
}
]
ii. On a type3 HDM-DB volatile device, create one BI and one regular region.
# echo ram > /sys/bus/cxl/devices/decoder2.0/mode
# echo 1 > /sys/bus/cxl/devices/decoder2.0/bi
# echo 0x20000000 > /sys/bus/cxl/devices/decoder2.0/dpa_size
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_bi_region
# echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
# echo 0x20000000 > /sys/bus/cxl/devices/region0/size
# echo decoder2.0 > /sys/bus/cxl/devices/region0/target0
# echo 1 > /sys/bus/cxl/devices/region0/commit
# echo ram > /sys/bus/cxl/devices/decoder2.1/mode
# echo 0 > /sys/bus/cxl/devices/decoder2.1/bi (this is already the default)
# echo 0x20000000 > /sys/bus/cxl/devices/decoder2.1/dpa_size
# echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
# echo 256 > /sys/bus/cxl/devices/region1/interleave_granularity
# echo 1 > /sys/bus/cxl/devices/region1/interleave_ways
# echo 0x20000000 > /sys/bus/cxl/devices/region1/size
# echo decoder2.1 > /sys/bus/cxl/devices/region1/target0
# echo 1 > /sys/bus/cxl/devices/region1/commit
# cxl list -D
[
{
"root decoders":[
{
"decoder":"decoder0.0",
"resource":4563402752,
"size":10737418240,
"interleave_ways":1,
"max_available_extent":9395240960,
"pmem_capable":true,
"volatile_capable":true,
"accelmem_capable":true,
"nr_targets":1
}
]
},
{
"port decoders":[
{
"decoder":"decoder1.0",
"resource":4831838208,
"size":536870912,
"interleave_ways":1,
"region":"region0",
"nr_targets":1
},
{
"decoder":"decoder1.1",
"resource":5368709120,
"size":536870912,
"interleave_ways":1,
"region":"region1",
"nr_targets":1
}
]
},
{
"endpoint decoders":[
{
"decoder":"decoder2.0",
"resource":4831838208,
"size":536870912,
"interleave_ways":1,
"region":"region0",
"dpa_resource":0,
"dpa_size":536870912,
"mode":"ram"
},
{
"decoder":"decoder2.1",
"resource":5368709120,
"size":536870912,
"interleave_ways":1,
"region":"region1",
"dpa_resource":536870912,
"dpa_size":536870912,
"mode":"ram"
}
]
}
]
iii. Detect mismatch between decoder and region types.
[ 217.911488] cxl_core:cxl_region_attach:1977: cxl region1: mem0:decoder2.0 type mismatch: 2 vs 3
[ 23.269994] cxl_core:cxl_region_attach:1987: cxl region0: mem0:decoder2.0 type mismatch: 3 vs 2 (no HDM-DB device)
Applies against the 'next' branch of cxl.git.
Thanks!
Davidlohr Bueso (3):
cxl/pci: Back-Invalidate device discovery and setup
acpi, tables: Rename coherency CFMW restrictions
cxl: Support creating HDM-DB regions
Documentation/ABI/testing/sysfs-bus-cxl | 41 ++-
drivers/cxl/acpi.c | 6 +-
drivers/cxl/core/core.h | 4 +
drivers/cxl/core/hdm.c | 56 ++++-
drivers/cxl/core/pci.c | 318 ++++++++++++++++++++++++
drivers/cxl/core/port.c | 57 ++++-
drivers/cxl/core/region.c | 57 ++++-
drivers/cxl/core/regs.c | 13 +
drivers/cxl/cxl.h | 47 +++-
drivers/cxl/cxlmem.h | 3 +
drivers/cxl/mem.c | 2 +
drivers/cxl/pci.c | 5 +
drivers/cxl/port.c | 67 +++++
include/acpi/actbl1.h | 5 +-
tools/testing/cxl/test/cxl.c | 18 +-
15 files changed, 661 insertions(+), 38 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup
2025-08-12 1:02 [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Davidlohr Bueso
@ 2025-08-12 1:02 ` Davidlohr Bueso
2025-08-15 15:20 ` Jonathan Cameron
2025-08-12 1:02 ` [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions Davidlohr Bueso
` (2 subsequent siblings)
3 siblings, 1 reply; 8+ messages in thread
From: Davidlohr Bueso @ 2025-08-12 1:02 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl, dave
Introduce two calls to enable/setup and disable/dealloc BI flows,
adding the general cachemem register plumbing.
int cxl_bi_setup(struct cxl_dev_state *cxlds);
int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
Both do the required hierarchy iteration ensuring it is safe to
enable/disable BI for every component. Upon a successful setup,
this enablement does not influence the current HDM decoder setup
by enabling the BI bit, and therefore the device is left in a
BI capable state, but not making use of it in the decode coherence.
Upon a BI-ID removal event, it is expected for the device to
be offline.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/core/pci.c | 318 ++++++++++++++++++++++++++++++++++++++++
drivers/cxl/core/port.c | 2 +
drivers/cxl/core/regs.c | 13 ++
drivers/cxl/cxl.h | 40 +++++
drivers/cxl/cxlmem.h | 2 +
drivers/cxl/mem.c | 2 +
drivers/cxl/pci.c | 5 +
drivers/cxl/port.c | 67 +++++++++
8 files changed, 449 insertions(+)
diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
index b50551601c2e..d0218e240f0e 100644
--- a/drivers/cxl/core/pci.c
+++ b/drivers/cxl/core/pci.c
@@ -1069,6 +1069,324 @@ int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c)
return 0;
}
+static bool cxl_is_bi_capable(struct pci_dev *pdev, void __iomem *bi)
+{
+ if (!cxl_pci_flit_256(pdev))
+ return false;
+
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM && !bi) {
+ dev_dbg(&pdev->dev, "No BI Decoder registers.\n");
+ return false;
+ }
+
+ return true;
+}
+
+/* limit any insane timeouts from hw */
+#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
+
+static unsigned long __cxl_bi_get_timeout_us(int scale, int base)
+{
+ unsigned long tmo;
+
+ switch (scale) {
+ case 0: /* 1 us */
+ tmo = 1;
+ break;
+ case 1: /* 10 us */
+ tmo = 10UL;
+ break;
+ case 2: /* 100 us */
+ tmo = 100UL;
+ break;
+ case 3: /* 1 ms */
+ tmo = 1000UL;
+ break;
+ case 4: /* 10 ms */
+ tmo = 10000UL;
+ break;
+ case 5: /* 100 ms */
+ tmo = 100000UL;
+ break;
+ case 6: /* 1 s */
+ tmo = 1000000UL;
+ break;
+ case 7: /* 10 s */
+ tmo = 10000000UL;
+ break;
+ default:
+ tmo = 0;
+ break;
+ }
+
+ return tmo * base;
+}
+
+#define ___cxl_bi_commit(dev, bi, ctype) \
+do { \
+ u32 status, ctrl; \
+ int scale, base; \
+ ktime_t tmo, now, start; \
+ unsigned long poll_us, tmo_us; \
+ \
+ ctrl = readl(bi + CXL_BI_##ctype##_CTRL_OFFSET); \
+ writel(ctrl & ~CXL_BI_##ctype##_CTRL_BI_COMMIT, \
+ (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
+ writel(ctrl | CXL_BI_##ctype##_CTRL_BI_COMMIT, \
+ (bi) + CXL_BI_##ctype##_CTRL_OFFSET); \
+ \
+ status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
+ scale = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_SCALE, \
+ status); \
+ base = FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMIT_TM_BASE, \
+ status); \
+ \
+ /* ... and poll */ \
+ tmo_us = min_t(unsigned long, CXL_BI_COMMIT_MAXTMO_US, \
+ __cxl_bi_get_timeout_us(scale, base)); \
+ poll_us = tmo_us / 10; /* arbitrary 10% of timeout */ \
+ start = now = ktime_get(); \
+ tmo = ktime_add_us(now, tmo_us); \
+ while (!FIELD_GET(CXL_BI_##ctype##_STATUS_BI_COMMITTED, status) && \
+ !FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, \
+ status)) { \
+ if (ktime_after(now, tmo)) { \
+ dev_dbg((dev), "BI-ID commit timedout (%luus)", \
+ tmo_us); \
+ return -ETIMEDOUT; \
+ } \
+ \
+ fsleep(poll_us); \
+ now = ktime_get(); \
+ status = readl((bi) + CXL_BI_##ctype##_STATUS_OFFSET); \
+ } \
+ \
+ if (FIELD_GET(CXL_BI_##ctype##_STATUS_BI_ERR_NOT_COMMITTED, status)) \
+ return -EIO; \
+ \
+ dev_dbg((dev), "BI-ID commit wait took %lluus", \
+ ktime_to_us(ktime_sub(now, start))); \
+} while (0)
+
+static int __cxl_bi_commit_rt(struct device *dev, void __iomem *bi)
+{
+ if (bi) /* optional */
+ ___cxl_bi_commit(dev, bi, RT);
+
+ return 0;
+}
+
+static int __cxl_bi_commit(struct device *dev, void __iomem *bi)
+{
+ if (!bi)
+ return -EINVAL;
+
+ ___cxl_bi_commit(dev, bi, DECODER);
+ return 0;
+}
+
+/* enable or dealloc BI-ID changes in the given level of the topology */
+static int cxl_bi_toggle_dport(struct cxl_dport *dport, bool enable)
+{
+ u32 cap, ctrl, value;
+ void __iomem *bi = dport->regs.bi;
+ struct cxl_port *port = dport->port;
+ struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
+
+ guard(device)(&port->dev);
+
+ if (!bi)
+ return -EINVAL;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
+ if (enable) {
+ /*
+ * There is no point of failure from here on,
+ * BI will be enabled on the endpoint device.
+ */
+ port->nr_bi++;
+
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
+ return 0;
+
+ value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
+ value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (--port->nr_bi > 0)
+ return 0;
+
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
+ int rc;
+
+ if (enable) {
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
+ value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
+ return 0;
+
+ value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ }
+
+ writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ cap = readl(bi + CXL_BI_DECODER_CAPS_OFFSET);
+ if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ, cap)) {
+ rc = __cxl_bi_commit(dport->dport_dev, dport->regs.bi);
+ if (rc)
+ return rc;
+ }
+
+ /* ... and the routing table, if any */
+ rc = __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
+ if (rc)
+ return rc;
+ } else
+ return -EINVAL;
+
+ return 0;
+}
+
+static int cxl_bi_toggle_endpoint(struct cxl_dev_state *cxlds, bool enable)
+{
+ u32 ctrl, val;
+ void __iomem *bi = cxlds->regs.bi;
+
+ ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
+
+ if (enable) {
+ if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ WARN_ON_ONCE(!cxlds->bi);
+ return 0;
+ }
+ val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
+ } else {
+ if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
+ WARN_ON_ONCE(cxlds->bi);
+ return 0;
+ }
+ val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
+ }
+
+ writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
+ cxlds->bi = enable;
+
+ dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
+ enable ? "":"in");
+
+ return 0;
+}
+
+int cxl_bi_setup(struct cxl_dev_state *cxlds)
+{
+ struct pci_dev *pdev = to_pci_dev(cxlds->dev);
+ struct cxl_dport *dport, *_dport;
+ struct cxl_port *parent_port, *port, *_port __free(put_cxl_port) =
+ cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
+
+ if (!_port)
+ return -EINVAL;
+
+ if (!cxl_is_bi_capable(pdev, cxlds->regs.bi))
+ return 0;
+
+ /* walkup the topology twice, first to check, then to enable */
+ port = _port;
+ dport = _dport;
+ while (1) {
+ parent_port = to_cxl_port(port->dev.parent);
+ /* check rp, dsp */
+ if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
+ dport->regs.bi))
+ return -EINVAL;
+
+ /* check usp */
+ if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
+ PCI_EXP_TYPE_DOWNSTREAM) {
+ if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
+ port->uport_regs.bi))
+ return -EINVAL;
+ }
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ port = _port;
+ dport = _dport;
+ while (1) {
+ int rc;
+
+ parent_port = to_cxl_port(port->dev.parent);
+
+ rc = cxl_bi_toggle_dport(dport, true);
+ if (rc)
+ return rc;
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ /* finally, enable BI on the device */
+ cxl_bi_toggle_endpoint(cxlds, true);
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
+
+int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
+{
+ struct cxl_dport *dport;
+ struct cxl_memdev *cxlmd = cxlds->cxlmd;
+ struct cxl_port *endpoint = cxlmd->endpoint;
+ struct cxl_port *parent_port, *port __free(put_cxl_port) =
+ cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
+
+ if (!port)
+ return -EINVAL;
+
+ /* ensure the device is offline and unmapped */
+ if (!endpoint || cxl_num_decoders_committed(endpoint) > 0)
+ return -EBUSY;
+
+ if (!cxlds->bi)
+ return 0;
+
+ /* first, disable BI on the device */
+ cxl_bi_toggle_endpoint(cxlds, false);
+
+ while (1) {
+ int rc;
+
+ parent_port = to_cxl_port(port->dev.parent);
+
+ rc = cxl_bi_toggle_dport(dport, false);
+ if (rc)
+ return rc;
+
+ if (is_cxl_root(parent_port))
+ break;
+
+ dport = port->parent_dport;
+ port = parent_port;
+ }
+
+ cxlds->bi = false;
+ return 0;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
+
/*
* Set max timeout such that platforms will optimize GPF flow to avoid
* the implied worst-case scenario delays. On a sane platform, all
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 29197376b18e..38f74bcbfec2 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -1470,6 +1470,8 @@ static void cxl_detach_ep(void *data)
{
struct cxl_memdev *cxlmd = data;
+ cxl_bi_dealloc(cxlmd->cxlds);
+
for (int i = cxlmd->depth - 1; i >= 1; i--) {
struct cxl_port *port, *parent_port;
struct detach_ctx ctx = {
diff --git a/drivers/cxl/core/regs.c b/drivers/cxl/core/regs.c
index 5ca7b0eed568..19f4572981ab 100644
--- a/drivers/cxl/core/regs.c
+++ b/drivers/cxl/core/regs.c
@@ -92,6 +92,18 @@ void cxl_probe_component_regs(struct device *dev, void __iomem *base,
length = CXL_RAS_CAPABILITY_LENGTH;
rmap = &map->ras;
break;
+ case CXL_CM_CAP_CAP_ID_BI_RT:
+ dev_dbg(dev, "found BI RT capability (0x%x)\n",
+ offset);
+ length = CXL_BI_RT_CAPABILITY_LENGTH;
+ rmap = &map->bi;
+ break;
+ case CXL_CM_CAP_CAP_ID_BI_DECODER:
+ dev_dbg(dev, "found BI Decoder capability (0x%x)\n",
+ offset);
+ length = CXL_BI_DECODER_CAPABILITY_LENGTH;
+ rmap = &map->bi;
+ break;
default:
dev_dbg(dev, "Unknown CM cap ID: %d (0x%x)\n", cap_id,
offset);
@@ -211,6 +223,7 @@ int cxl_map_component_regs(const struct cxl_register_map *map,
} mapinfo[] = {
{ &map->component_map.hdm_decoder, ®s->hdm_decoder },
{ &map->component_map.ras, ®s->ras },
+ { &map->component_map.bi, ®s->bi },
};
int i;
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index b7111e3568d0..6361365c5ce9 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -39,6 +39,8 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_CM_CAP_CAP_ID_RAS 0x2
#define CXL_CM_CAP_CAP_ID_HDM 0x5
+#define CXL_CM_CAP_CAP_ID_BI_RT 0xB
+#define CXL_CM_CAP_CAP_ID_BI_DECODER 0xC
#define CXL_CM_CAP_CAP_HDM_VERSION 1
/* HDM decoders CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure */
@@ -150,6 +152,33 @@ static inline int ways_to_eiw(unsigned int ways, u8 *eiw)
#define CXL_HEADERLOG_SIZE SZ_512
#define CXL_HEADERLOG_SIZE_U32 SZ_512 / sizeof(u32)
+/* CXL 3.2 8.2.4.26 CXL BI Routing Table Capability Structure */
+#define CXL_BI_RT_CAPABILITY_LENGTH 0xC
+#define CXL_BI_RT_CAPS_OFFSET 0x0
+#define CXL_BI_RT_CAPS_EXPLICIT_COMMIT_REQ BIT(0)
+#define CXL_BI_RT_CTRL_OFFSET 0x4
+#define CXL_BI_RT_CTRL_BI_COMMIT BIT(0)
+#define CXL_BI_RT_STATUS_OFFSET 0x8
+#define CXL_BI_RT_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_RT_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_RT_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
+/* CXL 3.2 8.2.4.27 CXL BI Decoder Capability Structure */
+#define CXL_BI_DECODER_CAPABILITY_LENGTH 0xC
+#define CXL_BI_DECODER_CAPS_OFFSET 0x0
+#define CXL_BI_DECODER_CAPS_HDMD_CAP BIT(0)
+#define CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ BIT(1)
+#define CXL_BI_DECODER_CTRL_OFFSET 0x4
+#define CXL_BI_DECODER_CTRL_BI_FW BIT(0)
+#define CXL_BI_DECODER_CTRL_BI_ENABLE BIT(1)
+#define CXL_BI_DECODER_CTRL_BI_COMMIT BIT(2)
+#define CXL_BI_DECODER_STATUS_OFFSET 0x8
+#define CXL_BI_DECODER_STATUS_BI_COMMITTED BIT(0)
+#define CXL_BI_DECODER_STATUS_BI_ERR_NOT_COMMITTED BIT(1)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_SCALE GENMASK(11, 8)
+#define CXL_BI_DECODER_STATUS_BI_COMMIT_TM_BASE GENMASK(15, 12)
+
/* CXL 2.0 8.2.8.1 Device Capabilities Array Register */
#define CXLDEV_CAP_ARRAY_OFFSET 0x0
#define CXLDEV_CAP_ARRAY_CAP_ID 0
@@ -209,10 +238,13 @@ struct cxl_regs {
* Common set of CXL Component register block base pointers
* @hdm_decoder: CXL 2.0 8.2.5.12 CXL HDM Decoder Capability Structure
* @ras: CXL 2.0 8.2.5.9 CXL RAS Capability Structure
+ * @bi: CXL 3.2 8.2.4.26 CXL BI Routing Table Capability Structure, or
+ * CXL 3.2 8.2.4.27 CXL BI Decoder Capability Structure
*/
struct_group_tagged(cxl_component_regs, component,
void __iomem *hdm_decoder;
void __iomem *ras;
+ void __iomem *bi;
);
/*
* Common set of CXL Device register block base pointers
@@ -255,6 +287,7 @@ struct cxl_reg_map {
struct cxl_component_reg_map {
struct cxl_reg_map hdm_decoder;
struct cxl_reg_map ras;
+ struct cxl_reg_map bi;
};
struct cxl_device_reg_map {
@@ -586,6 +619,7 @@ struct cxl_dax_region {
* @parent_dport: dport that points to this port in the parent
* @decoder_ida: allocator for decoder ids
* @reg_map: component and ras register mapping parameters
+ * @uport_regs: mapped component registers
* @nr_dports: number of entries in @dports
* @hdm_end: track last allocated HDM decoder instance for allocation ordering
* @commit_end: cursor to track highest committed decoder for commit ordering
@@ -594,6 +628,7 @@ struct cxl_dax_region {
* @cdat: Cached CDAT data
* @cdat_available: Should a CDAT attribute be available in sysfs
* @pci_latency: Upstream latency in picoseconds
+ * @num_bi: number of devices that are BI enabled under this port
*/
struct cxl_port {
struct device dev;
@@ -606,6 +641,7 @@ struct cxl_port {
struct cxl_dport *parent_dport;
struct ida decoder_ida;
struct cxl_register_map reg_map;
+ struct cxl_component_regs uport_regs;
int nr_dports;
int hdm_end;
int commit_end;
@@ -617,6 +653,7 @@ struct cxl_port {
} cdat;
bool cdat_available;
long pci_latency;
+ int nr_bi;
};
/**
@@ -905,6 +942,9 @@ void cxl_coordinates_combine(struct access_coordinate *out,
bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
+int cxl_bi_setup(struct cxl_dev_state *cxlds);
+int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
+
/*
* Unit test builds overrides this to __weak, find the 'strong' version
* of these symbols in tools/testing/cxl/.
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 751478dfc410..cc0a48031fdd 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -418,6 +418,7 @@ struct cxl_dpa_partition {
* @regs: Parsed register blocks
* @cxl_dvsec: Offset to the PCIe device DVSEC
* @rcd: operating in RCD mode (CXL 3.0 9.11.8 CXL Devices Attached to an RCH)
+ * @bi: The device is capable of supporting back invalidate flows
* @media_ready: Indicate whether the device media is usable
* @dpa_res: Overall DPA resource tree for the device
* @part: DPA partition array
@@ -434,6 +435,7 @@ struct cxl_dev_state {
struct cxl_regs regs;
int cxl_dvsec;
bool rcd;
+ bool bi;
bool media_ready;
struct resource dpa_res;
struct cxl_dpa_partition part[CXL_NR_PARTITIONS_MAX];
diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
index 6e6777b7bafb..a5f3154dd5e3 100644
--- a/drivers/cxl/mem.c
+++ b/drivers/cxl/mem.c
@@ -184,6 +184,8 @@ static int cxl_mem_probe(struct device *dev)
if (rc)
dev_dbg(dev, "CXL memdev EDAC registration failed rc=%d\n", rc);
+ cxl_bi_setup(cxlds);
+
/*
* The kernel may be operating out of CXL memory on this device,
* there is no spec defined way to determine whether this device
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index bd100ac31672..870bd95752a3 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -962,6 +962,11 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
if (rc)
dev_dbg(&pdev->dev, "Failed to map RAS capability.\n");
+ rc = cxl_map_component_regs(&cxlds->reg_map, &cxlds->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_BI_DECODER));
+ if (rc)
+ dev_dbg(&pdev->dev, "Failed to map BI decoder capability.\n");
+
rc = cxl_pci_type3_init_mailbox(cxlds);
if (rc)
return rc;
diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
index fe4b593331da..1c86198c190b 100644
--- a/drivers/cxl/port.c
+++ b/drivers/cxl/port.c
@@ -57,6 +57,69 @@ static int discover_region(struct device *dev, void *unused)
return 0;
}
+static int cxl_dport_init_bi(struct cxl_dport *dport)
+{
+ struct cxl_register_map *map = &dport->reg_map;
+ struct device *dev = dport->dport_dev;
+
+ if (!cxl_pci_flit_256(to_pci_dev(dev)))
+ return 0;
+
+ if (!map->component_map.bi.valid) {
+ dev_dbg(dev, "BI decoder registers not found\n");
+ return 0;
+ }
+
+ if (cxl_map_component_regs(map, &dport->regs.component,
+ BIT(CXL_CM_CAP_CAP_ID_BI_DECODER))) {
+ dev_dbg(dev, "Failed to map BI decoder capability.\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static void cxl_uport_init_bi(struct cxl_port *port, struct device *host)
+{
+ struct cxl_register_map *map = &port->reg_map;
+
+ if (!map->component_map.bi.valid) {
+ dev_dbg(host, "BI RT registers not found\n");
+ return;
+ }
+
+ map->host = host;
+ if (cxl_map_component_regs(map, &port->uport_regs,
+ BIT(CXL_CM_CAP_CAP_ID_BI_RT)))
+ dev_dbg(&port->dev, "Failed to map BI RT capability\n");
+}
+
+static void cxl_switch_port_init_bi(struct cxl_port *port)
+{
+ struct cxl_dport *parent_dport = port->parent_dport;
+
+ if (is_cxl_root(to_cxl_port(port->dev.parent)))
+ return;
+
+ if (dev_is_pci(&port->dev) && !cxl_pci_flit_256(to_pci_dev(&port->dev)))
+ return;
+
+ if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
+ struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
+
+ switch (pci_pcie_type(pdev)) {
+ case PCI_EXP_TYPE_ROOT_PORT:
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ cxl_dport_init_bi(parent_dport);
+ break;
+ default:
+ break;
+ }
+ }
+
+ cxl_uport_init_bi(port, &port->dev);
+}
+
static int cxl_switch_port_probe(struct cxl_port *port)
{
struct cxl_hdm *cxlhdm;
@@ -71,6 +134,8 @@ static int cxl_switch_port_probe(struct cxl_port *port)
cxl_switch_parse_cdat(port);
+ cxl_switch_port_init_bi(port);
+
cxlhdm = devm_cxl_setup_hdm(port, NULL);
if (!IS_ERR(cxlhdm))
return devm_cxl_enumerate_decoders(cxlhdm, NULL);
@@ -112,6 +177,8 @@ static int cxl_endpoint_port_probe(struct cxl_port *port)
read_cdat_data(port);
cxl_endpoint_parse_cdat(port);
+ cxl_dport_init_bi(port->parent_dport);
+
get_device(&cxlmd->dev);
rc = devm_add_action_or_reset(&port->dev, schedule_detach, cxlmd);
if (rc)
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions
2025-08-12 1:02 [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Davidlohr Bueso
2025-08-12 1:02 ` [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup Davidlohr Bueso
@ 2025-08-12 1:02 ` Davidlohr Bueso
2025-08-15 15:27 ` Jonathan Cameron
2025-08-12 1:02 ` [PATCH 3/3] cxl: Support creating HDM-DB regions Davidlohr Bueso
2025-08-12 14:53 ` [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Jonathan Cameron
3 siblings, 1 reply; 8+ messages in thread
From: Davidlohr Bueso @ 2025-08-12 1:02 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl, dave
This has been renamed in more recent CXL specs, as
type3 (memory expanders) can also use HDM-DB for
device coherent memory.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
drivers/cxl/acpi.c | 4 ++--
include/acpi/actbl1.h | 4 ++--
tools/testing/cxl/test/cxl.c | 18 +++++++++---------
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 712624cba2b6..9b32392c82fc 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -113,9 +113,9 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
{
unsigned long flags = CXL_DECODER_F_ENABLE;
- if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE2)
+ if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_DEVMEM)
flags |= CXL_DECODER_F_TYPE2;
- if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE3)
+ if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM)
flags |= CXL_DECODER_F_TYPE3;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE)
flags |= CXL_DECODER_F_RAM;
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index 99fd1588ff38..eb787dfbd2fa 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -560,8 +560,8 @@ struct acpi_cedt_cfmws_target_element {
/* Values for Restrictions field above */
-#define ACPI_CEDT_CFMWS_RESTRICT_TYPE2 (1)
-#define ACPI_CEDT_CFMWS_RESTRICT_TYPE3 (1<<1)
+#define ACPI_CEDT_CFMWS_RESTRICT_DEVMEM (1)
+#define ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM (1<<1)
#define ACPI_CEDT_CFMWS_RESTRICT_VOLATILE (1<<2)
#define ACPI_CEDT_CFMWS_RESTRICT_PMEM (1<<3)
#define ACPI_CEDT_CFMWS_RESTRICT_FIXED (1<<4)
diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
index 6a25cca5636f..ba50338f8ada 100644
--- a/tools/testing/cxl/test/cxl.c
+++ b/tools/testing/cxl/test/cxl.c
@@ -210,7 +210,7 @@ static struct {
},
.interleave_ways = 0,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
@@ -225,7 +225,7 @@ static struct {
},
.interleave_ways = 1,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
@@ -240,7 +240,7 @@ static struct {
},
.interleave_ways = 0,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
@@ -255,7 +255,7 @@ static struct {
},
.interleave_ways = 1,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
@@ -270,7 +270,7 @@ static struct {
},
.interleave_ways = 0,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 4UL,
@@ -285,7 +285,7 @@ static struct {
},
.interleave_ways = 0,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M,
@@ -302,7 +302,7 @@ static struct {
.interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
.interleave_ways = 0,
.granularity = 4,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
@@ -318,7 +318,7 @@ static struct {
.interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
.interleave_ways = 1,
.granularity = 0,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_256M * 8UL,
@@ -334,7 +334,7 @@ static struct {
.interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
.interleave_ways = 8,
.granularity = 1,
- .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
+ .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
ACPI_CEDT_CFMWS_RESTRICT_PMEM,
.qtg_id = FAKE_QTG_ID,
.window_size = SZ_512M * 6UL,
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] cxl: Support creating HDM-DB regions
2025-08-12 1:02 [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Davidlohr Bueso
2025-08-12 1:02 ` [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup Davidlohr Bueso
2025-08-12 1:02 ` [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions Davidlohr Bueso
@ 2025-08-12 1:02 ` Davidlohr Bueso
2025-08-15 15:41 ` Jonathan Cameron
2025-08-12 14:53 ` [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Jonathan Cameron
3 siblings, 1 reply; 8+ messages in thread
From: Davidlohr Bueso @ 2025-08-12 1:02 UTC (permalink / raw)
To: dave.jiang, dan.j.williams
Cc: jonathan.cameron, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl, dave
A single Type 3 device can expose different parts of its memory with different
coherency semantics. For example, some memory ranges within a Type 3 device
could be configured as HDM-H, while other memory ranges on the same device
could be configured as HDM-DB. This allows for flexible memory configuration
within a single device. As such, coherency models are defined per memory region.
For accelerators (type2), it is expected for accelerator drivers to manage the
HDM-D[B] region creation. For type3, relevant sysfs tunables are provided to
the user.
Other than the HDM decoder supported coherency models, the main dependency
to create these regions is for the device state to be BI-ready (cxlds->bi),
for which already committed HDM decoders with the BI bit set detected during
enumeration is currently not supported because endpoint and port enumerations
are independent.
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
---
Documentation/ABI/testing/sysfs-bus-cxl | 41 ++++++++++++++----
drivers/cxl/acpi.c | 2 +
drivers/cxl/core/core.h | 4 ++
drivers/cxl/core/hdm.c | 56 +++++++++++++++++++++---
drivers/cxl/core/port.c | 55 +++++++++++++++++++++++-
drivers/cxl/core/region.c | 57 +++++++++++++++++++++----
drivers/cxl/cxl.h | 7 ++-
drivers/cxl/cxlmem.h | 1 +
include/acpi/actbl1.h | 1 +
9 files changed, 199 insertions(+), 25 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
index 6b4e8c7a963d..7d9e3db736c3 100644
--- a/Documentation/ABI/testing/sysfs-bus-cxl
+++ b/Documentation/ABI/testing/sysfs-bus-cxl
@@ -297,17 +297,18 @@ Description:
Each entry in the list is a dport id.
-What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3}
-Date: June, 2021
-KernelVersion: v5.14
+What: /sys/bus/cxl/devices/decoderX.Y/cap_{pmem,ram,type2,type3,bi}
+Date: June, 2021, August, 2025
+KernelVersion: v5.14, v6.18 (bi)
Contact: linux-cxl@vger.kernel.org
Description:
(RO) When a CXL decoder is of devtype "cxl_decoder_root", it
represents a fixed memory window identified by platform
firmware. A fixed window may only support a subset of memory
types. The 'cap_*' attributes indicate whether persistent
- memory, volatile memory, accelerator memory, and / or expander
- memory may be mapped behind this decoder's memory window.
+ memory, volatile memory, accelerator memory, back invalidation,
+ and / or expander memory may be mapped behind this decoder's
+ memory window.
What: /sys/bus/cxl/devices/decoderX.Y/target_type
@@ -352,6 +353,17 @@ Description:
next allocation.
+What: /sys/bus/cxl/devices/decoderX.Y/bi
+Date: August, 2025
+KernelVersion: v6.18
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RW) Specify whether or not the memory range for this endpoint
+ decoder will use Back-Invalidation (HDM-DB) for device managed
+ coherence. It may only be written to when the decoder is in the
+ 'disabled' state.
+
+
What: /sys/bus/cxl/devices/decoderX.Y/dpa_resource
Date: May, 2022
KernelVersion: v6.0
@@ -410,15 +422,17 @@ Description:
interleave_granularity).
-What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_region
-Date: May, 2022, January, 2023
-KernelVersion: v6.0 (pmem), v6.3 (ram)
+What: /sys/bus/cxl/devices/decoderX.Y/create_{pmem,ram}_{bi}_region
+Date: May, 2022, January, 2023, August, 2025
+KernelVersion: v6.0 (pmem), v6.3 (ram) v6.18 (bi)
Contact: linux-cxl@vger.kernel.org
Description:
(RW) Write a string in the form 'regionZ' to start the process
of defining a new persistent, or volatile memory region
(interleave-set) within the decode range bounded by root decoder
- 'decoderX.Y'. The value written must match the current value
+ 'decoderX.Y'. Further, it is specified whether or not the region
+ also supports Back-Invalidate flows for device memory coherence
+ management. The value written must match the current value
returned from reading this attribute. An atomic compare exchange
operation is done on write to assign the requested id to a
region and allocate the region-id for the next creation attempt.
@@ -509,6 +523,15 @@ Description:
region. For more details on the possible modes see
/sys/bus/cxl/devices/decoderX.Y/mode
+What: /sys/bus/cxl/devices/regionZ/bi
+Date: August, 2025
+KernelVersion: v6.18
+Contact: linux-cxl@vger.kernel.org
+Description:
+ (RO) The coherence model of a region is established at region
+ creation time and dictates the type of the endpoint decoder that
+ comprise the region. See /sys/bus/cxl/devices/decoderX.Y/bi for
+ more details.
What: /sys/bus/cxl/devices/regionZ/resource
Date: May, 2022
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 9b32392c82fc..364f003d1961 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -123,6 +123,8 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
flags |= CXL_DECODER_F_PMEM;
if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_FIXED)
flags |= CXL_DECODER_F_LOCK;
+ if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_BI)
+ flags |= CXL_DECODER_F_BI;
return flags;
}
diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
index 2669f251d677..79fac8bc74f7 100644
--- a/drivers/cxl/core/core.h
+++ b/drivers/cxl/core/core.h
@@ -21,6 +21,8 @@ enum cxl_detach_mode {
#ifdef CONFIG_CXL_REGION
extern struct device_attribute dev_attr_create_pmem_region;
extern struct device_attribute dev_attr_create_ram_region;
+extern struct device_attribute dev_attr_create_pmem_bi_region;
+extern struct device_attribute dev_attr_create_ram_bi_region;
extern struct device_attribute dev_attr_delete_region;
extern struct device_attribute dev_attr_region;
extern const struct device_type cxl_pmem_region_type;
@@ -89,6 +91,8 @@ void __iomem *devm_cxl_iomap_block(struct device *dev, resource_size_t addr,
struct dentry *cxl_debugfs_create_dir(const char *dir);
int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
enum cxl_partition_mode mode);
+int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
+ enum cxl_decoder_type type);
int cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size);
int cxl_dpa_free(struct cxl_endpoint_decoder *cxled);
resource_size_t cxl_dpa_size(struct cxl_endpoint_decoder *cxled);
diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
index e9e1d555cec6..9e213785c335 100644
--- a/drivers/cxl/core/hdm.c
+++ b/drivers/cxl/core/hdm.c
@@ -96,6 +96,10 @@ static void parse_hdm_decoder_caps(struct cxl_hdm *cxlhdm)
cxlhdm->iw_cap_mask |= BIT(3) | BIT(6) | BIT(12);
if (FIELD_GET(CXL_HDM_DECODER_INTERLEAVE_16_WAY, hdm_cap))
cxlhdm->iw_cap_mask |= BIT(16);
+ if (FIELD_GET(CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK, hdm_cap))
+ cxlhdm->supported_coherency =
+ FIELD_GET(CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK,
+ hdm_cap);
}
static bool should_emulate_decoders(struct cxl_endpoint_dvsec_info *info)
@@ -613,6 +617,31 @@ int cxl_dpa_set_part(struct cxl_endpoint_decoder *cxled,
return 0;
}
+int cxl_dpa_set_coherence(struct cxl_endpoint_decoder *cxled,
+ enum cxl_decoder_type type)
+{
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_port *port = to_cxl_port(cxled->cxld.dev.parent);
+ struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
+
+ guard(rwsem_write)(&cxl_rwsem.dpa);
+ if (cxled->cxld.flags & CXL_DECODER_F_ENABLE)
+ return -EBUSY;
+
+ if (!cxlds->bi && type == CXL_DECODER_DEVMEM)
+ return -EINVAL;
+
+ if (type == CXL_DECODER_HOSTONLYMEM &&
+ cxlhdm->supported_coherency == 0x1)
+ return -EINVAL;
+ if (type == CXL_DECODER_DEVMEM && cxlhdm->supported_coherency == 0x2)
+ return -EINVAL;
+
+ cxled->cxld.target_type = type;
+ return 0;
+}
+
static int __cxl_dpa_alloc(struct cxl_endpoint_decoder *cxled, u64 size)
{
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
@@ -723,6 +752,9 @@ static void cxld_set_type(struct cxl_decoder *cxld, u32 *ctrl)
u32p_replace_bits(ctrl,
!!(cxld->target_type == CXL_DECODER_HOSTONLYMEM),
CXL_HDM_DECODER0_CTRL_HOSTONLY);
+ u32p_replace_bits(ctrl,
+ !!(cxld->target_type == CXL_DECODER_DEVMEM),
+ CXL_HDM_DECODER0_CTRL_BI);
}
static void cxlsd_set_targets(struct cxl_switch_decoder *cxlsd, u64 *tgt)
@@ -1033,6 +1065,15 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
cxld->flags |= CXL_DECODER_F_ENABLE;
if (ctrl & CXL_HDM_DECODER0_CTRL_LOCK)
cxld->flags |= CXL_DECODER_F_LOCK;
+
+ /*
+ * Autocommit BI-enabled decoders is not supported.
+ * At this point ->bi is not yet setup, so there
+ * are no guarantees that the platform supports BI.
+ */
+ if (FIELD_GET(CXL_HDM_DECODER0_CTRL_BI, ctrl))
+ return -ENXIO;
+
if (FIELD_GET(CXL_HDM_DECODER0_CTRL_HOSTONLY, ctrl))
cxld->target_type = CXL_DECODER_HOSTONLYMEM;
else
@@ -1057,14 +1098,19 @@ static int init_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld,
if (cxled) {
struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_port *port = to_cxl_port(cxld->dev.parent);
+ struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
/*
- * Default by devtype until a device arrives that needs
- * more precision.
+ * For type3 HDM-DB devices, users can later change the
+ * target_type, if supported by the HDM decoder.
*/
- if (cxlds->type == CXL_DEVTYPE_CLASSMEM)
- cxld->target_type = CXL_DECODER_HOSTONLYMEM;
- else
+ if (cxlds->type == CXL_DEVTYPE_CLASSMEM) {
+ if (cxlhdm->supported_coherency == 0x1)
+ cxld->target_type = CXL_DECODER_DEVMEM;
+ else
+ cxld->target_type = CXL_DECODER_HOSTONLYMEM;
+ } else
cxld->target_type = CXL_DECODER_DEVMEM;
} else {
/* To be overridden by region type at commit time */
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 38f74bcbfec2..6f95ac6ac01e 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -122,6 +122,7 @@ CXL_DECODER_FLAG_ATTR(cap_ram, CXL_DECODER_F_RAM);
CXL_DECODER_FLAG_ATTR(cap_type2, CXL_DECODER_F_TYPE2);
CXL_DECODER_FLAG_ATTR(cap_type3, CXL_DECODER_F_TYPE3);
CXL_DECODER_FLAG_ATTR(locked, CXL_DECODER_F_LOCK);
+CXL_DECODER_FLAG_ATTR(cap_bi, CXL_DECODER_F_BI);
static ssize_t target_type_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -224,6 +225,38 @@ static ssize_t mode_store(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RW(mode);
+static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
+ struct cxl_dev_state *cxlds = cxlmd->cxlds;
+
+ return sysfs_emit(buf, "%d\n", !!cxlds->bi);
+}
+
+static ssize_t bi_store(struct device *dev, struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
+ enum cxl_decoder_type type;
+ ssize_t rc;
+
+ if (sysfs_streq(buf, "1"))
+ type = CXL_DECODER_DEVMEM;
+ else if (sysfs_streq(buf, "0"))
+ type = CXL_DECODER_HOSTONLYMEM;
+ else
+ return -EINVAL;
+
+ rc = cxl_dpa_set_coherence(cxled, type);
+ if (rc)
+ return rc;
+
+ return len;
+}
+static DEVICE_ATTR_RW(bi);
+
static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -320,10 +353,13 @@ static struct attribute *cxl_decoder_root_attrs[] = {
&dev_attr_cap_ram.attr,
&dev_attr_cap_type2.attr,
&dev_attr_cap_type3.attr,
+ &dev_attr_cap_bi.attr,
&dev_attr_target_list.attr,
&dev_attr_qos_class.attr,
SET_CXL_REGION_ATTR(create_pmem_region)
SET_CXL_REGION_ATTR(create_ram_region)
+ SET_CXL_REGION_ATTR(create_pmem_bi_region)
+ SET_CXL_REGION_ATTR(create_ram_bi_region)
SET_CXL_REGION_ATTR(delete_region)
NULL,
};
@@ -342,7 +378,15 @@ static bool can_create_ram(struct cxl_root_decoder *cxlrd)
return (cxlrd->cxlsd.cxld.flags & flags) == flags;
}
-static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *a, int n)
+static bool can_create_bi(struct cxl_root_decoder *cxlrd)
+{
+ unsigned long flags = CXL_DECODER_F_BI;
+
+ return (cxlrd->cxlsd.cxld.flags & flags) == flags;
+}
+
+static umode_t cxl_root_decoder_visible(struct kobject *kobj,
+ struct attribute *a, int n)
{
struct device *dev = kobj_to_dev(kobj);
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
@@ -353,6 +397,14 @@ static umode_t cxl_root_decoder_visible(struct kobject *kobj, struct attribute *
if (a == CXL_REGION_ATTR(create_ram_region) && !can_create_ram(cxlrd))
return 0;
+ if (a == CXL_REGION_ATTR(create_pmem_bi_region) &&
+ (!can_create_pmem(cxlrd) || !can_create_bi(cxlrd)))
+ return 0;
+
+ if (a == CXL_REGION_ATTR(create_ram_bi_region) &&
+ (!can_create_ram(cxlrd) || !can_create_bi(cxlrd)))
+ return 0;
+
if (a == CXL_REGION_ATTR(delete_region) &&
!(can_create_pmem(cxlrd) || can_create_ram(cxlrd)))
return 0;
@@ -393,6 +445,7 @@ static const struct attribute_group *cxl_decoder_switch_attribute_groups[] = {
static struct attribute *cxl_decoder_endpoint_attrs[] = {
&dev_attr_target_type.attr,
&dev_attr_mode.attr,
+ &dev_attr_bi.attr,
&dev_attr_dpa_size.attr,
&dev_attr_dpa_resource.attr,
SET_CXL_REGION_ATTR(region)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index e9bf42d91689..0c8a6e539599 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -624,6 +624,15 @@ static ssize_t mode_show(struct device *dev, struct device_attribute *attr,
}
static DEVICE_ATTR_RO(mode);
+static ssize_t bi_show(struct device *dev, struct device_attribute *attr,
+ char *buf)
+{
+ struct cxl_region *cxlr = to_cxl_region(dev);
+
+ return sysfs_emit(buf, "%d\n", cxlr->type == CXL_DECODER_DEVMEM);
+}
+static DEVICE_ATTR_RO(bi);
+
static int alloc_hpa(struct cxl_region *cxlr, resource_size_t size)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
@@ -754,6 +763,7 @@ static struct attribute *cxl_region_attrs[] = {
&dev_attr_resource.attr,
&dev_attr_size.attr,
&dev_attr_mode.attr,
+ &dev_attr_bi.attr,
NULL,
};
@@ -2523,7 +2533,7 @@ static int cxl_region_calculate_adistance(struct notifier_block *nb,
* @cxlrd: root decoder
* @id: memregion id to create, or memregion_free() on failure
* @mode: mode for the endpoint decoders of this region
- * @type: select whether this is an expander or accelerator (type-2 or type-3)
+ * @type: select whether this is HDM-H or HDM-D[B]
*
* This is the second step of region initialization. Regions exist within an
* address space which is mapped by a @cxlrd.
@@ -2586,8 +2596,21 @@ static ssize_t create_ram_region_show(struct device *dev,
return __create_region_show(to_cxl_root_decoder(dev), buf);
}
+static ssize_t create_pmem_bi_region_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
+static ssize_t create_ram_bi_region_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return __create_region_show(to_cxl_root_decoder(dev), buf);
+}
+
static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
- enum cxl_partition_mode mode, int id)
+ enum cxl_partition_mode mode,
+ bool bi, int id)
{
int rc;
@@ -2609,11 +2632,13 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd,
return ERR_PTR(-EBUSY);
}
- return devm_cxl_add_region(cxlrd, id, mode, CXL_DECODER_HOSTONLYMEM);
+ return devm_cxl_add_region(cxlrd, id, mode, bi ?
+ CXL_DECODER_DEVMEM:CXL_DECODER_HOSTONLYMEM);
}
static ssize_t create_region_store(struct device *dev, const char *buf,
- size_t len, enum cxl_partition_mode mode)
+ size_t len, enum cxl_partition_mode mode,
+ bool bi)
{
struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(dev);
struct cxl_region *cxlr;
@@ -2623,7 +2648,7 @@ static ssize_t create_region_store(struct device *dev, const char *buf,
if (rc != 1)
return -EINVAL;
- cxlr = __create_region(cxlrd, mode, id);
+ cxlr = __create_region(cxlrd, mode, bi, id);
if (IS_ERR(cxlr))
return PTR_ERR(cxlr);
@@ -2634,7 +2659,7 @@ static ssize_t create_pmem_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM, false);
}
DEVICE_ATTR_RW(create_pmem_region);
@@ -2642,10 +2667,26 @@ static ssize_t create_ram_region_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t len)
{
- return create_region_store(dev, buf, len, CXL_PARTMODE_RAM);
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM, false);
}
DEVICE_ATTR_RW(create_ram_region);
+static ssize_t create_pmem_bi_region_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return create_region_store(dev, buf, len, CXL_PARTMODE_PMEM, true);
+}
+DEVICE_ATTR_RW(create_pmem_bi_region);
+
+static ssize_t create_ram_bi_region_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t len)
+{
+ return create_region_store(dev, buf, len, CXL_PARTMODE_RAM, true);
+}
+DEVICE_ATTR_RW(create_ram_bi_region);
+
static ssize_t region_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -3414,7 +3455,7 @@ static struct cxl_region *construct_region(struct cxl_root_decoder *cxlrd,
struct cxl_region *cxlr;
do {
- cxlr = __create_region(cxlrd, cxlds->part[part].mode,
+ cxlr = __create_region(cxlrd, cxlds->part[part].mode, false,
atomic_read(&cxlrd->region_id));
} while (IS_ERR(cxlr) && PTR_ERR(cxlr) == -EBUSY);
diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
index 6361365c5ce9..0a9532fa4ec1 100644
--- a/drivers/cxl/cxl.h
+++ b/drivers/cxl/cxl.h
@@ -51,6 +51,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
#define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
#define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
+#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
#define CXL_HDM_DECODER_CTRL_OFFSET 0x4
#define CXL_HDM_DECODER_ENABLE BIT(1)
#define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
@@ -65,6 +66,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
#define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
#define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
#define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
+#define CXL_HDM_DECODER0_CTRL_BI BIT(13)
#define CXL_HDM_DECODER0_TL_LOW(i) (0x20 * (i) + 0x24)
#define CXL_HDM_DECODER0_TL_HIGH(i) (0x20 * (i) + 0x28)
#define CXL_HDM_DECODER0_SKIP_LOW(i) CXL_HDM_DECODER0_TL_LOW(i)
@@ -363,8 +365,9 @@ int cxl_dport_map_rcd_linkcap(struct pci_dev *pdev, struct cxl_dport *dport);
#define CXL_DECODER_F_TYPE2 BIT(2)
#define CXL_DECODER_F_TYPE3 BIT(3)
#define CXL_DECODER_F_LOCK BIT(4)
-#define CXL_DECODER_F_ENABLE BIT(5)
-#define CXL_DECODER_F_MASK GENMASK(5, 0)
+#define CXL_DECODER_F_BI BIT(5)
+#define CXL_DECODER_F_ENABLE BIT(6)
+#define CXL_DECODER_F_MASK GENMASK(6, 0)
enum cxl_decoder_type {
CXL_DECODER_DEVMEM = 2,
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index cc0a48031fdd..d05ceea29261 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -922,6 +922,7 @@ struct cxl_hdm {
unsigned int target_count;
unsigned int interleave_mask;
unsigned long iw_cap_mask;
+ unsigned int supported_coherency;
struct cxl_port *port;
};
diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
index eb787dfbd2fa..7f35eb0e8458 100644
--- a/include/acpi/actbl1.h
+++ b/include/acpi/actbl1.h
@@ -565,6 +565,7 @@ struct acpi_cedt_cfmws_target_element {
#define ACPI_CEDT_CFMWS_RESTRICT_VOLATILE (1<<2)
#define ACPI_CEDT_CFMWS_RESTRICT_PMEM (1<<3)
#define ACPI_CEDT_CFMWS_RESTRICT_FIXED (1<<4)
+#define ACPI_CEDT_CFMWS_RESTRICT_BI (1<<5)
/* 2: CXL XOR Interleave Math Structure */
--
2.39.5
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate
2025-08-12 1:02 [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Davidlohr Bueso
` (2 preceding siblings ...)
2025-08-12 1:02 ` [PATCH 3/3] cxl: Support creating HDM-DB regions Davidlohr Bueso
@ 2025-08-12 14:53 ` Jonathan Cameron
3 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2025-08-12 14:53 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl
On Mon, 11 Aug 2025 18:02:25 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Hello,
>
> The following is some initial plumbing to enabling HDM-DB in Linux. This model allows devices,
> specifically Type 2 and Type 3 devices, to expose their local memory to the host CPU in a
> coherent manner, In alignment with what was discussed at last year's LPC type2 support session,
> this series takes the type3 memory expander approach, which is more direct.
>
> While this is an early RFC and I'm sure many thoughts, the next phase of this would be to
> integrate Bi with Alejandro's Type2 work as well as with Jonathan's Cache Coherency subsystem
> series (aka memregion inv)... this might be a good topic for the upcoming LPC's devmem session:
> https://lore.kernel.org/linux-cxl/20250624141355.269056-1-alejandro.lucero-palau@amd.com
> https://lore.kernel.org/linux-cxl/20250624154805.66985-1-Jonathan.Cameron@huawei.com
Hi Davidlohr,
For type 3 devices, when are HDM-DB / BI flows this useful?
I think it's worth calling those cases out.
IIRC
- Shared coherent memory
- Media operations (8.2.10.9.5.3 in 3.2 spec)
So the range based sanitize and zero stuff.
But not I think on DCD extent add or remove. Which is kind of fair enough as
the device would have to track prefetches that occurred to addresses it wasn't
backing at the time.
So I'm not immediately sure how it will combine with explicit cache coherency management.
So far we haven't hooked anything up for non BI enabled shared memory.
For type 2, do we have a BI capable example?
>
> o Patch 1 adds the BI cachemem register discovery along with two interfaces around cxlds to allow
> the setup and deallocations of BI-IDs. The idea is for type3 memdevs and future type2 devices
> to make use of cxlds->bi when committing HDM decoders, such that different device coherence models
> can be differentiated as:
>
> type2 hdm-db: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == true
> type2 hdm-d: cxlds->type == CXL_DEVTYPE_DEVMEM && cxlds->bi == false
> type3 hdm-h: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == false
> type3 hdm-db: cxlds->type == CXL_DEVTYPE_CLASSMEM && cxlds->bi == true
>
> Because ->bi becoming true does not depend on auto-committing upon HDM decoder port/enumeration
> (port driver), for now this is set as unsupported and will error out when initializing the HDM
> decoder that has its BI bit set.
>
> o Patch 2 renames/updates some of the CXL Window coherency restrictions. This should be picked
> up regardless as the spec has been updated already.
>
> o Patch 3 deals with the HDM decoder programming changes around whether or not to set the
> BI bit. Based on the model above, decoder target types are straightforward: DEVMEM or HOSTONLY
> for type2 and regular type3, but for type3 HDM-DB, this is not as clear, for which this patch
> will 1) rely on the HDM capability for supporting coherence models, and 2) allow, when possible,
> to change it by the user when configuring the BI-capable HDM decoder. It gives the user sysfs
> tools to create BI-enabled memory regions (see testing below).
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup
2025-08-12 1:02 ` [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup Davidlohr Bueso
@ 2025-08-15 15:20 ` Jonathan Cameron
0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2025-08-15 15:20 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl
On Mon, 11 Aug 2025 18:02:26 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> Introduce two calls to enable/setup and disable/dealloc BI flows,
> adding the general cachemem register plumbing.
>
> int cxl_bi_setup(struct cxl_dev_state *cxlds);
> int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
>
> Both do the required hierarchy iteration ensuring it is safe to
> enable/disable BI for every component. Upon a successful setup,
> this enablement does not influence the current HDM decoder setup
> by enabling the BI bit, and therefore the device is left in a
> BI capable state, but not making use of it in the decode coherence.
> Upon a BI-ID removal event, it is expected for the device to
> be offline.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
This gets a little complex, so this is very much a first look review rather
than giving any feedback on the overall approach.
J
> ---
> drivers/cxl/core/pci.c | 318 ++++++++++++++++++++++++++++++++++++++++
> drivers/cxl/core/port.c | 2 +
> drivers/cxl/core/regs.c | 13 ++
> drivers/cxl/cxl.h | 40 +++++
> drivers/cxl/cxlmem.h | 2 +
> drivers/cxl/mem.c | 2 +
> drivers/cxl/pci.c | 5 +
> drivers/cxl/port.c | 67 +++++++++
> 8 files changed, 449 insertions(+)
>
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index b50551601c2e..d0218e240f0e 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> +/* limit any insane timeouts from hw */
> +#define CXL_BI_COMMIT_MAXTMO_US (5 * USEC_PER_SEC)
> +
> +static unsigned long __cxl_bi_get_timeout_us(int scale, int base)
> +{
> + unsigned long tmo;
> +
> + switch (scale) {
> + case 0: /* 1 us */
> + tmo = 1;
Maybe a look up table and range check is more compact?
> + break;
> + case 1: /* 10 us */
> + tmo = 10UL;
> + break;
> + case 2: /* 100 us */
> + tmo = 100UL;
> + break;
> + case 3: /* 1 ms */
> + tmo = 1000UL;
> + break;
> + case 4: /* 10 ms */
> + tmo = 10000UL;
> + break;
> + case 5: /* 100 ms */
> + tmo = 100000UL;
> + break;
> + case 6: /* 1 s */
> + tmo = 1000000UL;
> + break;
> + case 7: /* 10 s */
> + tmo = 10000000UL;
> + break;
> + default:
> + tmo = 0;
> + break;
> + }
> +
> + return tmo * base;
> +}
> +
> +/* enable or dealloc BI-ID changes in the given level of the topology */
> +static int cxl_bi_toggle_dport(struct cxl_dport *dport, bool enable)
> +{
> + u32 cap, ctrl, value;
> + void __iomem *bi = dport->regs.bi;
> + struct cxl_port *port = dport->port;
> + struct pci_dev *pdev = to_pci_dev(dport->dport_dev);
> +
> + guard(device)(&port->dev);
> +
> + if (!bi)
> + return -EINVAL;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
Maybe a switch?
> + if (enable) {
> + /*
> + * There is no point of failure from here on,
> + * BI will be enabled on the endpoint device.
> + */
> + port->nr_bi++;
> +
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_FW, ctrl))
> + return 0;
> +
> + value = ctrl | CXL_BI_DECODER_CTRL_BI_FW;
> + value &= ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (--port->nr_bi > 0)
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
Unless more code is coming in later patches, I'd return here
so reviewer doesn't need to go see what else happens.
> + } else if (pci_pcie_type(pdev) == PCI_EXP_TYPE_DOWNSTREAM) {
with return above this can be just
if (pci_pcie_type...
> + int rc;
> +
> + if (enable) {
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_FW;
> + value |= CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl))
> + return 0;
> +
> + value = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(value, bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + cap = readl(bi + CXL_BI_DECODER_CAPS_OFFSET);
> + if (FIELD_GET(CXL_BI_DECODER_CAPS_EXPLICIT_COMMIT_REQ, cap)) {
> + rc = __cxl_bi_commit(dport->dport_dev, dport->regs.bi);
> + if (rc)
> + return rc;
> + }
> +
> + /* ... and the routing table, if any */
> + rc = __cxl_bi_commit_rt(&port->dev, port->uport_regs.bi);
> + if (rc)
> + return rc;
I'd do
return __cxl_bi_commit_rt()
}
return -EINVAL;
> + } else
> + return -EINVAL;
else {
}
> +
> + return 0;
> +}
> +
> +static int cxl_bi_toggle_endpoint(struct cxl_dev_state *cxlds, bool enable)
Something called toggle would normally not take an enable - it would just switch
between two states ever time. So maybe rename?
> +{
> + u32 ctrl, val;
> + void __iomem *bi = cxlds->regs.bi;
> +
> + ctrl = readl(bi + CXL_BI_DECODER_CTRL_OFFSET);
> +
> + if (enable) {
> + if (FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(!cxlds->bi);
> + return 0;
> + }
> + val = ctrl | CXL_BI_DECODER_CTRL_BI_ENABLE;
> + } else {
> + if (!FIELD_GET(CXL_BI_DECODER_CTRL_BI_ENABLE, ctrl)) {
> + WARN_ON_ONCE(cxlds->bi);
> + return 0;
> + }
> + val = ctrl & ~CXL_BI_DECODER_CTRL_BI_ENABLE;
> + }
> +
> + writel(val, bi + CXL_BI_DECODER_CTRL_OFFSET);
> + cxlds->bi = enable;
> +
> + dev_dbg(cxlds->dev, "device %scapable of issuing BI requests\n",
> + enable ? "":"in");
> +
> + return 0;
> +}
> +
> +int cxl_bi_setup(struct cxl_dev_state *cxlds)
> +{
> + struct pci_dev *pdev = to_pci_dev(cxlds->dev);
> + struct cxl_dport *dport, *_dport;
> + struct cxl_port *parent_port, *port, *_port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
That is horrible! Lines aren't that expensive.
struct cxl_port *parent_port, *port;
struct cxl_port *_port __free(put_cxl_port) =
cxl_pci_find_port(to_pci_dev(cxlds->dev), &_dport);
> +
> + if (!_port)
> + return -EINVAL;
> +
> + if (!cxl_is_bi_capable(pdev, cxlds->regs.bi))
> + return 0;
> +
> + /* walkup the topology twice, first to check, then to enable */
> + port = _port;
> + dport = _dport;
> + while (1) {
> + parent_port = to_cxl_port(port->dev.parent);
> + /* check rp, dsp */
> + if (!cxl_is_bi_capable(to_pci_dev(dport->dport_dev),
> + dport->regs.bi))
> + return -EINVAL;
> +
> + /* check usp */
> + if (pci_pcie_type(to_pci_dev(dport->dport_dev)) ==
> + PCI_EXP_TYPE_DOWNSTREAM) {
> + if (!cxl_is_bi_capable(to_pci_dev(port->uport_dev),
> + port->uport_regs.bi))
> + return -EINVAL;
> + }
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
Similar to below comment on __free() applying to a non obvious successiosn
of things without previous one being obvious released as we'd expect.
> + }
> +
> + port = _port;
> + dport = _dport;
> + while (1) {
> + int rc;
> +
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_toggle_dport(dport, true);
> + if (rc)
> + return rc;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
> + }
> +
> + /* finally, enable BI on the device */
> + cxl_bi_toggle_endpoint(cxlds, true);
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_setup, "CXL");
> +
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds)
> +{
> + struct cxl_dport *dport;
> + struct cxl_memdev *cxlmd = cxlds->cxlmd;
> + struct cxl_port *endpoint = cxlmd->endpoint;
> + struct cxl_port *parent_port, *port __free(put_cxl_port) =
> + cxl_pci_find_port(to_pci_dev(cxlds->dev), &dport);
As above, mixing free and non free in one line is not pretty.
Split them up.
> +
> + if (!port)
> + return -EINVAL;
> +
> + /* ensure the device is offline and unmapped */
> + if (!endpoint || cxl_num_decoders_committed(endpoint) > 0)
> + return -EBUSY;
> +
> + if (!cxlds->bi)
> + return 0;
> +
> + /* first, disable BI on the device */
> + cxl_bi_toggle_endpoint(cxlds, false);
> +
> + while (1) {
> + int rc;
> +
> + parent_port = to_cxl_port(port->dev.parent);
> +
> + rc = cxl_bi_toggle_dport(dport, false);
> + if (rc)
> + return rc;
> +
> + if (is_cxl_root(parent_port))
> + break;
> +
> + dport = port->parent_dport;
> + port = parent_port;
Reassigning port with a __free() in use on it looks complex at best.
I've lost track of what put_cxl_port() ends up being called on or where
we got a reference to that.
> + }
> +
> + cxlds->bi = false;
> + return 0;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_bi_dealloc, "CXL");
> +
> /*
> * Set max timeout such that platforms will optimize GPF flow to avoid
> * the implied worst-case scenario delays. On a sane platform, all
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index b7111e3568d0..6361365c5ce9 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> /**
> @@ -905,6 +942,9 @@ void cxl_coordinates_combine(struct access_coordinate *out,
>
> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port);
>
> +int cxl_bi_setup(struct cxl_dev_state *cxlds);
> +int cxl_bi_dealloc(struct cxl_dev_state *cxlds);
> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c
> index fe4b593331da..1c86198c190b 100644
> --- a/drivers/cxl/port.c
> +++ b/drivers/cxl/port.c
> +static void cxl_switch_port_init_bi(struct cxl_port *port)
> +{
> + struct cxl_dport *parent_dport = port->parent_dport;
> +
> + if (is_cxl_root(to_cxl_port(port->dev.parent)))
This is pretty common bit of code, maybe we should name it via
a #define. I'm not entirely sure (as get lost in these)
but is it detecting whether we are on a host bridge?
> + return;
> +
> + if (dev_is_pci(&port->dev) && !cxl_pci_flit_256(to_pci_dev(&port->dev)))
> + return;
> +
> + if (parent_dport && dev_is_pci(parent_dport->dport_dev)) {
> + struct pci_dev *pdev = to_pci_dev(parent_dport->dport_dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + cxl_dport_init_bi(parent_dport);
> + break;
> + default:
> + break;
> + }
> + }
> +
> + cxl_uport_init_bi(port, &port->dev);
> +}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions
2025-08-12 1:02 ` [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions Davidlohr Bueso
@ 2025-08-15 15:27 ` Jonathan Cameron
0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2025-08-15 15:27 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl, Rafael J. Wysocki
On Mon, 11 Aug 2025 18:02:27 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> This has been renamed in more recent CXL specs, as
> type3 (memory expanders) can also use HDM-DB for
> device coherent memory.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
actbl1 is generated from acpica (more or less anyway).
So you need to make the changes there..
That makes renaming trickier.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca321d1ca6723ed0e04edd09de49c92b24e3648e
is an example of such a rename.
Seems like a reasonable thing to do - can we decouple this from
the rest of the set?
> ---
> drivers/cxl/acpi.c | 4 ++--
> include/acpi/actbl1.h | 4 ++--
> tools/testing/cxl/test/cxl.c | 18 +++++++++---------
> 3 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 712624cba2b6..9b32392c82fc 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -113,9 +113,9 @@ static unsigned long cfmws_to_decoder_flags(int restrictions)
> {
> unsigned long flags = CXL_DECODER_F_ENABLE;
>
> - if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE2)
> + if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_DEVMEM)
> flags |= CXL_DECODER_F_TYPE2;
> - if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_TYPE3)
> + if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM)
> flags |= CXL_DECODER_F_TYPE3;
> if (restrictions & ACPI_CEDT_CFMWS_RESTRICT_VOLATILE)
> flags |= CXL_DECODER_F_RAM;
> diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h
> index 99fd1588ff38..eb787dfbd2fa 100644
> --- a/include/acpi/actbl1.h
> +++ b/include/acpi/actbl1.h
> @@ -560,8 +560,8 @@ struct acpi_cedt_cfmws_target_element {
>
> /* Values for Restrictions field above */
>
> -#define ACPI_CEDT_CFMWS_RESTRICT_TYPE2 (1)
> -#define ACPI_CEDT_CFMWS_RESTRICT_TYPE3 (1<<1)
> +#define ACPI_CEDT_CFMWS_RESTRICT_DEVMEM (1)
> +#define ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM (1<<1)
> #define ACPI_CEDT_CFMWS_RESTRICT_VOLATILE (1<<2)
> #define ACPI_CEDT_CFMWS_RESTRICT_PMEM (1<<3)
> #define ACPI_CEDT_CFMWS_RESTRICT_FIXED (1<<4)
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 6a25cca5636f..ba50338f8ada 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -210,7 +210,7 @@ static struct {
> },
> .interleave_ways = 0,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 4UL,
> @@ -225,7 +225,7 @@ static struct {
> },
> .interleave_ways = 1,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 8UL,
> @@ -240,7 +240,7 @@ static struct {
> },
> .interleave_ways = 0,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 4UL,
> @@ -255,7 +255,7 @@ static struct {
> },
> .interleave_ways = 1,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 8UL,
> @@ -270,7 +270,7 @@ static struct {
> },
> .interleave_ways = 0,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 4UL,
> @@ -285,7 +285,7 @@ static struct {
> },
> .interleave_ways = 0,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_VOLATILE,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M,
> @@ -302,7 +302,7 @@ static struct {
> .interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
> .interleave_ways = 0,
> .granularity = 4,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 8UL,
> @@ -318,7 +318,7 @@ static struct {
> .interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
> .interleave_ways = 1,
> .granularity = 0,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_256M * 8UL,
> @@ -334,7 +334,7 @@ static struct {
> .interleave_arithmetic = ACPI_CEDT_CFMWS_ARITHMETIC_XOR,
> .interleave_ways = 8,
> .granularity = 1,
> - .restrictions = ACPI_CEDT_CFMWS_RESTRICT_TYPE3 |
> + .restrictions = ACPI_CEDT_CFMWS_RESTRICT_HOSTONLYMEM |
> ACPI_CEDT_CFMWS_RESTRICT_PMEM,
> .qtg_id = FAKE_QTG_ID,
> .window_size = SZ_512M * 6UL,
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 3/3] cxl: Support creating HDM-DB regions
2025-08-12 1:02 ` [PATCH 3/3] cxl: Support creating HDM-DB regions Davidlohr Bueso
@ 2025-08-15 15:41 ` Jonathan Cameron
0 siblings, 0 replies; 8+ messages in thread
From: Jonathan Cameron @ 2025-08-15 15:41 UTC (permalink / raw)
To: Davidlohr Bueso
Cc: dave.jiang, dan.j.williams, ira.weiny, alison.schofield, alucerop,
a.manzanares, anisa.su, linux-cxl
On Mon, 11 Aug 2025 18:02:28 -0700
Davidlohr Bueso <dave@stgolabs.net> wrote:
> A single Type 3 device can expose different parts of its memory with different
> coherency semantics. For example, some memory ranges within a Type 3 device
> could be configured as HDM-H, while other memory ranges on the same device
> could be configured as HDM-DB. This allows for flexible memory configuration
> within a single device. As such, coherency models are defined per memory region.
>
> For accelerators (type2), it is expected for accelerator drivers to manage the
> HDM-D[B] region creation. For type3, relevant sysfs tunables are provided to
> the user.
>
> Other than the HDM decoder supported coherency models, the main dependency
> to create these regions is for the device state to be BI-ready (cxlds->bi),
> for which already committed HDM decoders with the BI bit set detected during
> enumeration is currently not supported because endpoint and port enumerations
> are independent.
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> Documentation/ABI/testing/sysfs-bus-cxl | 41 ++++++++++++++----
> drivers/cxl/acpi.c | 2 +
> drivers/cxl/core/core.h | 4 ++
> drivers/cxl/core/hdm.c | 56 +++++++++++++++++++++---
> drivers/cxl/core/port.c | 55 +++++++++++++++++++++++-
> drivers/cxl/core/region.c | 57 +++++++++++++++++++++----
> drivers/cxl/cxl.h | 7 ++-
> drivers/cxl/cxlmem.h | 1 +
> include/acpi/actbl1.h | 1 +
> 9 files changed, 199 insertions(+), 25 deletions(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-bus-cxl b/Documentation/ABI/testing/sysfs-bus-cxl
> index 6b4e8c7a963d..7d9e3db736c3 100644
> --- a/Documentation/ABI/testing/sysfs-bus-cxl
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 6361365c5ce9..0a9532fa4ec1 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -51,6 +51,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER_INTERLEAVE_14_12 BIT(9)
> #define CXL_HDM_DECODER_INTERLEAVE_3_6_12_WAY BIT(11)
> #define CXL_HDM_DECODER_INTERLEAVE_16_WAY BIT(12)
> +#define CXL_HDM_DECODER_SUPPORTED_COHERENCY_MASK GENMASK(22, 21)
Given the values are a bit non obvious (particularly what 0 means :),
I'd add defines for them alongside the mask.
You use the value 1 later which should have been a define.
> #define CXL_HDM_DECODER_CTRL_OFFSET 0x4
> #define CXL_HDM_DECODER_ENABLE BIT(1)
> #define CXL_HDM_DECODER0_BASE_LOW_OFFSET(i) (0x20 * (i) + 0x10)
> @@ -65,6 +66,7 @@ extern const struct nvdimm_security_ops *cxl_security_ops;
> #define CXL_HDM_DECODER0_CTRL_COMMITTED BIT(10)
> #define CXL_HDM_DECODER0_CTRL_COMMIT_ERROR BIT(11)
> #define CXL_HDM_DECODER0_CTRL_HOSTONLY BIT(12)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-15 15:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-12 1:02 [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Davidlohr Bueso
2025-08-12 1:02 ` [PATCH 1/3] cxl/pci: Back-Invalidate device discovery and setup Davidlohr Bueso
2025-08-15 15:20 ` Jonathan Cameron
2025-08-12 1:02 ` [PATCH 2/3] acpi, tables: Rename coherency CFMW restrictions Davidlohr Bueso
2025-08-15 15:27 ` Jonathan Cameron
2025-08-12 1:02 ` [PATCH 3/3] cxl: Support creating HDM-DB regions Davidlohr Bueso
2025-08-15 15:41 ` Jonathan Cameron
2025-08-12 14:53 ` [PATCH RFC 0/3] cxl: Initial support for Back-Invalidate Jonathan Cameron
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).