* [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
@ 2026-03-06 12:11 Alireza Sanaee via qemu development
2026-03-06 12:11 ` [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window Alireza Sanaee via qemu development
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Alireza Sanaee via qemu development @ 2026-03-06 12:11 UTC (permalink / raw)
To: qemu-devel, gourry, lizhijian
Cc: alireza.sanaee, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
Hey everyone,
This is v7 of performant CXL type 3 regions set:
v7 -> v8:
- Rebased on top of the latest master. Base-commit stated at the end of cover-letter.
- Thanks to Gregory and Zhijian for testing and feedback. Addressed
their comments.
v5 -> v6:
- Use object_unparent() in the third commit when deleting alias regions.
- Thanks to Gregory for the suggestion and testing.
v4 -> v5:
- Fixed some minor patch style like missing trailing white space and such.
v3 -> v4:
- Tear down path changed, given that it is done differently than
setup.
- Dropped Gregory's tested-by tag due to tear down changes.
v2 -> v3:
- Addressing Zhijian Li. Thanks for the feedback.
v1 -> v2:
- Mainly rebase.
==========================================================
The CXL address to device decoding logic is complex because of the need
to correctly decode fine grained interleave. The current implementation
prevents use with KVM where executed instructions may reside in that
memory and gives very slow performance even in TCG.
In many real cases non interleaved memory configurations are useful and
for those we can use a more conventional memory region alias allowing
similar performance to other memory in the system.
Whether this fast path is applicable can be established once the full
set of HDM decoders has been committed (in whatever order the guest
decides to commit them). As such a check is performed on each commit /
uncommit of HDM decoder to establish if the alias should be added or
removed.
Performance numbers:
For a read/write test with 4K block size, 256M region size, and 1 thread
with 100 iteration on TCG (it should do similar on KVM):
- Non-interleaved region (fast path): 25-30 seconds.
- Interleaved region (no fast path): Never finishes within 10
minutes.
Tested Topologies and Region Layouts
====================================
This series was validated across multiple CXL topology configurations,
covering single-device, multi-device, multi-host-bridge, and switched
fabrics. Region creation was exercised using the `cxl` userspace tool
with both non-interleaved and interleaved setups.
Decoder and memdev identifiers were discovered using:
cxl list
cxl list -D
Decoder IDs (e.g. decoder0.0) and memdev names (mem0, mem1) are
environment-specific. Commands below use placeholders such as
<decoder_span_both> which should be replaced with IDs from `cxl list -D`.
---------------------------------------------------------------------
Region Layout Notation
----------------------
CFMW (CXL Fixed Memory Window) is shown as a linear address space
containing regions:
CFMW: [ R0 | R1 | R2 ]
R0, R1, R2 are regions created by `cxl create-region`.
Non-interleaved region:
R0 (ways=1) -> entirely on one device (mem0 or mem1)
Fast path: APPLICABLE
2-way interleaved region (g=256):
R1 (ways=2, g=256) striped across devices:
|mem0|mem1|mem0|mem1|mem0|mem1| ...
256 256 256 256 256 256 bytes
Fast path: NOT APPLICABLE
---------------------------------------------------------------------
1) One device, one host bridge, one fixed window
------------------------------------------------
QEMU:
-M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
-device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
-object memory-backend-ram,id=mem0,size=512M,share=on
-device cxl-type3,id=dev0,bus=rp0,memdev=mem0
Topology:
Host
|
+-- CXL Host Bridge (cxl.0)
|
+-- Root Port (rp0)
|
+-- Type-3 (dev0, mem0)
Regions created:
cxl create-region ... -w 1 ... mem0 (Fast path: YES)
cxl create-region ... -w 1 ... mem0 (Fast path: YES)
Layout:
CFMW: [ R0 | R1 ]
R0 -> mem0 (Fast path: YES)
R1 -> mem0 (Fast path: YES)
---------------------------------------------------------------------
2) One host bridge, two Type-3 devices (via two root ports)
------------------------------------------------------------
QEMU:
-M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
-device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
-device cxl-rp,id=rp1,bus=cxl.0,port=1,chassis=0,slot=3
-object memory-backend-ram,id=mem0,size=512M,share=on
-object memory-backend-ram,id=mem1,size=512M,share=on
-device cxl-type3,id=dev0,bus=rp0,memdev=mem0
-device cxl-type3,id=dev1,bus=rp1,memdev=mem1
Topology:
Host
|
+-- CXL Host Bridge (cxl.0)
|
+-- Root Port (rp0) -- Type-3 (dev0, mem0)
|
+-- Root Port (rp1) -- Type-3 (dev1, mem1)
Region patterns exercised:
2.1 All non-interleaved:
R0 -> mem0 (Fast path: YES)
R1 -> mem0 (Fast path: YES)
R2 -> mem1 (Fast path: YES)
R3 -> mem1 (Fast path: YES)
2.2 Interleaved + local:
R0 -> mem0/mem1 interleaved (Fast path: NO)
R1 -> mem0 (Fast path: YES)
2.3 Local + interleaved + local:
R0 -> mem0 (Fast path: YES)
R1 -> mem0/mem1 interleaved (Fast path: NO)
R2 -> mem1 (Fast path: YES)
---------------------------------------------------------------------
3) Two host bridges, one device per host bridge
------------------------------------------------
QEMU:
-M q35,cxl=on,
cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G,
cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=4G,
cxl-fmw.2.targets.0=cxl.0,cxl-fmw.2.size=4G
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
-device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
-object memory-backend-ram,id=mem0,size=512M,share=on
-device cxl-type3,id=dev0,bus=rp0,memdev=mem0
-device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=13
-device cxl-rp,id=rp1,bus=cxl.1,port=0,chassis=1,slot=2
-object memory-backend-ram,id=mem1,size=512M,share=on
-device cxl-type3,id=dev1,bus=rp1,memdev=mem1
Region patterns identical to section 2, and fast-path applicability is
identical per region mapping (non-interleaved: YES, interleaved: NO).
---------------------------------------------------------------------
4) Switch topology
------------------
QEMU:
-M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
-device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
-device cxl-rp,id=rp1,bus=cxl.0,port=0,chassis=0,slot=3
-device cxl-upstream,id=us0,bus=rp0
-device cxl-downstream,id=ds0,bus=us0,port=0,chassis=0,slot=4
-object memory-backend-ram,id=mem0,size=512M,share=on
-device cxl-type3,id=dev0,bus=ds0,memdev=mem0
Topology (detailed):
Host
|
+-- CXL Host Bridge (cxl.0)
|
+-- Root Port (rp0)
| |
| +-- CXL Switch (upstream us0)
| |
| +-- Downstream Port (ds0) -- Type-3 (mem0)
| |
| +-- Downstream Port (ds1) -- Type-3 (mem1) [optional]
+-- Root Port (rp1)
|
+-- More devices/switches.
Fast-path interpretation in this topology:
If only mem0 exists:
All regions -> Fast path: YES
If mem0 and mem1 exist:
Non-interleaved regions -> Fast path: YES
Interleaved regions -> Fast path: NO
---------------------------------------------------------------------
Summary
-------
Across all topologies, region creation, enablement, and HDM decoder
commit/uncommit flows were exercised. The fast path is enabled only when
all decoders describe a non-interleaved mapping and is removed when any
interleave configuration is introduced.
Alireza Sanaee (3):
hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in
window.
hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved
paths are accepted
hw/cxl: Add a performant (and correct) path for the non interleaved
cases
hw/cxl/cxl-component-utils.c | 6 +
hw/cxl/cxl-host.c | 231 +++++++++++++++++++++++++++++++++--
hw/mem/cxl_type3.c | 4 +
include/hw/cxl/cxl.h | 1 +
include/hw/cxl/cxl_device.h | 4 +
5 files changed, 237 insertions(+), 9 deletions(-)
base-commit: 483cb5b74cd247b1520e0994b4fae4d8fe44cb00
--
2.43.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window.
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
@ 2026-03-06 12:11 ` Alireza Sanaee via qemu development
2026-03-06 16:15 ` Gregory Price
2026-03-06 12:11 ` [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted Alireza Sanaee via qemu development
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Alireza Sanaee via qemu development @ 2026-03-06 12:11 UTC (permalink / raw)
To: qemu-devel, gourry, lizhijian
Cc: alireza.sanaee, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
This function will shortly be used to help find if there is a route to a
device, serving an HPA, under a particular fixed memory window. Rather than
having that new use case subtract the base address in the caller, only to
add it again in cxl_cfmws_find_device(), push the responsibility for
calculating the HPA to the caller.
This also reduces the inconsistency in the meaning of the hwaddr addr
parameter between this function and the calls made within it that access
the HDM decoders that operating on HPA.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
Thanks to Li for the tag.
Change log:
v6->v7: No change.
hw/cxl/cxl-host.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index f3479b1991..a94b893e99 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -168,9 +168,6 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
bool target_found;
PCIDevice *rp, *d;
- /* Address is relative to memory region. Convert to HPA */
- addr += fw->base;
-
rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl_host_bridge);
if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
@@ -254,7 +251,7 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
CXLFixedWindow *fw = opaque;
PCIDevice *d;
- d = cxl_cfmws_find_device(fw, addr);
+ d = cxl_cfmws_find_device(fw, addr + fw->base);
if (d == NULL) {
*data = 0;
/* Reads to invalid address return poison */
@@ -271,7 +268,7 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
CXLFixedWindow *fw = opaque;
PCIDevice *d;
- d = cxl_cfmws_find_device(fw, addr);
+ d = cxl_cfmws_find_device(fw, addr + fw->base);
if (d == NULL) {
/* Writes to invalid address are silent */
return MEMTX_OK;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
2026-03-06 12:11 ` [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window Alireza Sanaee via qemu development
@ 2026-03-06 12:11 ` Alireza Sanaee via qemu development
2026-03-06 15:12 ` Jonathan Cameron via qemu development
2026-03-06 16:15 ` Gregory Price
2026-03-06 12:11 ` [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
` (3 subsequent siblings)
5 siblings, 2 replies; 12+ messages in thread
From: Alireza Sanaee via qemu development @ 2026-03-06 12:11 UTC (permalink / raw)
To: qemu-devel, gourry, lizhijian
Cc: alireza.sanaee, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
Extend cxl_cfmws_find_device() with a parameter that filters on whether the
address lies in an interleaved range. For now all callers accept
interleave configurations so no functional changes.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
Thanks to Li for the tag.
Change log:
v6->v7: No change!
hw/cxl/cxl-host.c | 33 ++++++++++++++++++++++++++-------
1 file changed, 26 insertions(+), 7 deletions(-)
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index a94b893e99..2dc9f77007 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -104,7 +104,7 @@ void cxl_fmws_link_targets(Error **errp)
}
static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
- uint8_t *target)
+ uint8_t *target, bool *interleaved)
{
int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
unsigned int hdm_count;
@@ -138,6 +138,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
found = true;
ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
iw_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW);
+
+ if (interleaved) {
+ *interleaved = iw_enc != 0;
+ }
+
target_idx = (addr / cxl_decode_ig(ig_enc)) % (1 << iw_enc);
if (target_idx < 4) {
@@ -157,7 +162,8 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
return found;
}
-static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
+static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr,
+ bool allow_interleave)
{
CXLComponentState *hb_cstate, *usp_cstate;
PCIHostState *hb;
@@ -165,9 +171,13 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
int rb_index;
uint32_t *cache_mem;
uint8_t target;
- bool target_found;
+ bool target_found, interleaved;
PCIDevice *rp, *d;
+ if ((fw->num_targets > 1) && !allow_interleave) {
+ return NULL;
+ }
+
rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl_host_bridge);
if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
@@ -187,11 +197,16 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
cache_mem = hb_cstate->crb.cache_mem_registers;
- target_found = cxl_hdm_find_target(cache_mem, addr, &target);
+ target_found = cxl_hdm_find_target(cache_mem, addr, &target,
+ &interleaved);
if (!target_found) {
return NULL;
}
+ if (interleaved && !allow_interleave) {
+ return NULL;
+ }
+
rp = pcie_find_port_by_pn(hb->bus, target);
if (!rp) {
return NULL;
@@ -223,11 +238,15 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
cache_mem = usp_cstate->crb.cache_mem_registers;
- target_found = cxl_hdm_find_target(cache_mem, addr, &target);
+ target_found = cxl_hdm_find_target(cache_mem, addr, &target, &interleaved);
if (!target_found) {
return NULL;
}
+ if (interleaved && !allow_interleave) {
+ return NULL;
+ }
+
d = pcie_find_port_by_pn(&PCI_BRIDGE(d)->sec_bus, target);
if (!d) {
return NULL;
@@ -251,7 +270,7 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
CXLFixedWindow *fw = opaque;
PCIDevice *d;
- d = cxl_cfmws_find_device(fw, addr + fw->base);
+ d = cxl_cfmws_find_device(fw, addr + fw->base, true);
if (d == NULL) {
*data = 0;
/* Reads to invalid address return poison */
@@ -268,7 +287,7 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
CXLFixedWindow *fw = opaque;
PCIDevice *d;
- d = cxl_cfmws_find_device(fw, addr + fw->base);
+ d = cxl_cfmws_find_device(fw, addr + fw->base, true);
if (d == NULL) {
/* Writes to invalid address are silent */
return MEMTX_OK;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
2026-03-06 12:11 ` [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window Alireza Sanaee via qemu development
2026-03-06 12:11 ` [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted Alireza Sanaee via qemu development
@ 2026-03-06 12:11 ` Alireza Sanaee via qemu development
2026-03-06 15:15 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
2026-03-06 13:29 ` [PATCH v7 0/3] " Alireza Sanaee via qemu development
` (2 subsequent siblings)
5 siblings, 2 replies; 12+ messages in thread
From: Alireza Sanaee via qemu development @ 2026-03-06 12:11 UTC (permalink / raw)
To: qemu-devel, gourry, lizhijian
Cc: alireza.sanaee, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
The CXL address to device decoding logic is complex because of the need to
correctly decode fine grained interleave. The current implementation
prevents use with KVM where executed instructions may reside in that memory
and gives very slow performance even in TCG.
In many real cases non interleaved memory configurations are useful and for
those we can use a more conventional memory region alias allowing similar
performance to other memory in the system.
Whether this fast path is applicable can be established once the full set
of HDM decoders has been committed (in whatever order the guest decides to
commit them). As such a check is performed on each commit/uncommit of HDM
decoder to establish if the alias should be added or removed.
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
---
Thanks to Gregory, and Li Zhijian for their feedback.
v6->v7:
- Fixed a div by zero situation in the code for interleaved_ways_dec func.
- Changed the signature of cfmws_update_non_interleaved function to void.
hw/cxl/cxl-component-utils.c | 6 ++
hw/cxl/cxl-host.c | 197 +++++++++++++++++++++++++++++++++++
hw/mem/cxl_type3.c | 4 +
include/hw/cxl/cxl.h | 1 +
include/hw/cxl/cxl_device.h | 4 +
5 files changed, 212 insertions(+)
diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
index 07aabe331c..a624357978 100644
--- a/hw/cxl/cxl-component-utils.c
+++ b/hw/cxl/cxl-component-utils.c
@@ -143,6 +143,12 @@ static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, COMMITTED, 0);
}
stl_le_p((uint8_t *)cache_mem + offset, value);
+
+ if (should_commit) {
+ cfmws_update_non_interleaved(true);
+ } else if (should_uncommit) {
+ cfmws_update_non_interleaved(false);
+ }
}
static void bi_handler(CXLComponentState *cxl_cstate, hwaddr offset,
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index 2dc9f77007..079b27133b 100644
--- a/hw/cxl/cxl-host.c
+++ b/hw/cxl/cxl-host.c
@@ -264,6 +264,203 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr,
return d;
}
+typedef struct CXLDirectPTState {
+ CXLType3Dev *ct3d;
+ hwaddr decoder_base;
+ hwaddr decoder_size;
+ hwaddr dpa_base;
+ unsigned int hdm_decoder_idx;
+} CXLDirectPTState;
+
+static void cxl_fmws_direct_passthrough_setup(CXLDirectPTState *state,
+ CXLFixedWindow *fw)
+{
+ CXLType3Dev *ct3d = state->ct3d;
+ MemoryRegion *mr = NULL;
+ uint64_t vmr_size = 0, pmr_size = 0, offset = 0;
+ MemoryRegion *direct_mr;
+ g_autofree char *direct_mr_name;
+ unsigned int idx = state->hdm_decoder_idx;
+
+ if (ct3d->hostvmem) {
+ MemoryRegion *vmr = host_memory_backend_get_memory(ct3d->hostvmem);
+
+ vmr_size = memory_region_size(vmr);
+ if (state->dpa_base < vmr_size) {
+ mr = vmr;
+ offset = state->dpa_base;
+ }
+ }
+ if (!mr && ct3d->hostpmem) {
+ MemoryRegion *pmr = host_memory_backend_get_memory(ct3d->hostpmem);
+
+ pmr_size = memory_region_size(pmr);
+ if (state->dpa_base - vmr_size < pmr_size) {
+ mr = pmr;
+ offset = state->dpa_base - vmr_size;
+ }
+ }
+ if (!mr) {
+ return;
+ }
+
+ if (ct3d->direct_mr_fw[idx]) {
+ return;
+ }
+
+ direct_mr = &ct3d->direct_mr[idx];
+ direct_mr_name = g_strdup_printf("cxl-direct-mapping-alias-%u", idx);
+ if (!direct_mr_name) {
+ return;
+ }
+
+ memory_region_init_alias(direct_mr, OBJECT(ct3d), direct_mr_name, mr,
+ offset, state->decoder_size);
+ memory_region_add_subregion(&fw->mr,
+ state->decoder_base - fw->base, direct_mr);
+ ct3d->direct_mr_fw[idx] = fw;
+}
+
+static void cxl_fmws_direct_passthrough_remove(CXLType3Dev *ct3d,
+ uint64_t decoder_base,
+ unsigned int idx)
+{
+ CXLFixedWindow *owner_fw = ct3d->direct_mr_fw[idx];
+ MemoryRegion *direct_mr = &ct3d->direct_mr[idx];
+
+ if (!owner_fw) {
+ return;
+ }
+
+ if (!memory_region_is_mapped(direct_mr)) {
+ return;
+ }
+
+ if (cxl_cfmws_find_device(owner_fw, decoder_base, false)) {
+ return;
+ }
+
+ memory_region_del_subregion(&owner_fw->mr, direct_mr);
+ object_unparent(OBJECT(direct_mr));
+ ct3d->direct_mr_fw[idx] = NULL;
+}
+
+static int cxl_fmws_direct_passthrough(Object *obj, void *opaque)
+{
+ CXLDirectPTState *state = opaque;
+ CXLFixedWindow *fw;
+
+ if (!object_dynamic_cast(obj, TYPE_CXL_FMW)) {
+ return 0;
+ }
+
+ fw = CXL_FMW(obj);
+
+ /* Verify not interleaved */
+ if (!cxl_cfmws_find_device(fw, state->decoder_base, false)) {
+ return 0;
+ }
+
+ cxl_fmws_direct_passthrough_setup(state, fw);
+
+ return 0;
+}
+
+static int update_non_interleaved(Object *obj, void *opaque)
+{
+ const int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
+ bool commit = *(bool *)opaque;
+ CXLType3Dev *ct3d;
+ uint32_t *cache_mem;
+ unsigned int hdm_count, i;
+ int interleave_ways_dec;
+ uint32_t cap;
+ uint64_t dpa_base = 0;
+
+ if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
+ return 0;
+ }
+
+ ct3d = CXL_TYPE3(obj);
+ cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
+ cap = ldl_le_p(cache_mem + R_CXL_HDM_DECODER_CAPABILITY);
+ hdm_count = cxl_decoder_count_dec(FIELD_EX32(cap,
+ CXL_HDM_DECODER_CAPABILITY,
+ DECODER_COUNT));
+ for (i = 0; i < hdm_count; i++) {
+ uint64_t decoder_base, decoder_size, skip;
+ uint32_t hdm_ctrl, low, high;
+ int iw, committed;
+
+ hdm_ctrl = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + i * hdm_inc);
+ committed = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED);
+
+ /*
+ * Optimization: Looking for a fully committed path; if the type 3 HDM
+ * decoder is not commmitted, it cannot lie on such a path.
+ */
+ if (commit && !committed) {
+ return 0;
+ }
+
+ low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO +
+ i * hdm_inc);
+ high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI +
+ i * hdm_inc);
+ skip = ((uint64_t)high << 32) | (low & 0xf0000000);
+ dpa_base += skip;
+
+ low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + i * hdm_inc);
+ high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + i * hdm_inc);
+ decoder_size = ((uint64_t)high << 32) | (low & 0xf0000000);
+
+ low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + i * hdm_inc);
+ high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + i * hdm_inc);
+ decoder_base = ((uint64_t)high << 32) | (low & 0xf0000000);
+
+ iw = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IW);
+
+ if (iw == 0) {
+ if (!commit) {
+ cxl_fmws_direct_passthrough_remove(ct3d, decoder_base, i);
+ } else {
+ CXLDirectPTState state = {
+ .ct3d = ct3d,
+ .decoder_base = decoder_base,
+ .decoder_size = decoder_size,
+ .dpa_base = dpa_base,
+ .hdm_decoder_idx = i,
+ };
+
+ object_child_foreach_recursive(object_get_root(),
+ cxl_fmws_direct_passthrough,
+ &state);
+ }
+ }
+
+ interleave_ways_dec = cxl_interleave_ways_dec(iw, &error_fatal);
+ if (interleave_ways_dec == 0) {
+ return 0;
+ }
+
+ dpa_base += decoder_size / interleave_ways_dec;
+ }
+
+ return 0;
+}
+
+void cfmws_update_non_interleaved(bool commit)
+{
+ /*
+ * Walk endpoints to find both committed and uncommitted decoders,
+ * then check if they are not interleaved (but the path is fully set up).
+ */
+ object_child_foreach_recursive(object_get_root(),
+ update_non_interleaved, &commit);
+
+ return;
+}
+
static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
unsigned size, MemTxAttrs attrs)
{
diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 4739239da3..d9fc0bec8f 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -427,6 +427,8 @@ static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
ctrl = FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl);
+
+ cfmws_update_non_interleaved(true);
}
static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which)
@@ -442,6 +444,8 @@ static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which)
ctrl = FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 0);
stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl);
+
+ cfmws_update_non_interleaved(false);
}
static int ct3d_qmp_uncor_err_to_cxl(CxlUncorErrorType qmp_err)
diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
index 998f495a98..d8cd8359d2 100644
--- a/include/hw/cxl/cxl.h
+++ b/include/hw/cxl/cxl.h
@@ -71,4 +71,5 @@ CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp);
typedef struct CXLDownstreamPort CXLDownstreamPort;
DECLARE_INSTANCE_CHECKER(CXLDownstreamPort, CXL_DSP, TYPE_CXL_DSP)
+void cfmws_update_non_interleaved(bool commit);
#endif
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 393f312217..ba551fa5f9 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -685,6 +685,8 @@ typedef struct CXLSetFeatureInfo {
size_t data_size;
} CXLSetFeatureInfo;
+typedef struct CXLFixedWindow CXLFixedWindow;
+
struct CXLSanitizeInfo;
typedef struct CXLAlertConfig {
@@ -712,6 +714,8 @@ struct CXLType3Dev {
uint64_t sn;
/* State */
+ MemoryRegion direct_mr[CXL_HDM_DECODER_COUNT];
+ CXLFixedWindow *direct_mr_fw[CXL_HDM_DECODER_COUNT];
AddressSpace hostvmem_as;
AddressSpace hostpmem_as;
CXLComponentState cxl_cstate;
--
2.43.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
` (2 preceding siblings ...)
2026-03-06 12:11 ` [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
@ 2026-03-06 13:29 ` Alireza Sanaee via qemu development
2026-03-06 15:19 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
5 siblings, 0 replies; 12+ messages in thread
From: Alireza Sanaee via qemu development @ 2026-03-06 13:29 UTC (permalink / raw)
To: qemu-devel, gourry, lizhijian
Cc: anisa.su887, armbru, david, imammedo, jonathan.cameron, linuxarm,
mst, nifan.cxl, peterx, philmd, pbonzini, venkataravis,
xiaoguangrong.eric
On Fri, 6 Mar 2026 12:11:48 +0000
Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
Hi,
> Hey everyone,
>
> This is v7 of performant CXL type 3 regions set:
>
>
> v7 -> v8:
oops, typo. v8 is wrong.
v6 -> v7:
> - Rebased on top of the latest master. Base-commit stated at the end of cover-letter.
> - Thanks to Gregory and Zhijian for testing and feedback. Addressed
> their comments.
> v5 -> v6:
> - Use object_unparent() in the third commit when deleting alias regions.
> - Thanks to Gregory for the suggestion and testing.
> v4 -> v5:
> - Fixed some minor patch style like missing trailing white space and such.
> v3 -> v4:
> - Tear down path changed, given that it is done differently than
> setup.
> - Dropped Gregory's tested-by tag due to tear down changes.
> v2 -> v3:
> - Addressing Zhijian Li. Thanks for the feedback.
> v1 -> v2:
> - Mainly rebase.
>
> ==========================================================
>
> The CXL address to device decoding logic is complex because of the need
> to correctly decode fine grained interleave. The current implementation
> prevents use with KVM where executed instructions may reside in that
> memory and gives very slow performance even in TCG.
>
> In many real cases non interleaved memory configurations are useful and
> for those we can use a more conventional memory region alias allowing
> similar performance to other memory in the system.
>
> Whether this fast path is applicable can be established once the full
> set of HDM decoders has been committed (in whatever order the guest
> decides to commit them). As such a check is performed on each commit /
> uncommit of HDM decoder to establish if the alias should be added or
> removed.
>
>
> Performance numbers:
>
> For a read/write test with 4K block size, 256M region size, and 1 thread
> with 100 iteration on TCG (it should do similar on KVM):
>
> - Non-interleaved region (fast path): 25-30 seconds.
> - Interleaved region (no fast path): Never finishes within 10
> minutes.
>
> Tested Topologies and Region Layouts
> ====================================
>
> This series was validated across multiple CXL topology configurations,
> covering single-device, multi-device, multi-host-bridge, and switched
> fabrics. Region creation was exercised using the `cxl` userspace tool
> with both non-interleaved and interleaved setups.
>
> Decoder and memdev identifiers were discovered using:
>
> cxl list
> cxl list -D
>
> Decoder IDs (e.g. decoder0.0) and memdev names (mem0, mem1) are
> environment-specific. Commands below use placeholders such as
> <decoder_span_both> which should be replaced with IDs from `cxl list -D`.
>
> ---------------------------------------------------------------------
>
> Region Layout Notation
> ----------------------
>
> CFMW (CXL Fixed Memory Window) is shown as a linear address space
> containing regions:
>
> CFMW: [ R0 | R1 | R2 ]
>
> R0, R1, R2 are regions created by `cxl create-region`.
>
> Non-interleaved region:
>
> R0 (ways=1) -> entirely on one device (mem0 or mem1)
> Fast path: APPLICABLE
>
> 2-way interleaved region (g=256):
>
> R1 (ways=2, g=256) striped across devices:
>
> |mem0|mem1|mem0|mem1|mem0|mem1| ...
> 256 256 256 256 256 256 bytes
>
> Fast path: NOT APPLICABLE
>
> ---------------------------------------------------------------------
>
> 1) One device, one host bridge, one fixed window
> ------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
>
> Topology:
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0)
> |
> +-- Type-3 (dev0, mem0)
>
> Regions created:
>
> cxl create-region ... -w 1 ... mem0 (Fast path: YES)
> cxl create-region ... -w 1 ... mem0 (Fast path: YES)
>
> Layout:
>
> CFMW: [ R0 | R1 ]
>
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0 (Fast path: YES)
>
> ---------------------------------------------------------------------
>
> 2) One host bridge, two Type-3 devices (via two root ports)
> ------------------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -device cxl-rp,id=rp1,bus=cxl.0,port=1,chassis=0,slot=3
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -object memory-backend-ram,id=mem1,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
> -device cxl-type3,id=dev1,bus=rp1,memdev=mem1
>
> Topology:
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0) -- Type-3 (dev0, mem0)
> |
> +-- Root Port (rp1) -- Type-3 (dev1, mem1)
>
> Region patterns exercised:
>
> 2.1 All non-interleaved:
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0 (Fast path: YES)
> R2 -> mem1 (Fast path: YES)
> R3 -> mem1 (Fast path: YES)
>
> 2.2 Interleaved + local:
> R0 -> mem0/mem1 interleaved (Fast path: NO)
> R1 -> mem0 (Fast path: YES)
>
> 2.3 Local + interleaved + local:
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0/mem1 interleaved (Fast path: NO)
> R2 -> mem1 (Fast path: YES)
>
> ---------------------------------------------------------------------
>
> 3) Two host bridges, one device per host bridge
> ------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,
> cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G,
> cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=4G,
> cxl-fmw.2.targets.0=cxl.0,cxl-fmw.2.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
> -device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=13
> -device cxl-rp,id=rp1,bus=cxl.1,port=0,chassis=1,slot=2
> -object memory-backend-ram,id=mem1,size=512M,share=on
> -device cxl-type3,id=dev1,bus=rp1,memdev=mem1
>
> Region patterns identical to section 2, and fast-path applicability is
> identical per region mapping (non-interleaved: YES, interleaved: NO).
>
> ---------------------------------------------------------------------
>
> 4) Switch topology
> ------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -device cxl-rp,id=rp1,bus=cxl.0,port=0,chassis=0,slot=3
> -device cxl-upstream,id=us0,bus=rp0
> -device cxl-downstream,id=ds0,bus=us0,port=0,chassis=0,slot=4
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=ds0,memdev=mem0
>
> Topology (detailed):
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0)
> | |
> | +-- CXL Switch (upstream us0)
> | |
> | +-- Downstream Port (ds0) -- Type-3 (mem0)
> | |
> | +-- Downstream Port (ds1) -- Type-3 (mem1) [optional]
> +-- Root Port (rp1)
> |
> +-- More devices/switches.
>
> Fast-path interpretation in this topology:
>
> If only mem0 exists:
> All regions -> Fast path: YES
>
> If mem0 and mem1 exist:
> Non-interleaved regions -> Fast path: YES
> Interleaved regions -> Fast path: NO
>
> ---------------------------------------------------------------------
>
> Summary
> -------
>
> Across all topologies, region creation, enablement, and HDM decoder
> commit/uncommit flows were exercised. The fast path is enabled only when
> all decoders describe a non-interleaved mapping and is removed when any
> interleave configuration is introduced.
>
> Alireza Sanaee (3):
> hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in
> window.
> hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved
> paths are accepted
> hw/cxl: Add a performant (and correct) path for the non interleaved
> cases
>
> hw/cxl/cxl-component-utils.c | 6 +
> hw/cxl/cxl-host.c | 231 +++++++++++++++++++++++++++++++++--
> hw/mem/cxl_type3.c | 4 +
> include/hw/cxl/cxl.h | 1 +
> include/hw/cxl/cxl_device.h | 4 +
> 5 files changed, 237 insertions(+), 9 deletions(-)
>
>
> base-commit: 483cb5b74cd247b1520e0994b4fae4d8fe44cb00
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted
2026-03-06 12:11 ` [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted Alireza Sanaee via qemu development
@ 2026-03-06 15:12 ` Jonathan Cameron via qemu development
2026-03-06 16:15 ` Gregory Price
1 sibling, 0 replies; 12+ messages in thread
From: Jonathan Cameron via qemu development @ 2026-03-06 15:12 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, gourry, lizhijian, anisa.su887, armbru, david,
imammedo, linuxarm, mst, nifan.cxl, peterx, philmd, pbonzini,
venkataravis, xiaoguangrong.eric
On Fri, 6 Mar 2026 12:11:50 +0000
Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
> Extend cxl_cfmws_find_device() with a parameter that filters on whether the
> address lies in an interleaved range. For now all callers accept
> interleave configurations so no functional changes.
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 ` [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
@ 2026-03-06 15:15 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
1 sibling, 0 replies; 12+ messages in thread
From: Jonathan Cameron via qemu development @ 2026-03-06 15:15 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, gourry, lizhijian, anisa.su887, armbru, david,
imammedo, linuxarm, mst, nifan.cxl, peterx, philmd, pbonzini,
venkataravis, xiaoguangrong.eric
On Fri, 6 Mar 2026 12:11:51 +0000
Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
> The CXL address to device decoding logic is complex because of the need to
> correctly decode fine grained interleave. The current implementation
> prevents use with KVM where executed instructions may reside in that memory
> and gives very slow performance even in TCG.
>
> In many real cases non interleaved memory configurations are useful and for
> those we can use a more conventional memory region alias allowing similar
> performance to other memory in the system.
>
> Whether this fast path is applicable can be established once the full set
> of HDM decoders has been committed (in whatever order the guest decides to
> commit them). As such a check is performed on each commit/uncommit of HDM
> decoder to establish if the alias should be added or removed.
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Changes look good to me. Thanks for the quick turn around.
Jonathan
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
` (3 preceding siblings ...)
2026-03-06 13:29 ` [PATCH v7 0/3] " Alireza Sanaee via qemu development
@ 2026-03-06 15:19 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
5 siblings, 0 replies; 12+ messages in thread
From: Jonathan Cameron via qemu development @ 2026-03-06 15:19 UTC (permalink / raw)
To: Alireza Sanaee, mst
Cc: qemu-devel, gourry, lizhijian, anisa.su887, armbru, david,
imammedo, linuxarm, nifan.cxl, peterx, philmd, pbonzini,
venkataravis, xiaoguangrong.eric
On Fri, 6 Mar 2026 12:11:48 +0000
Alireza Sanaee <alireza.sanaee@huawei.com> wrote:
> Hey everyone,
Michael,
Assuming all other feedback that might come in is good, I consider this series
ready for upstream. So if you happen to do another pull request before
the soft freeze feel free to pick it up directly.
If not I'll carry it on my staging tree and post a rebased version next cycle.
Otherwise for CXL stuff there are a few fixes being revised. I don't currently
see anything new other than this as being ready for upstream.
Thanks for picking up so much earlier this cycle. That has made things
a lot more manageable!
Thanks,
Jonathan
>
> This is v7 of performant CXL type 3 regions set:
>
>
> v7 -> v8:
> - Rebased on top of the latest master. Base-commit stated at the end of cover-letter.
> - Thanks to Gregory and Zhijian for testing and feedback. Addressed
> their comments.
> v5 -> v6:
> - Use object_unparent() in the third commit when deleting alias regions.
> - Thanks to Gregory for the suggestion and testing.
> v4 -> v5:
> - Fixed some minor patch style like missing trailing white space and such.
> v3 -> v4:
> - Tear down path changed, given that it is done differently than
> setup.
> - Dropped Gregory's tested-by tag due to tear down changes.
> v2 -> v3:
> - Addressing Zhijian Li. Thanks for the feedback.
> v1 -> v2:
> - Mainly rebase.
>
> ==========================================================
>
> The CXL address to device decoding logic is complex because of the need
> to correctly decode fine grained interleave. The current implementation
> prevents use with KVM where executed instructions may reside in that
> memory and gives very slow performance even in TCG.
>
> In many real cases non interleaved memory configurations are useful and
> for those we can use a more conventional memory region alias allowing
> similar performance to other memory in the system.
>
> Whether this fast path is applicable can be established once the full
> set of HDM decoders has been committed (in whatever order the guest
> decides to commit them). As such a check is performed on each commit /
> uncommit of HDM decoder to establish if the alias should be added or
> removed.
>
>
> Performance numbers:
>
> For a read/write test with 4K block size, 256M region size, and 1 thread
> with 100 iteration on TCG (it should do similar on KVM):
>
> - Non-interleaved region (fast path): 25-30 seconds.
> - Interleaved region (no fast path): Never finishes within 10
> minutes.
>
> Tested Topologies and Region Layouts
> ====================================
>
> This series was validated across multiple CXL topology configurations,
> covering single-device, multi-device, multi-host-bridge, and switched
> fabrics. Region creation was exercised using the `cxl` userspace tool
> with both non-interleaved and interleaved setups.
>
> Decoder and memdev identifiers were discovered using:
>
> cxl list
> cxl list -D
>
> Decoder IDs (e.g. decoder0.0) and memdev names (mem0, mem1) are
> environment-specific. Commands below use placeholders such as
> <decoder_span_both> which should be replaced with IDs from `cxl list -D`.
>
> ---------------------------------------------------------------------
>
> Region Layout Notation
> ----------------------
>
> CFMW (CXL Fixed Memory Window) is shown as a linear address space
> containing regions:
>
> CFMW: [ R0 | R1 | R2 ]
>
> R0, R1, R2 are regions created by `cxl create-region`.
>
> Non-interleaved region:
>
> R0 (ways=1) -> entirely on one device (mem0 or mem1)
> Fast path: APPLICABLE
>
> 2-way interleaved region (g=256):
>
> R1 (ways=2, g=256) striped across devices:
>
> |mem0|mem1|mem0|mem1|mem0|mem1| ...
> 256 256 256 256 256 256 bytes
>
> Fast path: NOT APPLICABLE
>
> ---------------------------------------------------------------------
>
> 1) One device, one host bridge, one fixed window
> ------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
>
> Topology:
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0)
> |
> +-- Type-3 (dev0, mem0)
>
> Regions created:
>
> cxl create-region ... -w 1 ... mem0 (Fast path: YES)
> cxl create-region ... -w 1 ... mem0 (Fast path: YES)
>
> Layout:
>
> CFMW: [ R0 | R1 ]
>
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0 (Fast path: YES)
>
> ---------------------------------------------------------------------
>
> 2) One host bridge, two Type-3 devices (via two root ports)
> ------------------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -device cxl-rp,id=rp1,bus=cxl.0,port=1,chassis=0,slot=3
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -object memory-backend-ram,id=mem1,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
> -device cxl-type3,id=dev1,bus=rp1,memdev=mem1
>
> Topology:
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0) -- Type-3 (dev0, mem0)
> |
> +-- Root Port (rp1) -- Type-3 (dev1, mem1)
>
> Region patterns exercised:
>
> 2.1 All non-interleaved:
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0 (Fast path: YES)
> R2 -> mem1 (Fast path: YES)
> R3 -> mem1 (Fast path: YES)
>
> 2.2 Interleaved + local:
> R0 -> mem0/mem1 interleaved (Fast path: NO)
> R1 -> mem0 (Fast path: YES)
>
> 2.3 Local + interleaved + local:
> R0 -> mem0 (Fast path: YES)
> R1 -> mem0/mem1 interleaved (Fast path: NO)
> R2 -> mem1 (Fast path: YES)
>
> ---------------------------------------------------------------------
>
> 3) Two host bridges, one device per host bridge
> ------------------------------------------------
>
> QEMU:
>
> -M q35,cxl=on,
> cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G,
> cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=4G,
> cxl-fmw.2.targets.0=cxl.0,cxl-fmw.2.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=rp0,memdev=mem0
> -device pxb-cxl,id=cxl.1,bus=pcie.0,bus_nr=13
> -device cxl-rp,id=rp1,bus=cxl.1,port=0,chassis=1,slot=2
> -object memory-backend-ram,id=mem1,size=512M,share=on
> -device cxl-type3,id=dev1,bus=rp1,memdev=mem1
>
> Region patterns identical to section 2, and fast-path applicability is
> identical per region mapping (non-interleaved: YES, interleaved: NO).
>
> ---------------------------------------------------------------------
>
> 4) Switch topology
> ------------------
>
> QEMU:
>
> -M q35,cxl=on,cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=12
> -device cxl-rp,id=rp0,bus=cxl.0,port=0,chassis=0,slot=2
> -device cxl-rp,id=rp1,bus=cxl.0,port=0,chassis=0,slot=3
> -device cxl-upstream,id=us0,bus=rp0
> -device cxl-downstream,id=ds0,bus=us0,port=0,chassis=0,slot=4
> -object memory-backend-ram,id=mem0,size=512M,share=on
> -device cxl-type3,id=dev0,bus=ds0,memdev=mem0
>
> Topology (detailed):
>
> Host
> |
> +-- CXL Host Bridge (cxl.0)
> |
> +-- Root Port (rp0)
> | |
> | +-- CXL Switch (upstream us0)
> | |
> | +-- Downstream Port (ds0) -- Type-3 (mem0)
> | |
> | +-- Downstream Port (ds1) -- Type-3 (mem1) [optional]
> +-- Root Port (rp1)
> |
> +-- More devices/switches.
>
> Fast-path interpretation in this topology:
>
> If only mem0 exists:
> All regions -> Fast path: YES
>
> If mem0 and mem1 exist:
> Non-interleaved regions -> Fast path: YES
> Interleaved regions -> Fast path: NO
>
> ---------------------------------------------------------------------
>
> Summary
> -------
>
> Across all topologies, region creation, enablement, and HDM decoder
> commit/uncommit flows were exercised. The fast path is enabled only when
> all decoders describe a non-interleaved mapping and is removed when any
> interleave configuration is introduced.
>
> Alireza Sanaee (3):
> hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in
> window.
> hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved
> paths are accepted
> hw/cxl: Add a performant (and correct) path for the non interleaved
> cases
>
> hw/cxl/cxl-component-utils.c | 6 +
> hw/cxl/cxl-host.c | 231 +++++++++++++++++++++++++++++++++--
> hw/mem/cxl_type3.c | 4 +
> include/hw/cxl/cxl.h | 1 +
> include/hw/cxl/cxl_device.h | 4 +
> 5 files changed, 237 insertions(+), 9 deletions(-)
>
>
> base-commit: 483cb5b74cd247b1520e0994b4fae4d8fe44cb00
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window.
2026-03-06 12:11 ` [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window Alireza Sanaee via qemu development
@ 2026-03-06 16:15 ` Gregory Price
0 siblings, 0 replies; 12+ messages in thread
From: Gregory Price @ 2026-03-06 16:15 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, lizhijian, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
On Fri, Mar 06, 2026 at 12:11:49PM +0000, Alireza Sanaee wrote:
> This function will shortly be used to help find if there is a route to a
> device, serving an HPA, under a particular fixed memory window. Rather than
> having that new use case subtract the base address in the caller, only to
> add it again in cxl_cfmws_find_device(), push the responsibility for
> calculating the HPA to the caller.
>
> This also reduces the inconsistency in the meaning of the hwaddr addr
> parameter between this function and the calls made within it that access
> the HDM decoders that operating on HPA.
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Tested-by: Gregory Price <gourry@gourry.net>
> ---
> Thanks to Li for the tag.
> Change log:
> v6->v7: No change.
>
> hw/cxl/cxl-host.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index f3479b1991..a94b893e99 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -168,9 +168,6 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
> bool target_found;
> PCIDevice *rp, *d;
>
> - /* Address is relative to memory region. Convert to HPA */
> - addr += fw->base;
> -
> rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
> hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl_host_bridge);
> if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
> @@ -254,7 +251,7 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
> CXLFixedWindow *fw = opaque;
> PCIDevice *d;
>
> - d = cxl_cfmws_find_device(fw, addr);
> + d = cxl_cfmws_find_device(fw, addr + fw->base);
> if (d == NULL) {
> *data = 0;
> /* Reads to invalid address return poison */
> @@ -271,7 +268,7 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
> CXLFixedWindow *fw = opaque;
> PCIDevice *d;
>
> - d = cxl_cfmws_find_device(fw, addr);
> + d = cxl_cfmws_find_device(fw, addr + fw->base);
> if (d == NULL) {
> /* Writes to invalid address are silent */
> return MEMTX_OK;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted
2026-03-06 12:11 ` [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted Alireza Sanaee via qemu development
2026-03-06 15:12 ` Jonathan Cameron via qemu development
@ 2026-03-06 16:15 ` Gregory Price
1 sibling, 0 replies; 12+ messages in thread
From: Gregory Price @ 2026-03-06 16:15 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, lizhijian, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
On Fri, Mar 06, 2026 at 12:11:50PM +0000, Alireza Sanaee wrote:
> Extend cxl_cfmws_find_device() with a parameter that filters on whether the
> address lies in an interleaved range. For now all callers accept
> interleave configurations so no functional changes.
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Tested-by: Gregory Price <gourry@gourry.net>
> ---
> Thanks to Li for the tag.
>
> Change log:
> v6->v7: No change!
> hw/cxl/cxl-host.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index a94b893e99..2dc9f77007 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -104,7 +104,7 @@ void cxl_fmws_link_targets(Error **errp)
> }
>
> static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
> - uint8_t *target)
> + uint8_t *target, bool *interleaved)
> {
> int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
> unsigned int hdm_count;
> @@ -138,6 +138,11 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
> found = true;
> ig_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IG);
> iw_enc = FIELD_EX32(ctrl, CXL_HDM_DECODER0_CTRL, IW);
> +
> + if (interleaved) {
> + *interleaved = iw_enc != 0;
> + }
> +
> target_idx = (addr / cxl_decode_ig(ig_enc)) % (1 << iw_enc);
>
> if (target_idx < 4) {
> @@ -157,7 +162,8 @@ static bool cxl_hdm_find_target(uint32_t *cache_mem, hwaddr addr,
> return found;
> }
>
> -static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
> +static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr,
> + bool allow_interleave)
> {
> CXLComponentState *hb_cstate, *usp_cstate;
> PCIHostState *hb;
> @@ -165,9 +171,13 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
> int rb_index;
> uint32_t *cache_mem;
> uint8_t target;
> - bool target_found;
> + bool target_found, interleaved;
> PCIDevice *rp, *d;
>
> + if ((fw->num_targets > 1) && !allow_interleave) {
> + return NULL;
> + }
> +
> rb_index = (addr / cxl_decode_ig(fw->enc_int_gran)) % fw->num_targets;
> hb = PCI_HOST_BRIDGE(fw->target_hbs[rb_index]->cxl_host_bridge);
> if (!hb || !hb->bus || !pci_bus_is_cxl(hb->bus)) {
> @@ -187,11 +197,16 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
>
> cache_mem = hb_cstate->crb.cache_mem_registers;
>
> - target_found = cxl_hdm_find_target(cache_mem, addr, &target);
> + target_found = cxl_hdm_find_target(cache_mem, addr, &target,
> + &interleaved);
> if (!target_found) {
> return NULL;
> }
>
> + if (interleaved && !allow_interleave) {
> + return NULL;
> + }
> +
> rp = pcie_find_port_by_pn(hb->bus, target);
> if (!rp) {
> return NULL;
> @@ -223,11 +238,15 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr)
>
> cache_mem = usp_cstate->crb.cache_mem_registers;
>
> - target_found = cxl_hdm_find_target(cache_mem, addr, &target);
> + target_found = cxl_hdm_find_target(cache_mem, addr, &target, &interleaved);
> if (!target_found) {
> return NULL;
> }
>
> + if (interleaved && !allow_interleave) {
> + return NULL;
> + }
> +
> d = pcie_find_port_by_pn(&PCI_BRIDGE(d)->sec_bus, target);
> if (!d) {
> return NULL;
> @@ -251,7 +270,7 @@ static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
> CXLFixedWindow *fw = opaque;
> PCIDevice *d;
>
> - d = cxl_cfmws_find_device(fw, addr + fw->base);
> + d = cxl_cfmws_find_device(fw, addr + fw->base, true);
> if (d == NULL) {
> *data = 0;
> /* Reads to invalid address return poison */
> @@ -268,7 +287,7 @@ static MemTxResult cxl_write_cfmws(void *opaque, hwaddr addr,
> CXLFixedWindow *fw = opaque;
> PCIDevice *d;
>
> - d = cxl_cfmws_find_device(fw, addr + fw->base);
> + d = cxl_cfmws_find_device(fw, addr + fw->base, true);
> if (d == NULL) {
> /* Writes to invalid address are silent */
> return MEMTX_OK;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 ` [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
2026-03-06 15:15 ` Jonathan Cameron via qemu development
@ 2026-03-06 16:16 ` Gregory Price
1 sibling, 0 replies; 12+ messages in thread
From: Gregory Price @ 2026-03-06 16:16 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, lizhijian, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
On Fri, Mar 06, 2026 at 12:11:51PM +0000, Alireza Sanaee wrote:
> The CXL address to device decoding logic is complex because of the need to
> correctly decode fine grained interleave. The current implementation
> prevents use with KVM where executed instructions may reside in that memory
> and gives very slow performance even in TCG.
>
> In many real cases non interleaved memory configurations are useful and for
> those we can use a more conventional memory region alias allowing similar
> performance to other memory in the system.
>
> Whether this fast path is applicable can be established once the full set
> of HDM decoders has been committed (in whatever order the guest decides to
> commit them). As such a check is performed on each commit/uncommit of HDM
> decoder to establish if the alias should be added or removed.
>
> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
> Co-developed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Signed-off-by: Alireza Sanaee <alireza.sanaee@huawei.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Tested-by: Gregory Price <gourry@gourry.net>
> ---
> Thanks to Gregory, and Li Zhijian for their feedback.
> v6->v7:
> - Fixed a div by zero situation in the code for interleaved_ways_dec func.
> - Changed the signature of cfmws_update_non_interleaved function to void.
>
> hw/cxl/cxl-component-utils.c | 6 ++
> hw/cxl/cxl-host.c | 197 +++++++++++++++++++++++++++++++++++
> hw/mem/cxl_type3.c | 4 +
> include/hw/cxl/cxl.h | 1 +
> include/hw/cxl/cxl_device.h | 4 +
> 5 files changed, 212 insertions(+)
>
> diff --git a/hw/cxl/cxl-component-utils.c b/hw/cxl/cxl-component-utils.c
> index 07aabe331c..a624357978 100644
> --- a/hw/cxl/cxl-component-utils.c
> +++ b/hw/cxl/cxl-component-utils.c
> @@ -143,6 +143,12 @@ static void dumb_hdm_handler(CXLComponentState *cxl_cstate, hwaddr offset,
> value = FIELD_DP32(value, CXL_HDM_DECODER0_CTRL, COMMITTED, 0);
> }
> stl_le_p((uint8_t *)cache_mem + offset, value);
> +
> + if (should_commit) {
> + cfmws_update_non_interleaved(true);
> + } else if (should_uncommit) {
> + cfmws_update_non_interleaved(false);
> + }
> }
>
> static void bi_handler(CXLComponentState *cxl_cstate, hwaddr offset,
> diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
> index 2dc9f77007..079b27133b 100644
> --- a/hw/cxl/cxl-host.c
> +++ b/hw/cxl/cxl-host.c
> @@ -264,6 +264,203 @@ static PCIDevice *cxl_cfmws_find_device(CXLFixedWindow *fw, hwaddr addr,
> return d;
> }
>
> +typedef struct CXLDirectPTState {
> + CXLType3Dev *ct3d;
> + hwaddr decoder_base;
> + hwaddr decoder_size;
> + hwaddr dpa_base;
> + unsigned int hdm_decoder_idx;
> +} CXLDirectPTState;
> +
> +static void cxl_fmws_direct_passthrough_setup(CXLDirectPTState *state,
> + CXLFixedWindow *fw)
> +{
> + CXLType3Dev *ct3d = state->ct3d;
> + MemoryRegion *mr = NULL;
> + uint64_t vmr_size = 0, pmr_size = 0, offset = 0;
> + MemoryRegion *direct_mr;
> + g_autofree char *direct_mr_name;
> + unsigned int idx = state->hdm_decoder_idx;
> +
> + if (ct3d->hostvmem) {
> + MemoryRegion *vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +
> + vmr_size = memory_region_size(vmr);
> + if (state->dpa_base < vmr_size) {
> + mr = vmr;
> + offset = state->dpa_base;
> + }
> + }
> + if (!mr && ct3d->hostpmem) {
> + MemoryRegion *pmr = host_memory_backend_get_memory(ct3d->hostpmem);
> +
> + pmr_size = memory_region_size(pmr);
> + if (state->dpa_base - vmr_size < pmr_size) {
> + mr = pmr;
> + offset = state->dpa_base - vmr_size;
> + }
> + }
> + if (!mr) {
> + return;
> + }
> +
> + if (ct3d->direct_mr_fw[idx]) {
> + return;
> + }
> +
> + direct_mr = &ct3d->direct_mr[idx];
> + direct_mr_name = g_strdup_printf("cxl-direct-mapping-alias-%u", idx);
> + if (!direct_mr_name) {
> + return;
> + }
> +
> + memory_region_init_alias(direct_mr, OBJECT(ct3d), direct_mr_name, mr,
> + offset, state->decoder_size);
> + memory_region_add_subregion(&fw->mr,
> + state->decoder_base - fw->base, direct_mr);
> + ct3d->direct_mr_fw[idx] = fw;
> +}
> +
> +static void cxl_fmws_direct_passthrough_remove(CXLType3Dev *ct3d,
> + uint64_t decoder_base,
> + unsigned int idx)
> +{
> + CXLFixedWindow *owner_fw = ct3d->direct_mr_fw[idx];
> + MemoryRegion *direct_mr = &ct3d->direct_mr[idx];
> +
> + if (!owner_fw) {
> + return;
> + }
> +
> + if (!memory_region_is_mapped(direct_mr)) {
> + return;
> + }
> +
> + if (cxl_cfmws_find_device(owner_fw, decoder_base, false)) {
> + return;
> + }
> +
> + memory_region_del_subregion(&owner_fw->mr, direct_mr);
> + object_unparent(OBJECT(direct_mr));
> + ct3d->direct_mr_fw[idx] = NULL;
> +}
> +
> +static int cxl_fmws_direct_passthrough(Object *obj, void *opaque)
> +{
> + CXLDirectPTState *state = opaque;
> + CXLFixedWindow *fw;
> +
> + if (!object_dynamic_cast(obj, TYPE_CXL_FMW)) {
> + return 0;
> + }
> +
> + fw = CXL_FMW(obj);
> +
> + /* Verify not interleaved */
> + if (!cxl_cfmws_find_device(fw, state->decoder_base, false)) {
> + return 0;
> + }
> +
> + cxl_fmws_direct_passthrough_setup(state, fw);
> +
> + return 0;
> +}
> +
> +static int update_non_interleaved(Object *obj, void *opaque)
> +{
> + const int hdm_inc = R_CXL_HDM_DECODER1_BASE_LO - R_CXL_HDM_DECODER0_BASE_LO;
> + bool commit = *(bool *)opaque;
> + CXLType3Dev *ct3d;
> + uint32_t *cache_mem;
> + unsigned int hdm_count, i;
> + int interleave_ways_dec;
> + uint32_t cap;
> + uint64_t dpa_base = 0;
> +
> + if (!object_dynamic_cast(obj, TYPE_CXL_TYPE3)) {
> + return 0;
> + }
> +
> + ct3d = CXL_TYPE3(obj);
> + cache_mem = ct3d->cxl_cstate.crb.cache_mem_registers;
> + cap = ldl_le_p(cache_mem + R_CXL_HDM_DECODER_CAPABILITY);
> + hdm_count = cxl_decoder_count_dec(FIELD_EX32(cap,
> + CXL_HDM_DECODER_CAPABILITY,
> + DECODER_COUNT));
> + for (i = 0; i < hdm_count; i++) {
> + uint64_t decoder_base, decoder_size, skip;
> + uint32_t hdm_ctrl, low, high;
> + int iw, committed;
> +
> + hdm_ctrl = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + i * hdm_inc);
> + committed = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED);
> +
> + /*
> + * Optimization: Looking for a fully committed path; if the type 3 HDM
> + * decoder is not commmitted, it cannot lie on such a path.
> + */
> + if (commit && !committed) {
> + return 0;
> + }
> +
> + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_LO +
> + i * hdm_inc);
> + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_DPA_SKIP_HI +
> + i * hdm_inc);
> + skip = ((uint64_t)high << 32) | (low & 0xf0000000);
> + dpa_base += skip;
> +
> + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_LO + i * hdm_inc);
> + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_SIZE_HI + i * hdm_inc);
> + decoder_size = ((uint64_t)high << 32) | (low & 0xf0000000);
> +
> + low = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_LO + i * hdm_inc);
> + high = ldl_le_p(cache_mem + R_CXL_HDM_DECODER0_BASE_HI + i * hdm_inc);
> + decoder_base = ((uint64_t)high << 32) | (low & 0xf0000000);
> +
> + iw = FIELD_EX32(hdm_ctrl, CXL_HDM_DECODER0_CTRL, IW);
> +
> + if (iw == 0) {
> + if (!commit) {
> + cxl_fmws_direct_passthrough_remove(ct3d, decoder_base, i);
> + } else {
> + CXLDirectPTState state = {
> + .ct3d = ct3d,
> + .decoder_base = decoder_base,
> + .decoder_size = decoder_size,
> + .dpa_base = dpa_base,
> + .hdm_decoder_idx = i,
> + };
> +
> + object_child_foreach_recursive(object_get_root(),
> + cxl_fmws_direct_passthrough,
> + &state);
> + }
> + }
> +
> + interleave_ways_dec = cxl_interleave_ways_dec(iw, &error_fatal);
> + if (interleave_ways_dec == 0) {
> + return 0;
> + }
> +
> + dpa_base += decoder_size / interleave_ways_dec;
> + }
> +
> + return 0;
> +}
> +
> +void cfmws_update_non_interleaved(bool commit)
> +{
> + /*
> + * Walk endpoints to find both committed and uncommitted decoders,
> + * then check if they are not interleaved (but the path is fully set up).
> + */
> + object_child_foreach_recursive(object_get_root(),
> + update_non_interleaved, &commit);
> +
> + return;
> +}
> +
> static MemTxResult cxl_read_cfmws(void *opaque, hwaddr addr, uint64_t *data,
> unsigned size, MemTxAttrs attrs)
> {
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 4739239da3..d9fc0bec8f 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -427,6 +427,8 @@ static void hdm_decoder_commit(CXLType3Dev *ct3d, int which)
> ctrl = FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 1);
>
> stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl);
> +
> + cfmws_update_non_interleaved(true);
> }
>
> static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which)
> @@ -442,6 +444,8 @@ static void hdm_decoder_uncommit(CXLType3Dev *ct3d, int which)
> ctrl = FIELD_DP32(ctrl, CXL_HDM_DECODER0_CTRL, COMMITTED, 0);
>
> stl_le_p(cache_mem + R_CXL_HDM_DECODER0_CTRL + which * hdm_inc, ctrl);
> +
> + cfmws_update_non_interleaved(false);
> }
>
> static int ct3d_qmp_uncor_err_to_cxl(CxlUncorErrorType qmp_err)
> diff --git a/include/hw/cxl/cxl.h b/include/hw/cxl/cxl.h
> index 998f495a98..d8cd8359d2 100644
> --- a/include/hw/cxl/cxl.h
> +++ b/include/hw/cxl/cxl.h
> @@ -71,4 +71,5 @@ CXLComponentState *cxl_usp_to_cstate(CXLUpstreamPort *usp);
> typedef struct CXLDownstreamPort CXLDownstreamPort;
> DECLARE_INSTANCE_CHECKER(CXLDownstreamPort, CXL_DSP, TYPE_CXL_DSP)
>
> +void cfmws_update_non_interleaved(bool commit);
> #endif
> diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
> index 393f312217..ba551fa5f9 100644
> --- a/include/hw/cxl/cxl_device.h
> +++ b/include/hw/cxl/cxl_device.h
> @@ -685,6 +685,8 @@ typedef struct CXLSetFeatureInfo {
> size_t data_size;
> } CXLSetFeatureInfo;
>
> +typedef struct CXLFixedWindow CXLFixedWindow;
> +
> struct CXLSanitizeInfo;
>
> typedef struct CXLAlertConfig {
> @@ -712,6 +714,8 @@ struct CXLType3Dev {
> uint64_t sn;
>
> /* State */
> + MemoryRegion direct_mr[CXL_HDM_DECODER_COUNT];
> + CXLFixedWindow *direct_mr_fw[CXL_HDM_DECODER_COUNT];
> AddressSpace hostvmem_as;
> AddressSpace hostpmem_as;
> CXLComponentState cxl_cstate;
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
` (4 preceding siblings ...)
2026-03-06 15:19 ` Jonathan Cameron via qemu development
@ 2026-03-06 16:16 ` Gregory Price
5 siblings, 0 replies; 12+ messages in thread
From: Gregory Price @ 2026-03-06 16:16 UTC (permalink / raw)
To: Alireza Sanaee
Cc: qemu-devel, lizhijian, anisa.su887, armbru, david, imammedo,
jonathan.cameron, linuxarm, mst, nifan.cxl, peterx, philmd,
pbonzini, venkataravis, xiaoguangrong.eric
On Fri, Mar 06, 2026 at 12:11:48PM +0000, Alireza Sanaee wrote:
> Hey everyone,
>
> This is v7 of performant CXL type 3 regions set:
>
Just wanted to thank you again for this, it has really helped accelerate
(lol) a bunch of development that has been going on.
~Gregory
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-03-06 16:22 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-06 12:11 [PATCH v7 0/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
2026-03-06 12:11 ` [PATCH v7 1/3] hw/cxl: Use HPA in cxl_cfmws_find_device() rather than offset in window Alireza Sanaee via qemu development
2026-03-06 16:15 ` Gregory Price
2026-03-06 12:11 ` [PATCH v7 2/3] hw/cxl: Allow cxl_cfmws_find_device() to filter on whether interleaved paths are accepted Alireza Sanaee via qemu development
2026-03-06 15:12 ` Jonathan Cameron via qemu development
2026-03-06 16:15 ` Gregory Price
2026-03-06 12:11 ` [PATCH v7 3/3] hw/cxl: Add a performant (and correct) path for the non interleaved cases Alireza Sanaee via qemu development
2026-03-06 15:15 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
2026-03-06 13:29 ` [PATCH v7 0/3] " Alireza Sanaee via qemu development
2026-03-06 15:19 ` Jonathan Cameron via qemu development
2026-03-06 16:16 ` Gregory Price
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.