From: Anisa Su <anisa.su887@gmail.com>
To: linux-cxl@vger.kernel.org
Cc: dan.j.williams@intel.com, ira.weiny@intel.com, dave@stgolabs.net,
linux-cxl@vger.kernel.org, nifan.cxl@gmail.com,
dongjoo.seo1@samsung.com
Subject: Re: [RFC PATCH 0/3] Add Support for Multiple DC Regions
Date: Wed, 3 Dec 2025 21:19:24 +0000 [thread overview]
Message-ID: <aTCpXMxnxtq4ZAPI@deb-101020-bm01.eng.stellus.in> (raw)
In-Reply-To: <20251203203540.1091827-1-anisa.su887@gmail.com>
On Wed, Dec 03, 2025 at 08:29:10PM +0000, anisa.su887@gmail.com wrote:
> From: Anisa Su <anisa.su@samsung.com>
>
> This patchset introduces support for multiple DC regions. It is rebased on top
> of the latest branch published to Ira's repository:
> https://github.com/weiny2/linux-kernel/tree/dcd-v6-2025-09-23.
> We hope it will be useful in the meantime for others and restart some
> discussion around how to move DCD forward.
>
> The corresponding NDCTL support can be found on this branch:
> https://github.com/anisa-su993/anisa-ndctl/tree/multiple-dc-region-support.
> I will reply to this thread with a reference to the thread for the
> NDCTL patches once published.
>
NDCTL thread: https://lore.kernel.org/linux-cxl/20251203211642.1104918-1-anisa.su887@gmail.com/T/#u
> Testing:
> This patchset was tested on a QEMU VM with the following topology:
>
> PCIE Root (pcie.0)
> │
> ├─ CXL Fixed Memory Window cxl-fmw.0
> ├─ CXL Root Complex cxl.0
> │ └─ Root Port root_port1
> │ └─ CXL Type-3 Device cxl-dcd0
> │
> ├─ CXL Fixed Memory Window cxl-fmw.1
> ├─ CXL Root Complex cxl.1
> │ └─ Root Port root_port2
> │ └─ CXL Type-3 Device cxl-dcd1
> └─
>
> "-object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/t3_cxl1.raw,size=8G \
> -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/t3_lsa1.raw,size=1M \
> -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/t3_cxl2.raw,size=8G \
> -object memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/t3_lsa2.raw,size=1M \
> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.0,hdm_for_passthrough=true \
> -device pxb-cxl,bus_nr=48,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
> -device cxl-rp,port=0,bus=cxl.0,id=root_port1,chassis=0,slot=1 \
> -device cxl-rp,port=1,bus=cxl.1,id=root_port2,chassis=1,slot=1 \
> -device cxl-type3,bus=root_port1,volatile-dc-memdev=cxl-mem1,id=cxl-dcd0,lsa=cxl-lsa1,num-dc-regions=8,sn=99 \
> -device cxl-type3,bus=root_port2,volatile-dc-memdev=cxl-mem2,id=cxl-dcd1,lsa=cxl-lsa2,num-dc-regions=8,sn=100 \
> -machine cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=8G,cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=8G"
>
> 2 CFMWs and 2 root complexes are emulated because QEMU creates
> 4 decoders/topology level. With 1 root complex, there are only 4 upstream
> decoders. Therefore in order to create 4+ regions, we need a total of
> 8 upstream decoders. This does mean that we are only able to create
> 4 regions on each device, although up to 8 are supported.
>
> Using `cxl list`, we can see mem0 and mem1 have dynamic_ram_* capablities:
> root@deb-101020-bm01:~# cxl list
> [
> {
> "memdevs":[
> {
> "memdev":"mem0",
> "dynamic_ram_0_size":1073741824,
> "dynamic_ram_1_size":1073741824,
> "dynamic_ram_2_size":1073741824,
> "dynamic_ram_3_size":1073741824,
> "dynamic_ram_4_size":1073741824,
> "dynamic_ram_5_size":1073741824,
> "dynamic_ram_6_size":1073741824,
> "dynamic_ram_7_size":1073741824,
> "serial":100,
> "host":"0000:31:00.0",
> "firmware_version":"BWFW VERSION 00"
> },
> {
> "memdev":"mem1",
> "dynamic_ram_0_size":1073741824,
> "dynamic_ram_1_size":1073741824,
> "dynamic_ram_2_size":1073741824,
> "dynamic_ram_3_size":1073741824,
> "dynamic_ram_4_size":1073741824,
> "dynamic_ram_5_size":1073741824,
> "dynamic_ram_6_size":1073741824,
> "dynamic_ram_7_size":1073741824,
> "serial":99,
> "host":"0000:0d:00.0",
> "firmware_version":"BWFW VERSION 00"
> }
> ]
> }
> ]
>
> To create the 8 regions:
> cxl create-region -m -d decoder0.0 -w 1 -s 1G mem1 -t dynamic_ram_0
> cxl create-region -m -d decoder0.0 -w 1 -s 1G mem1 -t dynamic_ram_1
> cxl create-region -m -d decoder0.0 -w 1 -s 1G mem1 -t dynamic_ram_2
> cxl create-region -m -d decoder0.0 -w 1 -s 1G mem1 -t dynamic_ram_3
>
> cxl create-region -m -d decoder0.1 -w 1 -s 1G mem0 -t dynamic_ram_4
> cxl create-region -m -d decoder0.1 -w 1 -s 1G mem0 -t dynamic_ram_5
> cxl create-region -m -d decoder0.1 -w 1 -s 1G mem0 -t dynamic_ram_6
> cxl create-region -m -d decoder0.1 -w 1 -s 1G mem0 -t dynamic_ram_7
>
>
> We can verify the 8 regions:
> root@deb-101020-bm01:~# cxl list
> [
> {
> "memdevs":[
> ...
> },
> {
> "regions":[
> {
> "region":"region0",
> "resource":79993765888,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region6",
> "resource":81067507712,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region7",
> "resource":82141249536,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region8",
> "resource":83214991360,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region1",
> "resource":88315265024,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region2",
> "resource":89389006848,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region3",
> "resource":90462748672,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> },
> {
> "region":"region4",
> "resource":91536490496,
> "size":1073741824,
> "interleave_ways":1,
> "interleave_granularity":256,
> "decode_state":"commit"
> }
> ]
> }
> ]
>
> Extents of various sizes (128MB, 256MB, 512MB, and 1GB) are added from mem1,
> which correspond to regions 0-3, then DAX devices are created from them.
> The extent DPAs are as follows, which allows each one to map to a distinct
> region:
> - [0-128] --> region0
> - [1024-1280] --> region1
> - [2048-2560] --> region2
> - [3072-4096] --> region3
>
> The correct sizes can be verified when creating the DAX device.
> root@deb-101020-bm01:~/libcxlmi# daxctl create-device -r region0
> [
> {
> "chardev":"dax0.1",
> "size":134217728,
> "target_node":1,
> "align":2097152,
> "mode":"devdax"
> }
> ]
> created 1 device
> root@deb-101020-bm01:~/libcxlmi# daxctl create-device -r region1
> [
> {
> "chardev":"dax1.1",
> "size":268435456,
> "target_node":1,
> "align":2097152,
> "mode":"devdax"
> }
> ]
> created 1 device
> root@deb-101020-bm01:~/libcxlmi# daxctl create-device -r region2
> [
> {
> "chardev":"dax2.1",
> "size":536870912,
> "target_node":1,
> "align":2097152,
> "mode":"devdax"
> }
> ]
> created 1 device
> root@deb-101020-bm01:~/libcxlmi# daxctl create-device -r region3
> [
> {
> "chardev":"dax3.1",
> "size":1073741824,
> "target_node":1,
> "align":2097152,
> "mode":"devdax"
> }
> ]
> created 1 device
>
> Then the DAX devices are reconfigured to system-ram mode and verified with lsmem.
> root@deb-101020-bm01:~/libcxlmi# daxctl reconfigure-device dax0.1 -m system-ram
> [
> {
> "chardev":"dax0.1",
> "size":134217728,
> "target_node":1,
> "align":2097152,
> "mode":"system-ram",
> "online_memblocks":1,
> "total_memblocks":1,
> "movable":true
> }
> ]
> reconfigured 1 device
> root@deb-101020-bm01:~/libcxlmi# daxctl reconfigure-device dax1.1 -m system-ram
> ...
> root@deb-101020-bm01:~/libcxlmi# daxctl reconfigure-device dax2.1 -m system-ram
> ...
> root@deb-101020-bm01:~/libcxlmi# daxctl reconfigure-device dax3.1 -m system-ram
> ...
>
>
> root@deb-101020-bm01:~/libcxlmi# lsmem
> RANGE SIZE STATE REMOVABLE BLOCK
> 0x0000000000000000-0x000000007fffffff 2G online yes 0-15
> 0x0000000100000000-0x000000027fffffff 6G online yes 32-79
> 0x00000012a0000000-0x00000012a7ffffff 128M online yes 596
> 0x00000012e0000000-0x00000012efffffff 256M online yes 604-605
> 0x0000001320000000-0x000000133fffffff 512M online yes 612-615
> 0x0000001360000000-0x000000139fffffff 1G online yes 620-627
>
> Memory block size: 128M
> Total online memory: 9.9G
> Total offline memory: 0B
>
> -------------------------------------------------------------------------------
> Note: I did try hacking QEMU to create 8 decoders at each level to avoid having
> 2 separate host bridges/DCDs by modifying include/hw/cxl/cxl_component.h like so:
>
> #define CXL_HDM_DECODER_COUNT 8
> HDM_DECODER_INIT(0);
> HDM_DECODER_INIT(1);
> HDM_DECODER_INIT(2);
> HDM_DECODER_INIT(3);
> HDM_DECODER_INIT(4);
> HDM_DECODER_INIT(5);
> HDM_DECODER_INIT(6);
> HDM_DECODER_INIT(7);
>
> However, when attempting to create the 5th cxl region,
> I ran into a timeout error when committing the decoders.
> Did not spend much time pursuing this further, most likely
> need to change more things on the QEMU side.
> But the 8 decoders do show up correctly under sysfs.
>
> Fan Ni (3):
> core/region: fix return logic for store_targetN
> dax/cxl: add existing dc extents when probing dax region
> dcd: Add support for multiple DC regions
>
> drivers/cxl/core/cdat.c | 2 +-
> drivers/cxl/core/core.h | 9 +-
> drivers/cxl/core/extent.c | 2 +-
> drivers/cxl/core/hdm.c | 18 +++-
> drivers/cxl/core/mbox.c | 39 +++++----
> drivers/cxl/core/memdev.c | 179 +++++++++++++++++++++++++-------------
> drivers/cxl/core/port.c | 45 ++++++++--
> drivers/cxl/core/region.c | 65 ++++++++------
> drivers/cxl/cxl.h | 23 ++++-
> drivers/cxl/cxlmem.h | 5 +-
> drivers/dax/cxl.c | 28 ++----
> 11 files changed, 281 insertions(+), 134 deletions(-)
>
> --
> 2.51.0
>
next prev parent reply other threads:[~2025-12-03 21:19 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 20:29 [RFC PATCH 0/3] Add Support for Multiple DC Regions anisa.su887
2025-12-03 20:29 ` [RFC PATCH 1/3] core/region: fix return logic for store_targetN anisa.su887
2025-12-04 17:04 ` Ira Weiny
2025-12-03 20:29 ` [RFC PATCH 2/3] dax/cxl: add existing dc extents when probing dax region anisa.su887
2025-12-03 21:03 ` Anisa Su
2025-12-04 17:29 ` Ira Weiny
2025-12-03 20:29 ` [RFC PATCH 3/3] dcd: Add support for multiple DC regions anisa.su887
2025-12-04 17:44 ` Ira Weiny
2025-12-03 21:19 ` Anisa Su [this message]
2025-12-04 17:28 ` [RFC PATCH 0/3] Add Support for Multiple DC Regions Ira Weiny
2025-12-11 21:05 ` Anisa Su
2025-12-12 22:07 ` Ira Weiny
2026-01-12 22:23 ` Anisa Su
2026-01-15 10:28 ` Alireza Sanaee
2026-02-11 1:44 ` Anisa Su
2026-02-11 9:34 ` Alireza Sanaee
2025-12-13 3:36 ` dan.j.williams
2026-01-12 22:50 ` Anisa Su
2026-01-13 0:08 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aTCpXMxnxtq4ZAPI@deb-101020-bm01.eng.stellus.in \
--to=anisa.su887@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=dongjoo.seo1@samsung.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=nifan.cxl@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox