* Assistance Needed with CXL Memory Region Creation
@ 2025-01-07 2:21 Yasunori Gotou (Fujitsu)
2025-01-16 11:13 ` Yasunori Gotou (Fujitsu)
0 siblings, 1 reply; 6+ messages in thread
From: Yasunori Gotou (Fujitsu) @ 2025-01-07 2:21 UTC (permalink / raw)
To: 'Dan Williams', 'linux-cxl@vger.kernel.org'
Cc: Yasunori Gotou (Fujitsu)
Hello,
I hope this message finds you well. I have a question regarding the use of CXL memory based
on the v2.0 specification, and I would appreciate any assistance you can provide.
I recently acquired a real machine equipped with actual CXL memory, and I'm attempting to
create a region using the "cxl create-region" command.
However, I encountered an issue where the command returns an ERANGE error.
Upon investigation, I discovered that the get_free_mem_region() function in kernel/resource.c
is returning this error.
It seems that the function is unable to allocate an area from CXL Window 0 due to it being "Soft Reserved."
Below is the relevant section from /proc/iomem:
====
89d500000-89fffffff : Reserved
8a0000000-189fffffff : CXL Window 0
8a0000000-189fffffff : Soft Reserved ------!!!!
fd00000000-fd03ffffff : Reserved
=====
In contrast, when using a QEMU CXL emulation environment, there are no resources under CXL Window 0,
allowing get_free_mem_region() to succeed and enabling region creation:
=====
100000000-47fffffff : System RAM
a90000000-e8fffffff : CXL Window 0 <--- there is no resource.
ec0000000-bfffffffff : PCI Bus 0000:00
c000000000-c00000ffff : PCI Bus 0000:0c
=====
From my understanding, the Soft Reserved area originates from EFI_MEMORY_SP.
Additionally, the CXL Memory Device SW Guide Rev1.1(*) says the followings
---
The only cases where it is recommended to omit the EFI_MEMORY_SP attribute are those
where the system contains exclusively CXL memory, or where CXL memory is interleaved
with DDR memory or where the CXL memory is required to boot.
----
Therefore I think this /proc/iomem map is natural for the real machine.
And, I expected a region to be created from the Soft Reserved area, but the actual behavior differs.
I'm confused why the kernel/driver rejects creating a region in the Soft Reserved area, and I have
many questions.
1. Is this behavior is intended?
2. If so, what steps are necessary to create a region in the Soft Reserved area?
3. Alternatively, is the Soft Reserved area not meant for region creation?
(*)https://cdrdv2-public.intel.com/643805/643805_CXL_Memory_Device_SW_Guide_Rev1_1.pdf
Thanks,
---
Yasunori Goto
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Assistance Needed with CXL Memory Region Creation
2025-01-07 2:21 Assistance Needed with CXL Memory Region Creation Yasunori Gotou (Fujitsu)
@ 2025-01-16 11:13 ` Yasunori Gotou (Fujitsu)
2025-01-16 18:26 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Yasunori Gotou (Fujitsu) @ 2025-01-16 11:13 UTC (permalink / raw)
To: 'Dan Williams', 'linux-cxl@vger.kernel.org'
Ping....
Is there anyone who can provide information to help resolve this issue?
I met this issue at kernel 6.13-rc4. Are there any fix for this issue?
Otherwise, are there any kernel version which can create region under the special purpose memory?
I'll add some more information below...
> Hello,
>
> I hope this message finds you well. I have a question regarding the use of CXL
> memory based on the v2.0 specification, and I would appreciate any assistance
> you can provide.
>
> I recently acquired a real machine equipped with actual CXL memory, and I'm
> attempting to create a region using the "cxl create-region" command.
> However, I encountered an issue where the command returns an ERANGE error.
> Upon investigation, I discovered that the get_free_mem_region() function in
> kernel/resource.c is returning this error.
When get_free_mem_region() returns the ERANGE error, __region_intersects() does not return REGION_DISJOINT.
As a result, the following "if" statement evaluates to true, causing the for loop to continue through the entire CXL memory window.
------------------------
static struct resource *
get_free_mem_region(struct device *dev, struct resource *base,
:
:
for (addr = gfr_start(base, size, align, flags);
gfr_continue(base, addr, align, flags);
addr = gfr_next(addr, align, flags)) {
if (__region_intersects(base, addr, size, 0, IORES_DESC_NONE) !=
REGION_DISJOINT)
continue; <---- !!!!!
----
On the other hand, in the CXL emulation environment, there is no EFI_MEMORY_SP area.
Therefore, __region_intersects() returns REGION_DISJOINT, allowing the code
in the following "}else{" block to be executed.
-----
} else {
res->start = addr;
res->end = addr + size - 1;
res->name = name;
res->desc = desc;
res->flags = IORESOURCE_MEM; <-----!!!!
/*
* Only succeed if the resource hosts an exclusive
* range after the insert
*/
if (__insert_resource(base, res) || res->child)
break;
write_unlock(&resource_lock);
}
------
I think cxl create-region should succeed in the special purpose memory area, but I cannot
understand why the above code results in an ERANGE error...
>
> It seems that the function is unable to allocate an area from CXL Window 0 due
> to it being "Soft Reserved."
> Below is the relevant section from /proc/iomem:
> ====
> 89d500000-89fffffff : Reserved
> 8a0000000-189fffffff : CXL Window 0
> 8a0000000-189fffffff : Soft Reserved ------!!!!
> fd00000000-fd03ffffff : Reserved
> =====
>
> In contrast, when using a QEMU CXL emulation environment, there are no
> resources under CXL Window 0, allowing get_free_mem_region() to succeed
> and enabling region creation:
> =====
> 100000000-47fffffff : System RAM
> a90000000-e8fffffff : CXL Window 0 <--- there is no resource.
> ec0000000-bfffffffff : PCI Bus 0000:00
> c000000000-c00000ffff : PCI Bus 0000:0c
> =====
>
> From my understanding, the Soft Reserved area originates from
> EFI_MEMORY_SP.
> Additionally, the CXL Memory Device SW Guide Rev1.1(*) says the followings
> ---
> The only cases where it is recommended to omit the EFI_MEMORY_SP
> attribute are those where the system contains exclusively CXL memory, or
> where CXL memory is interleaved with DDR memory or where the CXL memory
> is required to boot.
> ----
> Therefore I think this /proc/iomem map is natural for the real machine.
> And, I expected a region to be created from the Soft Reserved area, but the
> actual behavior differs.
>
> I'm confused why the kernel/driver rejects creating a region in the Soft
> Reserved area, and I have many questions.
>
> 1. Is this behavior is intended?
> 2. If so, what steps are necessary to create a region in the Soft Reserved area?
> 3. Alternatively, is the Soft Reserved area not meant for region creation?
>
> (*)https://cdrdv2-public.intel.com/643805/643805_CXL_Memory_Device_SW
> _Guide_Rev1_1.pdf
Thanks,
---
Yasunori Goto
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Assistance Needed with CXL Memory Region Creation
2025-01-16 11:13 ` Yasunori Gotou (Fujitsu)
@ 2025-01-16 18:26 ` Dan Williams
2025-01-17 7:00 ` Yasunori Gotou (Fujitsu)
2025-01-23 9:56 ` Zhijian Li (Fujitsu)
0 siblings, 2 replies; 6+ messages in thread
From: Dan Williams @ 2025-01-16 18:26 UTC (permalink / raw)
To: Yasunori Gotou (Fujitsu), 'Dan Williams',
'linux-cxl@vger.kernel.org'
Yasunori Gotou (Fujitsu) wrote:
> Ping....
> Is there anyone who can provide information to help resolve this issue?
I missed this earlier... holidays + merge window opening == inbox
overflow.
> I met this issue at kernel 6.13-rc4. Are there any fix for this issue?
> Otherwise, are there any kernel version which can create region under the special purpose memory?
>
> I'll add some more information below...
Thanks for that!
> I think cxl create-region should succeed in the special purpose memory area, but I cannot
> understand why the above code results in an ERANGE error...
[..]
> > Below is the relevant section from /proc/iomem:
> > ====
> > 89d500000-89fffffff : Reserved
> > 8a0000000-189fffffff : CXL Window 0
> > 8a0000000-189fffffff : Soft Reserved ------!!!!
> > fd00000000-fd03ffffff : Reserved
> > =====
This memory map is saying that BIOS created a CXL Window "CXL Window 0"
AND it populated that window with a CXL region marked "Soft Reserved".
The window is fully consumed by that existing BIOS created region. The
expectation is that the CXL subsystem parses the BIOS configuration and
creates a "cxl_region" object as a child of that Soft Reserved range.
However, for whatever reason it looks like the driver failed to parse
the CXL configuration. If that had worked the flow to create a new
region in that space would require first deleting the BIOS created
region.
So, the behavior you are seeing is expected. You can not create a
cxl_region in a space that already has a BIOS created cxl_region.
The work in this patchset [1] is aimed at making sure that even if the
kernel does not understand the BIOS CXL configuration it will still at
least transfer control of that address range to the device-dax
subsystem.
[1]: http://lore.kernel.org/cover.1737046620.git.nathan.fontenot@amd.com
...but that's just a fallback crutch. I would be interested to see more
details on why the kernel failed to assemble the region that the BIOS
created in this case.
The overall flow is:
- BIOS creates ACPI CFMWS
- BIOS optionally creates regions within one more CFMWS ranges
- BIOS builds EFI memory map with the CXL regions marked EFI_MEMORY_SP
- Linux boots and sees EFI_MEMORY_SP + CXL overlap and waits for the CXL
subsystem to assemble the region
- Region creation is only allowed with free capacity
The known bugs are:
- Corner case CXL configurations that trip up the driver (memory side
caching and CXL interleaved with DDR are current examples being
worked)
- Reliable fallback to "CXL unaware" behavior when region assembly
fails, should be address by [1]. A temporary workaround is to
disable the cxl_acpi driver so that hmem_register_device() skips CXL
range deferral.
- Inability to delete regions that were created by the BIOS. Should also
be addressed by [1].
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Assistance Needed with CXL Memory Region Creation
2025-01-16 18:26 ` Dan Williams
@ 2025-01-17 7:00 ` Yasunori Gotou (Fujitsu)
2025-02-03 10:08 ` Yasunori Gotou (Fujitsu)
2025-01-23 9:56 ` Zhijian Li (Fujitsu)
1 sibling, 1 reply; 6+ messages in thread
From: Yasunori Gotou (Fujitsu) @ 2025-01-17 7:00 UTC (permalink / raw)
To: 'Dan Williams', 'linux-cxl@vger.kernel.org'
Cc: Yasunori Gotou (Fujitsu)
> Yasunori Gotou (Fujitsu) wrote:
> > Ping....
> > Is there anyone who can provide information to help resolve this issue?
>
> I missed this earlier... holidays + merge window opening == inbox overflow.
Thank you for your help!
> > I met this issue at kernel 6.13-rc4. Are there any fix for this issue?
> > Otherwise, are there any kernel version which can create region under the
> special purpose memory?
> >
> > I'll add some more information below...
>
> Thanks for that!
>
> > I think cxl create-region should succeed in the special purpose memory
> > area, but I cannot understand why the above code results in an ERANGE
> error...
> [..]
> > > Below is the relevant section from /proc/iomem:
> > > ====
> > > 89d500000-89fffffff : Reserved
> > > 8a0000000-189fffffff : CXL Window 0
> > > 8a0000000-189fffffff : Soft Reserved ------!!!!
> > > fd00000000-fd03ffffff : Reserved
> > > =====
>
> This memory map is saying that BIOS created a CXL Window "CXL Window 0"
> AND it populated that window with a CXL region marked "Soft Reserved".
> The window is fully consumed by that existing BIOS created region. The
> expectation is that the CXL subsystem parses the BIOS configuration and
> creates a "cxl_region" object as a child of that Soft Reserved range.
>
> However, for whatever reason it looks like the driver failed to parse the CXL
> configuration. If that had worked the flow to create a new region in that space
> would require first deleting the BIOS created region.
Hmm, when I executed "cxl list -R" command, there is no region.
Certainly, the cxl driver might fail something to create regions at boot time....
----
# cxl list -R
Warning: no matching devices found
[]
----
> So, the behavior you are seeing is expected. You can not create a cxl_region in a
> space that already has a BIOS created cxl_region.
>
> The work in this patchset [1] is aimed at making sure that even if the kernel
> does not understand the BIOS CXL configuration it will still at least transfer
> control of that address range to the device-dax subsystem.
>
> [1]: http://lore.kernel.org/cover.1737046620.git.nathan.fontenot@amd.com
Thank you introducing this patch set.
I tried daxctl create-device and cxl create-region for each.
- "daxctl create-device -u command" : The command seems not able to find the device.
----
# daxctl create-device -u
error creating devices: No such device or address
created 0 devices
----
- "cxl create-region command." get_free_mem_region() certainly succeed.
But ENOSPC occurs at another place
-------
# cxl create-region -d decoder0.0 -t ram -m mem0
cxl region: create_region: decoder9.1: set_dpa_size failed: No space left on device
--------
Strace is here
--------
:
:
openat(AT_FDCWD, "/sys/bus/cxl/devices/root0/decoder0.0/region0/size", O_WRONLY|O_CLOEXEC) = 3
write(3, "0x1000000000\n\0", 14) = 14 <---succeeded by this patch!!!
close(3) = 0
:
(snip)
:
openat(AT_FDCWD, "/sys/bus/cxl/devices/root0/port2/endpoint9/decoder9.1/dpa_size", O_WRONLY|O_CLOEXEC) = 3
write(3, "0x1000000000\n\0", 14) = -1 ENOSPC (No space left on device) <--- new error!!!
close(3) = 0
write(2, "cxl region: create_region: ", 27cxl region: create_region: ) = 27
write(2, "decoder9.1: set_dpa_size failed:"..., 57decoder9.1: set_dpa_size failed: No space left on device
) = 57
------
>
> ...but that's just a fallback crutch. I would be interested to see more details on
> why the kernel failed to assemble the region that the BIOS created in this case.
>
> The overall flow is:
>
> - BIOS creates ACPI CFMWS
> - BIOS optionally creates regions within one more CFMWS ranges
> - BIOS builds EFI memory map with the CXL regions marked EFI_MEMORY_SP
> - Linux boots and sees EFI_MEMORY_SP + CXL overlap and waits for the CXL
> subsystem to assemble the region
> - Region creation is only allowed with free capacity
>
> The known bugs are:
> - Corner case CXL configurations that trip up the driver (memory side
> caching and CXL interleaved with DDR are current examples being
> worked)
> - Reliable fallback to "CXL unaware" behavior when region assembly
> fails, should be address by [1]. A temporary workaround is to
> disable the cxl_acpi driver so that hmem_register_device() skips CXL
> range deferral.
> - Inability to delete regions that were created by the BIOS. Should also
> be addressed by [1].
Thank you for your information.
I'll check BIOS implementation and investigate behaviors of the cxl devices.
Thanks,
----
Yasunori Goto
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Assistance Needed with CXL Memory Region Creation
2025-01-16 18:26 ` Dan Williams
2025-01-17 7:00 ` Yasunori Gotou (Fujitsu)
@ 2025-01-23 9:56 ` Zhijian Li (Fujitsu)
1 sibling, 0 replies; 6+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-01-23 9:56 UTC (permalink / raw)
To: Dan Williams, Yasunori Gotou (Fujitsu),
'linux-cxl@vger.kernel.org'
Hi Dan,
I have a question on CXL range deferral.
On 17/01/2025 02:26, Dan Williams wrote:
> Yasunori Gotou (Fujitsu) wrote:
>> Ping....
>> Is there anyone who can provide information to help resolve this issue?
>
> I missed this earlier... holidays + merge window opening == inbox
> overflow.
>
>> I met this issue at kernel 6.13-rc4. Are there any fix for this issue?
>> Otherwise, are there any kernel version which can create region under the special purpose memory?
>>
>> I'll add some more information below...
>
> Thanks for that!
>
>> I think cxl create-region should succeed in the special purpose memory area, but I cannot
>> understand why the above code results in an ERANGE error...
> [..]
>>> Below is the relevant section from /proc/iomem:
>>> ====
>>> 89d500000-89fffffff : Reserved
>>> 8a0000000-189fffffff : CXL Window 0
>>> 8a0000000-189fffffff : Soft Reserved ------!!!!
>>> fd00000000-fd03ffffff : Reserved
>>> =====
>
> This memory map is saying that BIOS created a CXL Window "CXL Window 0"
> AND it populated that window with a CXL region marked "Soft Reserved".
> The window is fully consumed by that existing BIOS created region. The
> expectation is that the CXL subsystem parses the BIOS configuration and
> creates a "cxl_region" object as a child of that Soft Reserved range.
>
> However, for whatever reason it looks like the driver failed to parse
> the CXL configuration. If that had worked the flow to create a new
> region in that space would require first deleting the BIOS created
> region.
>
> So, the behavior you are seeing is expected. You can not create a
> cxl_region in a space that already has a BIOS created cxl_region.
>
> The work in this patchset [1] is aimed at making sure that even if the
> kernel does not understand the BIOS CXL configuration it will still at
> least transfer control of that address range to the device-dax
> subsystem.
>
> [1]: http://lore.kernel.org/cover.1737046620.git.nathan.fontenot@amd.com
>
> ...but that's just a fallback crutch. I would be interested to see more
> details on why the kernel failed to assemble the region that the BIOS
> created in this case.
>
> The overall flow is:
>
> - BIOS creates ACPI CFMWS
> - BIOS optionally creates regions within one more CFMWS ranges
> - BIOS builds EFI memory map with the CXL regions marked EFI_MEMORY_SP
> - Linux boots and sees EFI_MEMORY_SP + CXL overlap and waits for the CXL
> subsystem to assemble the region
> - Region creation is only allowed with free capacity
>
> The known bugs are:
> - Corner case CXL configurations that trip up the driver (memory side
> caching and CXL interleaved with DDR are current examples being
> worked)
> - Reliable fallback to "CXL unaware" behavior when region assembly
> fails, should be address by [1]. A temporary workaround is to
> disable the cxl_acpi driver so that hmem_register_device() skips CXL
> range deferral.
Currently, we encountered the "CXL range deferral" issue
[ 7.722331] hmem_platform hmem_platform.0: deferring range to CXL: [mem 0x8a0000000-0x189fffffff flags 0x80000200]
After look into the code, I found that
If both cxl_acpi and dax_hmem are enabled, the issue of "CXL range deferral" will occur, right?
(cxl_acpi will parse the CFMWS and add the CXL Window to iomem_resource which must intersect with the res in clx region)
70 if (IS_ENABLED(CONFIG_CXL_REGION) &&
71 region_intersects(res->start, resource_size(res), IORESOURCE_MEM,
72 IORES_DESC_CXL) != REGION_DISJOINT) {
73 dev_dbg(host, "deferring range to CXL: %pr\n", res);
74 return 0;
75 }
In that case, under what circumstances is the cxl_hmem can transfer the CXL region to dax/kmem?
Thanks
Zhijian
> - Inability to delete regions that were created by the BIOS. Should also
> be addressed by [1].
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Assistance Needed with CXL Memory Region Creation
2025-01-17 7:00 ` Yasunori Gotou (Fujitsu)
@ 2025-02-03 10:08 ` Yasunori Gotou (Fujitsu)
0 siblings, 0 replies; 6+ messages in thread
From: Yasunori Gotou (Fujitsu) @ 2025-02-03 10:08 UTC (permalink / raw)
To: 'Dan Williams', 'linux-cxl@vger.kernel.org'
Hi,
> >
> > ...but that's just a fallback crutch. I would be interested to see
> > more details on why the kernel failed to assemble the region that the BIOS
> created in this case.
> >
> > The overall flow is:
> >
> > - BIOS creates ACPI CFMWS
> > - BIOS optionally creates regions within one more CFMWS ranges
> > - BIOS builds EFI memory map with the CXL regions marked
> EFI_MEMORY_SP
> > - Linux boots and sees EFI_MEMORY_SP + CXL overlap and waits for the
> CXL
> > subsystem to assemble the region
> > - Region creation is only allowed with free capacity
> >
> > The known bugs are:
> > - Corner case CXL configurations that trip up the driver (memory side
> > caching and CXL interleaved with DDR are current examples being
> > worked)
> > - Reliable fallback to "CXL unaware" behavior when region assembly
> > fails, should be address by [1]. A temporary workaround is to
> > disable the cxl_acpi driver so that hmem_register_device() skips CXL
> > range deferral.
> > - Inability to delete regions that were created by the BIOS. Should also
> > be addressed by [1].
>
> Thank you for your information.
> I'll check BIOS implementation and investigate behaviors of the cxl devices.
After investigation, we found the following two problems that caused the failure of
region creation at boot time. (Thank you!)
The first issue has been resolved but we still need your opinion for the second one..
1) HDM Decoder Capability indicated that the base address of the CXL memory was zero.
We found a related configuration in the BIOS, and the issue was resolved when I changed it.
2) There is a mismatch of Interleave granularity value between the CXL memory device and
others.
Currently, my system has only one CXL memory device, so the interleave way is only one.
Therefore, its granularity is not important.
Below is the debug log showing the error..
------------------------
# dmesg | grep ig:
[ 7.683843] cxl_port port1: decoder1.0: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 7.691103] cxl_port port2: decoder2.0: range: 0x8a0000000-0x189fffffff iw: 1 ig: 256
[ 7.719023] cxl_port port5: decoder5.0: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 7.744093] cxl_port port6: decoder6.0: range: 0x0-0xffffffffffffffff iw: 1 ig: 256
[ 8.452989] cxl_port endpoint9: decoder9.0: range: 0x8a0000000-0x189fffffff iw: 1 ig: 1024 <----!!!!
[ 8.454208] cxl_port endpoint9: decoder9.1: range: 0x0-0xffffffffffffffff iw: 1 ig: 1024 <----!!!!
[ 8.454273] cxl_pci 0000:01:00.0: mem0:decoder9.0: construct_region region0 res: [mem 0x8a0000000-0x189fffffff flags 0x200] iw: 1 ig: 1024 <---!!!
-----
Other decoders show their granularities of 256, but the endpoint9 only shows a granularity of 1024.
As a result, the cxl driver fails to initialize the device.
----
[ 8.454308] cxl region0: pci0000:00:port2 cxl_port_setup_targets expected iw: 1 ig: 1024 [mem 0x8a0000000-0x189fffffff flags 0x200]
[ 8.454310] cxl region0: pci0000:00:port2 cxl_port_setup_targets got iw: 1 ig: 256 state: enabled 0x8a0000000:0x189fffffff
-----
I suspect my CXL memory is indicating an Interleave Granularity value of 2(=1024 bytes),
in the Control Register of the CXL HDM Decoder Capability Structure.
I could not find a way to change this value to 256 through BIOS configuration.
Within our teams, there's an opinion that checking the granularity value is overkill since our interleave is 1-way.
Another opinion suggests that the cxl driver should use a granularity value of 256 byte as the default when it's 1-way,
considering the possibility of unusual hardware releases in the future.
Could you share your thoughts on these opinions?
Thank you,
----
Yasunori Goto
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-02-03 10:09 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 2:21 Assistance Needed with CXL Memory Region Creation Yasunori Gotou (Fujitsu)
2025-01-16 11:13 ` Yasunori Gotou (Fujitsu)
2025-01-16 18:26 ` Dan Williams
2025-01-17 7:00 ` Yasunori Gotou (Fujitsu)
2025-02-03 10:08 ` Yasunori Gotou (Fujitsu)
2025-01-23 9:56 ` Zhijian Li (Fujitsu)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox