* dax behavior on CXL 1.1 hosts
@ 2025-05-15 22:50 Yiannis Nikolakopoulos
2025-05-19 22:56 ` Dan Williams
0 siblings, 1 reply; 4+ messages in thread
From: Yiannis Nikolakopoulos @ 2025-05-15 22:50 UTC (permalink / raw)
To: linux-cxl; +Cc: yiannis, dan.j.williams
Hi,
I am trying to understand the dax behavior on CXL 1.1 hosts running
recent kernels and what is that I have probably misconfigured.
To my understanding this series [1] introduced some different
behavior, and I am trying to figure out what I am getting wrong here
(but of course I might as well be looking in a completely wrong
place).
System:
CXL 1.1 Host (x86) + CXL 2.0 Type 3 memory expander (FPGA).
CONFIG_EFI_SOFT_RESERVE is enabled.
The memory region appears correctly as soft reserved in the e820 tables.
When running with Kernel 6.2.16 (or older) the dax device appears as
expected, and I can either map it or configure it as system ram. No
issues there.
When running with 6.14.5 (and others e.g. 6.12-lts, 6.3), while the
soft reserved region is still there, neither the memory is brought
online, nor the dax device appears at all. I have tried what the
patchset suggested, `memhp_default_state=offline`, but there is no dax
device to reconfigure afterwards. Neither could I manually create
anything with daxctl as no dax regions are found. With online as
default, still nothing happens -- memory is not used and no dax device
appears.
I have dyndbg enabled for dax and hmem and the only entry I see is:
[ 12.714566] dax_hmem:hmem_register_device:73: hmem_platform
hmem_platform.0: deferring range to CXL: [mem
0x2050000000-0x240fffffff flags 0x80000200]
What is it that I am missing here?
Thanks in advance.
Best regards,
Yiannis
[1] https://lore.kernel.org/linux-cxl/167601992097.1924368.18291887895351917895.stgit@dwillia2-xfh.jf.intel.com/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dax behavior on CXL 1.1 hosts
2025-05-15 22:50 dax behavior on CXL 1.1 hosts Yiannis Nikolakopoulos
@ 2025-05-19 22:56 ` Dan Williams
2025-05-21 21:22 ` Yiannis Nikolakopoulos
0 siblings, 1 reply; 4+ messages in thread
From: Dan Williams @ 2025-05-19 22:56 UTC (permalink / raw)
To: Yiannis Nikolakopoulos, linux-cxl; +Cc: yiannis, dan.j.williams
Yiannis Nikolakopoulos wrote:
> Hi,
>
> I am trying to understand the dax behavior on CXL 1.1 hosts running
> recent kernels and what is that I have probably misconfigured.
> To my understanding this series [1] introduced some different
> behavior, and I am trying to figure out what I am getting wrong here
> (but of course I might as well be looking in a completely wrong
> place).
[..]
> When running with Kernel 6.2.16 (or older) the dax device appears as
> expected, and I can either map it or configure it as system ram. No
> issues there.
[..]
> What is it that I am missing here?
Starting around v6.3 the CXL subsystem started attempting to takeover
dax device registration.
09d09e04d2fc cxl/dax: Create dax devices for CXL RAM regions
I.e. instead of simply relying on memory-map information, "dax_hmem",
let the CXL subsystem assemble a cxl_region which outputs a dax-device
on the backend.
The rationale for this is that a CXL Region enables RAS flows that raw
memory map enumeration does not.
The problem though is what happens when the CXL subsytem fails to parse
the configuration. In that case you end up with neither a CXL Region, or
the the original "raw" dax_hmem device.
There has been a slow drip of fixes to get the CXL subsystem to
understand all the various platform quirks contributing to CXL Region
assembly failure. There is also efforts like this [1] in flight to
attempt to recover dax operation after a Region assembly failure.
Until then, the workaround is to disable the cxl_acpi driver from
loading. When cxl_acpi is disabled, dax_hmem proceeds to produce a raw
dax device. That configuration forfeits all the RAS support, but that is
to be expected when the CXL subsystem needs help to parse the system
topology.
[1]: http://lore.kernel.org/20250403183315.286710-1-terry.bowman@amd.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dax behavior on CXL 1.1 hosts
2025-05-19 22:56 ` Dan Williams
@ 2025-05-21 21:22 ` Yiannis Nikolakopoulos
2025-05-21 21:57 ` Dan Williams
0 siblings, 1 reply; 4+ messages in thread
From: Yiannis Nikolakopoulos @ 2025-05-21 21:22 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, yiannis
On Tue, 20 May 2025 at 00:56, Dan Williams <dan.j.williams@intel.com> wrote:
>
> Yiannis Nikolakopoulos wrote:
> > Hi,
> >
> > I am trying to understand the dax behavior on CXL 1.1 hosts running
> > recent kernels and what is that I have probably misconfigured.
> > To my understanding this series [1] introduced some different
> > behavior, and I am trying to figure out what I am getting wrong here
> > (but of course I might as well be looking in a completely wrong
> > place).
> [..]
> > When running with Kernel 6.2.16 (or older) the dax device appears as
> > expected, and I can either map it or configure it as system ram. No
> > issues there.
> [..]
> > What is it that I am missing here?
>
> Starting around v6.3 the CXL subsystem started attempting to takeover
> dax device registration.
>
> 09d09e04d2fc cxl/dax: Create dax devices for CXL RAM regions
>
> I.e. instead of simply relying on memory-map information, "dax_hmem",
> let the CXL subsystem assemble a cxl_region which outputs a dax-device
> on the backend.
>
> The rationale for this is that a CXL Region enables RAS flows that raw
> memory map enumeration does not.
>
> The problem though is what happens when the CXL subsytem fails to parse
> the configuration. In that case you end up with neither a CXL Region, or
> the the original "raw" dax_hmem device.
Thanks a lot for the explanation, this helped! I would guess this is what must
be happening in my case given some failures I see in the logs.
>
> There has been a slow drip of fixes to get the CXL subsystem to
> understand all the various platform quirks contributing to CXL Region
> assembly failure. There is also efforts like this [1] in flight to
> attempt to recover dax operation after a Region assembly failure.
Interesting, I will keep an eye on this and test the next version.
>
> Until then, the workaround is to disable the cxl_acpi driver from
> loading. When cxl_acpi is disabled, dax_hmem proceeds to produce a raw
> dax device. That configuration forfeits all the RAS support, but that is
> to be expected when the CXL subsystem needs help to parse the system
> topology.
Indeed the workaround works as expected, thank you very much for that.
Nevertheless, if figuring out the quirks of this platform provides any value to
the core driver and the community here, I can find some time within the next
couple of weeks to debug and delve deeper. I can of course give more details
on the platform, logs etc.
Thanks again for the help.
/Yiannis
>
> [1]: http://lore.kernel.org/20250403183315.286710-1-terry.bowman@amd.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: dax behavior on CXL 1.1 hosts
2025-05-21 21:22 ` Yiannis Nikolakopoulos
@ 2025-05-21 21:57 ` Dan Williams
0 siblings, 0 replies; 4+ messages in thread
From: Dan Williams @ 2025-05-21 21:57 UTC (permalink / raw)
To: Yiannis Nikolakopoulos, Dan Williams; +Cc: linux-cxl, yiannis
Yiannis Nikolakopoulos wrote:
> On Tue, 20 May 2025 at 00:56, Dan Williams <dan.j.williams@intel.com> wrote:
[..]
> > Until then, the workaround is to disable the cxl_acpi driver from
> > loading. When cxl_acpi is disabled, dax_hmem proceeds to produce a raw
> > dax device. That configuration forfeits all the RAS support, but that is
> > to be expected when the CXL subsystem needs help to parse the system
> > topology.
> Indeed the workaround works as expected, thank you very much for that.
>
> Nevertheless, if figuring out the quirks of this platform provides any value to
> the core driver and the community here, I can find some time within the next
> couple of weeks to debug and delve deeper. I can of course give more details
> on the platform, logs etc.
One of the larger known platform quirks that gives the CXL subsystem
heartburn are platforms that translate CXL HPA to another address space.
Patches in-flight to address that are here:
http://lore.kernel.org/20250218132356.1809075-1-rrichter@amd.com
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-05-21 21:57 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-15 22:50 dax behavior on CXL 1.1 hosts Yiannis Nikolakopoulos
2025-05-19 22:56 ` Dan Williams
2025-05-21 21:22 ` Yiannis Nikolakopoulos
2025-05-21 21:57 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox