Is there any plan to support CXL GPF in Linux

Linux CXL
 help / color / mirror / Atom feed

* Is there any plan to support CXL GPF in Linux
@ 2024-06-27  8:28 Yee Li
  2024-07-09  0:42 ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Yee Li @ 2024-06-27  8:28 UTC (permalink / raw)
  To: linux-cxl

Dear All,

Based on CXL Memory Device SW Guide and CXL Spec, the OS driver plays
a part in the CXL GPF sequence for persistent memory (like Samsung
CMM-H).

So, is there any plan to support CXL GPF in Linux?

Including init GPF DVSEC, flush data to GPF domain, etc.

Thanks,
Yee

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-06-27  8:28 Is there any plan to support CXL GPF in Linux Yee Li
@ 2024-07-09  0:42 ` Dan Williams
  2024-07-09  6:06   ` Christoph Hellwig
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-07-09  0:42 UTC (permalink / raw)
  To: Yee Li, linux-cxl

Yee Li wrote:
> Dear All,
> 
> Based on CXL Memory Device SW Guide and CXL Spec, the OS driver plays
> a part in the CXL GPF sequence for persistent memory (like Samsung
> CMM-H).
> 
> So, is there any plan to support CXL GPF in Linux?

See the "Maturity Map" [1] document for what the driver supports.

[1]: http://lore.kernel.org/172005486862.2048248.6668794717827294862.stgit@dwillia2-xfh.jf.intel.com

In terms of "plans", it is in the "patches welcome" state. While CXL
PMEM was an early focus of the kernel enabling, no PMEM devices
materialized in the market so focus moved elsewhere.

> Including init GPF DVSEC, flush data to GPF domain, etc.

One thing that guide does not cover is what should OS software do with a
dirty shutdown failure. To my knowledge there is no specific plumbing
for handling NVME device write-cache failures beyond: "hope filesystem
logging and metadata checksums can recover a consistent filesystem".

I do agree that the driver has a responsibility to set switch timeout
values, but that is more an unfortunate complexity imposed by the spec.
Just set the max and rely on devices to minimize GPF response times to
avoid the worst case wait times that those timeouts imply. In any event,
enabling that is "up for grabs."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  0:42 ` Dan Williams
@ 2024-07-09  6:06   ` Christoph Hellwig
  2024-07-09  7:22     ` Yee Li
  2024-07-09 18:33     ` Dan Williams
  0 siblings, 2 replies; 9+ messages in thread
From: Christoph Hellwig @ 2024-07-09  6:06 UTC (permalink / raw)
  To: Dan Williams; +Cc: Yee Li, linux-cxl

On Mon, Jul 08, 2024 at 05:42:39PM -0700, Dan Williams wrote:
> One thing that guide does not cover is what should OS software do with a
> dirty shutdown failure. To my knowledge there is no specific plumbing
> for handling NVME device write-cache failures beyond: "hope filesystem
> logging and metadata checksums can recover a consistent filesystem".
> 
> I do agree that the driver has a responsibility to set switch timeout
> values, but that is more an unfortunate complexity imposed by the spec.
> Just set the max and rely on devices to minimize GPF response times to
> avoid the worst case wait times that those timeouts imply. In any event,
> enabling that is "up for grabs."

Why would anyone specifically care about a (presumably non-volatile) write
cache failure?  A non-volatile write cache is simply part of the device
and it's failure rate guarantee.  So any data lost from it will be
recovered the same way as a media failure, SOC failure, interconnect
failure, etc.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  6:06   ` Christoph Hellwig
@ 2024-07-09  7:22     ` Yee Li
  2024-07-09  7:35       ` Christoph Hellwig
  2024-07-09 18:33     ` Dan Williams
  1 sibling, 1 reply; 9+ messages in thread
From: Yee Li @ 2024-07-09  7:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Dan Williams, linux-cxl

As I know, for cxl pmem, the DRAM cache is very large (even equal to
NAND flash). The data is stored in the DRAM cache. Data sync during
power loss or some sync operations. Data is stored in the DRAM cache
for most of the time.

Christoph Hellwig <hch@infradead.org> 于2024年7月9日周二 14:06写道：
>
> On Mon, Jul 08, 2024 at 05:42:39PM -0700, Dan Williams wrote:
> > One thing that guide does not cover is what should OS software do with a
> > dirty shutdown failure. To my knowledge there is no specific plumbing
> > for handling NVME device write-cache failures beyond: "hope filesystem
> > logging and metadata checksums can recover a consistent filesystem".
> >
> > I do agree that the driver has a responsibility to set switch timeout
> > values, but that is more an unfortunate complexity imposed by the spec.
> > Just set the max and rely on devices to minimize GPF response times to
> > avoid the worst case wait times that those timeouts imply. In any event,
> > enabling that is "up for grabs."
>
> Why would anyone specifically care about a (presumably non-volatile) write
> cache failure?  A non-volatile write cache is simply part of the device
> and it's failure rate guarantee.  So any data lost from it will be
> recovered the same way as a media failure, SOC failure, interconnect
> failure, etc.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  7:22     ` Yee Li
@ 2024-07-09  7:35       ` Christoph Hellwig
  2024-07-09  9:31         ` Yee Li
  0 siblings, 1 reply; 9+ messages in thread
From: Christoph Hellwig @ 2024-07-09  7:35 UTC (permalink / raw)
  To: Yee Li; +Cc: Christoph Hellwig, Dan Williams, linux-cxl

On Tue, Jul 09, 2024 at 03:22:53PM +0800, Yee Li wrote:
> As I know, for cxl pmem, the DRAM cache is very large (even equal to
> NAND flash). The data is stored in the DRAM cache. Data sync during
> power loss or some sync operations. Data is stored in the DRAM cache
> for most of the time.

If the device can't provide enough power on it's own to take care of
the DRAM flushing then it's simply broken.  All these details are
something the manufacturer needs to take care off.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  7:35       ` Christoph Hellwig
@ 2024-07-09  9:31         ` Yee Li
  2024-07-09 19:13           ` Dan Williams
  0 siblings, 1 reply; 9+ messages in thread
From: Yee Li @ 2024-07-09  9:31 UTC (permalink / raw)
  To: Christoph Hellwig, Dan Williams; +Cc: linux-cxl

Dear Christoph,

I agree with you.
cxl pmem device should process the GPF event well by itself which is sent
from Host or Switch.

Dear Dan,

ACPI.FADT.PERSISTENT_CPU_CACHES flag is not recognized by Linux.
Does it mean "cxl pmem flush operations reference the NFIT Platform
Capabilities"? Or, support the flag and cap in the future.

Christoph Hellwig <hch@infradead.org> 于2024年7月9日周二 15:35写道：

>
> On Tue, Jul 09, 2024 at 03:22:53PM +0800, Yee Li wrote:
> > As I know, for cxl pmem, the DRAM cache is very large (even equal to
> > NAND flash). The data is stored in the DRAM cache. Data sync during
> > power loss or some sync operations. Data is stored in the DRAM cache
> > for most of the time.
>
> If the device can't provide enough power on it's own to take care of
> the DRAM flushing then it's simply broken.  All these details are
> something the manufacturer needs to take care off.
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  6:06   ` Christoph Hellwig
  2024-07-09  7:22     ` Yee Li
@ 2024-07-09 18:33     ` Dan Williams
  1 sibling, 0 replies; 9+ messages in thread
From: Dan Williams @ 2024-07-09 18:33 UTC (permalink / raw)
  To: Christoph Hellwig, Dan Williams; +Cc: Yee Li, linux-cxl

Christoph Hellwig wrote:
> On Mon, Jul 08, 2024 at 05:42:39PM -0700, Dan Williams wrote:
> > One thing that guide does not cover is what should OS software do with a
> > dirty shutdown failure. To my knowledge there is no specific plumbing
> > for handling NVME device write-cache failures beyond: "hope filesystem
> > logging and metadata checksums can recover a consistent filesystem".
> > 
> > I do agree that the driver has a responsibility to set switch timeout
> > values, but that is more an unfortunate complexity imposed by the spec.
> > Just set the max and rely on devices to minimize GPF response times to
> > avoid the worst case wait times that those timeouts imply. In any event,
> > enabling that is "up for grabs."
> 
> Why would anyone specifically care about a (presumably non-volatile) write
> cache failure?  A non-volatile write cache is simply part of the device
> and it's failure rate guarantee.  So any data lost from it will be
> recovered the same way as a media failure, SOC failure, interconnect
> failure, etc.

Right, my concern is that the CXL specification is over-specified here
in its suggestion that system-software manage the dirty-state each boot.
I assume that if an NVME device experienced a super-cap failure that
prevented its write-cache from being drained on power-loss that event
would be logged somewhere. If administrator did not react to that event,
the kernel would just keep using the device on the next boot as if
nothing happened.

So I am more trying to preclude complicated patches around
dirty-shutdown handling since Yee mentioned the recommendations in the
CXL driver writer's guide. I.e. do not follow that guide explicitly, and
the recommendation to tightly scope GPF timeouts also seem over
specified.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09  9:31         ` Yee Li
@ 2024-07-09 19:13           ` Dan Williams
  2024-07-10  3:46             ` Yee Li
  0 siblings, 1 reply; 9+ messages in thread
From: Dan Williams @ 2024-07-09 19:13 UTC (permalink / raw)
  To: Yee Li, Christoph Hellwig, Dan Williams; +Cc: linux-cxl

Hi Yee, please refrain from top-posting [1]

[1]: https://subspace.kernel.org/etiquette.html

Yee Li wrote:
> Dear Christoph,
> 
> I agree with you.
> cxl pmem device should process the GPF event well by itself which is sent
> from Host or Switch.
> 
> Dear Dan,
> 
> ACPI.FADT.PERSISTENT_CPU_CACHES flag is not recognized by Linux.
> Does it mean "cxl pmem flush operations reference the NFIT Platform
> Capabilities"? Or, support the flag and cap in the future.

The Persistent CPU Caches flag is a performance optimization to avoid
CPU cache flushing when the platform is trusted to take care of cache
flushing at power-loss.

Software can force "persistent caches" behavior by:

    echo 0 > /sys/block/pmem0/dax/write_cache

From a kernel developer perspective, I feel more comfortable with
userspace making that "go fast at the risk of data-loss" decision. Maybe
after the industry gets more experience with platforms that set that bit
the kernel can default to trusting it, but given the slow roll out of
CXL PMEM devices I think Linux is ok to take a "wait and see" attitude.
At a minimum, if someone wants to draft a patch, it should be a build
time configuration option to trust the FADT.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Is there any plan to support CXL GPF in Linux
  2024-07-09 19:13           ` Dan Williams
@ 2024-07-10  3:46             ` Yee Li
  0 siblings, 0 replies; 9+ messages in thread
From: Yee Li @ 2024-07-10  3:46 UTC (permalink / raw)
  To: Dan Williams; +Cc: Christoph Hellwig, linux-cxl

Dan Williams <dan.j.williams@intel.com> 于2024年7月10日周三 03:13写道：
>
> Hi Yee, please refrain from top-posting [1]
>
> [1]: https://subspace.kernel.org/etiquette.html

Sorry, I will carefully read the mailing list etiquette.

> The Persistent CPU Caches flag is a performance optimization to avoid
> CPU cache flushing when the platform is trusted to take care of cache
> flushing at power-loss.
>
> Software can force "persistent caches" behavior by:
>
>     echo 0 > /sys/block/pmem0/dax/write_cache
>
> From a kernel developer perspective, I feel more comfortable with
> userspace making that "go fast at the risk of data-loss" decision. Maybe
> after the industry gets more experience with platforms that set that bit
> the kernel can default to trusting it, but given the slow roll out of
> CXL PMEM devices I think Linux is ok to take a "wait and see" attitude.
> At a minimum, if someone wants to draft a patch, it should be a build
> time configuration option to trust the FADT.

You are correct, exporting the dax flush to userspace is a comfortable
and safe method before we can trust the FADT flags.
Many thanks for your replying.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-07-10  3:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-27  8:28 Is there any plan to support CXL GPF in Linux Yee Li
2024-07-09  0:42 ` Dan Williams
2024-07-09  6:06   ` Christoph Hellwig
2024-07-09  7:22     ` Yee Li
2024-07-09  7:35       ` Christoph Hellwig
2024-07-09  9:31         ` Yee Li
2024-07-09 19:13           ` Dan Williams
2024-07-10  3:46             ` Yee Li
2024-07-09 18:33     ` Dan Williams

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox