Re: Is there any plan to support CXL GPF in Linux

Linux CXL
 help / color / mirror / Atom feed

From: Dan Williams <dan.j.williams@intel.com>
To: Christoph Hellwig <hch@infradead.org>,
	Dan Williams <dan.j.williams@intel.com>
Cc: Yee Li <seven.yi.lee@gmail.com>, <linux-cxl@vger.kernel.org>
Subject: Re: Is there any plan to support CXL GPF in Linux
Date: Tue, 9 Jul 2024 11:33:48 -0700	[thread overview]
Message-ID: <668d828c33c87_102cc294c7@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <ZozTTpbMjff82vO0@infradead.org>

Christoph Hellwig wrote:
> On Mon, Jul 08, 2024 at 05:42:39PM -0700, Dan Williams wrote:
> > One thing that guide does not cover is what should OS software do with a
> > dirty shutdown failure. To my knowledge there is no specific plumbing
> > for handling NVME device write-cache failures beyond: "hope filesystem
> > logging and metadata checksums can recover a consistent filesystem".
> > 
> > I do agree that the driver has a responsibility to set switch timeout
> > values, but that is more an unfortunate complexity imposed by the spec.
> > Just set the max and rely on devices to minimize GPF response times to
> > avoid the worst case wait times that those timeouts imply. In any event,
> > enabling that is "up for grabs."
> 
> Why would anyone specifically care about a (presumably non-volatile) write
> cache failure?  A non-volatile write cache is simply part of the device
> and it's failure rate guarantee.  So any data lost from it will be
> recovered the same way as a media failure, SOC failure, interconnect
> failure, etc.

Right, my concern is that the CXL specification is over-specified here
in its suggestion that system-software manage the dirty-state each boot.
I assume that if an NVME device experienced a super-cap failure that
prevented its write-cache from being drained on power-loss that event
would be logged somewhere. If administrator did not react to that event,
the kernel would just keep using the device on the next boot as if
nothing happened.

So I am more trying to preclude complicated patches around
dirty-shutdown handling since Yee mentioned the recommendations in the
CXL driver writer's guide. I.e. do not follow that guide explicitly, and
the recommendation to tightly scope GPF timeouts also seem over
specified.

     prev parent reply	other threads:[~2024-07-09 18:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-27  8:28 Is there any plan to support CXL GPF in Linux Yee Li
2024-07-09  0:42 ` Dan Williams
2024-07-09  6:06   ` Christoph Hellwig
2024-07-09  7:22     ` Yee Li
2024-07-09  7:35       ` Christoph Hellwig
2024-07-09  9:31         ` Yee Li
2024-07-09 19:13           ` Dan Williams
2024-07-10  3:46             ` Yee Li
2024-07-09 18:33     ` Dan Williams [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=668d828c33c87_102cc294c7@dwillia2-xfh.jf.intel.com.notmuch \
    --to=dan.j.williams@intel.com \
    --cc=hch@infradead.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=seven.yi.lee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox