Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Gavin Hindman <gavin.hindman@intel.com>,
	Linuxarm <linuxarm@huawei.com>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	Linux PCI <linux-pci@vger.kernel.org>,
	linux-cxl@vger.kernel.org, CHUCK_LEVER <chuck.lever@oracle.com>
Subject: Re: [RFC PATCH 0/1] DOE usage with pcie/portdrv
Date: Sat, 14 May 2022 15:55:21 +0200	[thread overview]
Message-ID: <20220514135521.GB14833@wunner.de> (raw)
In-Reply-To: <CAPcyv4hUKjt7QrA__wQ0KowfaxyQuMjHB5V-=rZBm=UbV4OvSg@mail.gmail.com>

On Wed, May 11, 2022 at 12:43:34PM -0700, Dan Williams wrote:
> On Wed, May 11, 2022 at 12:20 PM Lukas Wunner <lukas@wunner.de> wrote:
> > But the reset argument still stands:  That same section says that all
> > IDE streams transition to Insecure and all keys are invalidated upon
> > reset.
> 
> Right, this isn't the only problem with reset vs ongoing CXL operations...
> 
> https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/

The above-linked cover letter refers to AER.

I believe with AER, the kernel is notified of an error via an interrupt
and asynchronously attempts recovery through a reset.
Obviously, an eternity may pass until the kernel gets around to do that
and whether accesses performed between the initial error and the reset
succeed is sort of undefined.  So it's kind of a "best effort" error
recovery.

With the advent of DPC, the situation has improved considerably as the
hardware (not the kernel) automatically disables the link upon occurrence
of the initial error.  Any subsequent accesses will fail and the kernel
does not perform a reset itself (the hardware already did that) but merely
attempts to bring the link back up.  That has made error recovery pretty
solid and NVMe drives now seamlessly recover from errors without the need
to unbind/rebind the driver.  Data centers heavily depend on that feature.

Perhaps if CXL.mem used DPC, it would be able to recover more reliably?

Circling back to the SPDM/IDE topic, while NVMe is now capable of
reliably recovering from errors, it does expect the kernel to handle
recovery within a few seconds.  I'm not sure we can continue to
guarantee that if the kernel depends on user space to perform
re-authentication with SPDM after reset.  That's another headache
that we could avoid with in-kernel SPDM authentication.

Thanks,

Lukas

  reply	other threads:[~2022-05-14 13:55 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-03 15:34 [RFC PATCH 0/1] DOE usage with pcie/portdrv Jonathan Cameron
2022-05-03 15:34 ` [RFC PATCH 1/1] pcie/portdrv: Hack in DOE and CDAT support Jonathan Cameron
2022-05-06 22:40 ` [RFC PATCH 0/1] DOE usage with pcie/portdrv Dan Williams
2022-05-07 10:18   ` Lukas Wunner
2022-05-09  9:48     ` Jonathan Cameron
2022-05-11 19:13       ` Lukas Wunner
2022-05-11 19:19         ` Lukas Wunner
2022-05-11 19:43           ` Dan Williams
2022-05-14 13:55             ` Lukas Wunner [this message]
2022-05-16 17:01               ` Dan Williams
2022-05-27  9:39                 ` Lukas Wunner
2022-05-18 13:43               ` Christoph Hellwig
2022-05-18 15:08                 ` Dan Williams
2022-05-20  5:42                 ` Lukas Wunner
2022-05-20 15:37                   ` Dan Williams
2022-05-20 15:42                     ` Chuck Lever III
2022-05-11 19:42         ` Dan Williams
2022-05-11 20:22           ` Hindman, Gavin
2022-05-11 21:04             ` Dan Williams
2022-05-14 13:31           ` Lukas Wunner
2022-05-16 16:53             ` Dan Williams
2022-05-09  9:33   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220514135521.GB14833@wunner.de \
    --to=lukas@wunner.de \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=chuck.lever@oracle.com \
    --cc=dan.j.williams@intel.com \
    --cc=gavin.hindman@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox