public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Lukas Wunner <lukas@wunner.de>
Cc: Dave Jiang <dave.jiang@intel.com>, <linux-cxl@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <dan.j.williams@intel.com>,
	<ira.weiny@intel.com>, <vishal.l.verma@intel.com>,
	<alison.schofield@intel.com>, <dave@stgolabs.net>,
	<bhelgaas@google.com>
Subject: Re: [PATCH v3 4/4] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR)
Date: Thu, 4 Apr 2024 14:13:21 +0100	[thread overview]
Message-ID: <20240404141321.00005943@Huawei.com> (raw)
In-Reply-To: <Zg5qGJCAZJWY4xy9@wunner.de>

On Thu, 4 Apr 2024 10:51:36 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Wed, Apr 03, 2024 at 04:32:57PM +0100, Jonathan Cameron wrote:
> > On Tue, 2 Apr 2024 16:45:32 -0700 Dave Jiang <dave.jiang@intel.com> wrote:
> > Why not pass the info on what reset was done down from the PCI core?
> > I see Bjorn commented it would be *possible* to do it in the PCI core
> > but raised other concerns that needed addressing first (I think you've
> > dealt with those now).  Doesn't look that hard to me (I've not coded it
> > up yet though).
> > 
> > The core code knows how far it got down the list reset_methods before
> > it succeeded in resetting.  So...
> > 
> > Modify __pci_reset_function_locked() to return the index of the reset
> > method that succeeded. Then pass that to pci_dev_restore().
> > Finally push it into a reset_done2() that takes that as an extra
> > parameter and the driver can see if it is FLR or SBR.  
> 
> The reset types to distinguish per PCIe r6.2 sec 6.6 are
> Conventional Reset and Function Level Reset.
> 
> Secondary Bus Reset is a Conventional Reset.
> 
> The spec subdivides Conventional Reset into Cold, Warm and Hot,
> but that distinction is probably irrelevant for the kernel.

Agreed. SBR is only called out explicitly here because it's the one
with a handy triggering mechamism.

> 
> I think a more generalized (and therefore better) approach would be
> to store the reset type the device has undergone in struct pci_dev,
> right next to error_state, so that not just the ->reset_done()
> callback benefits from the information.  The reset type applied has
> consequences beyond the individual driver:  E.g. an FLR does not
> affect CMA-SPDM session state, but a Conventional Reset does.
> So there may be consumers of that information in the PCI core as well.

That makes sense if we do go the route of enhancing the information
provided for a reset.

> 
> It's worth noting that we already have an enum pcie_reset_state in
> <linux/pci.h> which distinguishes between deassert, warm and hot reset.
> It is currently only used by PowerPC EEH to convey to the platform
> which type of reset it should apply.  It might be possible to extend
> the enum so that it can be used to store the reset type that *was*
> applied to a device in struct pci_dev.
> 
> That all being said, checking for the *symptoms* of a Conventional Reset,
> as Dave has done here, may actually be more robust than just relying on
> what type of reset was applied.  E.g. after an FLR was handled, the device
> may experience a DPC-induced Hot Reset.  

This sounds like a plausible reason for doing it by symptom checking.

> By checking for the *symptoms*,
> the driver may be able to catch that the device has undergone a
> Conventional Reset immediately after an FLR.  Also, who knows if all
> devices are well-behaved and retain their state during an FLR, as they
> should per the spec?  Maybe there are broken devices which do not respect
> that rule.  Checking for symptoms of a Conventional Reset would catch
> those devices as well.

I'm not particularly keen on complexity additions to the kernel for
possible broken devices. For CXL devices the rules are very clear 
and the HDM decoder must not be reset.  If not chances are host OS will
take out BIOS setup memory and that isn't healthy.

Perhaps the key point here is that the patch title is misleading / simplistic.
The patch only warns if a reset happened that caused a configuration mismatch
for the address decoders.  SBR at other times is fine.

So even if we had a reset_type available, the driver would still need
to see if it mattered.

So I've ended up arguing myself into the fact all this code is needed anyway.
Perhaps change the patch title to

cxl: Add post reset warning if reset results in loss of previously committed HDM decoders.

If something along those lines..

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

Jonathan

> 
> Thanks,
> 
> Lukas


      reply	other threads:[~2024-04-04 13:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02 23:45 [PATCH 0/4 v3] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang
2024-04-02 23:45 ` [PATCH v3 1/4] PCI/cxl: Move PCI CXL vendor Id to a common location from CXL subsystem Dave Jiang
2024-04-02 23:45 ` [PATCH v3 2/4] PCI: Add check for CXL Secondary Bus Reset Dave Jiang
2024-04-03  8:26   ` Lukas Wunner
2024-04-04  0:19     ` Dave Jiang
2024-04-03 15:01   ` Jonathan Cameron
2024-04-02 23:45 ` [PATCH v3 3/4] PCI: Create new reset method to force SBR for CXL Dave Jiang
2024-04-03 15:09   ` Jonathan Cameron
2024-04-04  0:21     ` Dave Jiang
2024-04-04 13:29       ` Jonathan Cameron
2024-04-04 14:42         ` Dan Williams
2024-04-02 23:45 ` [PATCH v3 4/4] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang
2024-04-03 15:32   ` Jonathan Cameron
2024-04-03 16:27     ` Dan Williams
2024-04-04 13:16       ` Jonathan Cameron
2024-04-04  8:51     ` Lukas Wunner
2024-04-04 13:13       ` Jonathan Cameron [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240404141321.00005943@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox