From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>, <linux-cxl@vger.kernel.org>,
<linux-pci@vger.kernel.org>, <ira.weiny@intel.com>,
<vishal.l.verma@intel.com>, <alison.schofield@intel.com>,
<dave@stgolabs.net>, <bhelgaas@google.com>, <lukas@wunner.de>
Subject: Re: [PATCH v3 4/4] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR)
Date: Thu, 4 Apr 2024 14:16:34 +0100 [thread overview]
Message-ID: <20240404141634.00003872@Huawei.com> (raw)
In-Reply-To: <660d83708993_2459629416@dwillia2-mobl3.amr.corp.intel.com.notmuch>
On Wed, 3 Apr 2024 09:27:28 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> Jonathan Cameron wrote:
> > On Tue, 2 Apr 2024 16:45:32 -0700
> > Dave Jiang <dave.jiang@intel.com> wrote:
> >
> > > SBR is equivalent to a device been hot removed and inserted again. Doing a
> > > SBR on a CXL type 3 device is problematic if the exported device memory is
> > > part of system memory that cannot be offlined. The event is equivalent to
> > > violently ripping out that range of memory from the kernel. While the
> > > hardware requires the "Unmask SBR" bit set in the Port Control Extensions
> > > register and the kernel currently does not unmask it, user can unmask
> > > this bit via setpci or similar tool.
> > >
> > > The driver does not have a way to detect whether a reset coming from the
> > > PCI subsystem is a Function Level Reset (FLR) or SBR. The only way to
> > > detect is to note if a decoder is marked as enabled in software but the
> > > decoder control register indicates it's not committed.
> > >
> > > A helper function is added to find discrepancy between the decoder
> > > software state versus the hardware register state.
> > >
> > > Suggested-by: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> >
> > As I said way back on v1, this smells hacky.
> >
> > Why not pass the info on what reset was done down from the PCI core?
> > I see Bjorn commented it would be *possible* to do it in the PCI core
> > but raised other concerns that needed addressing first (I think you've
> > dealt with thosenow). Doesn't look that hard to me (I've not coded it
> > up yet though).
> >
> > The core code knows how far it got down the list reset_methods before
> > it succeeded in resetting. So...
> >
> > Modify __pci_reset_function_locked() to return the index of the reset
> > method that succeeded. Then pass that to pci_dev_restore().
> > Finally push it into a reset_done2() that takes that as an extra
> > parameter and the driver can see if it is FLR or SBR.
> > The extended reset_done is to avoid modifying lots of drivers.
> > However a quick grep suggests it's not that heavily used (15ish?)
> > so maybe just add the parameter.
> >
> > There are a few other paths, but non look that problematic at
> > first glance...
> >
> > So Bjorn, now the rest of this is hopefully close to what you'll be
> > happey with, which way do you prefer?
>
> I will defer to Bjorn, but I am not fan of this reset_done2() proposal.
> "Revalidate after reset" is a common driver pattern and all that
> plumbing the effective-reset-type does is make cxl_reset_done() more
> precise for no discernible value.
As per other thread branch, I think you are right, but key is this is not
detecting the SBR at all, it's detecting HDM decoders not being in expected
state. If they weren't setup before SBR, then we don't warn. So SBR is
the cause, but not what is being detected (which is a subset of SBR results)
>
next prev parent reply other threads:[~2024-04-04 13:16 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-02 23:45 [PATCH 0/4 v3] PCI: Add Secondary Bus Reset (SBR) support for CXL Dave Jiang
2024-04-02 23:45 ` [PATCH v3 1/4] PCI/cxl: Move PCI CXL vendor Id to a common location from CXL subsystem Dave Jiang
2024-04-02 23:45 ` [PATCH v3 2/4] PCI: Add check for CXL Secondary Bus Reset Dave Jiang
2024-04-03 8:26 ` Lukas Wunner
2024-04-04 0:19 ` Dave Jiang
2024-04-03 15:01 ` Jonathan Cameron
2024-04-02 23:45 ` [PATCH v3 3/4] PCI: Create new reset method to force SBR for CXL Dave Jiang
2024-04-03 15:09 ` Jonathan Cameron
2024-04-04 0:21 ` Dave Jiang
2024-04-04 13:29 ` Jonathan Cameron
2024-04-04 14:42 ` Dan Williams
2024-04-02 23:45 ` [PATCH v3 4/4] cxl: Add post reset warning if reset is detected as Secondary Bus Reset (SBR) Dave Jiang
2024-04-03 15:32 ` Jonathan Cameron
2024-04-03 16:27 ` Dan Williams
2024-04-04 13:16 ` Jonathan Cameron [this message]
2024-04-04 8:51 ` Lukas Wunner
2024-04-04 13:13 ` Jonathan Cameron
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240404141634.00003872@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox