linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Amey Narkhede <ameynarkhede03@gmail.com>
Cc: Oliver O'Halloran <oohall@gmail.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Shanker R Donthineni <sdonthineni@nvidia.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Sinan Kaya <okaya@kernel.org>, Vikram Sethi <vsethi@nvidia.com>
Subject: Re: [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs
Date: Wed, 5 May 2021 13:13:57 -0600	[thread overview]
Message-ID: <20210505131357.07e55042@redhat.com> (raw)
In-Reply-To: <20210505174032.sursnpwkfrc5qji2@archlinux>

On Wed, 5 May 2021 23:10:32 +0530
Amey Narkhede <ameynarkhede03@gmail.com> wrote:

> On 21/05/05 01:56PM, Oliver O'Halloran wrote:
> > On Wed, May 5, 2021 at 12:50 PM Bjorn Helgaas <helgaas@kernel.org> wrote:  
> > >
> > > On Mon, May 03, 2021 at 09:07:11PM -0500, Shanker R Donthineni wrote:  
> > > > On 5/3/21 5:42 PM, Bjorn Helgaas wrote:  
> > > > > Obviously _RST only works for built-in devices, since there's no AML
> > > > > for plug-in devices, right?  So if there's a plug-in card with this
> > > > > GPU, neither SBR nor _RST will work?  
> > > > These are not plug-in PCIe GPU cards, will exist on upcoming server
> > > > baseboards. ACPI-reset should wok for plug-in devices as well as long
> > > > as firmware has _RST method defined in ACPI-device associated with
> > > > the PCIe hot-plug slot.  
> > >
> > > Maybe I'm missing something, but I don't see how _RST can work for
> > > plug-in devices.  _RST is part of the system firmware, and that
> > > firmware knows nothing about what will be plugged into the slot.  So
> > > if system firmware supplies _RST that knows how to reset the Nvidia
> > > GPU, it's not going to do the right thing if you plug in an NVMe
> > > device instead.
> > >
> > > Can you elaborate on how _RST would work for plug-in devices?  

I'm not sure I really understand these concerns about plug-in devices.
In this case I believe we're dealing with an embedded GPU, there is no
case where one of these GPUs would be a discrete device on a plug-in
card.  I'm also assuming all SoCs integrating this GPU will provide a
_RST method, but we're also disabling SBR in this series to avoid the
only other generic reset option we'd have for this device.

In the more general case, I'd expect that system firmware isn't going
to implement an _RST method for a pluggable slot, so we'll lookup the
ACPI handle, fail to find a _RST method and drop to the next option.
For a PCI/e slot, at best the _RST method might be included in the _PRR
scope rather than the device scope to indicate it affects the entire
slot.  That could be something like the #PERST below or a warm reset.  I
don't think we're enabling that here, are we?

Otherwise system firmware would need to dynamically provide a _RST
method if it recognized and had support for the plugin card.

> > Power cycling the slot or just re-asserting #PERST probably. IBM has
> > been doing that on Power boxes since forever and it mostly works.
> > Mostly.  
> According to ACPI spec v6.3 section 7.3.25, _RST just performs normal
> FLR in most cases but if the device supports _PRR(Power Resource for Reset)
> then reset operation causes the device to be reported as missing from the bus
> that indicates that it affects all the devices on the bus.

We're only looking for _RST on the device handle, so I think we're
limited to the device context limitations.  Per the referenced section:

7.3.25 _RST (Device Reset)

  This object executes a reset on the associated device or devices. If
                                                                    ^^
  included in a device context, the reset must not affect any other
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ACPI-described devices; if included in a power resource for reset
  ^^^^^^^^^^^^^^^^^^^^^^
  (_PRR, Section 7.3.26) the reset must affect all ACPI-described
  devices that reference it.

  When this object is described in a device context, it executes a
  function level reset that only affects the device it is associated
  with; neither parent nor children should be affected by the execution
  of this reset. Executing this must only result in this device
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  resetting without the device appearing as if it has been removed from
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  the bus altogether, to prevent OSPM re-enumeration of devices on
  ^^^^^^^^^^^^^^^^^^
  hot-pluggable buses (e.g. USB).

  If a device reset is supported by the platform, but cannot meet the
  function level and bus requirement, the device should instead
  implement a _PRR (Section 7.3.26).

  Devices can define both a _RST and a _PRR if supported by the
  hardware.

  Arguments: Non

  Return Value: None


It's a bit unfortunate that they use the phrase "function level reset",
but since this method is not specific to a PCI device, I think this
could just as easily be replaced with "individual device scope reset".
The implementation of that could be an PCI FLR, or any number of device
or platform specific operations.  To me this reads like a system
firmware provided, device specific reset.  Thanks,

Alex


  reply	other threads:[~2021-05-05 19:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-29  0:49 [PATCH v4 1/2] PCI: Add support for a function level reset based on _RST method Shanker Donthineni
2021-04-29  0:49 ` [PATCH v4 2/2] PCI: Enable NO_BUS_RESET quirk for Nvidia GPUs Shanker Donthineni
2021-04-30 17:01   ` Bjorn Helgaas
2021-04-30 22:11     ` Shanker R Donthineni
2021-05-03 22:42       ` Bjorn Helgaas
2021-05-04  2:07         ` Shanker R Donthineni
2021-05-05  2:12           ` Bjorn Helgaas
2021-05-05  3:51             ` Shanker R Donthineni
2021-05-05  3:56             ` Oliver O'Halloran
2021-05-05 17:40               ` Amey Narkhede
2021-05-05 19:13                 ` Alex Williamson [this message]
2021-05-05 20:04                   ` Shanker R Donthineni
2021-05-05 20:40                   ` Bjorn Helgaas
2021-05-05 12:15       ` Pali Rohár
2021-05-05 15:35         ` Shanker R Donthineni
2021-04-30 18:39 ` [PATCH v4 1/2] PCI: Add support for a function level reset based on _RST method Alex Williamson
2021-04-30 19:05   ` Shanker R Donthineni

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210505131357.07e55042@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=ameynarkhede03@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@kernel.org \
    --cc=oohall@gmail.com \
    --cc=sdonthineni@nvidia.com \
    --cc=vsethi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).