linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ido Schimmel <idosch@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Amit Cohen <amcohen@nvidia.com>,
	mlxsw@nvidia.com, linux-pci@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow
Date: Wed, 29 Mar 2023 11:01:44 -0500	[thread overview]
Message-ID: <20230329160144.GA2967030@bhelgaas> (raw)
In-Reply-To: <ZCBOdunTNYsufhcn@shredder>

[+cc Alex, Lukas for link-disable reset thoughts, beginning of thread:
https://lore.kernel.org/all/cover.1679502371.git.petrm@nvidia.com/]

On Sun, Mar 26, 2023 at 04:53:58PM +0300, Ido Schimmel wrote:
> On Thu, Mar 23, 2023 at 11:51:15AM -0500, Bjorn Helgaas wrote:
> > On Wed, Mar 22, 2023 at 05:49:35PM +0100, Petr Machata wrote:
> > > From: Amit Cohen <amcohen@nvidia.com>
> > > 
> > > The driver resets the device during probe and during a devlink reload.
> > > The current reset method reloads the current firmware version or a pending
> > > one, if one was previously flashed using devlink. However, the reset does
> > > not take down the PCI link, preventing the PCI firmware from being
> > > upgraded, unless the system is rebooted.
> > 
> > Just to make sure I understand this correctly, the above sounds like
> > "firmware" includes two parts that have different rules for loading:
> > 
> >   - Current reset method is completely mlxsw-specific and resets the
> >     mlxsw core but not the PCIe interface; this loads only firmware
> >     part A
> > 
> >   - A PCIe reset resets both the mlxsw core and the PCIe interface;
> >     this loads both firmware part A and part B
> 
> Yes. A few years ago I had to flash a new firmware in order to test a
> fix in the PCIe firmware and the bug still reproduced after a devlink
> reload. Only after a reboot the new PCIe firmware was loaded and the bug
> was fixed. Bugs in PCIe firmware are not common, but we would like to
> avoid the scenario where users must reboot the machine in order to load
> the new firmware.
> 
> > > To solve this problem, a new reset command (6) was implemented in the
> > > firmware. Unlike the current command (1), after issuing the new command
> > > the device will not start the reset immediately, but only after the PCI
> > > link was disabled. The driver is expected to wait for 500ms before
> > > re-enabling the link to give the firmware enough time to start the reset.
> > 
> > I guess the idea here is that the mlxsw driver:
> > 
> >   - Tells the firmware we're going to reset
> >     (MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > 
> >   - Saves PCI config state
> > 
> >   - Disables the link (mlxsw_pci_link_toggle()), which causes a PCIe
> >     hot reset
> > 
> >   - The firmware notices the link disable and starts its own internal
> >     reset
> > 
> >   - The mlxsw driver waits 500ms
> >     (MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS)
> > 
> >   - Enables link and waits for it to be active
> >     (mlxsw_pci_link_active_check()
> > 
> >   - Waits for device to be ready again (mlxsw_pci_device_id_read())
> 
> Correct.
> 
> > So the first question is why you don't simply use
> > pci_reset_function(), since it is supposed to cause a reset and do all
> > the necessary waiting for the device to be ready.  This is quite
> > complicated to do correctly; in fact, we still discover issues there
> > regularly.  There are many special cases in PCIe r6.0, sec 6.6.1, and
> > it would be much better if we can avoid trying to handle them all in
> > individual drivers.
> 
> I see that this function takes the device lock and I think (didn't try)
> it will deadlock if we were to replace the current code with it since we
> also perform a reset during probe where I believe the device lock is
> already taken.
> 
> __pci_reset_function_locked() is another option, but it assumes the
> device lock was already taken, which is correct during probe, but not
> when reset is performed as part of devlink reload.
> 
> Let's put the locking issues aside and assume we can use
> __pci_reset_function_locked(). I'm trying to figure out what it can
> allow us to remove from the driver in favor of common PCI code. It
> essentially invokes one of the supported reset methods. Looking at my
> device, I see the following:
> 
>  # cat /sys/class/pci_bus/0000\:01/device/0000\:01\:00.0/reset_method 
>  pm bus
> 
> So I assume it will invoke pci_pm_reset(). I'm not sure it can work for
> us as our reset procedure requires us to disable the link on the
> downstream port as a way of notifying the device that it should start
> the reset procedure.

Hmmm, pci_pm_reset() puts the device in D3hot, then back to D0.  Spec
says that results in "undefined internal Function state," which
doesn't even sound like a guaranteed reset, but it's what we have, and
in any case, it does not disable the link.

> We might be able to use the "device_specific" method and add quirks in
> "pci_dev_reset_methods". However, I'm not sure what would be the
> benefit, as it basically means moving the code in
> mlxsw_pci_link_toggle() to drivers/pci/quirks.c. Also, when the "probe"
> argument is "true" we can't actually determine if this reset method is
> supported or not, as we can't query that from the configuration space of
> the device in the current implementation. It's queried using a command
> interface that is specific to mlxsw and resides in the driver itself.
> Not usable from drivers/pci/quirks.c.

Spec (PCIe r6.0, sec 6.6.1) says "Disabling a Link causes Downstream
components to undergo a hot reset."  That seems like it *could* be a
general-purpose method of doing a reset, and I don't know why the PCI
core doesn't support it.  Added Alex and Lukas in case they know.

But it sounds like there's some wrinkle with your device?  I suppose a
link disable actually causes a reset, but that reset may not trigger
the firmware reload you need?  If we had a generic "disable link"
reset method, maybe a device quirk could disable it if necessary?

> > Of course, pci_reset_function() does *not* include details like
> > MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS.
> > 
> > I assume that flashing the firmware to the device followed by a power
> > cycle (without ever doing MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > would load the new firmware everywhere.  Can we not do the same with a
> > PCIe reset?
> 
> Yes, that's what we would like to achieve.
> 
> Thanks for the feedback!

  reply	other threads:[~2023-03-29 16:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1679502371.git.petrm@nvidia.com>
     [not found] ` <c61d07469ecf5d3053442e24d4d050405f466b76.1679502371.git.petrm@nvidia.com>
2023-03-23  9:13   ` [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow Petr Machata
2023-03-23 16:51   ` Bjorn Helgaas
2023-03-26 13:53     ` Ido Schimmel
2023-03-29 16:01       ` Bjorn Helgaas [this message]
2023-03-29 17:10         ` Alex Williamson
2023-03-30  8:26           ` Ido Schimmel
2023-03-30 18:49             ` Alex Williamson
2023-04-13  8:10               ` Ido Schimmel
2023-04-13 15:26                 ` Jakub Kicinski
2023-04-13 10:26         ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230329160144.GA2967030@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=amcohen@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mlxsw@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).