All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ido Schimmel <idosch@nvidia.com>
Cc: Petr Machata <petrm@nvidia.com>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, Amit Cohen <amcohen@nvidia.com>,
	mlxsw@nvidia.com, linux-pci@vger.kernel.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow
Date: Wed, 29 Mar 2023 11:01:44 -0500	[thread overview]
Message-ID: <20230329160144.GA2967030@bhelgaas> (raw)
In-Reply-To: <ZCBOdunTNYsufhcn@shredder>

[+cc Alex, Lukas for link-disable reset thoughts, beginning of thread:
https://lore.kernel.org/all/cover.1679502371.git.petrm@nvidia.com/]

On Sun, Mar 26, 2023 at 04:53:58PM +0300, Ido Schimmel wrote:
> On Thu, Mar 23, 2023 at 11:51:15AM -0500, Bjorn Helgaas wrote:
> > On Wed, Mar 22, 2023 at 05:49:35PM +0100, Petr Machata wrote:
> > > From: Amit Cohen <amcohen@nvidia.com>
> > > 
> > > The driver resets the device during probe and during a devlink reload.
> > > The current reset method reloads the current firmware version or a pending
> > > one, if one was previously flashed using devlink. However, the reset does
> > > not take down the PCI link, preventing the PCI firmware from being
> > > upgraded, unless the system is rebooted.
> > 
> > Just to make sure I understand this correctly, the above sounds like
> > "firmware" includes two parts that have different rules for loading:
> > 
> >   - Current reset method is completely mlxsw-specific and resets the
> >     mlxsw core but not the PCIe interface; this loads only firmware
> >     part A
> > 
> >   - A PCIe reset resets both the mlxsw core and the PCIe interface;
> >     this loads both firmware part A and part B
> 
> Yes. A few years ago I had to flash a new firmware in order to test a
> fix in the PCIe firmware and the bug still reproduced after a devlink
> reload. Only after a reboot the new PCIe firmware was loaded and the bug
> was fixed. Bugs in PCIe firmware are not common, but we would like to
> avoid the scenario where users must reboot the machine in order to load
> the new firmware.
> 
> > > To solve this problem, a new reset command (6) was implemented in the
> > > firmware. Unlike the current command (1), after issuing the new command
> > > the device will not start the reset immediately, but only after the PCI
> > > link was disabled. The driver is expected to wait for 500ms before
> > > re-enabling the link to give the firmware enough time to start the reset.
> > 
> > I guess the idea here is that the mlxsw driver:
> > 
> >   - Tells the firmware we're going to reset
> >     (MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > 
> >   - Saves PCI config state
> > 
> >   - Disables the link (mlxsw_pci_link_toggle()), which causes a PCIe
> >     hot reset
> > 
> >   - The firmware notices the link disable and starts its own internal
> >     reset
> > 
> >   - The mlxsw driver waits 500ms
> >     (MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS)
> > 
> >   - Enables link and waits for it to be active
> >     (mlxsw_pci_link_active_check()
> > 
> >   - Waits for device to be ready again (mlxsw_pci_device_id_read())
> 
> Correct.
> 
> > So the first question is why you don't simply use
> > pci_reset_function(), since it is supposed to cause a reset and do all
> > the necessary waiting for the device to be ready.  This is quite
> > complicated to do correctly; in fact, we still discover issues there
> > regularly.  There are many special cases in PCIe r6.0, sec 6.6.1, and
> > it would be much better if we can avoid trying to handle them all in
> > individual drivers.
> 
> I see that this function takes the device lock and I think (didn't try)
> it will deadlock if we were to replace the current code with it since we
> also perform a reset during probe where I believe the device lock is
> already taken.
> 
> __pci_reset_function_locked() is another option, but it assumes the
> device lock was already taken, which is correct during probe, but not
> when reset is performed as part of devlink reload.
> 
> Let's put the locking issues aside and assume we can use
> __pci_reset_function_locked(). I'm trying to figure out what it can
> allow us to remove from the driver in favor of common PCI code. It
> essentially invokes one of the supported reset methods. Looking at my
> device, I see the following:
> 
>  # cat /sys/class/pci_bus/0000\:01/device/0000\:01\:00.0/reset_method 
>  pm bus
> 
> So I assume it will invoke pci_pm_reset(). I'm not sure it can work for
> us as our reset procedure requires us to disable the link on the
> downstream port as a way of notifying the device that it should start
> the reset procedure.

Hmmm, pci_pm_reset() puts the device in D3hot, then back to D0.  Spec
says that results in "undefined internal Function state," which
doesn't even sound like a guaranteed reset, but it's what we have, and
in any case, it does not disable the link.

> We might be able to use the "device_specific" method and add quirks in
> "pci_dev_reset_methods". However, I'm not sure what would be the
> benefit, as it basically means moving the code in
> mlxsw_pci_link_toggle() to drivers/pci/quirks.c. Also, when the "probe"
> argument is "true" we can't actually determine if this reset method is
> supported or not, as we can't query that from the configuration space of
> the device in the current implementation. It's queried using a command
> interface that is specific to mlxsw and resides in the driver itself.
> Not usable from drivers/pci/quirks.c.

Spec (PCIe r6.0, sec 6.6.1) says "Disabling a Link causes Downstream
components to undergo a hot reset."  That seems like it *could* be a
general-purpose method of doing a reset, and I don't know why the PCI
core doesn't support it.  Added Alex and Lukas in case they know.

But it sounds like there's some wrinkle with your device?  I suppose a
link disable actually causes a reset, but that reset may not trigger
the firmware reload you need?  If we had a generic "disable link"
reset method, maybe a device quirk could disable it if necessary?

> > Of course, pci_reset_function() does *not* include details like
> > MLXSW_PCI_TOGGLE_WAIT_BEFORE_EN_MSECS.
> > 
> > I assume that flashing the firmware to the device followed by a power
> > cycle (without ever doing MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE)
> > would load the new firmware everywhere.  Can we not do the same with a
> > PCIe reset?
> 
> Yes, that's what we would like to achieve.
> 
> Thanks for the feedback!

  reply	other threads:[~2023-03-29 16:05 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-22 16:49 [PATCH net-next 0/6] mlxsw: Add support for new reset flow Petr Machata
2023-03-22 16:49 ` [PATCH net-next 1/6] mlxsw: reg: Move 'mpsc' definition in 'mlxsw_reg_infos' Petr Machata
2023-03-26 13:59   ` Simon Horman
2023-03-22 16:49 ` [PATCH net-next 2/6] mlxsw: reg: Add Management Capabilities Mask Register Petr Machata
2023-03-26 14:00   ` Simon Horman
2023-03-26 14:02   ` Simon Horman
2023-03-27  9:35     ` Petr Machata
2023-03-27  9:53       ` Simon Horman
2023-03-22 16:49 ` [PATCH net-next 3/6] mlxsw: Extend MRSR pack() function to support new commands Petr Machata
2023-03-26 14:02   ` Simon Horman
2023-03-22 16:49 ` [PATCH net-next 4/6] mlxsw: pci: Rename mlxsw_pci_sw_reset() Petr Machata
2023-03-26 14:02   ` Simon Horman
2023-03-22 16:49 ` [PATCH net-next 5/6] mlxsw: pci: Move software reset code to a separate function Petr Machata
2023-03-26 14:03   ` Simon Horman
2023-03-22 16:49 ` [PATCH net-next 6/6] mlxsw: pci: Add support for new reset flow Petr Machata
2023-03-23  9:13   ` Petr Machata
2023-03-23 16:51   ` Bjorn Helgaas
2023-03-26 13:53     ` Ido Schimmel
2023-03-29 16:01       ` Bjorn Helgaas [this message]
2023-03-29 17:10         ` Alex Williamson
2023-03-30  8:26           ` Ido Schimmel
2023-03-30 18:49             ` Alex Williamson
2023-04-13  8:10               ` Ido Schimmel
2023-04-13 15:26                 ` Jakub Kicinski
2023-04-13 10:26         ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230329160144.GA2967030@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=amcohen@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=idosch@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mlxsw@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=petrm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.