From: Leon Romanovsky <leon@kernel.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Niklas Schnelle <schnelle@linux.ibm.com>,
Saeed Mahameed <saeedm@nvidia.com>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Gerd Bayer <gbayer@linux.ibm.com>,
Alexander Schmidt <alexs@linux.ibm.com>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] net/mlx5: stop waiting for PCI link if reset is required
Date: Sun, 9 Apr 2023 11:54:25 +0300 [thread overview]
Message-ID: <20230409085425.GC14869@unreal> (raw)
In-Reply-To: <20230405210613.GA3638573@bhelgaas>
On Wed, Apr 05, 2023 at 04:06:13PM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 04, 2023 at 05:27:35PM +0200, Niklas Schnelle wrote:
> > On Mon, 2023-04-03 at 21:21 +0300, Leon Romanovsky wrote:
> > > On Mon, Apr 03, 2023 at 09:56:56AM +0200, Niklas Schnelle wrote:
> > > > after an error on the PCI link, the driver does not need to wait
> > > > for the link to become functional again as a reset is required. Stop
> > > > the wait loop in this case to accelerate the recovery flow.
> > > >
> > > > Co-developed-by: Alexander Schmidt <alexs@linux.ibm.com>
> > > > Signed-off-by: Alexander Schmidt <alexs@linux.ibm.com>
> > > > Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> > > > ---
> > > > drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> > > > 1 file changed, 10 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > index f9438d4e43ca..81ca44e0705a 100644
> > > > --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> > > > @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> > > > while (sensor_pci_not_working(dev)) {
> > >
> > > According to the comment in sensor_pci_not_working(), this loop is
> > > supposed to wait till PCI will be ready again. Otherwise, already in
> > > first iteration, we will bail out with pci_channel_offline() error.
> >
> > Well yes. The problem is that this works for intermittent errors
> > including when the card resets itself which seems to be the use case in
> > mlx5_fw_reset_complete_reload() and mlx5_devlink_reload_fw_activate().
> > If there is a PCI error that requires a link reset though we see some
> > problems though it does work after running into the timeout.
> >
> > As I understand it and as implemented at least on s390,
> > pci_channel_io_frozen is only set for fatal errors that require a reset
> > while non fatal errors will have pci_channel_io_normal (see also
> > Documentation/PCI/pcieaer-howto.rst)
>
> Yes, I think that's true, see handle_error_source().
>
> > thus I think pci_channel_offline()
> > should only be true if a reset is required or there is a permanent
> > error.
>
> Yes, I think pci_channel_offline() will only be true when a fatal
> error has been reported via AER or DPC (or a hotplug driver says the
> device has been removed). The driver resetting the device should not
> cause such a fatal error.
Thank you for an explanation and confirmation.
next prev parent reply other threads:[~2023-04-09 8:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-03 7:56 [PATCH] net/mlx5: stop waiting for PCI link if reset is required Niklas Schnelle
2023-04-03 18:21 ` Leon Romanovsky
2023-04-04 15:27 ` Niklas Schnelle
2023-04-05 21:06 ` Bjorn Helgaas
2023-04-09 8:54 ` Leon Romanovsky [this message]
2023-04-09 8:55 ` Leon Romanovsky
2023-04-11 10:13 ` Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230409085425.GC14869@unreal \
--to=leon@kernel.org \
--cc=alexs@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gbayer@linux.ibm.com \
--cc=helgaas@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.