From: Jacob Keller <jacob.e.keller@intel.com>
To: Niklas Schnelle <schnelle@linux.ibm.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Gerd Bayer <gbayer@linux.ibm.com>,
Alexander Schmidt <alexs@linux.ibm.com>,
Leon Romanovsky <leonro@nvidia.com>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH net-next v2] net/mlx5: stop waiting for PCI link if reset is required
Date: Wed, 12 Apr 2023 16:33:06 -0700 [thread overview]
Message-ID: <0166d13c-dc55-c376-28ca-dae0a872b518@intel.com> (raw)
In-Reply-To: <20230411105103.2835394-1-schnelle@linux.ibm.com>
On 4/11/2023 3:51 AM, Niklas Schnelle wrote:
> After an error on the PCI link, the driver does not need to wait
> for the link to become functional again as a reset is required. Stop
> the wait loop in this case to accelerate the recovery flow.
>
Ok, so if the PCI link is completely offline (pci_channel_offline) then
we just bail out immediately and fail to recover, reporting to the user
as-such. Then a system administrator can setup in and perform the
appropriate reset? Rather than not reporting until the timeout
completes. Essentially, we know that this will never recover at this
point so stop wasting time.
Makes sense.
> Co-developed-by: Alexander Schmidt <alexs@linux.ibm.com>
> Signed-off-by: Alexander Schmidt <alexs@linux.ibm.com>
> Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
> Link: https://lore.kernel.org/r/20230403075657.168294-1-schnelle@linux.ibm.com
> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
> ---
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
> drivers/net/ethernet/mellanox/mlx5/core/health.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> index f9438d4e43ca..81ca44e0705a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
> @@ -325,6 +325,8 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
> while (sensor_pci_not_working(dev)) {
> if (time_after(jiffies, end))
> return -ETIMEDOUT;
> + if (pci_channel_offline(dev->pdev))
> + return -EIO;
> msleep(100);
> }
> return 0;
> @@ -332,10 +334,16 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev)
>
> static int mlx5_health_try_recover(struct mlx5_core_dev *dev)
> {
> + int rc;
> +
> mlx5_core_warn(dev, "handling bad device here\n");
> mlx5_handle_bad_state(dev);
> - if (mlx5_health_wait_pci_up(dev)) {
> - mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> + rc = mlx5_health_wait_pci_up(dev);
> + if (rc) {
> + if (rc == -ETIMEDOUT)
> + mlx5_core_err(dev, "health recovery flow aborted, PCI reads still not working\n");
> + else
> + mlx5_core_err(dev, "health recovery flow aborted, PCI channel offline\n");
> return -EIO;
> }
> mlx5_core_err(dev, "starting health recovery flow\n");
>
> base-commit: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
next prev parent reply other threads:[~2023-04-12 23:33 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-11 10:51 [PATCH net-next v2] net/mlx5: stop waiting for PCI link if reset is required Niklas Schnelle
2023-04-12 23:33 ` Jacob Keller [this message]
2023-04-13 23:01 ` Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0166d13c-dc55-c376-28ca-dae0a872b518@intel.com \
--to=jacob.e.keller@intel.com \
--cc=alexs@linux.ibm.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=gbayer@linux.ibm.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=schnelle@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox