linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrisious Haddad <phaddad@nvidia.com>
To: Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Cc: Patrisious Haddad <phaddad@nvidia.com>,
	Leon Romanovsky <leonro@nvidia.com>,
	Linux-nvme <linux-nvme@lists.infradead.org>,
	<linux-rdma@vger.kernel.org>,
	Michael Guralnik <michaelgur@nvidia.com>,
	Israel Rukshin <israelr@nvidia.com>,
	Maor Gottlieb <maorg@nvidia.com>,
	"Max Gurtovoy" <mgurtovoy@nvidia.com>
Subject: [PATCH rdma-next 0/4] Provide more error details when a QP moves to
Date: Wed, 7 Sep 2022 14:37:56 +0300	[thread overview]
Message-ID: <20220907113800.22182-1-phaddad@nvidia.com> (raw)

The following series adds debug prints for fatal QP events that are
helpful for finding the root cause of the errors.
The ib_get_qp_err_syndrome is called at a work queue since the QP event callback is
running on an interrupt context that can't sleep.

The functions is especially useful for debugging purposes for few
reasons:
First of all it provides the information in a human readable way, that
would make it easier to identify the bug root cause.
Secondly it also allows providing vendor specfic error codes or information
that could be very useful to users who know them.
Lastly and most importantly the function provides information about the
reason the QP moved to error state, in cases where CQE isn't generated
and without this feature it would have been way harder to know the root cause
of the error.

An example of such case would be a remote write with RKEY violation,
whereas on the remote side no CQE would be generated but this print
allows to know the reason behind the failure.

Thanks.

Israel Rukshin (1):
  nvme-rdma: add more error details when a QP moves to an error state

Patrisious Haddad (3):
  net/mlx5: Introduce CQE error syndrome
  RDMA/core: Introduce ib_get_qp_err_syndrome function
  RDMA/mlx5: Implement ib_get_qp_err_syndrome

 drivers/infiniband/core/device.c     |  1 +
 drivers/infiniband/core/verbs.c      |  8 +++++
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  1 +
 drivers/infiniband/hw/mlx5/qp.c      | 42 ++++++++++++++++++++++++-
 drivers/infiniband/hw/mlx5/qp.h      |  2 +-
 drivers/infiniband/hw/mlx5/qpc.c     |  4 ++-
 drivers/nvme/host/rdma.c             | 24 ++++++++++++++
 include/linux/mlx5/mlx5_ifc.h        | 47 +++++++++++++++++++++++++---
 include/rdma/ib_verbs.h              | 13 ++++++++
 10 files changed, 135 insertions(+), 8 deletions(-)

-- 
2.18.1


             reply	other threads:[~2022-09-07 11:38 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 11:37 Patrisious Haddad [this message]
2022-09-07 11:37 ` [PATCH rdma-next 1/4] net/mlx5: Introduce CQE error syndrome Patrisious Haddad
2022-09-07 11:37 ` [PATCH rdma-next 2/4] RDMA/core: Introduce ib_get_qp_err_syndrome function Patrisious Haddad
2022-09-07 11:37 ` [PATCH rdma-next 3/4] RDMA/mlx5: Implement ib_get_qp_err_syndrome Patrisious Haddad
2022-09-07 11:38 ` [PATCH rdma-next 4/4] nvme-rdma: add more error details when a QP moves to an error state Patrisious Haddad
2022-09-07 12:02   ` Christoph Hellwig
2022-09-07 12:11     ` Leon Romanovsky
2022-09-07 12:34   ` Sagi Grimberg
2022-09-07 12:51     ` Leon Romanovsky
2022-09-07 15:16       ` Sagi Grimberg
2022-09-07 15:18         ` Christoph Hellwig
2022-09-07 17:39           ` Leon Romanovsky
2022-11-01  9:12             ` Mark Zhang
2022-11-02  1:56               ` Mark Zhang
2022-09-08  7:55           ` Patrisious Haddad
2022-09-07 17:29         ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220907113800.22182-1-phaddad@nvidia.com \
    --to=phaddad@nvidia.com \
    --cc=hch@lst.de \
    --cc=israelr@nvidia.com \
    --cc=leonro@nvidia.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=michaelgur@nvidia.com \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).