From: Leon Romanovsky <leon@kernel.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Leon Romanovsky <leonro@nvidia.com>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org,
Mark Zhang <markzhang@nvidia.com>,
netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
Patrisious Haddad <phaddad@nvidia.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Tariq Toukan <tariqt@nvidia.com>,
Yishai Hadas <yishaih@nvidia.com>,
Sagi Grimberg <sagi@grimberg.me>, Christoph Hellwig <hch@lst.de>
Subject: [PATCH rdma-next v1 0/3] Provide more error details when a QP moves to error state
Date: Wed, 4 Jan 2023 11:43:33 +0200 [thread overview]
Message-ID: <cover.1672821186.git.leonro@nvidia.com> (raw)
From: Leon Romanovsky <leonro@nvidia.com>
Changelog:
v1:
* Reworked mlx4 to allow non-atomic IB QP event handler.
v0: https://lore.kernel.org/linux-rdma/20220907113800.22182-1-phaddad@nvidia.com/
------------------------------------------
The following series adds ability to get information about fatal QP events.
This functionality is extremely useful for the following reasons:
* Provides an information about the reason why QP moved to error state,
in cases where CQE isn't generated.
* Allows to provide vendor specfic error codes and information that
could be very useful to users who know them.
An example of a case without CQE is a remote write with RKEY violation.
In this flow, on remote side no CQEs are generated and such error without
indication is hard to debug.
Thanks.
Mark Zhang (1):
RDMA/mlx: Calling qp event handler in workqueue context
Patrisious Haddad (2):
net/mlx5: Introduce CQE error syndrome
RDMA/mlx5: Print error syndrome in case of fatal QP errors
drivers/infiniband/hw/mlx4/main.c | 8 ++
drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 +
drivers/infiniband/hw/mlx4/qp.c | 121 +++++++++++------
drivers/infiniband/hw/mlx5/main.c | 7 +
drivers/infiniband/hw/mlx5/qp.c | 164 ++++++++++++++++++------
drivers/infiniband/hw/mlx5/qp.h | 4 +-
drivers/infiniband/hw/mlx5/qpc.c | 7 +-
drivers/net/ethernet/mellanox/mlx4/qp.c | 14 +-
include/linux/mlx4/qp.h | 1 +
include/linux/mlx5/mlx5_ifc.h | 47 ++++++-
include/rdma/ib_verbs.h | 2 +-
11 files changed, 292 insertions(+), 86 deletions(-)
--
2.38.1
next reply other threads:[~2023-01-04 9:43 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-04 9:43 Leon Romanovsky [this message]
2023-01-04 9:43 ` [PATCH rdma-next v1 1/3] RDMA/mlx: Calling qp event handler in workqueue context Leon Romanovsky
2023-01-04 9:43 ` [PATCH mlx5-next v1 2/3] net/mlx5: Introduce CQE error syndrome Leon Romanovsky
2023-01-06 0:31 ` Saeed Mahameed
2023-01-04 9:43 ` [PATCH rdma-next v1 3/3] RDMA/mlx5: Print error syndrome in case of fatal QP errors Leon Romanovsky
2023-01-06 0:31 ` Saeed Mahameed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1672821186.git.leonro@nvidia.com \
--to=leon@kernel.org \
--cc=edumazet@google.com \
--cc=hch@lst.de \
--cc=jgg@nvidia.com \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=markzhang@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=phaddad@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=sagi@grimberg.me \
--cc=tariqt@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.