netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Aru <aru.kolappan@oracle.com>
Cc: jgg@ziepe.ca, saeedm@nvidia.com, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	manjunath.b.patil@oracle.com, rama.nichanamatlu@oracle.com
Subject: Re: [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe
Date: Tue, 18 Oct 2022 10:47:38 +0300	[thread overview]
Message-ID: <Y05aGuXSEtSt2aS2@unreal> (raw)
In-Reply-To: <a7fad299-6df5-e79b-960a-c85c7ea4235a@oracle.com>

On Fri, Oct 14, 2022 at 12:12:36PM -0700, Aru wrote:
> Hi Leon,
> 
> Thank you for reviewing the patch.
> 
> The method you mentioned disables the dump permanently for the kernel.
> We thought vendor might have enabled it for their consumption when needed.
> Hence we made it dynamic, so that it can be enabled/disabled at run time.
> 
> Especially, in a production environment, having the option to turn this log
> on/off
> at runtime will be helpful.

While you are interested on/off this specific warning, your change will
cause "to hide" all syndromes as it is unlikely that anyone runs in
production with debug prints.

 -   mlx5_ib_warn(dev, "dump error cqe\n");
 +   mlx5_ib_dbg(dev, "dump error cqe\n");

Something like this will do the trick without interrupting to the others.

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index 457f57b088c6..966206085eb3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -267,10 +267,29 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
 	wc->wc_flags |= IB_WC_WITH_NETWORK_HDR_TYPE;
 }
 
-static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
+static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe,
+		     struct ib_wc *wc, int dump)
 {
-	mlx5_ib_warn(dev, "dump error cqe\n");
-	mlx5_dump_err_cqe(dev->mdev, cqe);
+	const char *level;
+
+	if (!dump)
+		return;
+
+	mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
+		     ib_wc_status_msg(wc->status));
+
+	if (dump == 1) {
+		mlx5_ib_warn(dev, "dump error cqe\n");
+		level = KERN_WARNING;
+	}
+
+	if (dump == 2) {
+		mlx5_ib_dbg(dev, "dump error cqe\n");
+		level = KERN_DEBUG;
+	}
+
+	print_hex_dump(level, "", DUMP_PREFIX_OFFSET, 16, 1, cqe, sizeof(*cqe),
+		       false);
 }
 
 static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
@@ -300,6 +319,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
 		wc->status = IB_WC_BAD_RESP_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+		dump = 2;
 		wc->status = IB_WC_LOC_ACCESS_ERR;
 		break;
 	case MLX5_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
@@ -328,11 +348,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
 	}
 
 	wc->vendor_err = cqe->vendor_err_synd;
-	if (dump) {
-		mlx5_ib_warn(dev, "WC error: %d, Message: %s\n", wc->status,
-			     ib_wc_status_msg(wc->status));
-		dump_cqe(dev, cqe);
-	}
+	dump_cqe(dev, cqe, wc, dump);
 }
 
 static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64,

> 
> Feel free to share your thoughts.

And please don't top-post.

Thanks
> 
> Thanks,
> Aru
> 
> On 10/13/22 3:43 AM, Leon Romanovsky wrote:
> > On Wed, Oct 12, 2022 at 04:52:52PM -0700, Aru Kolappan wrote:
> > > From: Arumugam Kolappan <aru.kolappan@oracle.com>
> > > 
> > > Presently, mlx5 driver dumps error CQE by default for few syndromes. Some
> > > syndromes are expected due to application behavior[Ex: REMOTE_ACCESS_ERR
> > > for revoking rkey before RDMA operation is completed]. There is no option
> > > to disable the log if the application decided to do so. This patch
> > > converts the log into dynamic print and by default, this debug print is
> > > disabled. Users can enable/disable this logging at runtime if needed.
> > > 
> > > Suggested-by: Manjunath Patil <manjunath.b.patil@oracle.com>
> > > Signed-off-by: Arumugam Kolappan <aru.kolappan@oracle.com>
> > > ---
> > >   drivers/infiniband/hw/mlx5/cq.c | 2 +-
> > >   include/linux/mlx5/cq.h         | 4 ++--
> > >   2 files changed, 3 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> > > index be189e0..890cdc3 100644
> > > --- a/drivers/infiniband/hw/mlx5/cq.c
> > > +++ b/drivers/infiniband/hw/mlx5/cq.c
> > > @@ -269,7 +269,7 @@ static void handle_responder(struct ib_wc *wc, struct mlx5_cqe64 *cqe,
> > >   static void dump_cqe(struct mlx5_ib_dev *dev, struct mlx5_err_cqe *cqe)
> > >   {
> > > -	mlx5_ib_warn(dev, "dump error cqe\n");
> > > +	mlx5_ib_dbg(dev, "dump error cqe\n");
> > This path should be handled in switch<->case of mlx5_handle_error_cqe()
> > by skipping dump_cqe for MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR.
> > 
> > diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
> > index be189e0525de..2d75c3071a1e 100644
> > --- a/drivers/infiniband/hw/mlx5/cq.c
> > +++ b/drivers/infiniband/hw/mlx5/cq.c
> > @@ -306,6 +306,7 @@ static void mlx5_handle_error_cqe(struct mlx5_ib_dev *dev,
> >                  wc->status = IB_WC_REM_INV_REQ_ERR;
> >                  break;
> >          case MLX5_CQE_SYNDROME_REMOTE_ACCESS_ERR:
> > +               dump = 0;
> >                  wc->status = IB_WC_REM_ACCESS_ERR;
> >                  break;
> >          case MLX5_CQE_SYNDROME_REMOTE_OP_ERR:
> > 
> > Thanks

  reply	other threads:[~2022-10-18  7:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-12 23:52 [PATCH 1/1] net/mlx5: add dynamic logging for mlx5_dump_err_cqe Aru Kolappan
2022-10-13 10:43 ` Leon Romanovsky
2022-10-14 19:12   ` Aru
2022-10-18  7:47     ` Leon Romanovsky [this message]
2022-10-20  8:24       ` Aru
2022-10-20 11:54         ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y05aGuXSEtSt2aS2@unreal \
    --to=leon@kernel.org \
    --cc=aru.kolappan@oracle.com \
    --cc=jgg@ziepe.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=manjunath.b.patil@oracle.com \
    --cc=netdev@vger.kernel.org \
    --cc=rama.nichanamatlu@oracle.com \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).