public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Bob Pearson <rpearsonhpe@gmail.com>,
	jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior
Date: Fri, 13 May 2022 10:40:10 +0800	[thread overview]
Message-ID: <e587f531-0650-1548-1fe0-04d0152a5082@linux.dev> (raw)
In-Reply-To: <20220512194901.76696-1-rpearsonhpe@gmail.com>

在 2022/5/13 3:49, Bob Pearson 写道:
> Currently the completer tasklet when it sets the retransmit timer or the
> nak timer sets the same flag (qp->req.need_retry) so that if either
> timer fires it will attempt to perform a retry flow on the send queue.
> This has the effect of responding to an RNR NAK at the first retransmit
> timer event which does not allow for the requested rnr timeout.
> 
> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
> prevents a retry flow until the rnr nak timer fires.
> 
> This patch fixes rnr retry errors which can be observed by running the
> pyverbs test suite 50-100X. With this patch applied they do not occur.

Do you mean that running run_tests.py for 50-100times can reproduce this 
bug? I want to reproduce this problem.

Thanks a lot.
Zhu Yanjun

> 
> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
>   drivers/infiniband/sw/rxe/rxe_comp.c  | 4 +---
>   drivers/infiniband/sw/rxe/rxe_qp.c    | 1 +
>   drivers/infiniband/sw/rxe/rxe_req.c   | 6 ++++--
>   drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>   4 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 138b3e7d3a5f..bc668cb211b1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>   				if (qp->comp.rnr_retry != 7)
>   					qp->comp.rnr_retry--;
>   
> -				qp->req.need_retry = 1;
> -				pr_debug("qp#%d set rnr nak timer\n",
> -					 qp_num(qp));
> +				qp->req.need_rnr_timeout = 1;
>   				mod_timer(&qp->rnr_nak_timer,
>   					  jiffies + rnrnak_jiffies(aeth_syn(pkt)
>   						& ~AETH_TYPE_MASK));
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 62acf890af6c..1c962468714e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>   	atomic_set(&qp->ssn, 0);
>   	qp->req.opcode = -1;
>   	qp->req.need_retry = 0;
> +	qp->req.need_rnr_timeout = 0;
>   	qp->req.noack_pkts = 0;
>   	qp->resp.msn = 0;
>   	qp->resp.opcode = -1;
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..770ae4279f73 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>   {
>   	struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>   
> -	pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
> +	qp->req.need_retry = 1;
> +	qp->req.need_rnr_timeout = 0;
>   	rxe_run_task(&qp->req.task, 1);
>   }
>   
> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>   		qp->req.need_rd_atomic = 0;
>   		qp->req.wait_psn = 0;
>   		qp->req.need_retry = 0;
> +		qp->req.need_rnr_timeout = 0;
>   		goto exit;
>   	}
>   
> -	if (unlikely(qp->req.need_retry)) {
> +	if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>   		req_retry(qp);
>   		qp->req.need_retry = 0;
>   	}
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e7eff1ca75e9..ab3186478c3f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -123,6 +123,7 @@ struct rxe_req_info {
>   	int			need_rd_atomic;
>   	int			wait_psn;
>   	int			need_retry;
> +	int			need_rnr_timeout;
>   	int			noack_pkts;
>   	struct rxe_task		task;
>   };
> 
> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a


  reply	other threads:[~2022-05-13  2:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-12 19:49 [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior Bob Pearson
2022-05-13  2:40 ` Yanjun Zhu [this message]
2022-05-13 13:38   ` Yanjun Zhu
2022-05-13 15:42     ` Bob Pearson
2022-05-13 18:22     ` Bob Pearson
2022-05-14  2:30       ` Yanjun Zhu
2022-05-14  3:25         ` Bob Pearson
2022-05-13 13:04 ` Tom Talpey
2022-05-13 15:33   ` Bob Pearson
2022-05-13 17:43     ` Tom Talpey
2022-05-13 18:32       ` Bob Pearson
2022-05-13 19:08         ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e587f531-0650-1548-1fe0-04d0152a5082@linux.dev \
    --to=yanjun.zhu@linux.dev \
    --cc=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox