From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Bob Pearson <rpearsonhpe@gmail.com>,
jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior
Date: Fri, 13 May 2022 10:40:10 +0800 [thread overview]
Message-ID: <e587f531-0650-1548-1fe0-04d0152a5082@linux.dev> (raw)
In-Reply-To: <20220512194901.76696-1-rpearsonhpe@gmail.com>
在 2022/5/13 3:49, Bob Pearson 写道:
> Currently the completer tasklet when it sets the retransmit timer or the
> nak timer sets the same flag (qp->req.need_retry) so that if either
> timer fires it will attempt to perform a retry flow on the send queue.
> This has the effect of responding to an RNR NAK at the first retransmit
> timer event which does not allow for the requested rnr timeout.
>
> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
> prevents a retry flow until the rnr nak timer fires.
>
> This patch fixes rnr retry errors which can be observed by running the
> pyverbs test suite 50-100X. With this patch applied they do not occur.
Do you mean that running run_tests.py for 50-100times can reproduce this
bug? I want to reproduce this problem.
Thanks a lot.
Zhu Yanjun
>
> Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
> ---
> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +---
> drivers/infiniband/sw/rxe/rxe_qp.c | 1 +
> drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
> 4 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 138b3e7d3a5f..bc668cb211b1 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
> if (qp->comp.rnr_retry != 7)
> qp->comp.rnr_retry--;
>
> - qp->req.need_retry = 1;
> - pr_debug("qp#%d set rnr nak timer\n",
> - qp_num(qp));
> + qp->req.need_rnr_timeout = 1;
> mod_timer(&qp->rnr_nak_timer,
> jiffies + rnrnak_jiffies(aeth_syn(pkt)
> & ~AETH_TYPE_MASK));
> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c b/drivers/infiniband/sw/rxe/rxe_qp.c
> index 62acf890af6c..1c962468714e 100644
> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
> atomic_set(&qp->ssn, 0);
> qp->req.opcode = -1;
> qp->req.need_retry = 0;
> + qp->req.need_rnr_timeout = 0;
> qp->req.noack_pkts = 0;
> qp->resp.msn = 0;
> qp->resp.opcode = -1;
> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
> index ae5fbc79dd5c..770ae4279f73 100644
> --- a/drivers/infiniband/sw/rxe/rxe_req.c
> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
> {
> struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>
> - pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
> + qp->req.need_retry = 1;
> + qp->req.need_rnr_timeout = 0;
> rxe_run_task(&qp->req.task, 1);
> }
>
> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
> qp->req.need_rd_atomic = 0;
> qp->req.wait_psn = 0;
> qp->req.need_retry = 0;
> + qp->req.need_rnr_timeout = 0;
> goto exit;
> }
>
> - if (unlikely(qp->req.need_retry)) {
> + if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
> req_retry(qp);
> qp->req.need_retry = 0;
> }
> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h
> index e7eff1ca75e9..ab3186478c3f 100644
> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
> @@ -123,6 +123,7 @@ struct rxe_req_info {
> int need_rd_atomic;
> int wait_psn;
> int need_retry;
> + int need_rnr_timeout;
> int noack_pkts;
> struct rxe_task task;
> };
>
> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
next prev parent reply other threads:[~2022-05-13 2:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-12 19:49 [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior Bob Pearson
2022-05-13 2:40 ` Yanjun Zhu [this message]
2022-05-13 13:38 ` Yanjun Zhu
2022-05-13 15:42 ` Bob Pearson
2022-05-13 18:22 ` Bob Pearson
2022-05-14 2:30 ` Yanjun Zhu
2022-05-14 3:25 ` Bob Pearson
2022-05-13 13:04 ` Tom Talpey
2022-05-13 15:33 ` Bob Pearson
2022-05-13 17:43 ` Tom Talpey
2022-05-13 18:32 ` Bob Pearson
2022-05-13 19:08 ` Tom Talpey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e587f531-0650-1548-1fe0-04d0152a5082@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=rpearsonhpe@gmail.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox