From: Yanjun Zhu <yanjun.zhu@linux.dev>
To: Bob Pearson <rpearsonhpe@gmail.com>,
jgg@nvidia.com, zyjzyj2000@gmail.com, linux-rdma@vger.kernel.org
Subject: Re: [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior
Date: Fri, 13 May 2022 21:38:52 +0800 [thread overview]
Message-ID: <36f6e476-9762-6d39-e167-abb8dcc9f2bb@linux.dev> (raw)
In-Reply-To: <e587f531-0650-1548-1fe0-04d0152a5082@linux.dev>
在 2022/5/13 10:40, Yanjun Zhu 写道:
> 在 2022/5/13 3:49, Bob Pearson 写道:
>> Currently the completer tasklet when it sets the retransmit timer or the
>> nak timer sets the same flag (qp->req.need_retry) so that if either
>> timer fires it will attempt to perform a retry flow on the send queue.
>> This has the effect of responding to an RNR NAK at the first retransmit
>> timer event which does not allow for the requested rnr timeout.
>>
>> This patch adds a new flag (qp->req.need_rnr_timeout) which, if set,
>> prevents a retry flow until the rnr nak timer fires.
>>
>> This patch fixes rnr retry errors which can be observed by running the
>> pyverbs test suite 50-100X. With this patch applied they do not occur.
>
> Do you mean that running run_tests.py for 50-100times can reproduce this
> bug? I want to reproduce this problem.
Running run_tests.py for 50-100times, I can not reproduce this problem
Zhu Yanjun
>
> Thanks a lot.
> Zhu Yanjun
>
>>
>> Link:
>> https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
>>
>> Fixes: 8700e3e7c485 ("Soft RoCE (RXE) - The software RoCE driver")
>> Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
>> ---
>> drivers/infiniband/sw/rxe/rxe_comp.c | 4 +---
>> drivers/infiniband/sw/rxe/rxe_qp.c | 1 +
>> drivers/infiniband/sw/rxe/rxe_req.c | 6 ++++--
>> drivers/infiniband/sw/rxe/rxe_verbs.h | 1 +
>> 4 files changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c
>> b/drivers/infiniband/sw/rxe/rxe_comp.c
>> index 138b3e7d3a5f..bc668cb211b1 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
>> @@ -733,9 +733,7 @@ int rxe_completer(void *arg)
>> if (qp->comp.rnr_retry != 7)
>> qp->comp.rnr_retry--;
>> - qp->req.need_retry = 1;
>> - pr_debug("qp#%d set rnr nak timer\n",
>> - qp_num(qp));
>> + qp->req.need_rnr_timeout = 1;
>> mod_timer(&qp->rnr_nak_timer,
>> jiffies + rnrnak_jiffies(aeth_syn(pkt)
>> & ~AETH_TYPE_MASK));
>> diff --git a/drivers/infiniband/sw/rxe/rxe_qp.c
>> b/drivers/infiniband/sw/rxe/rxe_qp.c
>> index 62acf890af6c..1c962468714e 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_qp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_qp.c
>> @@ -513,6 +513,7 @@ static void rxe_qp_reset(struct rxe_qp *qp)
>> atomic_set(&qp->ssn, 0);
>> qp->req.opcode = -1;
>> qp->req.need_retry = 0;
>> + qp->req.need_rnr_timeout = 0;
>> qp->req.noack_pkts = 0;
>> qp->resp.msn = 0;
>> qp->resp.opcode = -1;
>> diff --git a/drivers/infiniband/sw/rxe/rxe_req.c
>> b/drivers/infiniband/sw/rxe/rxe_req.c
>> index ae5fbc79dd5c..770ae4279f73 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_req.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_req.c
>> @@ -103,7 +103,8 @@ void rnr_nak_timer(struct timer_list *t)
>> {
>> struct rxe_qp *qp = from_timer(qp, t, rnr_nak_timer);
>> - pr_debug("qp#%d rnr nak timer fired\n", qp_num(qp));
>> + qp->req.need_retry = 1;
>> + qp->req.need_rnr_timeout = 0;
>> rxe_run_task(&qp->req.task, 1);
>> }
>> @@ -624,10 +625,11 @@ int rxe_requester(void *arg)
>> qp->req.need_rd_atomic = 0;
>> qp->req.wait_psn = 0;
>> qp->req.need_retry = 0;
>> + qp->req.need_rnr_timeout = 0;
>> goto exit;
>> }
>> - if (unlikely(qp->req.need_retry)) {
>> + if (unlikely(qp->req.need_retry && !qp->req.need_rnr_timeout)) {
>> req_retry(qp);
>> qp->req.need_retry = 0;
>> }
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> index e7eff1ca75e9..ab3186478c3f 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_verbs.h
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.h
>> @@ -123,6 +123,7 @@ struct rxe_req_info {
>> int need_rd_atomic;
>> int wait_psn;
>> int need_retry;
>> + int need_rnr_timeout;
>> int noack_pkts;
>> struct rxe_task task;
>> };
>>
>> base-commit: c5eb0a61238dd6faf37f58c9ce61c9980aaffd7a
>
next prev parent reply other threads:[~2022-05-13 13:50 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-12 19:49 [PATCH for-rc] RDMA/rxe: Fix rnr retry behavior Bob Pearson
2022-05-13 2:40 ` Yanjun Zhu
2022-05-13 13:38 ` Yanjun Zhu [this message]
2022-05-13 15:42 ` Bob Pearson
2022-05-13 18:22 ` Bob Pearson
2022-05-14 2:30 ` Yanjun Zhu
2022-05-14 3:25 ` Bob Pearson
2022-05-13 13:04 ` Tom Talpey
2022-05-13 15:33 ` Bob Pearson
2022-05-13 17:43 ` Tom Talpey
2022-05-13 18:32 ` Bob Pearson
2022-05-13 19:08 ` Tom Talpey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36f6e476-9762-6d39-e167-abb8dcc9f2bb@linux.dev \
--to=yanjun.zhu@linux.dev \
--cc=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=rpearsonhpe@gmail.com \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox