From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Yanjun Zhu <yanjun.zhu@linux.dev>,
Zhu Yanjun <zyjzyj2000@gmail.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: bug report for rdma_rxe
Date: Mon, 25 Apr 2022 19:58:31 -0300 [thread overview]
Message-ID: <20220425225831.GG2125828@nvidia.com> (raw)
In-Reply-To: <dfba7eb7-8467-59b5-2c2a-071ed1e4949f@gmail.com>
On Mon, Apr 25, 2022 at 11:58:55AM -0500, Bob Pearson wrote:
> On 4/24/22 19:04, Yanjun Zhu wrote:
> > 在 2022/4/23 5:04, Bob Pearson 写道:
> >> Local operations in the rdma_rxe driver are not obviously idempotent. But, the
> >> RC retry mechanism backs up the send queue to the point of the wqe that is
> >> currently being acknowledged and re-walks the sq. Each send or write operation is
> >> retried with the exception that the first one is truncated by the packets already
> >> having been acknowledged. Each read and atomic operation is resent except that
> >> read data already received in the first wqe is not requested. But all the
> >> local operations are replayed. The problem is local invalidate which is destructive.
> >> For example
> >
> > Is there any example or just your analysis?
>
> I have a colleague at HPE who is testing Lustre/o2iblnd/rxe. They are testing over a
> highly reliable network so do not expect to see dropped or out of order packets. But they
> see multiple timeout flows. When working on rping a week ago I also saw lots of timeouts
> and verified that the timeout code in rxe has the behavior that when a new RC operation is
> sent the retry timer is modified to go off at jiffies + qp->timeout_jiffies but only if
> there is not a currently pending timer. Once set it is never cleared so it will fire
> typically a few msec later initiating a retry flow. If IO operations are frequent then
> there will be a timeout every few msec (about 20 times a second for typical timeout values.)
> o2iblnd uses fast reg MRs to write data to the target system and then local invalidate
> operations to invalidate the MR and then increments the key portion of the rkey and resets
> the map and then does a reg mr operation. Retry flows cause the local invalidate and reg MR
> operations to be re-executed over and over again. A single retry can cause a half a dozen
> invalidate operations to be run with various rkeys and they mostly fail because they don't
> match the current MR. This results in Lustre crashing.
>
> Currently I am actually happy that the unneeded retries are happening because it makes
> testing the retry code a lot easier. But eventually it would be good to clear or reset the timer
> after the operation is completed which would greatly reduce the number of retries. Also
> it will be important to figure out how the IBA intended for local invalidates and reg MRs to
> work. The way they are now they cannot be successfully retried. Also marking them as done
> and skipping them in the retry sequence does not work. (It breaks some of the blktests test
> cases.)
local operations on a QP are not supposed to need retry because they
are not supposed to go on the network, so backing up the sq past its
current position should not re-execute any local operations until the
sq passes its actual head.
Or, stated differently, you have a head/tail pointer for local work
and a head/tail pointer for network work and the two track
independently within the defined ordering constraints.
Jason
next prev parent reply other threads:[~2022-04-25 22:58 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-22 21:04 bug report for rdma_rxe Bob Pearson
2022-04-23 1:54 ` Bob Pearson
2022-04-25 0:04 ` Yanjun Zhu
2022-04-25 16:58 ` Bob Pearson
2022-04-25 22:58 ` Jason Gunthorpe [this message]
2022-04-26 1:40 ` Bob Pearson
2022-04-26 11:42 ` Jason Gunthorpe
2022-04-28 13:31 ` Bob Pearson
2022-04-28 14:29 ` Jason Gunthorpe
2022-04-26 3:10 ` Zhu Yanjun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220425225831.GG2125828@nvidia.com \
--to=jgg@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=rpearsonhpe@gmail.com \
--cc=yanjun.zhu@linux.dev \
--cc=zyjzyj2000@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.