Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: Yanjun Zhu <yanjun.zhu@linux.dev>,
	Zhu Yanjun <zyjzyj2000@gmail.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>
Subject: Re: bug report for rdma_rxe
Date: Mon, 25 Apr 2022 19:58:31 -0300	[thread overview]
Message-ID: <20220425225831.GG2125828@nvidia.com> (raw)
In-Reply-To: <dfba7eb7-8467-59b5-2c2a-071ed1e4949f@gmail.com>

On Mon, Apr 25, 2022 at 11:58:55AM -0500, Bob Pearson wrote:
> On 4/24/22 19:04, Yanjun Zhu wrote:
> > 在 2022/4/23 5:04, Bob Pearson 写道:
> >> Local operations in the rdma_rxe driver are not obviously idempotent. But, the
> >> RC retry mechanism backs up the send queue to the point of the wqe that is
> >> currently being acknowledged and re-walks the sq. Each send or write operation is
> >> retried with the exception that the first one is truncated by the packets already
> >> having been acknowledged. Each read and atomic operation is resent except that
> >> read data already received in the first wqe is not requested. But all the
> >> local operations are replayed. The problem is local invalidate which is destructive.
> >> For example
> > 
> > Is there any example or just your analysis?
> 
> I have a colleague at HPE who is testing Lustre/o2iblnd/rxe. They are testing over a
> highly reliable network so do not expect to see dropped or out of order packets. But they
> see multiple timeout flows. When working on rping a week ago I also saw lots of timeouts
> and verified that the timeout code in rxe has the behavior that when a new RC operation is
> sent the retry timer is modified to go off at jiffies + qp->timeout_jiffies but only if
> there is not a currently pending timer. Once set it is never cleared so it will fire
> typically a few msec later initiating a retry flow. If IO operations are frequent then
> there will be a timeout every few msec (about 20 times a second for typical timeout values.)
> o2iblnd uses fast reg MRs to write data to the target system and then local invalidate
> operations to invalidate the MR and then increments the key portion of the rkey and resets
> the map and then does a reg mr operation. Retry flows cause the local invalidate and reg MR
> operations to be re-executed over and over again. A single retry can cause a half a dozen
> invalidate operations to be run with various rkeys and they mostly fail because they don't
> match the current MR. This results in Lustre crashing.
> 
> Currently I am actually happy that the unneeded retries are happening because it makes
> testing the retry code a lot easier. But eventually it would be good to clear or reset the timer
> after the operation is completed which would greatly reduce the number of retries. Also
> it will be important to figure out how the IBA intended for local invalidates and reg MRs to
> work. The way they are now they cannot be successfully retried. Also marking them as done
> and skipping them in the retry sequence does not work. (It breaks some of the blktests test
> cases.)

local operations on a QP are not supposed to need retry because they
are not supposed to go on the network, so backing up the sq past its
current position should not re-execute any local operations until the
sq passes its actual head.

Or, stated differently, you have a head/tail pointer for local work
and a head/tail pointer for network work and the two track
independently within the defined ordering constraints.

Jason

  reply	other threads:[~2022-04-25 22:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-22 21:04 bug report for rdma_rxe Bob Pearson
2022-04-23  1:54 ` Bob Pearson
2022-04-25  0:04 ` Yanjun Zhu
2022-04-25 16:58   ` Bob Pearson
2022-04-25 22:58     ` Jason Gunthorpe [this message]
2022-04-26  1:40       ` Bob Pearson
2022-04-26 11:42         ` Jason Gunthorpe
2022-04-28 13:31       ` Bob Pearson
2022-04-28 14:29         ` Jason Gunthorpe
2022-04-26  3:10     ` Zhu Yanjun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220425225831.GG2125828@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rpearsonhpe@gmail.com \
    --cc=yanjun.zhu@linux.dev \
    --cc=zyjzyj2000@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox