From: jgunthorpe@obsidianresearch.com (Jason Gunthorpe)
Subject: Unexpected issues with 2 NVME initiators using the same target
Date: Tue, 20 Jun 2017 15:19:58 -0600 [thread overview]
Message-ID: <20170620211958.GA5574@obsidianresearch.com> (raw)
In-Reply-To: <C14B071E-F1B2-466A-82CF-4E20BFAD9DC1@oracle.com>
On Tue, Jun 20, 2017@04:56:39PM -0400, Chuck Lever wrote:
> > I thought the use of MR's with SEND was a new invention? If you use
> > the local rdma lkey with send, it is never invalidated, and this is
> > not an issue, which IIRC, was the historical configuration for NFS.
>
> We may be conflating things a bit.
>
> RPC-over-RDMA client uses persistently registered buffers, using
> the lkey, for inline data. The use of MRs is reserved for NFS READ
> and WRITE payloads. The inline buffers are never explicitly
> invalidated by RPC-over-RDMA.
That makes much more sense, but is that the original question in this
thread? Why are we even talking about invalidate ordering then?
> > All ULPs must ensure SEND/RDMA Write resources remain stable until the
> > CQ indicates that work is completed. 'In a perfect world' this
> > includes not changing the source memory as that would cause
> > retransmitted packets to be different.
>
> I assume you mean the sending side (the server) for RDMA
> Write. I believe rdma_rw uses the local rdma lkey by default
> for RDMA Write source buffers.
RDMA Write or SEND
> >>> No. The SQ side is asynchronous to the CQ side, the HCA will pipeline
> >>> send packets on the wire up to some internal limit.
> >>
> >> So if my ULP issues FastReg followed by Send followed by
> >> LocalInv (signaled), I can't rely on the LocalInv completion
> >> to imply that the Send is also complete?
> >
> > Correct.
> >
> > This is explicitly defined in Table 79 of the IBA.
> >
> > It describes the ordering requirements, if you order Send followed by
> > LocalInv the ordering is 'L' which means they are not ordered unless
> > the WR has the Local Invalidate Fence bit set.
> >
> > LIF is an optional feature, I do not know if any of our hardware
> > supports it, but it is defined to cause the local invalidate to wait
> > until all ongoing references to the MR are completed.
>
> Now, since there was confusion about using an MR for a
> Send operation, let me clarify. If the client does:
> FastReg(payload buffer)
> Send(inline buffer)
> ...
> Recv
> LocalInv(payload buffer)
> wait for LI completion
Not sure what you are describing?
Is Recv landing memory for a SEND? In that case it is using a lkey,
lkeys are not remotely usable, so it does not need synchronous
invalidation. In all cases the LocalInv must only be posted once a CQE
for the Recv is observed.
If Recv is RDMA WRITE target memory, then it using the rkey and it
does does need synchronous invalidation. This must be done once a recv
CQE is observed, or optimized by having the other send via one of the
_INV operations.
In no case can you pipeline a LocalInv into the SQ that would impact
RQ activity, even with any of the fences.
> Is setting IB_SEND_FENCE on the LocalInv enough to ensure
> that the Send is complete?
No.
There are two fences in the spec, IB_SEND_FENCE is the mandatory one,
and it only interacts with RDMA READ and ATOMIC entries.
Local Invalidate Fence (the optinal one) also will not order the two
because LIF is only defined to order against SQE's that use the
MR. Since Send is using the global dma lkey it does not interact with
the LocalInv and LIF will not order them.
> > No idea on the relative performance of LIF vs doing it manually, but
> > the need for one or the other is unambigously clear in the spec.
>
> It seems to me that the guarantee that the server sees
> only one copy of the Send payload is good enough. That
> means that by the time Recv completion occurs on the
> client, even if the client HCA still thinks it needs to
> retransmit the Send containing the RPC Call, the server
> ULP has already seen and processed that Send payload,
> and the HCA on the server won't deliver that payload a
> second time.
Yes, that is OK reasoning.
> If the only concern about preserving that inline buffer is
> guaranteeing that retransmits contain the same content, I
> don't think we have a problem. All HCA retransmits of an
> RPC Call, until the matching RPC Reply is received on the
> client, will contain the same content.
Right.
> The issue about the HCA not being able to access the inline
> buffer during a retransmit is also not an issue for RPC-
> over-RDMA because these buffers are always registered with
> the local rdma lkey.
Exactly.
Jason
next prev parent reply other threads:[~2017-06-20 21:19 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-21 19:38 Unexpected issues with 2 NVME initiators using the same target shahar.salzman
2017-02-21 22:50 ` Sagi Grimberg
2017-02-22 16:52 ` Laurence Oberman
2017-02-22 19:39 ` Sagi Grimberg
2017-02-26 8:03 ` shahar.salzman
2017-02-26 17:58 ` Gruher, Joseph R
2017-02-27 20:33 ` Sagi Grimberg
2017-02-27 20:57 ` Gruher, Joseph R
2017-03-05 18:23 ` Leon Romanovsky
2017-03-06 0:07 ` Max Gurtovoy
2017-03-06 11:28 ` Sagi Grimberg
2017-03-07 9:27 ` Max Gurtovoy
2017-03-07 13:41 ` Sagi Grimberg
2017-03-09 12:18 ` shahar.salzman
2017-03-12 12:33 ` Vladimir Neyelov
2017-03-13 9:43 ` Sagi Grimberg
2017-03-14 8:55 ` Max Gurtovoy
2017-03-14 19:57 ` Gruher, Joseph R
2017-03-14 23:42 ` Gruher, Joseph R
2017-03-16 0:03 ` Gruher, Joseph R
2017-03-17 18:37 ` Gruher, Joseph R
2017-03-17 19:49 ` Max Gurtovoy
[not found] ` <DE927C68B458BE418D582EC97927A928550391C2@ORSMSX113.amr.corp.intel.com>
2017-03-24 18:30 ` Gruher, Joseph R
2017-03-27 14:17 ` Max Gurtovoy
2017-03-27 15:39 ` Gruher, Joseph R
2017-03-28 8:38 ` Max Gurtovoy
2017-03-28 10:21 ` shahar.salzman
2017-03-28 11:34 ` Sagi Grimberg
2017-04-10 11:40 ` Marta Rybczynska
2017-04-10 14:09 ` Max Gurtovoy
2017-04-11 12:47 ` Marta Rybczynska
2017-04-20 10:18 ` Sagi Grimberg
2017-04-26 11:56 ` Max Gurtovoy
2017-04-26 14:45 ` Sagi Grimberg
2017-05-12 19:20 ` Gruher, Joseph R
2017-05-15 12:00 ` Sagi Grimberg
2017-05-15 13:31 ` Leon Romanovsky
2017-05-15 13:43 ` Sagi Grimberg
2017-05-15 14:36 ` Leon Romanovsky
2017-05-15 14:59 ` Christoph Hellwig
2017-05-15 17:05 ` Leon Romanovsky
2017-05-17 12:56 ` Marta Rybczynska
2017-05-18 13:34 ` Leon Romanovsky
2017-06-19 17:21 ` Robert LeBlanc
2017-06-20 6:39 ` Sagi Grimberg
2017-06-20 7:46 ` Leon Romanovsky
2017-06-20 7:58 ` Sagi Grimberg
2017-06-20 8:33 ` Leon Romanovsky
2017-06-20 9:33 ` Sagi Grimberg
2017-06-20 10:31 ` Max Gurtovoy
2017-06-20 22:58 ` Robert LeBlanc
2017-06-27 7:16 ` Sagi Grimberg
2017-06-20 12:02 ` Sagi Grimberg
2017-06-20 13:28 ` Max Gurtovoy
2017-06-20 17:01 ` Chuck Lever
2017-06-20 17:12 ` Sagi Grimberg
2017-06-20 17:35 ` Jason Gunthorpe
2017-06-20 18:17 ` Chuck Lever
2017-06-20 19:27 ` Jason Gunthorpe
2017-06-20 20:56 ` Chuck Lever
2017-06-20 21:19 ` Jason Gunthorpe [this message]
2017-06-27 7:37 ` Sagi Grimberg
2017-06-27 14:42 ` Chuck Lever
2017-06-27 16:07 ` Sagi Grimberg
2017-06-27 16:28 ` Jason Gunthorpe
2017-06-28 7:03 ` Sagi Grimberg
2017-06-27 16:28 ` Chuck Lever
2017-06-28 7:08 ` Sagi Grimberg
2017-06-28 16:11 ` Chuck Lever
2017-06-29 5:35 ` Sagi Grimberg
2017-06-29 14:55 ` Chuck Lever
2017-07-02 9:45 ` Sagi Grimberg
2017-07-02 18:17 ` Chuck Lever
2017-07-09 16:47 ` Jason Gunthorpe
2017-07-10 19:03 ` Chuck Lever
2017-07-10 20:05 ` Jason Gunthorpe
2017-07-10 20:51 ` Chuck Lever
2017-07-10 21:14 ` Jason Gunthorpe
2017-07-10 21:24 ` Jason Gunthorpe
2017-07-10 21:29 ` Chuck Lever
2017-07-10 21:32 ` Jason Gunthorpe
2017-07-10 22:04 ` Chuck Lever
2017-07-10 22:09 ` Jason Gunthorpe
2017-07-11 3:57 ` Chuck Lever
2017-07-11 13:23 ` Tom Talpey
2017-07-11 14:55 ` Chuck Lever
2017-06-27 18:08 ` Bart Van Assche
2017-06-27 18:14 ` Jason Gunthorpe
2017-06-28 7:16 ` Sagi Grimberg
2017-06-28 9:43 ` Bart Van Assche
2017-06-20 17:08 ` Robert LeBlanc
2017-06-20 17:19 ` Sagi Grimberg
2017-06-20 17:28 ` Robert LeBlanc
2017-06-27 7:22 ` Sagi Grimberg
2017-06-20 14:43 ` Robert LeBlanc
2017-06-20 14:41 ` Robert LeBlanc
2017-02-27 20:13 ` Sagi Grimberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170620211958.GA5574@obsidianresearch.com \
--to=jgunthorpe@obsidianresearch.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).