RFC: RPC/RDMA memory invalidation

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: RPC/RDMA memory invalidation
@ 2015-10-28 19:56 Chuck Lever
       [not found] ` <094A348A-0764-4F46-A422-FBF2F1DC1C28-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2015-10-28 19:56 UTC (permalink / raw)
  To: Linux RDMA Mailing List

RPC/RDMA is moving towards a model where R_keys are invalidated
as part of reply handling (either the client does it in the
reply handler, or the server does it via Send With Invalidate).
This fences the RPC's memory from the server before the RPC
consumer is awoken and can access it.

There are some cases where no reply occurs, however.

- A signal such as ^C or a software fault

- A soft timeout

- A local RPC client error

- A GSS credential problem

The safest thing to do is to ensure that memory is completely
fenced (invalidated and DMA unmapped) before allowing such an
abnormally terminated RPC to exit and its memory to be re-used.

Unfortunately in the current kernel RPC client implementation
there is no place an RPC can park itself, after it is awoken
by means other than a reply, to wait for R_key invalidation
to complete. Even if invalidation is started asynchronously
as the RPC exits, it opens a window where an RPC can complete
and exit while the memory is still registered and exposed for
a short period.

One way to handle this rare situation is to ensure that such
an error exit always results in a connection loss if the
request has registered memory. Any registered memory would be
invalidated by the loss and fenced from the server; only a
DMA unmap, which never sleeps, would be needed before the RPC
exits.

Knocking the QP out of RTS may seem drastic, but I believe in
some of these cases the connection may already be gone.

A key question is whether connection loss guarantees that the
server is fenced, for all device types, from existing
registered MRs. After reconnect, each MR must be registered
again before it can be accessed remotely. Is this true for the
Linux IB core, and all kernel providers, when using FRWR?

After a connection loss, the Linux kernel RPC/RDMA client
creates a new QP as it reconnects, thus I’d expect the QPN to
be different on the new connection. That should be enough to
prevent access to MRs that were registered with the previous
QP and PD, right?

I ask here because I am often surprised by the subtlety of
standards language. ;-)

-—
Chuck Lever

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: RPC/RDMA memory invalidation
       [not found] ` <094A348A-0764-4F46-A422-FBF2F1DC1C28-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-10-28 20:10   ` Jason Gunthorpe
       [not found]     ` <20151028201002.GA27901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Gunthorpe @ 2015-10-28 20:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List

On Wed, Oct 28, 2015 at 03:56:08PM -0400, Chuck Lever wrote:

> A key question is whether connection loss guarantees that the
> server is fenced, for all device types, from existing
> registered MRs. After reconnect, each MR must be registered
> again before it can be accessed remotely. Is this true for the
> Linux IB core, and all kernel providers, when using FRWR?

MR validation is not linked to a QP in any way. The memory is not
fully fenced until the invalidate completes, or the MR unregister
completes. Nothing else is good enough.

> After a connection loss, the Linux kernel RPC/RDMA client
> creates a new QP as it reconnects, thus I’d expect the QPN to
> be different on the new connection. That should be enough to
> prevent access to MRs that were registered with the previous
> QP and PD, right?

No, the NFS implementation creates a single PD for everything and any
QP in the PD can access all the MRs. This is another security issue of
a different sort.

If there was one PD per QP then the above would be true, since the MR
is linked to the PD.

Even so, moving a QP out of RTR is not a synchronous operation, and
until the CQ is drained, the disoposition of ongoing RDMA is not
defined.

Basically: You can't avoid actually doing a blocking invalidate
operation. The core layer must allow for this if it is going to async
cancel RPCs.

FWIW, the same is true on the send side too, if the RPC had send
buffers and gets canceled, you have to block until a CQ linked to that
send is seen.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: RPC/RDMA memory invalidation
       [not found]     ` <20151028201002.GA27901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2015-10-28 21:30       ` Chuck Lever
       [not found]         ` <59849A38-0C8F-46AB-BB76-71216C6C0631-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2015-10-28 21:30 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Linux RDMA Mailing List


> On Oct 28, 2015, at 4:10 PM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> 
> On Wed, Oct 28, 2015 at 03:56:08PM -0400, Chuck Lever wrote:
> 
>> A key question is whether connection loss guarantees that the
>> server is fenced, for all device types, from existing
>> registered MRs. After reconnect, each MR must be registered
>> again before it can be accessed remotely. Is this true for the
>> Linux IB core, and all kernel providers, when using FRWR?
> 
> MR validation is not linked to a QP in any way. The memory is not
> fully fenced until the invalidate completes, or the MR unregister
> completes. Nothing else is good enough.

IBTA spec states:

> MW access operations (i.e. RDMA Write, RDMA Reads, and Atom-
> ics) are only allowed if the Type 2B MW is in the Valid state and the
> QP Number (QPN) and PD of the QP performing the MW access op-
> eration matches the QPN and PD associated with the Bound Type 2B
> MW.

Once the QP is out of RTS, there can be no incoming RDMA
requests that match the R_key, QPN, PD tuple. I think you
are saying that the QP state change has the same problem
as not waiting for an invalidation to complete.


>> After a connection loss, the Linux kernel RPC/RDMA client
>> creates a new QP as it reconnects, thus I’d expect the QPN to
>> be different on the new connection. That should be enough to
>> prevent access to MRs that were registered with the previous
>> QP and PD, right?
> 
> No, the NFS implementation creates a single PD for everything and any
> QP in the PD can access all the MRs. This is another security issue of
> a different sort.

I’m speaking only of the client at the moment.


> If there was one PD per QP then the above would be true, since the MR
> is linked to the PD.

There is a per-connection struct rpcrdma_ia that contains
both a PD and a QP. Therefore there is one PD and only one
QP (on the client) per connection.

Transport reconnect replaces the QP, but not the PD. See
rpcrdma_ep_connect().


> Even so, moving a QP out of RTR is not a synchronous operation, and
> until the CQ is drained, the disoposition of ongoing RDMA is not
> defined.
> 
> Basically: You can't avoid actually doing a blocking invalidate
> operation. The core layer must allow for this if it is going to async
> cancel RPCs.

Disappointing, but understood.


> FWIW, the same is true on the send side too, if the RPC had send
> buffers and gets canceled, you have to block until a CQ linked to that
> send is seen.

By “you have to block” you mean the send buffer cannot be reused
until the Send WR is known to have completed, and new Send WRs
cannot be posted until it is known that enough send queue resources
are available.

The connection recovery logic in rpcrdma_ep_connect should flush
pending CQs. New RPCs are blocked until a new connection is
established, although I’m not certain we are careful to ensure
the hardware has truly relinquished the send buffer before it is
made available for re-use. A known issue.


—-
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC: RPC/RDMA memory invalidation
       [not found]         ` <59849A38-0C8F-46AB-BB76-71216C6C0631-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2015-10-28 21:51           ` Jason Gunthorpe
  0 siblings, 0 replies; 4+ messages in thread
From: Jason Gunthorpe @ 2015-10-28 21:51 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux RDMA Mailing List

On Wed, Oct 28, 2015 at 05:30:17PM -0400, Chuck Lever wrote:

> IBTA spec states:
> 
> > MW access operations (i.e. RDMA Write, RDMA Reads, and Atom-
> > ics) are only allowed if the Type 2B MW is in the Valid state and the
> > QP Number (QPN) and PD of the QP performing the MW access op-
> > eration matches the QPN and PD associated with the Bound Type 2B
> > MW.
> 
> Once the QP is out of RTS, there can be no incoming RDMA
> requests that match the R_key, QPN, PD tuple. I think you
> are saying that the QP state change has the same problem
> as not waiting for an invalidation to complete.

MW (Memory Window) is something different from a MR.

MR's do not match on the QPN.

> > If there was one PD per QP then the above would be true, since the MR
> > is linked to the PD.
> 
> There is a per-connection struct rpcrdma_ia that contains
> both a PD and a QP. Therefore there is one PD and only one
> QP (on the client) per connection.

Oh, that is great then

> > FWIW, the same is true on the send side too, if the RPC had send
> > buffers and gets canceled, you have to block until a CQ linked to that
> > send is seen.
> 
> By “you have to block” you mean the send buffer cannot be reused
> until the Send WR is known to have completed, and new Send WRs
> cannot be posted until it is known that enough send queue resources
> are available.

Yes

> I’m not certain we are careful to ensure
> the hardware has truly relinquished the send buffer before it is
> made available for re-use. A known issue.

This is the issue I was thinking of, yes. Ideally the CPU would not
touch the send buffer until the HW is done with it under any
situation. This is less serious than having a rouge writable R_Key
however.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-10-28 21:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-28 19:56 RFC: RPC/RDMA memory invalidation Chuck Lever
     [not found] ` <094A348A-0764-4F46-A422-FBF2F1DC1C28-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-10-28 20:10   ` Jason Gunthorpe
     [not found]     ` <20151028201002.GA27901-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-10-28 21:30       ` Chuck Lever
     [not found]         ` <59849A38-0C8F-46AB-BB76-71216C6C0631-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2015-10-28 21:51           ` Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).