From: Chuck Lever III <chuck.lever@oracle.com>
To: Neil Brown <neilb@suse.de>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"rostedt@goodmis.org" <rostedt@goodmis.org>
Subject: Re: [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports
Date: Mon, 8 May 2023 04:10:16 +0000 [thread overview]
Message-ID: <4CC536D3-10A2-46EB-B12D-231CED2EBBA5@oracle.com> (raw)
In-Reply-To: <168350522604.9647.16917749332832150697@noble.neil.brown.name>
> On May 7, 2023, at 5:20 PM, NeilBrown <neilb@suse.de> wrote:
>
> On Fri, 08 Apr 2022, Chuck Lever wrote:
>> Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
>> investigation shows that the crash occurred while NFSD was handling
>> a deferred request.
>>
>> This patch addresses two inter-related issues that prevent request
>> deferral from working correctly for RPC/RDMA requests:
>>
>> 1. Prevent the crash by ensuring that the original
>> svc_rqst::rq_xprt_ctxt value is available when the request is
>> revisited. Otherwise svc_rdma_sendto() does not have a Receive
>> context available with which to construct its reply.
>>
>> 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
>> compute .len and .buflen for received RPC Calls"),
>> svc_rdma_recvfrom() did not include the transport header in the
>> returned xdr_buf. There should have been no need for svc_defer()
>> and friends to save and restore that header, as of that commit.
>> This issue is addressed in a backport-friendly way by simply
>> having svc_rdma_recvfrom() set rq_xprt_hlen to zero
>> unconditionally, just as svc_tcp_recvfrom() does. This enables
>> svc_deferred_recv() to correctly reconstruct an RPC message
>> received via RPC/RDMA.
>
> I'm a bit late to this party but .... this patch is bad and I don't know
> how best to fix it.
> It is bad for two reasons.
> 1/ It can leak memory. When a deferral happens, the context is moved
> into an svc_deferred_req. There are a couple of places which assume
> that an svc_deferred_req can be freed with kfree(), such as
> svc_delete_xprt() and svc_revisit(). These will now leak the
> context. This is the bit that is hard to fix.
>
> 2/ It can cause a double-free with UDP (and maybe rdma).
> When a request is deferred, the ctxt is moved to the dreq.
> When that request is revisited the ctxt is copied back into the rqst.
> If the rqst is again deferred then the old dreq is reused and,
> importantly, the rq_xprt_ctxt is not cleared. So when the release
> function is called the ctxt is freed.
> When the request is revisited a second time that ctxt (now pointing
> to freed and possibly reused memory) is copied back into the rqst.
> When the request completes the ctxt is again freed - double free
> oops.
>
> For UDP there is no value in saving the ctxt in the dreq - the content
> of the ctxt, which is an skbuf, has been copied into the dreq. So maybe
> the UDB ctxt is a very different beast than the rdma ctxt and needs to
> be handled completely differently.
>
> We can fix the UDP double-free by always doing
> rqstp->rq_xprt_ctxt = NULL;
> whether a new dreq was needed or not. But that doesn't fix the leaking
> of ctxts and is really just a band-aid.
A double-free is potentially catastrophic, so I would
say this is a reasonable change that can be back-ported
without fuss (while noting that more is needed).
The RDMA recv_ctxt is persistent for the life of the
connection. Releasing one of those just puts it back on
a free list. So, maybe the second free could cause free
list corruption?
> Thoughts?
> Do we need to preserve ALL of the svc_rdma_recv_ctxt for deferred
> requests? Could we just copy some bits into the dreq (allocate a bit
> more space somehow) so that a simple kfree() is still enough?
> Or do we need a free_ctxt() handler attached to the xprt?
While I take some time to review code and options, did
you know there is a deferral fault injector that might
possibly help you sort through some of this? I doubt I
took the time to try forcing a second deferral of the
same request.
> Thanks,
> NeilBrown
>
>
>
>
>>
>> Reported-by: Trond Myklebust <trondmy@hammerspace.com>
>> Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> include/linux/sunrpc/svc.h | 1 +
>> net/sunrpc/svc_xprt.c | 3 +++
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +-
>> 3 files changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index a5dda4987e8b..217711fc9cac 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -395,6 +395,7 @@ struct svc_deferred_req {
>> size_t addrlen;
>> struct sockaddr_storage daddr; /* where reply must come from */
>> size_t daddrlen;
>> + void *xprt_ctxt;
>> struct cache_deferred_req handle;
>> size_t xprt_hlen;
>> int argslen;
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 0c117d3bfda8..b42cfffa7395 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
>> dr->daddr = rqstp->rq_daddr;
>> dr->argslen = rqstp->rq_arg.len >> 2;
>> dr->xprt_hlen = rqstp->rq_xprt_hlen;
>> + dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
>> + rqstp->rq_xprt_ctxt = NULL;
>>
>> /* back up head to the start of the buffer and copy */
>> skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
>> @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
>> rqstp->rq_xprt_hlen = dr->xprt_hlen;
>> rqstp->rq_daddr = dr->daddr;
>> rqstp->rq_respages = rqstp->rq_pages;
>> + rqstp->rq_xprt_ctxt = dr->xprt_ctxt;
>> svc_xprt_received(rqstp->rq_xprt);
>> return (dr->argslen<<2) - dr->xprt_hlen;
>> }
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index cf76a6ad127b..864131a9fc6e 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> goto out_err;
>> if (ret == 0)
>> goto out_drop;
>> - rqstp->rq_xprt_hlen = ret;
>> + rqstp->rq_xprt_hlen = 0;
>>
>> if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
>> goto out_backchannel;
--
Chuck Lever
next prev parent reply other threads:[~2023-05-08 4:10 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-07 17:43 [PATCH v3 0/2] Fix request deferral for NFS/RDMA Chuck Lever
2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
2022-04-07 20:31 ` Trond Myklebust
2023-05-08 0:20 ` NeilBrown
2023-05-08 4:10 ` Chuck Lever III [this message]
2022-04-07 17:43 ` [PATCH v3 2/2] SUNRPC: Fix the svc_deferred_event trace class Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CC536D3-10A2-46EB-B12D-231CED2EBBA5@oracle.com \
--to=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox