From: "J. Bruce Fields" <bfields@fieldses.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: svc_xprt_put is no longer BH-safe
Date: Tue, 8 Nov 2016 16:42:08 -0500 [thread overview]
Message-ID: <20161108214208.GB27681@fieldses.org> (raw)
In-Reply-To: <8AD6B9DB-E777-46E4-BB7E-774F5FC8B2B6@oracle.com>
On Tue, Nov 08, 2016 at 04:05:54PM -0500, Chuck Lever wrote:
>
> > On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >
> > On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote:
> >>
> >>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>>
> >>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
> >>>> Hi Bruce-
> >>>
> >>> Sorry for the slow response!
> >>>
> >>> ...
> >>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
> >>>> all backchannels') you added:
> >>>>
> >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >>>> index f5572e3..4f01f63 100644
> >>>> --- a/net/sunrpc/svc_xprt.c
> >>>> +++ b/net/sunrpc/svc_xprt.c
> >>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
> >>>> /* See comment on corresponding get in xs_setup_bc_tcp(): */
> >>>> if (xprt->xpt_bc_xprt)
> >>>> xprt_put(xprt->xpt_bc_xprt);
> >>>> + if (xprt->xpt_bc_xps)
> >>>> + xprt_switch_put(xprt->xpt_bc_xps);
> >>>> xprt->xpt_ops->xpo_free(xprt);
> >>>> module_put(owner);
> >>>> }
> >>>>
> >>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
> >>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
> >>>
> >>> Is that necessary? I wonder why the svcrdma code seems to be taking so
> >>> many of its own references on svc_xprts.
> >>
> >> The idea is to keep the xprt around while Work Requests (I/O) are running,
> >> so that the xprt is guaranteed to be there during work completions. The
> >> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ
> >> context.
> >>
> >> It's simple to change completions to use a Work Queue instead, but testing
> >> so far shows that will result in a performance loss. I'm still studying it.
> >>
> >> Is there another way to keep the xprt's ref count boosted while I/O is
> >> going on?
> >
> > Why do you need the svc_xprt in the completion?
>
> 1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue
> accounting mechanism. SQ accounting has to be updated for each completion:
> the completion indicates that the SQ entry used by this Work Request is
> now available, and that other callers waiting for enough SQEs can go ahead.
>
> 2. When handling a Receive completion, the incoming RPC message is enqueued
> on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed.
>
> 3. When handling a Read completion, the incoming RPC message is enqueued on
> the svc_xprt via svc_xprt_enqueue.
>
>
> > Can the xpo_detach method wait for any pending completions?
>
> So the completions would call wake_up on a waitqueue, or use a kernel
> completion? That sounds more expensive than managing an atomic reference
> count.
>
> I suppose some other reference count could be used to trigger a work item
> that would invoke xpo_detach.
>
> But based on your comments, then, svc_xprt_put() was never intended to be
> BH-safe.
I'm not sure what was intended. It doesn't look to me like any other
callers require it.
--b.
>
>
> > --b.
> >
> >>
> >>
> >>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
> >>>> everywhere without disabling BHs.
> >>>>
> >>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
> >>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
> >>>>
> >>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
> >>>> like a temporary solution.
> >>>
> >>> Since xpo_free is also called from svc_xprt_put that doesn't sound like
> >>> it would change anything. Or do we not trunk over RDMA for some reason?
> >>
> >> It's quite desirable to trunk NFS/RDMA on multiple connections, and it
> >> should work just like it does for TCP, but so far it's not been tested.
>
>
> --
> Chuck Lever
>
>
prev parent reply other threads:[~2016-11-08 21:42 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-29 16:21 svc_xprt_put is no longer BH-safe Chuck Lever
2016-11-08 20:03 ` J. Bruce Fields
2016-11-08 20:13 ` Chuck Lever
2016-11-08 20:20 ` J. Bruce Fields
2016-11-08 21:05 ` Chuck Lever
2016-11-08 21:42 ` J. Bruce Fields [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161108214208.GB27681@fieldses.org \
--to=bfields@fieldses.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.