* svc_xprt_put is no longer BH-safe
@ 2016-10-29 16:21 Chuck Lever
2016-11-08 20:03 ` J. Bruce Fields
0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2016-10-29 16:21 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Linux NFS Mailing List
Hi Bruce-
I hit this lockdep splat this morning
Oct 29 11:38:25 klimt kernel: =================================
Oct 29 11:38:25 klimt kernel: [ INFO: inconsistent lock state ]
Oct 29 11:38:25 klimt kernel: 4.9.0-rc2-00003-g114ddae #9 Not tainted
Oct 29 11:38:25 klimt kernel: ---------------------------------
Oct 29 11:38:25 klimt kernel: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
Oct 29 11:38:25 klimt kernel: swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
Oct 29 11:38:25 klimt kernel: (
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: ){+.?...}
Oct 29 11:38:25 klimt kernel: , at:
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: {SOFTIRQ-ON-W} state was registered at:
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffff810e2483>] __lock_acquire+0x343/0x1670
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffff810e3c97>] lock_acquire+0x197/0x1f0
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffff8174cbf8>] _raw_spin_lock+0x38/0x50
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488242>] xprt_switch_put+0x22/0x30 [sunrpc]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa048492d>] svc_xprt_free+0x5d/0x80 [sunrpc]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa048554a>] svc_xprt_release+0x12a/0x140 [sunrpc]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa0487242>] svc_recv+0xcb2/0xed0 [sunrpc]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffffa04d57b8>] nfsd+0xe8/0x160 [nfsd]
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffff810b267b>] kthread+0x10b/0x120
Oct 29 11:38:25 klimt kernel:
Oct 29 11:38:25 klimt kernel: [<ffffffff8174d8ba>] ret_from_fork+0x2a/0x40
Oct 29 11:38:25 klimt kernel: irq event stamp: 483745258
Oct 29 11:38:25 klimt kernel: hardirqs last enabled at (483745258):
Oct 29 11:38:25 klimt kernel: [<ffffffff81093a45>] __local_bh_enable_ip+0x95/0xd0
Oct 29 11:38:25 klimt kernel: hardirqs last disabled at (483745257):
Oct 29 11:38:25 klimt kernel: [<ffffffff81093a05>] __local_bh_enable_ip+0x55/0xd0
Oct 29 11:38:25 klimt kernel: softirqs last enabled at (483744848):
Oct 29 11:38:25 klimt kernel: [<ffffffff81092a52>] _local_bh_enable+0x42/0x50
Oct 29 11:38:25 klimt kernel: softirqs last disabled at (483744849):
Oct 29 11:38:25 klimt kernel: [<ffffffff81093b4b>] irq_exit+0x5b/0xf0
Oct 29 11:38:25 klimt kernel: #012other info that might help us debug this:
Oct 29 11:38:25 klimt kernel: Possible unsafe locking scenario:
Oct 29 11:38:25 klimt kernel: CPU0
Oct 29 11:38:25 klimt kernel: ----
Oct 29 11:38:25 klimt kernel: lock(
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: );
Oct 29 11:38:25 klimt kernel: <Interrupt>
Oct 29 11:38:25 klimt kernel: lock(
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: );
Oct 29 11:38:25 klimt kernel: #012 *** DEADLOCK ***
Oct 29 11:38:25 klimt kernel: no locks held by swapper/6/0.
Oct 29 11:38:25 klimt kernel: #012stack backtrace:
Oct 29 11:38:25 klimt kernel: CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.9.0-rc2-00003-g114ddae #9
Oct 29 11:38:25 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 29 11:38:25 klimt kernel: ffff88087bd83be8 ffffffff81392e51 ffff880857008040 ffffffff827e2500
Oct 29 11:38:25 klimt kernel: ffff88087bd83c38 ffffffff811b3e76 0000000000000001 0000000000000001
Oct 29 11:38:25 klimt kernel: 0000000000000000 0000000000000006 ffff880857008040 ffffffff810e1060
Oct 29 11:38:25 klimt kernel: Call Trace:
Oct 29 11:38:25 klimt kernel: <IRQ>
Oct 29 11:38:25 klimt kernel: [<ffffffff81392e51>] dump_stack+0x85/0xc4
Oct 29 11:38:25 klimt kernel: [<ffffffff811b3e76>] print_usage_bug+0x1eb/0x1fc
Oct 29 11:38:25 klimt kernel: [<ffffffff810e1060>] ? print_irq_inversion_bug+0x200/0x200
Oct 29 11:38:25 klimt kernel: [<ffffffff810e1a65>] mark_lock+0x175/0x290
Oct 29 11:38:25 klimt kernel: [<ffffffff810e23cf>] __lock_acquire+0x28f/0x1670
Oct 29 11:38:25 klimt kernel: [<ffffffff810e3c97>] lock_acquire+0x197/0x1f0
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] ? xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffff8174cbf8>] _raw_spin_lock+0x38/0x50
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] ? xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488242>] xprt_switch_put+0x22/0x30 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa048492d>] svc_xprt_free+0x5d/0x80 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa048501d>] svc_xprt_put+0x1d/0x20 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa05753db>] svc_rdma_wc_receive+0xcb/0xe0 [rpcrdma]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0240d55>] __ib_process_cq+0x35/0xc0 [ib_core]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0240ed2>] ib_poll_handler+0x22/0x60 [ib_core]
Oct 29 11:38:25 klimt kernel: [<ffffffff813c8bb5>] irq_poll_softirq+0x85/0x100
Oct 29 11:38:25 klimt kernel: [<ffffffff817507a9>] __do_softirq+0x1f9/0x425
Oct 29 11:38:25 klimt kernel: [<ffffffff81093b4b>] irq_exit+0x5b/0xf0
Oct 29 11:38:25 klimt kernel: [<ffffffff817502df>] do_IRQ+0xef/0x110
Oct 29 11:38:25 klimt kernel: [<ffffffff8174e016>] common_interrupt+0x96/0x96
Oct 29 11:38:25 klimt kernel: <EOI>
Oct 29 11:38:25 klimt kernel: [<ffffffff815b996c>] ? cpuidle_enter_state+0x22c/0x300
Oct 29 11:38:25 klimt kernel: [<ffffffff815b9a77>] cpuidle_enter+0x17/0x20
Oct 29 11:38:25 klimt kernel: [<ffffffff810dbe7d>] call_cpuidle+0x3d/0x50
Oct 29 11:38:25 klimt kernel: [<ffffffff810dc13e>] cpu_startup_entry+0x19e/0x240
Oct 29 11:38:25 klimt kernel: [<ffffffff81059590>] start_secondary+0x160/0x1a0
In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
all backchannels') you added:
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f5572e3..4f01f63 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
/* See comment on corresponding get in xs_setup_bc_tcp(): */
if (xprt->xpt_bc_xprt)
xprt_put(xprt->xpt_bc_xprt);
+ if (xprt->xpt_bc_xps)
+ xprt_switch_put(xprt->xpt_bc_xps);
xprt->xpt_ops->xpo_free(xprt);
module_put(owner);
}
svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
everywhere without disabling BHs.
It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
like a temporary solution.
--
Chuck Lever
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: svc_xprt_put is no longer BH-safe 2016-10-29 16:21 svc_xprt_put is no longer BH-safe Chuck Lever @ 2016-11-08 20:03 ` J. Bruce Fields 2016-11-08 20:13 ` Chuck Lever 0 siblings, 1 reply; 6+ messages in thread From: J. Bruce Fields @ 2016-11-08 20:03 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: > Hi Bruce- Sorry for the slow response! ... > In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between > all backchannels') you added: > > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > index f5572e3..4f01f63 100644 > --- a/net/sunrpc/svc_xprt.c > +++ b/net/sunrpc/svc_xprt.c > @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) > /* See comment on corresponding get in xs_setup_bc_tcp(): */ > if (xprt->xpt_bc_xprt) > xprt_put(xprt->xpt_bc_xprt); > + if (xprt->xpt_bc_xps) > + xprt_switch_put(xprt->xpt_bc_xps); > xprt->xpt_ops->xpo_free(xprt); > module_put(owner); > } > > svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called > from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). Is that necessary? I wonder why the svcrdma code seems to be taking so many of its own references on svc_xprts. > However, xprt_switch_put() takes a spin lock (xps_lock) which is locked > everywhere without disabling BHs. > > It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? > Not sure if svc_xprt_put() was intended to be BH-safe beforehand. > > Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems > like a temporary solution. Since xpo_free is also called from svc_xprt_put that doesn't sound like it would change anything. Or do we not trunk over RDMA for some reason? --b. > > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: svc_xprt_put is no longer BH-safe 2016-11-08 20:03 ` J. Bruce Fields @ 2016-11-08 20:13 ` Chuck Lever 2016-11-08 20:20 ` J. Bruce Fields 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2016-11-08 20:13 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Linux NFS Mailing List > On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: >> Hi Bruce- > > Sorry for the slow response! > > ... >> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between >> all backchannels') you added: >> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >> index f5572e3..4f01f63 100644 >> --- a/net/sunrpc/svc_xprt.c >> +++ b/net/sunrpc/svc_xprt.c >> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) >> /* See comment on corresponding get in xs_setup_bc_tcp(): */ >> if (xprt->xpt_bc_xprt) >> xprt_put(xprt->xpt_bc_xprt); >> + if (xprt->xpt_bc_xps) >> + xprt_switch_put(xprt->xpt_bc_xps); >> xprt->xpt_ops->xpo_free(xprt); >> module_put(owner); >> } >> >> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called >> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). > > Is that necessary? I wonder why the svcrdma code seems to be taking so > many of its own references on svc_xprts. The idea is to keep the xprt around while Work Requests (I/O) are running, so that the xprt is guaranteed to be there during work completions. The completion handlers (where svc_xprt_put is often invoked) run in soft IRQ context. It's simple to change completions to use a Work Queue instead, but testing so far shows that will result in a performance loss. I'm still studying it. Is there another way to keep the xprt's ref count boosted while I/O is going on? >> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked >> everywhere without disabling BHs. >> >> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? >> Not sure if svc_xprt_put() was intended to be BH-safe beforehand. >> >> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems >> like a temporary solution. > > Since xpo_free is also called from svc_xprt_put that doesn't sound like > it would change anything. Or do we not trunk over RDMA for some reason? It's quite desirable to trunk NFS/RDMA on multiple connections, and it should work just like it does for TCP, but so far it's not been tested. -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: svc_xprt_put is no longer BH-safe 2016-11-08 20:13 ` Chuck Lever @ 2016-11-08 20:20 ` J. Bruce Fields 2016-11-08 21:05 ` Chuck Lever 0 siblings, 1 reply; 6+ messages in thread From: J. Bruce Fields @ 2016-11-08 20:20 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote: > > > On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: > >> Hi Bruce- > > > > Sorry for the slow response! > > > > ... > >> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between > >> all backchannels') you added: > >> > >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > >> index f5572e3..4f01f63 100644 > >> --- a/net/sunrpc/svc_xprt.c > >> +++ b/net/sunrpc/svc_xprt.c > >> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) > >> /* See comment on corresponding get in xs_setup_bc_tcp(): */ > >> if (xprt->xpt_bc_xprt) > >> xprt_put(xprt->xpt_bc_xprt); > >> + if (xprt->xpt_bc_xps) > >> + xprt_switch_put(xprt->xpt_bc_xps); > >> xprt->xpt_ops->xpo_free(xprt); > >> module_put(owner); > >> } > >> > >> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called > >> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). > > > > Is that necessary? I wonder why the svcrdma code seems to be taking so > > many of its own references on svc_xprts. > > The idea is to keep the xprt around while Work Requests (I/O) are running, > so that the xprt is guaranteed to be there during work completions. The > completion handlers (where svc_xprt_put is often invoked) run in soft IRQ > context. > > It's simple to change completions to use a Work Queue instead, but testing > so far shows that will result in a performance loss. I'm still studying it. > > Is there another way to keep the xprt's ref count boosted while I/O is > going on? Why do you need the svc_xprt in the completion? Can the xpo_detach method wait for any pending completions? --b. > > > >> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked > >> everywhere without disabling BHs. > >> > >> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? > >> Not sure if svc_xprt_put() was intended to be BH-safe beforehand. > >> > >> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems > >> like a temporary solution. > > > > Since xpo_free is also called from svc_xprt_put that doesn't sound like > > it would change anything. Or do we not trunk over RDMA for some reason? > > It's quite desirable to trunk NFS/RDMA on multiple connections, and it > should work just like it does for TCP, but so far it's not been tested. > > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: svc_xprt_put is no longer BH-safe 2016-11-08 20:20 ` J. Bruce Fields @ 2016-11-08 21:05 ` Chuck Lever 2016-11-08 21:42 ` J. Bruce Fields 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2016-11-08 21:05 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Linux NFS Mailing List > On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote: >> >>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote: >>> >>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: >>>> Hi Bruce- >>> >>> Sorry for the slow response! >>> >>> ... >>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between >>>> all backchannels') you added: >>>> >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c >>>> index f5572e3..4f01f63 100644 >>>> --- a/net/sunrpc/svc_xprt.c >>>> +++ b/net/sunrpc/svc_xprt.c >>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) >>>> /* See comment on corresponding get in xs_setup_bc_tcp(): */ >>>> if (xprt->xpt_bc_xprt) >>>> xprt_put(xprt->xpt_bc_xprt); >>>> + if (xprt->xpt_bc_xps) >>>> + xprt_switch_put(xprt->xpt_bc_xps); >>>> xprt->xpt_ops->xpo_free(xprt); >>>> module_put(owner); >>>> } >>>> >>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called >>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). >>> >>> Is that necessary? I wonder why the svcrdma code seems to be taking so >>> many of its own references on svc_xprts. >> >> The idea is to keep the xprt around while Work Requests (I/O) are running, >> so that the xprt is guaranteed to be there during work completions. The >> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ >> context. >> >> It's simple to change completions to use a Work Queue instead, but testing >> so far shows that will result in a performance loss. I'm still studying it. >> >> Is there another way to keep the xprt's ref count boosted while I/O is >> going on? > > Why do you need the svc_xprt in the completion? 1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue accounting mechanism. SQ accounting has to be updated for each completion: the completion indicates that the SQ entry used by this Work Request is now available, and that other callers waiting for enough SQEs can go ahead. 2. When handling a Receive completion, the incoming RPC message is enqueued on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed. 3. When handling a Read completion, the incoming RPC message is enqueued on the svc_xprt via svc_xprt_enqueue. > Can the xpo_detach method wait for any pending completions? So the completions would call wake_up on a waitqueue, or use a kernel completion? That sounds more expensive than managing an atomic reference count. I suppose some other reference count could be used to trigger a work item that would invoke xpo_detach. But based on your comments, then, svc_xprt_put() was never intended to be BH-safe. > --b. > >> >> >>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked >>>> everywhere without disabling BHs. >>>> >>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? >>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand. >>>> >>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems >>>> like a temporary solution. >>> >>> Since xpo_free is also called from svc_xprt_put that doesn't sound like >>> it would change anything. Or do we not trunk over RDMA for some reason? >> >> It's quite desirable to trunk NFS/RDMA on multiple connections, and it >> should work just like it does for TCP, but so far it's not been tested. -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: svc_xprt_put is no longer BH-safe 2016-11-08 21:05 ` Chuck Lever @ 2016-11-08 21:42 ` J. Bruce Fields 0 siblings, 0 replies; 6+ messages in thread From: J. Bruce Fields @ 2016-11-08 21:42 UTC (permalink / raw) To: Chuck Lever; +Cc: Linux NFS Mailing List On Tue, Nov 08, 2016 at 04:05:54PM -0500, Chuck Lever wrote: > > > On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > > > On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote: > >> > >>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > >>> > >>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote: > >>>> Hi Bruce- > >>> > >>> Sorry for the slow response! > >>> > >>> ... > >>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between > >>>> all backchannels') you added: > >>>> > >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c > >>>> index f5572e3..4f01f63 100644 > >>>> --- a/net/sunrpc/svc_xprt.c > >>>> +++ b/net/sunrpc/svc_xprt.c > >>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref) > >>>> /* See comment on corresponding get in xs_setup_bc_tcp(): */ > >>>> if (xprt->xpt_bc_xprt) > >>>> xprt_put(xprt->xpt_bc_xprt); > >>>> + if (xprt->xpt_bc_xps) > >>>> + xprt_switch_put(xprt->xpt_bc_xps); > >>>> xprt->xpt_ops->xpo_free(xprt); > >>>> module_put(owner); > >>>> } > >>>> > >>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called > >>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive). > >>> > >>> Is that necessary? I wonder why the svcrdma code seems to be taking so > >>> many of its own references on svc_xprts. > >> > >> The idea is to keep the xprt around while Work Requests (I/O) are running, > >> so that the xprt is guaranteed to be there during work completions. The > >> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ > >> context. > >> > >> It's simple to change completions to use a Work Queue instead, but testing > >> so far shows that will result in a performance loss. I'm still studying it. > >> > >> Is there another way to keep the xprt's ref count boosted while I/O is > >> going on? > > > > Why do you need the svc_xprt in the completion? > > 1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue > accounting mechanism. SQ accounting has to be updated for each completion: > the completion indicates that the SQ entry used by this Work Request is > now available, and that other callers waiting for enough SQEs can go ahead. > > 2. When handling a Receive completion, the incoming RPC message is enqueued > on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed. > > 3. When handling a Read completion, the incoming RPC message is enqueued on > the svc_xprt via svc_xprt_enqueue. > > > > Can the xpo_detach method wait for any pending completions? > > So the completions would call wake_up on a waitqueue, or use a kernel > completion? That sounds more expensive than managing an atomic reference > count. > > I suppose some other reference count could be used to trigger a work item > that would invoke xpo_detach. > > But based on your comments, then, svc_xprt_put() was never intended to be > BH-safe. I'm not sure what was intended. It doesn't look to me like any other callers require it. --b. > > > > --b. > > > >> > >> > >>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked > >>>> everywhere without disabling BHs. > >>>> > >>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe? > >>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand. > >>>> > >>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems > >>>> like a temporary solution. > >>> > >>> Since xpo_free is also called from svc_xprt_put that doesn't sound like > >>> it would change anything. Or do we not trunk over RDMA for some reason? > >> > >> It's quite desirable to trunk NFS/RDMA on multiple connections, and it > >> should work just like it does for TCP, but so far it's not been tested. > > > -- > Chuck Lever > > ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-11-08 21:42 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-29 16:21 svc_xprt_put is no longer BH-safe Chuck Lever 2016-11-08 20:03 ` J. Bruce Fields 2016-11-08 20:13 ` Chuck Lever 2016-11-08 20:20 ` J. Bruce Fields 2016-11-08 21:05 ` Chuck Lever 2016-11-08 21:42 ` J. Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.