* Re: [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket [not found] <1413881868-13111-1-git-send-email-wen.gang.wang@oracle.com> @ 2014-10-27 1:03 ` Wengang 2014-10-27 11:52 ` Trond Myklebust 0 siblings, 1 reply; 3+ messages in thread From: Wengang @ 2014-10-27 1:03 UTC (permalink / raw) To: linux-nfs; +Cc: Wengang Wang Could somebody help to review this patch please? thanks, Wengang 于 2014年10月21日 16:57, Wengang Wang 写道: > A panic with call trace like this: > > crash> bt > PID: 1842 TASK: ffff8824d1d523c0 CPU: 29 COMMAND: "kworker/29:1" > #0 [ffff88052a351a40] machine_kexec at ffffffff8103b40d > #1 [ffff88052a351ab0] crash_kexec at ffffffff810b98c5 > #2 [ffff88052a351b80] oops_end at ffffffff815077d8 > #3 [ffff88052a351bb0] no_context at ffffffff81048dff > #4 [ffff88052a351bf0] __bad_area_nosemaphore at ffffffff81048f80 > #5 [ffff88052a351c40] bad_area_nosemaphore at ffffffff81049183 > #6 [ffff88052a351c50] do_page_fault at ffffffff8150a32e > #7 [ffff88052a351d60] page_fault at ffffffff81506d55 > [exception RIP: xs_tcp_reuse_connection+24] > RIP: ffffffffa0439518 RSP: ffff88052a351e10 RFLAGS: 00010282 > RAX: ffff8824d1d523c0 RBX: ffff880d0d2d1000 RCX: ffff88407f3ae088 > RDX: 0000000000000000 RSI: 0000000000001d00 RDI: ffff880d0d2d1000 > RBP: ffff88052a351e20 R8: ffff88407f3af260 R9: ffffffff819ab880 > R10: 0000000000000000 R11: ffff883f03de4820 R12: 00000000fffffff5 > R13: ffff880d0d2d1000 R14: ffff8815e260b840 R15: 0000000000000000 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #8 [ffff88052a351e28] xs_tcp_setup_socket at ffffffffa043b01a [sunrpc] > #9 [ffff88052a351e58] process_one_work at ffffffff8108c0d9 > #10 [ffff88052a351ea8] worker_thread at ffffffff8108ca1a > #11 [ffff88052a351ee8] kthread at ffffffff81090ff7 > #12 [ffff88052a351f48] kernel_thread_helper at ffffffff8150fe84 > > In xs_tcp_setup_socket, if the xprt->sock is not NULL, it calls > xs_tcp_reuse_connection. But in xs_tcp_reuse_connection, the sock and > inet is seen to be zero when crash happened > > crash> sock_xprt.sock ffff880d0d2d1000 > sock = 0x0 > crash> sock_xprt.inet ffff880d0d2d1000 > inet = 0x0 > > the xprt.state is 532 which is XPRT_CONNECTING|XPRT_BOUND|XPRT_INITIALIZED > > This looks like a race with xs_reset_transport(). > > The fix is to wait the cancel and wait until connect_worker finishes. > > Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> > --- > net/sunrpc/xprtsock.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 3b305ab..718c57f 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -869,6 +869,9 @@ static void xs_reset_transport(struct sock_xprt *transport) > if (sk == NULL) > return; > > + /* avoid a race with xs_tcp_setup_socket */ > + cancel_delayed_work_sync(&transport->connect_worker); > + > transport->srcport = 0; > > write_lock_bh(&sk->sk_callback_lock); ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket 2014-10-27 1:03 ` [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket Wengang @ 2014-10-27 11:52 ` Trond Myklebust 2014-10-28 1:05 ` Wengang 0 siblings, 1 reply; 3+ messages in thread From: Trond Myklebust @ 2014-10-27 11:52 UTC (permalink / raw) To: Wengang; +Cc: linux-nfs On Mon, Oct 27, 2014 at 3:03 AM, Wengang <wen.gang.wang@oracle.com> wrote: > Could somebody help to review this patch please? > > thanks, > Wengang > > 于 2014年10月21日 16:57, Wengang Wang 写道: >> >> A panic with call trace like this: >> >> crash> bt >> PID: 1842 TASK: ffff8824d1d523c0 CPU: 29 COMMAND: "kworker/29:1" >> #0 [ffff88052a351a40] machine_kexec at ffffffff8103b40d >> #1 [ffff88052a351ab0] crash_kexec at ffffffff810b98c5 >> #2 [ffff88052a351b80] oops_end at ffffffff815077d8 >> #3 [ffff88052a351bb0] no_context at ffffffff81048dff >> #4 [ffff88052a351bf0] __bad_area_nosemaphore at ffffffff81048f80 >> #5 [ffff88052a351c40] bad_area_nosemaphore at ffffffff81049183 >> #6 [ffff88052a351c50] do_page_fault at ffffffff8150a32e >> #7 [ffff88052a351d60] page_fault at ffffffff81506d55 >> [exception RIP: xs_tcp_reuse_connection+24] >> RIP: ffffffffa0439518 RSP: ffff88052a351e10 RFLAGS: 00010282 >> RAX: ffff8824d1d523c0 RBX: ffff880d0d2d1000 RCX: ffff88407f3ae088 >> RDX: 0000000000000000 RSI: 0000000000001d00 RDI: ffff880d0d2d1000 >> RBP: ffff88052a351e20 R8: ffff88407f3af260 R9: ffffffff819ab880 >> R10: 0000000000000000 R11: ffff883f03de4820 R12: 00000000fffffff5 >> R13: ffff880d0d2d1000 R14: ffff8815e260b840 R15: 0000000000000000 >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> #8 [ffff88052a351e28] xs_tcp_setup_socket at ffffffffa043b01a [sunrpc] >> #9 [ffff88052a351e58] process_one_work at ffffffff8108c0d9 >> #10 [ffff88052a351ea8] worker_thread at ffffffff8108ca1a >> #11 [ffff88052a351ee8] kthread at ffffffff81090ff7 >> #12 [ffff88052a351f48] kernel_thread_helper at ffffffff8150fe84 >> >> In xs_tcp_setup_socket, if the xprt->sock is not NULL, it calls >> xs_tcp_reuse_connection. But in xs_tcp_reuse_connection, the sock and >> inet is seen to be zero when crash happened >> >> crash> sock_xprt.sock ffff880d0d2d1000 >> sock = 0x0 >> crash> sock_xprt.inet ffff880d0d2d1000 >> inet = 0x0 >> >> the xprt.state is 532 which is XPRT_CONNECTING|XPRT_BOUND|XPRT_INITIALIZED >> >> This looks like a race with xs_reset_transport(). >> >> The fix is to wait the cancel and wait until connect_worker finishes. >> >> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> >> --- >> net/sunrpc/xprtsock.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >> index 3b305ab..718c57f 100644 >> --- a/net/sunrpc/xprtsock.c >> +++ b/net/sunrpc/xprtsock.c >> @@ -869,6 +869,9 @@ static void xs_reset_transport(struct sock_xprt >> *transport) >> if (sk == NULL) >> return; >> + /* avoid a race with xs_tcp_setup_socket */ >> + cancel_delayed_work_sync(&transport->connect_worker); >> + >> transport->srcport = 0; >> write_lock_bh(&sk->sk_callback_lock); > In mainline, there are only 2 callers of xs_reset_transport(): 1) xs_close(), which already performs the above call 2) xs_udp_setup_socket() which cannot conflict with xs_tcp_setup_socket() Cheers Trond ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket 2014-10-27 11:52 ` Trond Myklebust @ 2014-10-28 1:05 ` Wengang 0 siblings, 0 replies; 3+ messages in thread From: Wengang @ 2014-10-28 1:05 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs Hi Trond, Thanks for your review! The problem happened against source code without cancel_delayed_work_sync being called in xs_close(). I didn't realize the difference with between mainline code, sorry for confusing and thanks for your time. thanks, wengang 于 2014年10月27日 19:52, Trond Myklebust 写道: > On Mon, Oct 27, 2014 at 3:03 AM, Wengang <wen.gang.wang@oracle.com> wrote: >> Could somebody help to review this patch please? >> >> thanks, >> Wengang >> >> 于 2014年10月21日 16:57, Wengang Wang 写道: >>> A panic with call trace like this: >>> >>> crash> bt >>> PID: 1842 TASK: ffff8824d1d523c0 CPU: 29 COMMAND: "kworker/29:1" >>> #0 [ffff88052a351a40] machine_kexec at ffffffff8103b40d >>> #1 [ffff88052a351ab0] crash_kexec at ffffffff810b98c5 >>> #2 [ffff88052a351b80] oops_end at ffffffff815077d8 >>> #3 [ffff88052a351bb0] no_context at ffffffff81048dff >>> #4 [ffff88052a351bf0] __bad_area_nosemaphore at ffffffff81048f80 >>> #5 [ffff88052a351c40] bad_area_nosemaphore at ffffffff81049183 >>> #6 [ffff88052a351c50] do_page_fault at ffffffff8150a32e >>> #7 [ffff88052a351d60] page_fault at ffffffff81506d55 >>> [exception RIP: xs_tcp_reuse_connection+24] >>> RIP: ffffffffa0439518 RSP: ffff88052a351e10 RFLAGS: 00010282 >>> RAX: ffff8824d1d523c0 RBX: ffff880d0d2d1000 RCX: ffff88407f3ae088 >>> RDX: 0000000000000000 RSI: 0000000000001d00 RDI: ffff880d0d2d1000 >>> RBP: ffff88052a351e20 R8: ffff88407f3af260 R9: ffffffff819ab880 >>> R10: 0000000000000000 R11: ffff883f03de4820 R12: 00000000fffffff5 >>> R13: ffff880d0d2d1000 R14: ffff8815e260b840 R15: 0000000000000000 >>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>> #8 [ffff88052a351e28] xs_tcp_setup_socket at ffffffffa043b01a [sunrpc] >>> #9 [ffff88052a351e58] process_one_work at ffffffff8108c0d9 >>> #10 [ffff88052a351ea8] worker_thread at ffffffff8108ca1a >>> #11 [ffff88052a351ee8] kthread at ffffffff81090ff7 >>> #12 [ffff88052a351f48] kernel_thread_helper at ffffffff8150fe84 >>> >>> In xs_tcp_setup_socket, if the xprt->sock is not NULL, it calls >>> xs_tcp_reuse_connection. But in xs_tcp_reuse_connection, the sock and >>> inet is seen to be zero when crash happened >>> >>> crash> sock_xprt.sock ffff880d0d2d1000 >>> sock = 0x0 >>> crash> sock_xprt.inet ffff880d0d2d1000 >>> inet = 0x0 >>> >>> the xprt.state is 532 which is XPRT_CONNECTING|XPRT_BOUND|XPRT_INITIALIZED >>> >>> This looks like a race with xs_reset_transport(). >>> >>> The fix is to wait the cancel and wait until connect_worker finishes. >>> >>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> >>> --- >>> net/sunrpc/xprtsock.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>> index 3b305ab..718c57f 100644 >>> --- a/net/sunrpc/xprtsock.c >>> +++ b/net/sunrpc/xprtsock.c >>> @@ -869,6 +869,9 @@ static void xs_reset_transport(struct sock_xprt >>> *transport) >>> if (sk == NULL) >>> return; >>> + /* avoid a race with xs_tcp_setup_socket */ >>> + cancel_delayed_work_sync(&transport->connect_worker); >>> + >>> transport->srcport = 0; >>> write_lock_bh(&sk->sk_callback_lock); > In mainline, there are only 2 callers of xs_reset_transport(): > 1) xs_close(), which already performs the above call > 2) xs_udp_setup_socket() which cannot conflict with xs_tcp_setup_socket() > > Cheers > Trond > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-10-28 1:03 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1413881868-13111-1-git-send-email-wen.gang.wang@oracle.com>
2014-10-27 1:03 ` [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket Wengang
2014-10-27 11:52 ` Trond Myklebust
2014-10-28 1:05 ` Wengang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox