public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Wengang <wen.gang.wang@oracle.com>
To: Trond Myklebust <trondmy@gmail.com>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket
Date: Tue, 28 Oct 2014 09:05:26 +0800	[thread overview]
Message-ID: <544EEBD6.8020004@oracle.com> (raw)
In-Reply-To: <CAABAsM5YLCDOek+ZY4XiSBvc6Z=P7invXopG=r=ZSSNQJ2S8zQ@mail.gmail.com>

Hi Trond,

Thanks for your review!
The problem happened against source code without cancel_delayed_work_sync
being called in xs_close(). I didn't realize the difference with between 
mainline code,
sorry for confusing and thanks for your time.

thanks,
wengang

于 2014年10月27日 19:52, Trond Myklebust 写道:
> On Mon, Oct 27, 2014 at 3:03 AM, Wengang <wen.gang.wang@oracle.com> wrote:
>> Could somebody help to review this patch please?
>>
>> thanks,
>> Wengang
>>
>> 于 2014年10月21日 16:57, Wengang Wang 写道:
>>> A panic with call trace like this:
>>>
>>> crash> bt
>>> PID: 1842   TASK: ffff8824d1d523c0  CPU: 29  COMMAND: "kworker/29:1"
>>>    #0 [ffff88052a351a40] machine_kexec at ffffffff8103b40d
>>>    #1 [ffff88052a351ab0] crash_kexec at ffffffff810b98c5
>>>    #2 [ffff88052a351b80] oops_end at ffffffff815077d8
>>>    #3 [ffff88052a351bb0] no_context at ffffffff81048dff
>>>    #4 [ffff88052a351bf0] __bad_area_nosemaphore at ffffffff81048f80
>>>    #5 [ffff88052a351c40] bad_area_nosemaphore at ffffffff81049183
>>>    #6 [ffff88052a351c50] do_page_fault at ffffffff8150a32e
>>>    #7 [ffff88052a351d60] page_fault at ffffffff81506d55
>>>       [exception RIP: xs_tcp_reuse_connection+24]
>>>       RIP: ffffffffa0439518  RSP: ffff88052a351e10  RFLAGS: 00010282
>>>       RAX: ffff8824d1d523c0  RBX: ffff880d0d2d1000  RCX: ffff88407f3ae088
>>>       RDX: 0000000000000000  RSI: 0000000000001d00  RDI: ffff880d0d2d1000
>>>       RBP: ffff88052a351e20   R8: ffff88407f3af260   R9: ffffffff819ab880
>>>       R10: 0000000000000000  R11: ffff883f03de4820  R12: 00000000fffffff5
>>>       R13: ffff880d0d2d1000  R14: ffff8815e260b840  R15: 0000000000000000
>>>       ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>>>    #8 [ffff88052a351e28] xs_tcp_setup_socket at ffffffffa043b01a [sunrpc]
>>>    #9 [ffff88052a351e58] process_one_work at ffffffff8108c0d9
>>> #10 [ffff88052a351ea8] worker_thread at ffffffff8108ca1a
>>> #11 [ffff88052a351ee8] kthread at ffffffff81090ff7
>>> #12 [ffff88052a351f48] kernel_thread_helper at ffffffff8150fe84
>>>
>>> In xs_tcp_setup_socket, if the xprt->sock is not NULL, it calls
>>> xs_tcp_reuse_connection. But in xs_tcp_reuse_connection, the sock and
>>> inet is seen to be zero when crash happened
>>>
>>> crash> sock_xprt.sock ffff880d0d2d1000
>>>     sock = 0x0
>>> crash> sock_xprt.inet ffff880d0d2d1000
>>>     inet = 0x0
>>>
>>> the xprt.state is 532 which is XPRT_CONNECTING|XPRT_BOUND|XPRT_INITIALIZED
>>>
>>> This looks like a race with xs_reset_transport().
>>>
>>> The fix is to wait the cancel and wait until connect_worker finishes.
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
>>> ---
>>>    net/sunrpc/xprtsock.c | 3 +++
>>>    1 file changed, 3 insertions(+)
>>>
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index 3b305ab..718c57f 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -869,6 +869,9 @@ static void xs_reset_transport(struct sock_xprt
>>> *transport)
>>>          if (sk == NULL)
>>>                  return;
>>>    +     /* avoid a race with xs_tcp_setup_socket */
>>> +       cancel_delayed_work_sync(&transport->connect_worker);
>>> +
>>>          transport->srcport = 0;
>>>          write_lock_bh(&sk->sk_callback_lock);
> In mainline, there are only 2 callers of xs_reset_transport():
> 1) xs_close(), which already performs the above call
> 2) xs_udp_setup_socket() which cannot conflict with xs_tcp_setup_socket()
>
> Cheers
>    Trond
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


      reply	other threads:[~2014-10-28  1:03 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1413881868-13111-1-git-send-email-wen.gang.wang@oracle.com>
2014-10-27  1:03 ` [PATCH] [SUNRPC]: avoid race between xs_reset_transport and xs_tcp_setup_socket Wengang
2014-10-27 11:52   ` Trond Myklebust
2014-10-28  1:05     ` Wengang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544EEBD6.8020004@oracle.com \
    --to=wen.gang.wang@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox