From: "Igor O." <igoros@gmail.com>
To: linux-nfs@vger.kernel.org
Subject: Re: Race between RPC queuing on xprt_pending and a write space notification?
Date: Thu, 4 May 2017 16:29:03 -0700 [thread overview]
Message-ID: <CABGc9E5r_fSVuLL0uHD7ESeRhZfDwBmAHPqXyx9zW9H2U1i==Q@mail.gmail.com> (raw)
In-Reply-To: <CABGc9E44m=UvKFtTEaa9hiOJuXfYaNjPr=H92VvqvTbKQ828qQ@mail.gmail.com>
I found a fix for a race that sounds very similar to what I am seeing.
I am not 100% sure yet but I am reporting progress here in case anyone
is looking into this.
commit d48f9ce73c997573e1b512893fa6eddf353a6f69
Author: David Vrabel <david.vrabel@citrix.com>
Date: Mon Sep 19 13:58:30 2016 +0100
sunrpc: fix write space race causing stalls
Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space(). The existing mechanism to avoid the
race does not work.
This (edited) partial trace illustrates the problem:
[1] rpc_task_run_action: task:43546@5 ... action=call_transmit
[2] xs_write_space <-xs_tcp_write_space
[3] xprt_write_space <-xs_write_space
[4] rpc_task_sleep: task:43546@5 ...
[5] xs_write_space <-xs_tcp_write_space
[1] Task 43546 runs but is out of write space.
[2] Space becomes available, xs_write_space() clears the
SOCKWQ_ASYNC_NOSPACE bit.
[3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
this has not yet been queued and the wake up is lost.
[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
which queues task 43546.
[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
is supposed to handle the above race) does not call
xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
thus the task is not woken.
Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
so the second call to sk->sk_write_space() calls xprt_write_space().
Suggested-by: Trond Myklebust <trondmy@primarydata.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
cc: stable@vger.kernel.org # 4.4
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Igor
On Thu, May 4, 2017 at 9:22 AM, Igor O. <igoros@gmail.com> wrote:
> Hi all,
>
> I am trying to understand an unresponsive NFS client issue that I can
> reproduce in a particular workload running for a few hours. The VM
> running the NFS client is Ubuntu 14.04.5, so this is using the
> 4.4.0-34 kernel.
>
> Based on rpc_debug traces, I have a possible hypothesis on how this
> can happen as a result of a race on the xprt_pending queue. I was
> wondering whether anyone here can comment on whether the hypothesis is
> plausible or not.
>
> The client is stuck in the following state:
>
> * There is one pending RPC (61009) that began transmitting, but
> the transmission was incomplete due to EAGAIN, and will never complete
> + RPC 61009 is forever reported as pending by
> sunrpc/rpc_clnt/*/tasks in debugfs
> + Obviously no other RPC can transmit, since the transport is
> locked by RPC 61009
>
> * But, there is space in the output TCP buffer:
> + Output TCP socket buffer is empty, according to netstat
> + Server is reporting non-zero TCP window to the client,
> according to tcpdump
>
> This is the rpc_debug trace of how RPC 61009 got into this state:
>
> May 4 06:12:02 ub14045-gold kernel: [155440.176714] RPC: 61009
> xprt_transmit(524448)
> May 4 06:12:02 ub14045-gold kernel: [155440.176734] RPC:
> xs_tcp_send_request(524448) = 0
> May 4 06:12:02 ub14045-gold kernel: [155440.176735] RPC:
> xs_tcp_send_request(261456) = -11
> May 4 06:12:02 ub14045-gold kernel: [155440.176738] RPC: 61009
> xmit incomplete (261456 left of 524448)
> May 4 06:12:02 ub14045-gold kernel: [155440.176771] RPC:
> write space: waking waiting task on xprt ffff88003d1e7000
> May 4 06:12:02 ub14045-gold kernel: [155440.176778] RPC: 61009
> sleep_on(queue "xprt_pending" time 4333753084)
> May 4 06:12:02 ub14045-gold kernel: [155440.176778] RPC: 61009
> added to queue ffff88003d1e7258 "xprt_pending"
>
> In other words:
>
> 1. 61009 gets EAGAIN, reports xmit incomplete
> 2. In race, write space becomes available, the xprt_pending queue
> tasks get woken. The queue is empty, though, because 61009 hasn't
> queued itself yet.
> 3. 61009 queues itself on xprt_pending queue, but it just missed
> the notification.
> 4. 61009 is forever stuck on xprt_pending queue. There won't be
> any future "write space" notification and it will never time out.
>
> At this point, there isn't any notification that will wake 61009 on
> the xprt_pending queue. Since I have a hard mount, there will be no
> timeout on the wait, as implemented in xprt_wait_for_buffer_space().
>
> Can anyone provide insight on whether the xprt_pending queue handles
> the above race condition? Is there another mechanism apart from "write
> space" notification (that can be missed) and timeout (not scheduled
> due to hard mount) that should still wake the RPC from the
> xprt_pending queue?
>
> Apart from that, any suggested next steps to investigate this?
>
> Thank you for reading,
> Igor Ostrovsky
prev parent reply other threads:[~2017-05-04 23:29 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-05-04 16:22 Race between RPC queuing on xprt_pending and a write space notification? Igor O.
2017-05-04 23:29 ` Igor O. [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CABGc9E5r_fSVuLL0uHD7ESeRhZfDwBmAHPqXyx9zW9H2U1i==Q@mail.gmail.com' \
--to=igoros@gmail.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).