From: NeilBrown <neilb@suse.de>
To: "J.Bruce Fields" <bfields@citi.umich.edu>
Cc: Ben Myers <bpm@sgi.com>, Olga Kornievskaia <aglo@citi.umich.edu>,
NFS <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
Date: Fri, 26 Jul 2013 06:33:03 +1000 [thread overview]
Message-ID: <20130726063303.0d1495b3@notabene.brown> (raw)
In-Reply-To: <20130725201805.GB17962@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 5229 bytes --]
On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu>
wrote:
> On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote:
> >
> > Since we enabled auto-tuning for sunrpc TCP connections we do not
> > guarantee that there is enough write-space on each connection to
> > queue a reply.
> >
> > If memory pressure causes the window to shrink too small, the request
> > throttling in sunrpc/svc will not accept any requests so no more requests
> > will be handled. Even when pressure decreases the window will not
> > grow again until data is sent on the connection.
> > This means we get a deadlock: no requests will be handled until there
> > is more space, and no space will be allocated until a request is
> > handled.
> >
> > This can be simulated by modifying svc_tcp_has_wspace to inflate the
> > number of byte required and removing the 'svc_sock_setbufsize' calls
> > in svc_setup_socket.
>
> Ah-hah!
>
> > I found that multiplying by 16 was enough to make the requirement
> > exceed the default allocation. With this modification in place:
> > mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
> > would block and eventually time out because the nfs server could not
> > accept any requests.
>
> So, this?:
Close enough. I just put "//" in front of the lines I didn't want rather
than delete them. But yes: exactly that effect.
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 305374d..36de50d 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
> if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
> return 1;
> required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> + required *= 16;
> if (sk_stream_wspace(svsk->sk_sk) >= required)
> return 1;
> set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> /* Initialize the socket */
> if (sock->type == SOCK_DGRAM)
> svc_udp_init(svsk, serv);
> - else {
> - /* initialise setting must have enough space to
> - * receive and respond to one request.
> - */
> - svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> - 4 * serv->sv_max_mesg);
> + else
> svc_tcp_init(svsk, serv);
> - }
>
> dprintk("svc: svc_setup_socket created %p (inet %p)\n",
> svsk, svsk->sk_sk);
>
> > This patch relaxes the request throttling to always allow at least one
> > request through per connection. It does this by checking both
> > sk_stream_min_wspace() and xprt->xpt_reserved
> > are zero.
> > The first is zero when the TCP transmit queue is empty.
> > The second is zero when there are no RPC requests being processed.
> > When both of these are zero the socket is idle and so one more
> > request can safely be allowed through.
> >
> > Applying this patch allows the above mount command to succeed cleanly.
> > Tracing shows that the allocated write buffer space quickly grows and
> > after a few requests are handled, the extra tests are no longer needed
> > to permit further requests to be processed.
> >
> > The main purpose of request throttling is to handle the case when one
> > client is slow at collecting replies and the send queue gets full of
> > replies that the client hasn't acknowledged (at the TCP level) yet.
> > As we only change behaviour when the send queue is empty this main
> > purpose is still preserved.
> >
> > Reported-by: Ben Myers <bpm@sgi.com>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> >
> > --
> > As you can see I've changed the patch. While writing up the above
> > description realised there was a weakness and so added the sk_stream_min_wspace
> > test. That allowed me to write the final paragraph.
>
> This is great, thanks!
>
> Inclined to queue it up for 3.11 and stable....
I'd agree for 3.11.
It feels a bit border-line for stable. "dead-lock" and "has been seen in the
wild" are technically enough justification...
I'd probably mark it as "pleas don't apply to -stable until 3.11 is released"
or something like that, just for a bit of breathing space.
Your call though.
NeilBrown
>
> --b.
>
> >
> > NeilBrown
> >
> >
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 305374d..7762b9f 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
> > if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
> > return 1;
> > required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> > - if (sk_stream_wspace(svsk->sk_sk) >= required)
> > + if (sk_stream_wspace(svsk->sk_sk) >= required ||
> > + (sk_stream_min_wspace(svsk->sk_sk) == 0 &&
> > + atomic_read(&xprt->xpt_reserved) == 0))
> > return 1;
> > set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> > return 0;
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2013-07-25 20:33 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10 2:27 ` Is tcp autotuning really what NFS wants? J.Bruce Fields
2013-07-10 4:32 ` NeilBrown
2013-07-10 19:07 ` J.Bruce Fields
2013-07-15 4:32 ` NeilBrown
2013-07-16 1:58 ` J.Bruce Fields
2013-07-16 4:00 ` NeilBrown
2013-07-16 14:24 ` J.Bruce Fields
2013-07-18 0:03 ` Ben Myers
2013-07-24 21:07 ` J.Bruce Fields
2013-07-25 1:30 ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35 ` Jim Rees
2013-07-25 20:18 ` J.Bruce Fields
2013-07-25 20:33 ` NeilBrown [this message]
2013-07-26 14:19 ` J.Bruce Fields
2013-07-30 2:48 ` NeilBrown
2013-08-01 2:49 ` J.Bruce Fields
2013-07-10 17:33 ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39 ` Ben Greear
2013-07-15 4:35 ` NeilBrown
2013-07-15 23:32 ` Ben Greear
2013-07-16 4:46 ` NeilBrown
2013-07-10 19:59 ` Michael Richardson
2013-07-15 1:26 ` Jim Rees
2013-07-15 5:02 ` NeilBrown
2013-07-15 11:57 ` Jim Rees
2013-07-15 13:42 ` Jim Rees
2013-07-16 1:10 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130726063303.0d1495b3@notabene.brown \
--to=neilb@suse.de \
--cc=aglo@citi.umich.edu \
--cc=bfields@citi.umich.edu \
--cc=bpm@sgi.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).