linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "J.Bruce Fields" <bfields@citi.umich.edu>
Cc: Ben Myers <bpm@sgi.com>, Olga Kornievskaia <aglo@citi.umich.edu>,
	NFS <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.
Date: Fri, 26 Jul 2013 06:33:03 +1000	[thread overview]
Message-ID: <20130726063303.0d1495b3@notabene.brown> (raw)
In-Reply-To: <20130725201805.GB17962@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 5229 bytes --]

On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@citi.umich.edu>
wrote:

> On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote:
> > 
> > Since we enabled auto-tuning for sunrpc TCP connections we do not
> > guarantee that there is enough write-space on each connection to
> > queue a reply.
> > 
> > If memory pressure causes the window to shrink too small, the request
> > throttling in sunrpc/svc will not accept any requests so no more requests
> > will be handled.  Even when pressure decreases the window will not
> > grow again until data is sent on the connection.
> > This means we get a deadlock:  no requests will be handled until there
> > is more space, and no space will be allocated until a request is
> > handled.
> > 
> > This can be simulated by modifying svc_tcp_has_wspace to inflate the
> > number of byte required and removing the 'svc_sock_setbufsize' calls
> > in svc_setup_socket.
> 
> Ah-hah!
> 
> > I found that multiplying by 16 was enough to make the requirement
> > exceed the default allocation.  With this modification in place:
> >    mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
> > would block and eventually time out because the nfs server could not
> > accept any requests.
> 
> So, this?:

Close enough.  I just put "//" in front of the lines I didn't want rather
than delete them.  But yes: exactly that effect.


> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 305374d..36de50d 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1193,6 +1193,7 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
>  	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
>  		return 1;
>  	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> +	required *= 16;
>  	if (sk_stream_wspace(svsk->sk_sk) >= required)
>  		return 1;
>  	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> @@ -1378,14 +1379,8 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)
>  		svc_udp_init(svsk, serv);
> -	else {
> -		/* initialise setting must have enough space to
> -		 * receive and respond to one request.
> -		 */
> -		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> -					4 * serv->sv_max_mesg);
> +	else
>  		svc_tcp_init(svsk, serv);
> -	}
>  
>  	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>  				svsk, svsk->sk_sk);
> 
> > This patch relaxes the request throttling to always allow at least one
> > request through per connection.  It does this by checking both
> >   sk_stream_min_wspace() and xprt->xpt_reserved
> > are zero.
> > The first is zero when the TCP transmit queue is empty.
> > The second is zero when there are no RPC requests being processed.
> > When both of these are zero the socket is idle and so one more
> > request can safely be allowed through.
> > 
> > Applying this patch allows the above mount command to succeed cleanly.
> > Tracing shows that the allocated write buffer space quickly grows and
> > after a few requests are handled, the extra tests are no longer needed
> > to permit further requests to be processed.
> > 
> > The main purpose of request throttling is to handle the case when one
> > client is slow at collecting replies and the send queue gets full of
> > replies that the client hasn't acknowledged (at the TCP level) yet.
> > As we only change behaviour when the send queue is empty this main
> > purpose is still preserved.
> > 
> > Reported-by: Ben Myers <bpm@sgi.com>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > 
> > --
> > As you can see I've changed the patch.  While writing up the above 
> > description realised there was a weakness and so added the sk_stream_min_wspace
> > test.  That allowed me to write the final paragraph.
> 
> This is great, thanks!
> 
> Inclined to queue it up for 3.11 and stable....

I'd agree for 3.11.
It feels a bit border-line for stable.  "dead-lock" and "has been seen in the
wild" are technically enough justification...
I'd probably mark it as "pleas don't apply to -stable until 3.11 is released"
or something like that, just for a bit of breathing space.
Your call though.

NeilBrown

> 
> --b.
> 
> > 
> > NeilBrown
> > 
> > 
> > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > index 305374d..7762b9f 100644
> > --- a/net/sunrpc/svcsock.c
> > +++ b/net/sunrpc/svcsock.c
> > @@ -1193,7 +1193,9 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt)
> >  	if (test_bit(XPT_LISTENER, &xprt->xpt_flags))
> >  		return 1;
> >  	required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg;
> > -	if (sk_stream_wspace(svsk->sk_sk) >= required)
> > +	if (sk_stream_wspace(svsk->sk_sk) >= required ||
> > +	    (sk_stream_min_wspace(svsk->sk_sk) == 0 &&
> > +	     atomic_read(&xprt->xpt_reserved) == 0))
> >  		return 1;
> >  	set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags);
> >  	return 0;
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2013-07-25 20:33 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130710092255.0240a36d@notabene.brown>
2013-07-10  2:27 ` Is tcp autotuning really what NFS wants? J.Bruce Fields
2013-07-10  4:32   ` NeilBrown
2013-07-10 19:07     ` J.Bruce Fields
2013-07-15  4:32       ` NeilBrown
2013-07-16  1:58         ` J.Bruce Fields
2013-07-16  4:00           ` NeilBrown
2013-07-16 14:24             ` J.Bruce Fields
2013-07-18  0:03               ` Ben Myers
2013-07-24 21:07                 ` J.Bruce Fields
2013-07-25  1:30                   ` [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure NeilBrown
2013-07-25 12:35                     ` Jim Rees
2013-07-25 20:18                     ` J.Bruce Fields
2013-07-25 20:33                       ` NeilBrown [this message]
2013-07-26 14:19                         ` J.Bruce Fields
2013-07-30  2:48                           ` NeilBrown
2013-08-01  2:49                             ` J.Bruce Fields
2013-07-10 17:33   ` Is tcp autotuning really what NFS wants? Dean
2013-07-10 17:39     ` Ben Greear
2013-07-15  4:35       ` NeilBrown
2013-07-15 23:32         ` Ben Greear
2013-07-16  4:46           ` NeilBrown
2013-07-10 19:59     ` Michael Richardson
2013-07-15  1:26   ` Jim Rees
2013-07-15  5:02     ` NeilBrown
2013-07-15 11:57       ` Jim Rees
2013-07-15 13:42   ` Jim Rees
2013-07-16  1:10     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130726063303.0d1495b3@notabene.brown \
    --to=neilb@suse.de \
    --cc=aglo@citi.umich.edu \
    --cc=bfields@citi.umich.edu \
    --cc=bpm@sgi.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).