linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ben Myers <bpm@sgi.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: sunrpc: socket buffer size tuneable
Date: Fri, 25 Jan 2013 13:10:25 -0600	[thread overview]
Message-ID: <20130125191025.GV30652@sgi.com> (raw)
In-Reply-To: <20130125184522.GB29596@fieldses.org>

Hey Bruce,

On Fri, Jan 25, 2013 at 01:45:22PM -0500, J. Bruce Fields wrote:
> On Thu, Jan 24, 2013 at 06:59:30PM -0600, Ben Myers wrote:
> > At 1020 threads the send buffer size wraps and becomes negative causing
> > the nfs server to grind to a halt.
> 
> Actually this appears to be a problem in only one place:
> 
> > @@ -525,18 +575,17 @@ static int svc_udp_recvfrom(struct svc_r
> >  	size_t len;
> >  	int err;
> >  
> > -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
> > -	    /* udp sockets need large rcvbuf as all pending
> > -	     * requests are still in that buffer.  sndbuf must
> > -	     * also be large enough that there is enough space
> > -	     * for one reply per thread.  We count all threads
> > -	     * rather than threads in a particular pool, which
> > -	     * provides an upper bound on the number of threads
> > -	     * which will access the socket.
> > -	     */
> > -	    svc_sock_setbufsize(svsk->sk_sock,
> > -				(serv->sv_nrthreads+3) * serv->sv_max_mesg,
> > -				(serv->sv_nrthreads+3) * serv->sv_max_mesg);
> 
> Elsewhere it's not dependent on the number of threads:
> 
> >  	svc_sock_setbufsize(svsk->sk_sock,
> > -			    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
> > -			    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
> ...
> > -		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
> > -					4 * serv->sv_max_mesg);
> 
> Since sv_max_mesg is at most a megabyte, those won't overflow.
> 
> If you saw an overflow in the tcp case, I suspect that was with a kernel
> before 9660439861aa8dbd5e2b8087f33e20760c2c9afc "svcrpc: take advantage
> of tcp autotuning"?

Correct.  This is an updated version of an old patch I've been carrying around
awhile.

> Honestly I'm a bit mystified by the udp comment in the code above; do we
> *really* need buffers that large?

I've been assuming that the author was used to running with just a few nfsd
threads, and didn't expect that someone would try a large number.

> And if so, why don't we set them to
> that size at the start in svc_udp_init?

Yeah, that's strange.. 

> But in the udp case I'm inclined to do just the minimum to fix the
> overflow and otherwise leave the code alone--it doesn't get as much use
> and testing as it once did, so it's probably better to leave it alone.
> 
> So something like the below.

It looks reasonable.  I'll give it a try.
 
> This still leaves the issue of tcp buffer size under memory pressure.

Yep.  That may be why this patch works for me.

Thanks,
Ben

> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 0a148c9..f3a525c 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -525,7 +525,7 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  	size_t len;
>  	int err;
>  
> -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
> +	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags)) {
>  	    /* udp sockets need large rcvbuf as all pending
>  	     * requests are still in that buffer.  sndbuf must
>  	     * also be large enough that there is enough space
> @@ -534,9 +534,13 @@ static int svc_udp_recvfrom(struct svc_rqst *rqstp)
>  	     * provides an upper bound on the number of threads
>  	     * which will access the socket.
>  	     */
> +	    unsigned int threads = serv->sv_nrthreads;
> +	    /* arbitrary limit, mainly just to prevent overflow: */
> +	    threads = min(threads, 128U);
>  	    svc_sock_setbufsize(svsk->sk_sock,
> -				(serv->sv_nrthreads+3) * serv->sv_max_mesg,
> -				(serv->sv_nrthreads+3) * serv->sv_max_mesg);
> +				(threads+3) * serv->sv_max_mesg,
> +				(threads+3) * serv->sv_max_mesg);
> +	}
>  
>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>  	skb = NULL;

  reply	other threads:[~2013-01-25 19:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-25  0:59 sunrpc: socket buffer size tuneable Ben Myers
2013-01-25 18:45 ` J. Bruce Fields
2013-01-25 19:10   ` Ben Myers [this message]
2013-01-25 18:57 ` J. Bruce Fields
2013-01-25 19:16   ` Jim Rees
2013-01-25 19:29     ` Ben Myers
2013-01-25 20:21       ` J. Bruce Fields
2013-01-25 20:35         ` Ben Myers
2013-01-25 21:12           ` Myklebust, Trond
2013-01-25 21:21             ` J. Bruce Fields
2013-01-25 21:29               ` Myklebust, Trond
2013-01-25 21:35                 ` J. Bruce Fields
2013-01-25 21:45                   ` Myklebust, Trond
2013-01-25 21:57                     ` J. Bruce Fields
2013-01-25 22:02                       ` Ben Myers
2013-01-25 22:20                       ` Myklebust, Trond
2013-01-25 22:34                         ` J. Bruce Fields
2013-01-25 23:00                           ` Myklebust, Trond
2013-01-25 20:35         ` J. Bruce Fields
2013-01-25 20:51           ` J. Bruce Fields
2013-01-25 21:13             ` Ben Myers
2013-01-25 21:02           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130125191025.GV30652@sgi.com \
    --to=bpm@sgi.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).