From: "J. Bruce Fields" <bfields@fieldses.org>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Ben Myers <bpm@sgi.com>, Olga Kornievskaia <aglo@citi.umich.edu>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
Jim Rees <rees@umich.edu>
Subject: Re: sunrpc: socket buffer size tuneable
Date: Fri, 25 Jan 2013 17:34:54 -0500 [thread overview]
Message-ID: <20130125223454.GK29596@fieldses.org> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA91833C0D5@sacexcmbx05-prd.hq.netapp.com>
On Fri, Jan 25, 2013 at 10:20:12PM +0000, Myklebust, Trond wrote:
> > -----Original Message-----
> > From: J. Bruce Fields [mailto:bfields@fieldses.org]
> > Sent: Friday, January 25, 2013 4:57 PM
> > To: Myklebust, Trond
> > Cc: Ben Myers; Olga Kornievskaia; linux-nfs@vger.kernel.org; Jim Rees
> > Subject: Re: sunrpc: socket buffer size tuneable
> >
> > On Fri, Jan 25, 2013 at 09:45:12PM +0000, Myklebust, Trond wrote:
> > > > -----Original Message----- From: J. Bruce Fields
> > > > [mailto:bfields@fieldses.org] Sent: Friday, January 25, 2013 4:35 PM
> > > > To: Myklebust, Trond Cc: Ben Myers; Olga Kornievskaia;
> > > > linux-nfs@vger.kernel.org; Jim Rees Subject: Re: sunrpc: socket
> > > > buffer size tuneable
> > > >
> > > > On Fri, Jan 25, 2013 at 09:29:09PM +0000, Myklebust, Trond wrote:
> > > > > > -----Original Message----- From: J. Bruce Fields
> > > > > > [mailto:bfields@fieldses.org] Sent: Friday, January 25, 2013
> > > > > > 4:21 PM To: Myklebust, Trond Cc: Ben Myers; Olga Kornievskaia;
> > > > > > linux-nfs@vger.kernel.org; Jim Rees Subject: Re: sunrpc: socket
> > > > > > buffer size tuneable
> > > > > >
> > > > > > On Fri, Jan 25, 2013 at 09:12:55PM +0000, Myklebust, Trond
> > > > > > wrote:
> > > > >
> > > > > > > Why is it not sufficient to clamp the TCP values of 'snd' and
> > > > > > > 'rcv' using
> > > > > > sysctl_tcp_wmem/sysctl_tcp_rmem?
> > > > > > > ...and clamp the UDP values using
> > > > > > sysctl_[wr]mem_min/sysctl_[wr]mem_max?.
> > > > > >
> > > > > > Yeah, I was just looking at that--so, Ben, something like:
> > > > > >
> > > > > > echo "1048576 1048576 4194304"
> > > > > > >/proc/sys/net/ipv4/tcp_wmem
> > > > > >
> > > > > > But I'm unclear on some of the details: do we need to set the
> > > > > > minimum or only the default? And does it need any more
> > > > > > allowance for protocol overhead?
> > > > >
> > > > > I meant adding a check either to svc_sock_setbufsize or to the 2
> > > > > call-sites
> > > > that enforces the above limits.
> > > >
> > > > I lost you.
> > > >
> > > > It's not svc_sock_setbufsize that's setting too-small values, if
> > > > that's what you mean.
> > > >
> > >
> > > I understood that the problem was svc_udp_recvfrom() and
> > > svc_setup_socket() were using negative values in the calls to
> > > svc_sock_setbufsize(). Looking again at svc_setup_socket(), I don't
> > > see how that could do so, but svc_udp_recvfrom() definitely has
> > > potential to cause damage.
> >
> > Right, the changelog was confusing, the problem they're actually hitting is
> > with tcp. Looks like tcp autotuning is decreasing the send buffer below the
> > size we requested in svc_sock_setbufsize().
>
> Yes. As far as I can tell, that is endemic unless you lock the sndbuf size. Grep for sk_stream_moderate_sndbuf(), and you'll see what I mean.
Yes. So I guess I'll investigate a little more, then do an amateur
attempt at an interface to enforce a minimum and see if the network
developers think it's a reasonable idea.
Alternatively: is there some better strategy for the server here?
It's trying to prevent threads from blocking by refusing to accept more
rpc's than it has send buffer space to reply to.
Presumably the fear is that all your threads could block trying to get
responses to a small number of slow clients.
Are there better ways to prevent that?
--b.
next prev parent reply other threads:[~2013-01-25 22:34 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-25 0:59 sunrpc: socket buffer size tuneable Ben Myers
2013-01-25 18:45 ` J. Bruce Fields
2013-01-25 19:10 ` Ben Myers
2013-01-25 18:57 ` J. Bruce Fields
2013-01-25 19:16 ` Jim Rees
2013-01-25 19:29 ` Ben Myers
2013-01-25 20:21 ` J. Bruce Fields
2013-01-25 20:35 ` Ben Myers
2013-01-25 21:12 ` Myklebust, Trond
2013-01-25 21:21 ` J. Bruce Fields
2013-01-25 21:29 ` Myklebust, Trond
2013-01-25 21:35 ` J. Bruce Fields
2013-01-25 21:45 ` Myklebust, Trond
2013-01-25 21:57 ` J. Bruce Fields
2013-01-25 22:02 ` Ben Myers
2013-01-25 22:20 ` Myklebust, Trond
2013-01-25 22:34 ` J. Bruce Fields [this message]
2013-01-25 23:00 ` Myklebust, Trond
2013-01-25 20:35 ` J. Bruce Fields
2013-01-25 20:51 ` J. Bruce Fields
2013-01-25 21:13 ` Ben Myers
2013-01-25 21:02 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130125223454.GK29596@fieldses.org \
--to=bfields@fieldses.org \
--cc=Trond.Myklebust@netapp.com \
--cc=aglo@citi.umich.edu \
--cc=bpm@sgi.com \
--cc=linux-nfs@vger.kernel.org \
--cc=rees@umich.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.