Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix

Linux NFS development
 help / color / mirror / Atom feed

From: "J. Bruce Fields" <bfields@fieldses.org>
To: Dean Hildebrand <seattleplus@gmail.com>
Cc: Olga Kornievskaia <aglo@citi.umich.edu>, linux-nfs@vger.kernel.org
Subject: Re: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
Date: Wed, 22 Oct 2008 17:52:49 -0400	[thread overview]
Message-ID: <20081022215249.GD7454@fieldses.org> (raw)
In-Reply-To: <48FF9B87.6050909@gmail.com>

On Wed, Oct 22, 2008 at 02:30:47PM -0700, Dean Hildebrand wrote:
>
>
> J. Bruce Fields wrote:
>> On Tue, Oct 21, 2008 at 02:31:38PM -0400, Olga Kornievskaia wrote:
>>   
>>> From: Olga Kornievskaia <aglo@citi.umich.edu>
>>> Date: Tue, 21 Oct 2008 14:13:47 -0400
>>> Subject: [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix
>>>
>>> This patch allows for the NFSv4 server to make use of TCP autotuning behaviour
>>> which was previously disabled by setting sk_userlocks variable. 
>>>
>>> This patch sets the receive buffers to be big enough to receive the 
>>> whole RPC request. This buffer size had to be set for the listening 
>>> socket and not
>>> the accept socket as it was previously done.     
>>
>> The point there being that our previous buffer-size settings were made
>> too late to actually have an affect?
>>
>>   
>>> This patch removes the code that readjust the receive/send buffer sizes for
>>> the accepted socket. Previously this code was used to influence the TCP
>>> window management behaviour which is no longer needed when autotuning 
>>> is enabled.     
>>
>> Could we get a really brief summary of the performance improvement for a
>> high-speed network, to include in the commit message?
>>
>> The one remaining worry I recall is that we assume the tcp autotuning
>> never decreases the size of the buffer below the size we initially
>> requested.  Apparently that assumption is true.  There's some worry
>> about whether that's true by design or merely true of the current
>> implementation.
>>   
> If it does happen, I assume the fix there is to set the minimum tcp  
> buffer settings big enough for a single request?

That might be a workaround.  The fix would be a kernel patch.  That
doesn't look likely.

> In fact, the big impact here I believe is that NFS will finally start  
> using the linux tcp buffer settings (like every other tool).

You're talking about the various sysctls?

Since they're global, I don't think they're really very useful as tools
to tune nfsd.

--b.

> Is there  
> any way to
> get some documentation out there that this is the case?  Maybe an  
> addition to the website would be the right place for now?
> Dean
>> That doesn't look like a big worry--I'm inclined to apply this patch as
>> is--but moving the sk_{rcv,snd}buf assignments to a simple function in
>> the networking code and documenting the requirements there might be a
>> nice thing to do (as a separate patch).
>>
>> --b.
>>
>>   
>>> Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
>>> ---
>>>  net/sunrpc/svcsock.c |   35 +++++++----------------------------
>>>  1 files changed, 7 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>>> index 3e65719..4bb535e 100644
>>> --- a/net/sunrpc/svcsock.c
>>> +++ b/net/sunrpc/svcsock.c
>>> @@ -349,7 +349,6 @@ static void svc_sock_setbufsize(struct socket *sock, unsigned int snd,
>>>  	lock_sock(sock->sk);
>>>  	sock->sk->sk_sndbuf = snd * 2;
>>>  	sock->sk->sk_rcvbuf = rcv * 2;
>>> -	sock->sk->sk_userlocks |= SOCK_SNDBUF_LOCK|SOCK_RCVBUF_LOCK;
>>>  	release_sock(sock->sk);
>>>  #endif
>>>  }
>>> @@ -801,23 +800,6 @@ static int svc_tcp_recvfrom(struct svc_rqst *rqstp)
>>>  		test_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags),
>>>  		test_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags));
>>>  -	if (test_and_clear_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags))
>>> -		/* sndbuf needs to have room for one request
>>> -		 * per thread, otherwise we can stall even when the
>>> -		 * network isn't a bottleneck.
>>> -		 *
>>> -		 * We count all threads rather than threads in a
>>> -		 * particular pool, which provides an upper bound
>>> -		 * on the number of threads which will access the socket.
>>> -		 *
>>> -		 * rcvbuf just needs to be able to hold a few requests.
>>> -		 * Normally they will be removed from the queue
>>> -		 * as soon a a complete request arrives.
>>> -		 */
>>> -		svc_sock_setbufsize(svsk->sk_sock,
>>> -				    (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>>> -				    3 * serv->sv_max_mesg);
>>> -
>>>  	clear_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>>   	/* Receive data. If we haven't got the record length yet, get
>>> @@ -1065,15 +1047,6 @@ static void svc_tcp_init(struct svc_sock *svsk, struct svc_serv *serv)
>>>   		tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>>>  -		/* initialise setting must have enough space to
>>> -		 * receive and respond to one request.
>>> -		 * svc_tcp_recvfrom will re-adjust if necessary
>>> -		 */
>>> -		svc_sock_setbufsize(svsk->sk_sock,
>>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg,
>>> -				    3 * svsk->sk_xprt.xpt_server->sv_max_mesg);
>>> -
>>> -		set_bit(XPT_CHNGBUF, &svsk->sk_xprt.xpt_flags);
>>>  		set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
>>>  		if (sk->sk_state != TCP_ESTABLISHED)
>>>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>>> @@ -1143,8 +1116,14 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>>>  	/* Initialize the socket */
>>>  	if (sock->type == SOCK_DGRAM)
>>>  		svc_udp_init(svsk, serv);
>>> -	else
>>> +	else {
>>> +		/* initialise setting must have enough space to
>>>     
>>
>> s/initialise/initial/
>>
>>   
>>> +		 * receive and respond to one request.
>>> +		 */
>>> +		svc_sock_setbufsize(svsk->sk_sock, 4 * serv->sv_max_mesg,
>>> +					4 * serv->sv_max_mesg);
>>>  		svc_tcp_init(svsk, serv);
>>> +	}
>>>   	dprintk("svc: svc_setup_socket created %p (inet %p)\n",
>>>  				svsk, svsk->sk_sk);
>>> -- 
>>> 1.5.0.2
>>>
>>>     
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

next prev parent reply	other threads:[~2008-10-22 21:52 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 18:31 [RFC] [PATCH 1/1] tcp-autotuning-on-recv-window-fix Olga Kornievskaia
2008-10-22 19:46 ` J. Bruce Fields
2008-10-22 21:30   ` Dean Hildebrand
2008-10-22 21:52     ` J. Bruce Fields [this message]
2008-10-22 23:12   ` Jim Rees
2008-10-23 15:17   ` Olga Kornievskaia
2008-10-23 17:53     ` J. Bruce Fields
2008-10-23 18:34       ` Olga Kornievskaia
2008-10-23 18:46         ` Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081022215249.GD7454@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=aglo@citi.umich.edu \
    --cc=linux-nfs@vger.kernel.org \
    --cc=seattleplus@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox